Neurips 2025 Day 1

My notes on Day of NeurIPS 2025. I wanted to wait till they were more complete, but it’s three months later now and they’re useful as as!

Explainable AI

Attribution

  • System output is a function of:
    • Training data, which makes the system sensitive to
    • Input features, that are transformed by
    • Model components to produce an output
  • Approaches for attributing outputs to features
    • Perturbation based approaches
      • Compare system output with a feature that without
        • Game-theoretic approach
          • Compute marginal contribution of units
          • Combine these into Shapley metrics
      • Produces saliency maps
    • Gradient based approaches
      • Partial derivatives of output with respect to a features
    • Linear approximations
      • Approximate model as locally linear: $f(x) \approx w^T x + b.$
        • Often enough to replace $x$ with binary indicator of feature presence.
  • Data attribution
    • Perturbation based: e.g. Leave One Out: $f(x) – f^{-j}(x)$
      • Game theoretic metric: Data Shapley
    • Gradient-based
      • Loss gradient overlap
      • Influence functions: how much do parameters change without a specific training example: $\theta_{\epsilon, x_j} = \text{argmin}_\theta {1 \over n} \sum+
    • Linear approximations
      • Skip training, directly predict model output with a linear model.
  • Component attribution
    • Component: Neuron, subnetwork, etc.
    • Perturbation based: Causal Mediation Analysis
      • Output with and without a specific component.
    • Game Theoretic: Neural Shapley
    • Causal tracing:
      • First, get the normal system output, x \to f(x): What is the Capital of France?
      • Then, get the output to a perturbed input x’ \to f(x’): What is the Capital of ____?
      • Finally, bring in component activation from unperturbed case: x’ \to f_{k^*}(x’).
      • Perturb the identified component to obtain target behaviour.
    • Gradient-based component attribution:
      • $f_{k^*}(x’) -f(x’) \approx \nabla_{c_k} f(x’)[c_k(x) – c_k(x’)]$?

Building Inherently Explainable AI systems

  • Explainability as communication channel between the model and the human.
  • Inherently explainable architectures:
    • Replacing neural network layers to force explicit representations of human-understandable concepts.
      • Modifying loss functions accordingly
    • Does not cause performance loss.
    • Transformers to Generalized Additive Models
      • Backpack Language Models
  • Inherently explainable training:
    • Gradient-based and perturbation based approaches can attribute different answers than masking based approaches.
    • One solution: change training to reflect post-hoc explainaibility paradigm.
    • E.g. training on randomly masked data.

Concepts and References

  • LogitLens
  • Shapley / Data Shapley / Neural Shapley
  • Causal Mediation Analysis
  • Kumar et al. 2022, probing classifiers will rely on spurious features.
  • Backpack Language Models
  • Generalized Additive Models
  • Neural Additive Models

Geometric Deep Learning

  • Groups capture symmetries
  • Invariant neural networks:
    • Output invariant transformed input is the same: $f(g.x) = f(x)$
  • Equivariance:
    • Output respects transformed input: $f(g.x) = g.f(x)$
  • Why do we care?
    • Learning efficiency
    • Noise robustness
    • Can affect loss-landscapes, learning

Achieving equivariance

  • Modifying off-the-shelf (non-equivariant) networks
    • Canonicalization
      • E.g. alignment, registration
      • Problem: not always continuous: small input changes produce large output changes.
    • Group averaging
      • Average output of all group transformed inputs
      • Nice mathematical properties, including continuity
      • Problem: Groups can be huge, enumerating all group elements can be hard.
        • Solution: Don’t need to use full group, group generators suffice.
          • Problem: Group generators can be hard to find.
            • Solution: A not too large random subset of group elements can approximately capture the symmetries.
  • Building equivariant-networks
    • Data augmentation: train on group transformed data.
      • Problem: No guarantee to exactly capture the symmetry.
    • Weight-sharing: symmetries are reflected in fixed weigh structure.
      • E.g. CNNs capturing translation invariance.
      • Problem: Might not be sufficiently expressive.
    • From invariant theory:
      • Build networks sensitive to specific small order polynomial functions of the input.
        • E.g. to make invariant to rotations, use dot products.

Concepts and References

  • Equivariance
  • SignNet
  • DeepSet
  • Villar “Machine Learning and Invariant Theory”
  • “Symmetries in Neural Network Parameter Space.”
  • Weight space learning “Neural Nets as Data”
  • Model merging
  • Symmetry-invariant optimization
  • Neural Radiance Fields
  • Implicit Neural Representations
  • Any-dimensional models
  • “Representational Stability” B Farb ICM 2014
  • Manifold hypothesis helps curse of dimensionality if curvature is bounded positive below
    • E.g. avoid hyperbolas, space-filling shapes.


Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *