Cosyne 2026

My running notes from Cosyne 2026. Most were hastily written during/immediately after live presentations, so likely contain errors reflecting my misunderstandings. Apologies to the presenters.

Posters by Topic

Posters by Session

Poster Session 1

1-007 Convergent motifs of early olfactory processing

Vertebrate and invertebrate olfactory systems are similar, but not derived from a common ancestor, implies convergent evolution.
Question: Can the structure of the olfactory system be determined normatively?
Model
- Sparse monomolecular concentration vectors $\cc$
- Sensed by receptors with affinities $\WW$.
- Expressed, not necessarily one-to-one, in ORNs according to $\bE$.
- Hill transformed, plus isotropic additive noise, to give ORN responses.$$ \rr = \varphi(\bE \WW \cc) + \eta.$$
- ORN responses converge, not necessarily one to one, on glomeruli
Learned parameters of the Hill function, and mappings of ORs to ORNs, to maximize mutual information between odours and responses.
- I think there was an energetic cost somewhere as well.
Found the 1-1 mapping from ORs to ORNs ($\bE \approx \II$).
Downstream: found 1-1 mapping from ORNs to glomeruli.
Dicussion: These results depend strongly on the assumed structure of the noise.

1-063
Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures

2AFC task with one target monomolecular odour among up to 16 distractor odours.
Performance depended on concentration of target odour, but not number of distractor odours.
Recapitulated in model with sparse, highly tuned receptors.
- These only sense the target odour, unresponsive to distractors, hence affected by target SNR but not number of distractors.

1-140
A generative diffusion model reveals V2’s representation of natural images

Modeled the manifold of images by training a diffusion model to produce images.
Generated two kinds of “noise”:
- On-manifold: Gaussian perturbations to the diffusion model input.
- Off-manifold: Gaussian perturbations to the diffusion mode output.
Measured responses of Macaque V2.
- On-manifold noise produced higher variability in the responses.
  - Such “noise” produces valid, different images, so the responses will reflect the new images geneated.
  - Cosine similarity of these responses decreased with noise level.
- Off-manifold noise produced similar responses across all noise levels.
  - This noise produces corrupted versions of the same images, so lack of variability perhaps reflects mapping of all of these to the same image.
  - Cosine similarity was similar across noise levels.
Comment: It’s like V2 is inverting the diffusion model.
- On-manifold noise produces variable activity as the mapping back to the image-generating latents, varies.
- Off-manifold noise produces constant responses, as the mapping back is to the same image.

1-089 Interpretable time-series analysis with Gumbel dynamics

Context was dynamics of discrete latent state generating observations.
We track the probability of being in each state.
We want state occupancies to be nearly one-hot, for interpretability.
- We want the system to be mostly in one state, rather than distributed across states.
Standard approach uses softmax, which has two problems:
- If we work with the probabilities, then these can stay nearly uniform.
  - Downstream decoder/observation model transforming can expand the small fluctuations from uniformity as needed.
  - Hard to interpret as states occupancy will be distributed.
- Alternatively, we can sample from the softmax.
  - Fixes the interpretability issue as exactly one state will be occupied
  - In efficient, as gradients etc. will only operate on that active state / sample.
Solution:Gumbel Softmax: $$ z \sim \text{GS}(\pi, \tau) \iff z \sim \text{softmax}\left({\pi + \eta \over \tau}\right), \quad \eta \sim \text{Gumbel}(0,1).$$
- Pushes the softmax probabilities so state-occupancy is nearly one-hot.
  - Helps interpretability
  - More efficient than purely one hot because states with small probabilities are also update.

1-065 Continuous Multinomial Logistic Regression for Neural Decoding

Standard logistic regression: $$p(y_k|\xx) \propto \exp(-\ww_k^T \xx).$$
Weights are fixed per class.
Can be extended to a temporal dimension by treating each time bin independently.
- This ignore temporal structure.
Idea: allow $\ww_k$ to vary smoothly in time by giving it a Gaussian process prior : $$ \ww_k \sim \text{GP}(\bzero, \lambda).$$
- I think there was also some linking of the different states as well, so that their vector evolutions were also linked.
Take the mean $\ww_k(t)$ to compute output probabilities. $$ p(\yy_k | \xx(t)) \propto \exp(-\overline{\ww}_k(t)^T \xx).$$

Poster Session 2

2-136 Statistical theory for inferring population geometry in high-dimensional neural data

Used RMT to investigate how covariance estimation varies with the number of neurons $N$ and trials $T$.
First result was on PCA dimension (participation ratio), showing how subsampling $N$ neurons and $T$ trials to $M$ neurons and $P$ trials affects estimation by relating the PCA dimensions at the two configuration.
Next result was about how estimation error relative to true covariance, $\|\hat C – C\|_F^2$ varies as $D_N/T$.
- $D_N$ is the true PCA dimensionality for $N$ neurons in the infinite trials limit.
What if the true dimensionality is not known? Replication error, $\|\hat C_1 – \hat C_2\|_F^2$, varies as $2 \hat{D}_{N,T}/T$.
- $\hat{D}_{N,T}$ is the dimensionality of the sample.
Estimating eigenvectors and eigenvalues:
- Low rank signal model: $C = U D U + \sigma^2 I$.
- SNR for the $k$’th dimensions: $SNR_k = d_k/{N \sigma^2}$.
- There was some critical SNR threshold above which eigenvalues and eigenvectors could be estimated.
- Effective $R^2$ is $R^2 = {\sum_k O_k SNR_k – {K/N} \over \sum_k SNR_k -1 }.$
  - O_k is the alignment of the recovered eigenvector with the true eigenvector.
  - Recovering each mode contributes SNR, weighted by the alignment, but also brings noise (contributing $N^{-1}$ per mode).
  - Best rank to recover is the $K$ at which the numerator no longer increases, where adding one more mode brings more noise than signal.

2-157 Continuous partitioning of neuronal variability

Classic models of neural responses explain them as homogeneous Poisson processes whose rates are the product of a stimulus dependent tuning $f_s$ and a trial-specific gain $g_k$: $$y \sim \text{Poisson}(f_s g_k).$$
Innovation to capture temporal variability: the Continuous Modulated Poisson Model.
Tunings and gains as Gaussian processes: $$ \log g(t) \sim \text{GP}(0, K_g), \quad \log f_s(t) \sim \text{GP}(0, K_f(s)).$$
Gains modelled as $$K_g = \rho_g \exp\left(-{|t_1 – t_2|^q \over \ell_g}\right).$$
Can then e.g. monitor how parameters like optimal temporal correlation lengths $\ell_g$ and heavy-tailed-ness $q$ vary across brain areas.

2-092 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines

Main idea: fit RBMs to neural data.
Marginalize out hidden states to get effective interactions between units at all orders.
- Advantage over e.g. Schneidman’s approach by making it easier to estimate higher order interactions.
  - But surely subsampling problems must be the same? i.e. estimates of high-order interactions will be noisy due to lack of observations.
Compute index of higher order interactions between pairs of units.
- Can indicate either missing units in the recording, or true higher order interactions (e.g. via glia).

2-218 A unified theory of feature learning in RNNs and DNNs

Compared the solutions in a regression task by RNNs and DNNs.
RNNs can be viewed as DNNs using temporal unrolling.
Key difference is that in the unrolled RNN, the layer weights are shared.
Weight sharing imposes an inductive bias which can make RNNs more sample efficient e.g. when learning temporal sequences.

2-152 Dynamical archetype analysis: Autonomous computation

Neural systems often have different geometry, but the same topology
- Converge to the same pattern of fixed points, repelled by the same set of repellers etc.
This is called topological conjugacy: Two systems are topologically conjugate if there is a homeomorphism $\Phi$ that transforms one set of dynamics into the other.
The set of all homeomorphisms is too large, so parameterize using a Neural ODE.
Compute the distance between two sets of dynamics $f$, $g$ by minimizing a combination of a trajectory mismatch $d_\text{traj}$, and homeomorphism complexity $d_\text{cxty}$.
Trajectory loss: Given the trajectory at time $t$ starting at $x$ under $f$, dynamics, $\phi_f^t(x)$, and similarly $\phi_g^t(x)$: $$ d_\text{traj}(\Phi; f, g) = \int_t \| \phi_f^t(x) – \underbrace{\Phi(\phi_g^t ( \Phi^{-1}(x)))}_{\text{$g$ trajectory in $f$ space}}\| dt$$
Complexity loss: $$ d_\text{cxty}(\Phi) = \int \| \nabla \Phi(x) – I \| dx,$$ evaluated along the $f$ trajectory (?).
Measure the distance of a given neural system to a fixed set of archetype dynamics, e.g. line attractor, ring attractor.
Showed that they could correctly map dynamics to their archetypes when the ground truth was known.
- E.g. neural ring attractor dynamics (fly system? head direction system?) mapped to ring attractor archetype.

2-208 Plastic Circuits for Context-Dependent Decisions

Investigated the effect of Short-Term Synaptic Plasticity (STSP) in an RNN performing the Mante 2013 task (output based on colour or motion signal, depending on context).
STSP: Strengths are determined as the product of utilization and activity, $w_{\text{eff}, ij}(t) =w_{ij} u_j(t) x_j(t).$
- This is mean to model e.g. vesicle pool depletion etc.
A network using STSP could perform the task, one with Hebbian plasticity could not (?).
Found that in STSP, context information is stored in neural activity, not the synapses.
A network with fixed weights wanting to implement the same thing has to do so through nonlinear activations: $A(t) x(t) \to W \phi(x(t)),$, which presumably could get complicated, intractable.

2-173 Spatiotemporal Dynamics in Recurrent Neural Networks as Flow Invariance

RNNs are often used to learn stimulus dynamics.
It’s natural to want equivariant hidden representations: flow in the stimulus results in corresponding flow in the latents.
Incorporating such equivariance into the RNN dynamics can dramatically speed up learning.

2-045 A non-local variational framework for optimal neural representations

Tuning curves are often defined by maximizing Fisher information.
Fisher information is a local measure – doesn’t capture e.g. errors due to jumps in the inferred values.
Mutual information is global, but can be hard to compute.
Solution: a non-local loss on tuning curves comparing all pairs of input stimuli: $$ L[f] = {1 \over (2 \pi)^2} \int_\theta \int_{\theta’} \ell(f(\theta), f(\theta’)),$$ where $\ell$ is misclassification error.
How to solve this?
- $f(\theta) = p(x|\theta)$, the population response.
- The population response space is a manifold with the Fisher-Rao metric.
- The responses to two stimuli are two points in this space, separated by a geodesic distance $d$.
- Classification error $\ell$ is approximately erfc of this distance.
- The set of all responses (to circular stimuli) forms a closed curve in this space.
- The optimal tuning curves that minimize the loss form a circle in the space of square-root firing rates.

Poster Session 3

3-154 Estimating neural coding fidelity in high dimensions with limited samples

d’ measures discriminability but can be biased when neurons >> trials.
Used RMT to estimate d’ in high-dimensional setting and produce a less-biased estimator.
Key quantity: Signal aligned spectrum: $$ G_\rho(x) = \sum_{i=1}^N \left({v_i^T u \over \|u\|}\right)^2 1_{x > \lambda_i},$$ where $v_i$ are the noise directions in decreasing order of variance $\lambda_i$, and $u$ is the signal direction.
In those terms, $$ d’ = \|u\|^2 \int {1 \over \lambda} dG(\lambda).$$

3-138 Identifying interpretable latent factors within and across brain regions

Decompose temporal activity into sparse factors, orthogonal factors convolved with Gaussian filters, possibly with some delay.
- Orthogonality and sparsity give interpretability.
- Convolution with Gaussian filters is faster to fit than general Gaussian process.

3-070 Generalization and memorization in mouse olfactory learning

Trained mice to distinguish a variety of sixteen component mixtures to test generalization vs memorization.
Mice can do both, biased towards simple rules when these exist.

3-130 Sensory prediction errors update predictive representations

RSC is the source of sensory predictions (not prediction errors) to V1.

3-225 Noise Correlations for Efficient Learning

For optimal discrimination we want noise correlations to be orthogonal to the discriminating dimensions.
Tested humans on a joint color-motion discrimination task, where the rule would occasionally flip.
Modelled this with shallow linear net mapping color and motion to the decision.
Observed that noise correlations in the model were parallel to the optimal discrimination direction.
- This would produce sub-optimal accuracy, and indeed it does.
- But they hypothesize that it helps find the discriminating directions.
  - I think this is putting the cart before the horse.
  - Classifiers will find the discriminating directions, and that in turn will affect the noise correlations.

3-005 State-dependent modulation of neocortical sensory processing

Imaged mouse S1 and PPC during two-alternative multi-sensory discrimination task.
Modelled brain activity using a 3-state GLM-HMM
Produced one “engaged” state, two “disengaged” states.
In the engaged state:
- Stronger reps in S1, PPC-A
- Stronger communication b/w S1 to PPC-A
- Stronger bottom-up activation.

3-069 Generalized DSA: Comparing Neural Population Dynamics by Identifying Optimal Linearizing Embeddings

DSA has two steps:
- Map nonlinear dynamics to high (infinite) dimensional linear dynamics using Koopman theory.
- Find an orthonormal coordinate transformation that best lines up one set with another: $$ d_\text{DSA}(A,B) = \min_{C \in O(N)} \|A – C B C^T\|.$$
Genearlized DSA:
- Find eigenspectrum of dynamics
- Measure distance between spectra using optimal transport.
Ostrow: Is faster / works better than DSA.

3-030 Identifying Neural Activity Manifolds through Non-reversibility Analysis

Neural dynamics are generally not reversible.
- i.e. $p(x_t = a, x_{t+1} = b) != p(x_t = b, x_{t+1}=a)$
Noise dynamics are reversible.
Idea: Dimension reduce dynamics by finding projections that produce maximally non-reversible dynamics.
Let $X_k \in \RR^{N \times T}$ be the responses to condition $k$.
Compute the covariance of the vectorized responses $C = \EE(\vec{X_k^T}, \vec{X_k^T}^T).$
Split this into a reversible and non-reversible part:
- Let $\sigma(C)$ be the time-transposed covariances.
- $C^+ = C + \sigma(C)$
- $C^- = C – \sigma(C)$
Non-reversibility index: $$\xi = {\|C^-\|_F \over \|C^+\|_F}.$$
Tough to maximize this coefficient itself, instead just maximize the numerator:
- Find $U$ to maximize $\|C^-\|$ for the projected data $Y_k = U^T X_k$.
- Non-reversible part has a simple expression: $$ \|C^-\|_F^2 = \sum_{k,k’} \tr{Y_k Y_{k’}^T}^2 – \tr{Y_k Y_k^T Y_{k’} Y_{k’}^T}.$$
- Can be kernelized.

3-111 Distinguishing probabilistic from heuristic neural representations of uncertainty

When are neural representations truly probabilistic rather than just heuristic?
Found that truly probabilistic reps require a bottleneck that forces learning of sufficient statistics.
Otherwise can just memorize inputs.
- Measurable if can predict inputs from the hidden states.

3-046 Canonical cortical circuits: A unified sampling machine for static and dynamic inference

Found that the canonical microcircuit was a substrate for Hamiltonian dynamics and allowed fast inference.

Workshops 1

Eero

Efficient coding (Barlow): information transmission while minimizing redundancy.
First part of the talk: building circuit models that progressively reduce different kinds of redundancy, and how these match up to corresponding visual ares.
Second part of talk: accessing the image manifold through denoising
- Supervised training of image denoisers
  - For least squares loss this reports posterior mean: $$ y = x + z \mapsto \hat x(y) = \int x p(x|y) dx .$$
  - The trained system implicitly contains information about the prior on image. How to access this information?
  - Tweedie’s identity: $$\hat{x}(y) = y + \sigma^2 \nabla_y \log p(y).$$

Ken Harris

Workshops 2

Questions

Can the OT metric on tuning curves be pulled back to a metric in the response space to allow direct comparison of datasets with different sizes?
Non-reversibility analysis looks for an orthogonal projection in neural space. Is there also an optimal time scale, a projection along the time direction, to compute non-reversibility over?
Ken Harris mentioned the trivial neural code for numbers, where successive neurons code for successive significant digits.
- This is an intensive code, information per dimensions decays to zero in the large N limit.
- What is a corresponding extensive code, that distributes the information evenly among neurons?
  - That is also easy to decode?
  - Does the naive encoding correspond to axis-aligned coordinates?
    - And the a distributed code could be a rotation?
Matthew Chalk’s talk
- He defined a multi-resolution information metric by looking at how fisher information changed with noise corruption of different magnitudes.
- The eigenvectors of the metric give the most informative directions for each stimulus.
- Can we derie Can the multi-resolution information metric be derived from locally most informative directions?

Posters by Topic

Experimental

Data Analysis

Theory/Modeling

Posters by Session

1-063 Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures

1-140 A generative diffusion model reveals V2’s representation of natural images

1-089 Interpretable time-series analysis with Gumbel dynamics

Poster Session 2

2-136 Statistical theory for inferring population geometry in high-dimensional neural data

2-218 A unified theory of feature learning in RNNs and DNNs

2-208 Plastic Circuits for Context-Dependent Decisions

2-045 A non-local variational framework for optimal neural representations

Poster Session 3

3-154 Estimating neural coding fidelity in high dimensions with limited samples

3-138 Identifying interpretable latent factors within and across brain regions

3-070 Generalization and memorization in mouse olfactory learning

3-130 Sensory prediction errors update predictive representations

3-225 Noise Correlations for Efficient Learning

3-005 State-dependent modulation of neocortical sensory processing

3-069 Generalized DSA: Comparing Neural Population Dynamics by Identifying Optimal Linearizing Embeddings

3-030 Identifying Neural Activity Manifolds through Non-reversibility Analysis

Workshops 1

Eero

Ken Harris

Workshops 2

Questions

Comments

Leave a Reply Cancel reply

1-063
Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures

1-140
A generative diffusion model reveals V2’s representation of natural images