My running notes from Cosyne 2026. Most were hastily written during/immediately after live presentations, so likely contain errors reflecting my misunderstandings. Apologies to the presenters.
Posters by Topic
Experimental
- 1-007 Convergent motifs of early olfactory processing
- 1-063
Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures - 3-070 Generalization and memorization in mouse olfactory learning
- 3-130 Sensory prediction errors update predictive representations
- 1-140
A generative diffusion model reveals V2’s representation of natural images - 3-005 State-dependent modulation of neocortical sensory processing
- 3-225 Noise Correlations for Efficient Learning
Data Analysis
- 2-092 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines
- 1-089 INTERPRETABLE TIME-SERIES ANALYSIS WITH GUMBEL DYNAMICS
- 3-138 Identifying interpretable latent factors within and across brain regions
- 3-030 Identifying Neural Activity Manifolds through Non-reversibility Analysis
- 1-065 Continuous Multinomial Logistic Regression for Neural Decoding
- 2-157 Continuous partitioning of neuronal variability
- 2-152 DYNAMICAL ARCHETYPE ANALYSIS: AUTONOMOUS COMPUTATION
- 3-069 Generalized DSA: Comparing Neural Population Dynamics by Identifying Optimal Linearizing Embeddings
Theory/Modeling
- 2-136 Statistical theory for inferring population geometry in high-dimensional neural data
- 3-154 Estimating neural coding fidelity in high dimensions with limited samples
- 2-045 A non-local variational framework for optimal neural representations
- 2-218 A unified theory of feature learning in RNNs and DNNs
- 2-208 Plastic Circuits for Context-Dependent Decisions
- 2-173 Spatiotemporal Dynamics in Recurrent Neural Networks as Flow Invariance
- 3-111 Distinguishing probabilistic from heuristic neural representations of uncertainty
- 3-046 Canonical cortical circuits: A unified sampling machine for static and dynamic inference
Posters by Session
Poster Session 1
1-007 Convergent motifs of early olfactory processing
- Vertebrate and invertebrate olfactory systems are similar, but not derived from a common ancestor, implies convergent evolution.
- Question: Can the structure of the olfactory system be determined normatively?
- Model
- Sparse monomolecular concentration vectors $\cc$
- Sensed by receptors with affinities $\WW$.
- Expressed, not necessarily one-to-one, in ORNs according to $\bE$.
- Hill transformed, plus isotropic additive noise, to give ORN responses.$$ \rr = \varphi(\bE \WW \cc) + \eta.$$
- ORN responses converge, not necessarily one to one, on glomeruli
- Learned parameters of the Hill function, and mappings of ORs to ORNs, to maximize mutual information between odours and responses.
- I think there was an energetic cost somewhere as well.
- Found the 1-1 mapping from ORs to ORNs ($\bE \approx \II$).
- Downstream: found 1-1 mapping from ORNs to glomeruli.
- Dicussion: These results depend strongly on the assumed structure of the noise.
1-063
Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures
- 2AFC task with one target monomolecular odour among up to 16 distractor odours.
- Performance depended on concentration of target odour, but not number of distractor odours.
- Recapitulated in model with sparse, highly tuned receptors.
- These only sense the target odour, unresponsive to distractors, hence affected by target SNR but not number of distractors.
1-140
A generative diffusion model reveals V2’s representation of natural images
- Modeled the manifold of images by training a diffusion model to produce images.
- Generated two kinds of “noise”:
- On-manifold: Gaussian perturbations to the diffusion model input.
- Off-manifold: Gaussian perturbations to the diffusion mode output.
- Measured responses of Macaque V2.
- On-manifold noise produced higher variability in the responses.
- Such “noise” produces valid, different images, so the responses will reflect the new images geneated.
- Cosine similarity of these responses decreased with noise level.
- Off-manifold noise produced similar responses across all noise levels.
- This noise produces corrupted versions of the same images, so lack of variability perhaps reflects mapping of all of these to the same image.
- Cosine similarity was similar across noise levels.
- On-manifold noise produced higher variability in the responses.
- Comment: It’s like V2 is inverting the diffusion model.
- On-manifold noise produces variable activity as the mapping back to the image-generating latents, varies.
- Off-manifold noise produces constant responses, as the mapping back is to the same image.
1-089 Interpretable time-series analysis with Gumbel dynamics
- Context was dynamics of discrete latent state generating observations.
- We track the probability of being in each state.
- We want state occupancies to be nearly one-hot, for interpretability.
- We want the system to be mostly in one state, rather than distributed across states.
- Standard approach uses softmax, which has two problems:
- If we work with the probabilities, then these can stay nearly uniform.
- Downstream decoder/observation model transforming can expand the small fluctuations from uniformity as needed.
- Hard to interpret as states occupancy will be distributed.
- Alternatively, we can sample from the softmax.
- Fixes the interpretability issue as exactly one state will be occupied
- In efficient, as gradients etc. will only operate on that active state / sample.
- If we work with the probabilities, then these can stay nearly uniform.
- Solution:Gumbel Softmax: $$ z \sim \text{GS}(\pi, \tau) \iff z \sim \text{softmax}\left({\pi + \eta \over \tau}\right), \quad \eta \sim \text{Gumbel}(0,1).$$
- Pushes the softmax probabilities so state-occupancy is nearly one-hot.
- Helps interpretability
- More efficient than purely one hot because states with small probabilities are also update.
- Pushes the softmax probabilities so state-occupancy is nearly one-hot.
1-065 Continuous Multinomial Logistic Regression for Neural Decoding
- Standard logistic regression: $$p(y_k|\xx) \propto \exp(-\ww_k^T \xx).$$
- Weights are fixed per class.
- Can be extended to a temporal dimension by treating each time bin independently.
- This ignore temporal structure.
- Idea: allow $\ww_k$ to vary smoothly in time by giving it a Gaussian process prior : $$ \ww_k \sim \text{GP}(\bzero, \lambda).$$
- I think there was also some linking of the different states as well, so that their vector evolutions were also linked.
- Take the mean $\ww_k(t)$ to compute output probabilities. $$ p(\yy_k | \xx(t)) \propto \exp(-\overline{\ww}_k(t)^T \xx).$$
Poster Session 2
2-136 Statistical theory for inferring population geometry in high-dimensional neural data
- Used RMT to investigate how covariance estimation varies with the number of neurons $N$ and trials $T$.
- First result was on PCA dimension (participation ratio), showing how subsampling $N$ neurons and $T$ trials to $M$ neurons and $P$ trials affects estimation by relating the PCA dimensions at the two configuration.
- Next result was about how estimation error relative to true covariance, $\|\hat C – C\|_F^2$ varies as $D_N/T$.
- $D_N$ is the true PCA dimensionality for $N$ neurons in the infinite trials limit.
- What if the true dimensionality is not known? Replication error, $\|\hat C_1 – \hat C_2\|_F^2$, varies as $2 \hat{D}_{N,T}/T$.
- $\hat{D}_{N,T}$ is the dimensionality of the sample.
- Estimating eigenvectors and eigenvalues:
- Low rank signal model: $C = U D U + \sigma^2 I$.
- SNR for the $k$’th dimensions: $SNR_k = d_k/{N \sigma^2}$.
- There was some critical SNR threshold above which eigenvalues and eigenvectors could be estimated.
- Effective $R^2$ is $R^2 = {\sum_k O_k SNR_k – {K/N} \over \sum_k SNR_k -1 }.$
- O_k is the alignment of the recovered eigenvector with the true eigenvector.
- Recovering each mode contributes SNR, weighted by the alignment, but also brings noise (contributing $N^{-1}$ per mode).
- Best rank to recover is the $K$ at which the numerator no longer increases, where adding one more mode brings more noise than signal.
2-157 Continuous partitioning of neuronal variability
- Classic models of neural responses explain them as homogeneous Poisson processes whose rates are the product of a stimulus dependent tuning $f_s$ and a trial-specific gain $g_k$: $$y \sim \text{Poisson}(f_s g_k).$$
- Innovation to capture temporal variability: the Continuous Modulated Poisson Model.
- Tunings and gains as Gaussian processes: $$ \log g(t) \sim \text{GP}(0, K_g), \quad \log f_s(t) \sim \text{GP}(0, K_f(s)).$$
- Gains modelled as $$K_g = \rho_g \exp\left(-{|t_1 – t_2|^q \over \ell_g}\right).$$
- Can then e.g. monitor how parameters like optimal temporal correlation lengths $\ell_g$ and heavy-tailed-ness $q$ vary across brain areas.
2-092 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines
- Main idea: fit RBMs to neural data.
- Marginalize out hidden states to get effective interactions between units at all orders.
- Advantage over e.g. Schneidman’s approach by making it easier to estimate higher order interactions.
- But surely subsampling problems must be the same? i.e. estimates of high-order interactions will be noisy due to lack of observations.
- Advantage over e.g. Schneidman’s approach by making it easier to estimate higher order interactions.
- Compute index of higher order interactions between pairs of units.
- Can indicate either missing units in the recording, or true higher order interactions (e.g. via glia).
2-218 A unified theory of feature learning in RNNs and DNNs
- Compared the solutions in a regression task by RNNs and DNNs.
- RNNs can be viewed as DNNs using temporal unrolling.
- Key difference is that in the unrolled RNN, the layer weights are shared.
- Weight sharing imposes an inductive bias which can make RNNs more sample efficient e.g. when learning temporal sequences.
2-152 Dynamical archetype analysis: Autonomous computation
- Neural systems often have different geometry, but the same topology
- Converge to the same pattern of fixed points, repelled by the same set of repellers etc.
- This is called topological conjugacy: Two systems are topologically conjugate if there is a homeomorphism $\Phi$ that transforms one set of dynamics into the other.
- The set of all homeomorphisms is too large, so parameterize using a Neural ODE.
- Compute the distance between two sets of dynamics $f$, $g$ by minimizing a combination of a trajectory mismatch $d_\text{traj}$, and homeomorphism complexity $d_\text{cxty}$.
- Trajectory loss: Given the trajectory at time $t$ starting at $x$ under $f$, dynamics, $\phi_f^t(x)$, and similarly $\phi_g^t(x)$: $$ d_\text{traj}(\Phi; f, g) = \int_t \| \phi_f^t(x) – \underbrace{\Phi(\phi_g^t ( \Phi^{-1}(x)))}_{\text{$g$ trajectory in $f$ space}}\| dt$$
- Complexity loss: $$ d_\text{cxty}(\Phi) = \int \| \nabla \Phi(x) – I \| dx,$$ evaluated along the $f$ trajectory (?).
- Measure the distance of a given neural system to a fixed set of archetype dynamics, e.g. line attractor, ring attractor.
- Showed that they could correctly map dynamics to their archetypes when the ground truth was known.
- E.g. neural ring attractor dynamics (fly system? head direction system?) mapped to ring attractor archetype.
2-208 Plastic Circuits for Context-Dependent Decisions
- Investigated the effect of Short-Term Synaptic Plasticity (STSP) in an RNN performing the Mante 2013 task (output based on colour or motion signal, depending on context).
- STSP: Strengths are determined as the product of utilization and activity, $w_{\text{eff}, ij}(t) =w_{ij} u_j(t) x_j(t).$
- This is mean to model e.g. vesicle pool depletion etc.
- A network using STSP could perform the task, one with Hebbian plasticity could not (?).
- Found that in STSP, context information is stored in neural activity, not the synapses.
- A network with fixed weights wanting to implement the same thing has to do so through nonlinear activations: $A(t) x(t) \to W \phi(x(t)),$, which presumably could get complicated, intractable.
2-173 Spatiotemporal Dynamics in Recurrent Neural Networks as Flow Invariance
- RNNs are often used to learn stimulus dynamics.
- It’s natural to want equivariant hidden representations: flow in the stimulus results in corresponding flow in the latents.
- Incorporating such equivariance into the RNN dynamics can dramatically speed up learning.
2-045 A non-local variational framework for optimal neural representations
- Tuning curves are often defined by maximizing Fisher information.
- Fisher information is a local measure – doesn’t capture e.g. errors due to jumps in the inferred values.
- Mutual information is global, but can be hard to compute.
- Solution: a non-local loss on tuning curves comparing all pairs of input stimuli: $$ L[f] = {1 \over (2 \pi)^2} \int_\theta \int_{\theta’} \ell(f(\theta), f(\theta’)),$$ where $\ell$ is misclassification error.
- How to solve this?
- $f(\theta) = p(x|\theta)$, the population response.
- The population response space is a manifold with the Fisher-Rao metric.
- The responses to two stimuli are two points in this space, separated by a geodesic distance $d$.
- Classification error $\ell$ is approximately erfc of this distance.
- The set of all responses (to circular stimuli) forms a closed curve in this space.
- The optimal tuning curves that minimize the loss form a circle in the space of square-root firing rates.
Poster Session 3
3-154 Estimating neural coding fidelity in high dimensions with limited samples
- d’ measures discriminability but can be biased when neurons >> trials.
- Used RMT to estimate d’ in high-dimensional setting and produce a less-biased estimator.
- Key quantity: Signal aligned spectrum: $$ G_\rho(x) = \sum_{i=1}^N \left({v_i^T u \over \|u\|}\right)^2 1_{x > \lambda_i},$$ where $v_i$ are the noise directions in decreasing order of variance $\lambda_i$, and $u$ is the signal direction.
- In those terms, $$ d’ = \|u\|^2 \int {1 \over \lambda} dG(\lambda).$$
3-138 Identifying interpretable latent factors within and across brain regions
- Decompose temporal activity into sparse factors, orthogonal factors convolved with Gaussian filters, possibly with some delay.
- Orthogonality and sparsity give interpretability.
- Convolution with Gaussian filters is faster to fit than general Gaussian process.
3-070 Generalization and memorization in mouse olfactory learning
- Trained mice to distinguish a variety of sixteen component mixtures to test generalization vs memorization.
- Mice can do both, biased towards simple rules when these exist.
3-130 Sensory prediction errors update predictive representations
- RSC is the source of sensory predictions (not prediction errors) to V1.
3-225 Noise Correlations for Efficient Learning
- For optimal discrimination we want noise correlations to be orthogonal to the discriminating dimensions.
- Tested humans on a joint color-motion discrimination task, where the rule would occasionally flip.
- Modelled this with shallow linear net mapping color and motion to the decision.
- Observed that noise correlations in the model were parallel to the optimal discrimination direction.
- This would produce sub-optimal accuracy, and indeed it does.
- But they hypothesize that it helps find the discriminating directions.
- I think this is putting the cart before the horse.
- Classifiers will find the discriminating directions, and that in turn will affect the noise correlations.
3-005 State-dependent modulation of neocortical sensory processing
- Imaged mouse S1 and PPC during two-alternative multi-sensory discrimination task.
- Modelled brain activity using a 3-state GLM-HMM
- Produced one “engaged” state, two “disengaged” states.
- In the engaged state:
- Stronger reps in S1, PPC-A
- Stronger communication b/w S1 to PPC-A
- Stronger bottom-up activation.
3-069 Generalized DSA: Comparing Neural Population Dynamics by Identifying Optimal Linearizing Embeddings
- DSA has two steps:
- Map nonlinear dynamics to high (infinite) dimensional linear dynamics using Koopman theory.
- Find an orthonormal coordinate transformation that best lines up one set with another: $$ d_\text{DSA}(A,B) = \min_{C \in O(N)} \|A – C B C^T\|.$$
- Genearlized DSA:
- Find eigenspectrum of dynamics
- Measure distance between spectra using optimal transport.
- Ostrow: Is faster / works better than DSA.
3-030 Identifying Neural Activity Manifolds through Non-reversibility Analysis
- Neural dynamics are generally not reversible.
- i.e. $p(x_t = a, x_{t+1} = b) != p(x_t = b, x_{t+1}=a)$
- Noise dynamics are reversible.
- Idea: Dimension reduce dynamics by finding projections that produce maximally non-reversible dynamics.
- Let $X_k \in \RR^{N \times T}$ be the responses to condition $k$.
- Compute the covariance of the vectorized responses $C = \EE(\vec{X_k^T}, \vec{X_k^T}^T).$
- Split this into a reversible and non-reversible part:
- Let $\sigma(C)$ be the time-transposed covariances.
- $C^+ = C + \sigma(C)$
- $C^- = C – \sigma(C)$
- Non-reversibility index: $$\xi = {\|C^-\|_F \over \|C^+\|_F}.$$
- Tough to maximize this coefficient itself, instead just maximize the numerator:
- Find $U$ to maximize $\|C^-\|$ for the projected data $Y_k = U^T X_k$.
- Non-reversible part has a simple expression: $$ \|C^-\|_F^2 = \sum_{k,k’} \tr{Y_k Y_{k’}^T}^2 – \tr{Y_k Y_k^T Y_{k’} Y_{k’}^T}.$$
- Can be kernelized.
3-111 Distinguishing probabilistic from heuristic neural representations of uncertainty
- When are neural representations truly probabilistic rather than just heuristic?
- Found that truly probabilistic reps require a bottleneck that forces learning of sufficient statistics.
- Otherwise can just memorize inputs.
- Measurable if can predict inputs from the hidden states.
3-046 Canonical cortical circuits: A unified sampling machine for static and dynamic inference
- Found that the canonical microcircuit was a substrate for Hamiltonian dynamics and allowed fast inference.
Workshops 1
Eero
- Efficient coding (Barlow): information transmission while minimizing redundancy.
- First part of the talk: building circuit models that progressively reduce different kinds of redundancy, and how these match up to corresponding visual ares.
- Second part of talk: accessing the image manifold through denoising
- Supervised training of image denoisers
- For least squares loss this reports posterior mean: $$ y = x + z \mapsto \hat x(y) = \int x p(x|y) dx .$$
- The trained system implicitly contains information about the prior on image. How to access this information?
- Tweedie’s identity: $$\hat{x}(y) = y + \sigma^2 \nabla_y \log p(y).$$
- Supervised training of image denoisers
Ken Harris
Workshops 2
Questions
- Can the OT metric on tuning curves be pulled back to a metric in the response space to allow direct comparison of datasets with different sizes?
- Non-reversibility analysis looks for an orthogonal projection in neural space. Is there also an optimal time scale, a projection along the time direction, to compute non-reversibility over?
- Ken Harris mentioned the trivial neural code for numbers, where successive neurons code for successive significant digits.
- This is an intensive code, information per dimensions decays to zero in the large N limit.
- What is a corresponding extensive code, that distributes the information evenly among neurons?
- That is also easy to decode?
- Does the naive encoding correspond to axis-aligned coordinates?
- And the a distributed code could be a rotation?
- Matthew Chalk’s talk
- He defined a multi-resolution information metric by looking at how fisher information changed with noise corruption of different magnitudes.
- The eigenvectors of the metric give the most informative directions for each stimulus.
- Can we derie Can the multi-resolution information metric be derived from locally most informative directions?
Leave a Reply