Cosyne 2026

My running notes from Cosyne 2026. Most were hastily written during/immediately after live presentations, so likely contain errors reflecting my misunderstandings. Apologies to the presenters.

Poster Session 1

007 Convergent motifs of early olfactory processing
  • Vertebrate and invertebrate olfactory systems are similar, but not derived from a common ancestor, implies convergent evolution.
  • Question: Can the structure of the olfactory system be determined normatively?
  • Model
    • Sparse monomolecular concentration vectors $\cc$
    • Sensed by receptors with affinities $\WW$.
    • Expressed, not necessarily one-to-one, in ORNs according to $\bE$.
    • Hill transformed, plus isotropic additive noise, to give ORN responses.$$ \rr = \varphi(\bE \WW \cc) + \eta.$$
    • ORN responses converge, not necessarily one to one, on glomeruli
  • Learned parameters of the Hill function, and mappings of ORs to ORNs, to maximize mutual information between odours and responses.
    • I think there was an energetic cost somewhere as well.
  • Found the 1-1 mapping from ORs to ORNs ($\bE \approx \II$).
  • Downstream: found 1-1 mapping from ORNs to glomeruli.
  • Dicussion: These results depend strongly on the assumed structure of the noise.
063
Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures
  • 2AFC task with one target monomolecular odour among up to 16 distractor odours.
  • Performance depended on concentration of target odour, but not number of distractor odours.
  • Recapitulated in model with sparse, highly tuned receptors.
    • These only sense the target odour, unresponsive to distractors, hence affected by target SNR but not number of distractors.
140
A generative diffusion model reveals V2’s representation of natural images
  • Modeled the manifold of images by training a diffusion model to produce images.
  • Generated two kinds of “noise”:
    • On-manifold: Gaussian perturbations to the diffusion model input.
    • Off-manifold: Gaussian perturbations to the diffusion mode output.
  • Measured responses of Macaque V2.
    • On-manifold noise produced higher variability in the responses.
      • Such “noise” produces valid, different images, so the responses will reflect the new images geneated.
      • Cosine similarity of these responses decreased with noise level.
    • Off-manifold noise produced similar responses across all noise levels.
      • This noise produces corrupted versions of the same images, so lack of variability perhaps reflects mapping of all of these to the same image.
      • Cosine similarity was similar across noise levels.
  • Comment: It’s like V2 is inverting the diffusion model.
    • On-manifold noise produces variable activity as the mapping back to the image-generating latents, varies.
    • Off-manifold noise produces constant responses, as the mapping back is to the same image.
089 Interpretable time-series analysis with Gumbel dynamics
  • Context was dynamics of discrete latent state generating observations.
  • We track the probability of being in each state.
  • We want state occupancies to be nearly one-hot, for interpretability.
    • We want the system to be mostly in one state, rather than distributed across states.
  • Standard approach uses softmax, which has two problems:
    • If we work with the probabilities, then these can stay nearly uniform.
      • Downstream decoder/observation model transforming can expand the small fluctuations from uniformity as needed.
      • Hard to interpret as states occupancy will be distributed.
    • Alternatively, we can sample from the softmax.
      • Fixes the interpretability issue as exactly one state will be occupied
      • In efficient, as gradients etc. will only operate on that active state / sample.
  • Solution:Gumbel Softmax: $$ z \sim \text{GS}(\pi, \tau) \iff z \sim \text{softmax}\left({\pi + \eta \over \tau}\right), \quad \eta \sim \text{Gumbel}(0,1).$$
    • Pushes the softmax probabilities so state-occupancy is nearly one-hot.
      • Helps interpretability
      • More efficient than purely one hot because states with small probabilities are also update.
065 Continuous Multinomial Logistic Regression for Neural Decoding
  • Standard logistic regression: $$p(y_k|\xx) \propto \exp(-\ww_k^T \xx).$$
  • Weights are fixed per class.
  • Can be extended to a temporal dimension by treating each time bin independently.
    • This ignore temporal structure.
  • Idea: allow $\ww_k$ to vary smoothly in time by giving it a Gaussian process prior : $$ \ww_k \sim \text{GP}(\bzero, \lambda).$$
    • I think there was also some linking of the different states as well, so that their vector evolutions were also linked.
  • Take the mean $\ww_k(t)$ to compute output probabilities. $$ p(\yy_k | \xx(t)) \propto \exp(-\overline{\ww}_k(t)^T \xx).$$

Poster Session 2

136 Statistical theory for inferring population geometry in high-dimensional neural data

  • Used RMT to investigate how covariance estimation varies with the number of neurons $N$ and trials $T$.
  • First result was on PCA dimension (participation ratio), showing how subsampling $N$ neurons and $T$ trials to $M$ neurons and $P$ trials affects estimation by relating the PCA dimensions at the two configuration.
  • Next result was about how estimation error relative to true covariance, $\|\hat C – C\|_F^2$ varies as $D_N/T$.
    • $D_N$ is the true PCA dimensionality for $N$ neurons in the infinite trials limit.
  • What if the true dimensionality is not known? Replication error, $\|\hat C_1 – \hat C_2\|_F^2$, varies as $2 \hat{D}_{N,T}/T$.
    • $\hat{D}_{N,T}$ is the dimensionality of the sample.
  • Estimating eigenvectors and eigenvalues:
    • Low rank signal model: $C = U D U + \sigma^2 I$.
    • SNR for the $k$’th dimensions: $SNR_k = d_k/{N \sigma^2}$.
    • There was some critical SNR threshold above which eigenvalues and eigenvectors could be estimated.
    • Effective $R^2$ is $R^2 = {\sum_k O_k SNR_k – {K/N} \over \sum_k SNR_k -1 }.$
      • O_k is the alignment of the recovered eigenvector with the true eigenvector.
      • Recovering each mode contributes SNR, weighted by the alignment, but also brings noise (contributing $N^{-1}$ per mode).
      • Best rank to recover is the $K$ at which the numerator no longer increases, where adding one more mode brings more noise than signal.

157 Continuous partitioning of neuronal variability

  • Classic models of neural responses explain them as homogeneous Poisson processes whose rates are the product of a stimulus dependent tuning $f_s$ and a trial-specific gain $g_k$: $$y \sim \text{Poisson}(f_s g_k).$$
  • Innovation to capture temporal variability: the Continuous Modulated Poisson Model.
  • Tunings and gains as Gaussian processes: $$ \log g(t) \sim \text{GP}(0, K_g), \quad \log f_s(t) \sim \text{GP}(0, K_f(s)).$$
  • Gains modelled as $$K_g = \rho_g \exp\left(-{|t_1 – t_2|^q \over \ell_g}\right).$$
  • Can then e.g. monitor how parameters like optimal temporal correlation lengths $\ell_g$ and heavy-tailed-ness $q$ vary across brain areas.

35 Gain-modulated linear networks: a tractable framework for context-dependent computations

92 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines

  • Main idea: fit RBMs to neural data.
  • Marginalize out hidden states to get effective interactions between units at all orders.
    • Advantage over e.g. Schneidman’s approach by making it easier to estimate higher order interactions.
      • But surely subsampling problems must be the same? i.e. estimates of high-order interactions will be noisy due to lack of observations.
  • Compute index of higher order interactions between pairs of units.
    • Can indicate either missing units in the recording, or true higher order interactions (e.g. via glia).

218 A unified theory of feature learning in RNNs and DNNs

  • Compared the solutions in a regression task by RNNs and DNNs.
  • RNNs can be viewed as DNNs using temporal unrolling.
  • Key difference is that in the unrolled RNN, the layer weights are shared.
  • Weight sharing imposes an inductive bias which can make RNNs more sample efficient e.g. when learning temporal sequences.

152 Dynamical archetype analysis: Autonomous computation

208 Plastic Circuits for Context-Dependent Decisions

  • Investigated the effect of Short-Term Synaptic Plasticity (STSP) in an RNN performing the Mante 2013 task (output based on colour or motion signal, depending on context).
  • STSP: Strengths are determined as the product of utilization and activity, $w_{\text{eff}, ij}(t) =w_{ij} u_j(t) x_j(t).$
    • This is mean to model e.g. vesicle pool depletion etc.
  • A network using STSP could perform the task, one with Hebbian plasticity could not (?).
  • Found that in STSP, context information is stored in neural activity, not the synapses.
  • A network with fixed weights wanting to implement the same thing has to do so through nonlinear activations: $A(t) x(t) \to W \phi(x(t)),$, which presumably could get complicated, intractable.

173 Spatiotemporal Dynamics in Recurrent Neural Networks as Flow Invariance

045 A nonlocal variational framework for optimal neural representations


Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *