My running notes from Cosyne 2026. Most were hastily written during/immediately after live presentations, so likely contain errors reflecting my misunderstandings. Apologies to the presenters.
Poster Session 1
007 Convergent motifs of early olfactory processing
- Vertebrate and invertebrate olfactory systems are similar, but not derived from a common ancestor, implies convergent evolution.
- Question: Can the structure of the olfactory system be determined normatively?
- Model
- Sparse monomolecular concentration vectors $\cc$
- Sensed by receptors with affinities $\WW$.
- Expressed, not necessarily one-to-one, in ORNs according to $\bE$.
- Hill transformed, plus isotropic additive noise, to give ORN responses.$$ \rr = \varphi(\bE \WW \cc) + \eta.$$
- ORN responses converge, not necessarily one to one, on glomeruli
- Learned parameters of the Hill function, and mappings of ORs to ORNs, to maximize mutual information between odours and responses.
- I think there was an energetic cost somewhere as well.
- Found the 1-1 mapping from ORs to ORNs ($\bE \approx \II$).
- Downstream: found 1-1 mapping from ORNs to glomeruli.
- Dicussion: These results depend strongly on the assumed structure of the noise.
063
Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures
- 2AFC task with one target monomolecular odour among up to 16 distractor odours.
- Performance depended on concentration of target odour, but not number of distractor odours.
- Recapitulated in model with sparse, highly tuned receptors.
- These only sense the target odour, unresponsive to distractors, hence affected by target SNR but not number of distractors.
140
A generative diffusion model reveals V2’s representation of natural images
- Modeled the manifold of images by training a diffusion model to produce images.
- Generated two kinds of “noise”:
- On-manifold: Gaussian perturbations to the diffusion model input.
- Off-manifold: Gaussian perturbations to the diffusion mode output.
- Measured responses of Macaque V2.
- On-manifold noise produced higher variability in the responses.
- Such “noise” produces valid, different images, so the responses will reflect the new images geneated.
- Cosine similarity of these responses decreased with noise level.
- Off-manifold noise produced similar responses across all noise levels.
- This noise produces corrupted versions of the same images, so lack of variability perhaps reflects mapping of all of these to the same image.
- Cosine similarity was similar across noise levels.
- On-manifold noise produced higher variability in the responses.
- Comment: It’s like V2 is inverting the diffusion model.
- On-manifold noise produces variable activity as the mapping back to the image-generating latents, varies.
- Off-manifold noise produces constant responses, as the mapping back is to the same image.
089 Interpretable time-series analysis with Gumbel dynamics
- Context was dynamics of discrete latent state generating observations.
- We track the probability of being in each state.
- We want state occupancies to be nearly one-hot, for interpretability.
- We want the system to be mostly in one state, rather than distributed across states.
- Standard approach uses softmax, which has two problems:
- If we work with the probabilities, then these can stay nearly uniform.
- Downstream decoder/observation model transforming can expand the small fluctuations from uniformity as needed.
- Hard to interpret as states occupancy will be distributed.
- Alternatively, we can sample from the softmax.
- Fixes the interpretability issue as exactly one state will be occupied
- In efficient, as gradients etc. will only operate on that active state / sample.
- If we work with the probabilities, then these can stay nearly uniform.
- Solution:Gumbel Softmax: $$ z \sim \text{GS}(\pi, \tau) \iff z \sim \text{softmax}\left({\pi + \eta \over \tau}\right), \quad \eta \sim \text{Gumbel}(0,1).$$
- Pushes the softmax probabilities so state-occupancy is nearly one-hot.
- Helps interpretability
- More efficient than purely one hot because states with small probabilities are also update.
- Pushes the softmax probabilities so state-occupancy is nearly one-hot.
065 Continuous Multinomial Logistic Regression for Neural Decoding
- Standard logistic regression: $$p(y_k|\xx) \propto \exp(-\ww_k^T \xx).$$
- Weights are fixed per class.
- Can be extended to a temporal dimension by treating each time bin independently.
- This ignore temporal structure.
- Idea: allow $\ww_k$ to vary smoothly in time by giving it a Gaussian process prior : $$ \ww_k \sim \text{GP}(\bzero, \lambda).$$
- I think there was also some linking of the different states as well, so that their vector evolutions were also linked.
- Take the mean $\ww_k(t)$ to compute output probabilities. $$ p(\yy_k | \xx(t)) \propto \exp(-\overline{\ww}_k(t)^T \xx).$$
Poster Session 2
136 Statistical theory for inferring population geometry in high-dimensional neural data
- Used RMT to investigate how covariance estimation varies with the number of neurons $N$ and trials $T$.
- First result was on PCA dimension (participation ratio), showing how subsampling $N$ neurons and $T$ trials to $M$ neurons and $P$ trials affects estimation by relating the PCA dimensions at the two configuration.
- Next result was about how estimation error relative to true covariance, $\|\hat C – C\|_F^2$ varies as $D_N/T$.
- $D_N$ is the true PCA dimensionality for $N$ neurons in the infinite trials limit.
- What if the true dimensionality is not known? Replication error, $\|\hat C_1 – \hat C_2\|_F^2$, varies as $2 \hat{D}_{N,T}/T$.
- $\hat{D}_{N,T}$ is the dimensionality of the sample.
- Estimating eigenvectors and eigenvalues:
- Low rank signal model: $C = U D U + \sigma^2 I$.
- SNR for the $k$’th dimensions: $SNR_k = d_k/{N \sigma^2}$.
- There was some critical SNR threshold above which eigenvalues and eigenvectors could be estimated.
- Effective $R^2$ is $R^2 = {\sum_k O_k SNR_k – {K/N} \over \sum_k SNR_k -1 }.$
- O_k is the alignment of the recovered eigenvector with the true eigenvector.
- Recovering each mode contributes SNR, weighted by the alignment, but also brings noise (contributing $N^{-1}$ per mode).
- Best rank to recover is the $K$ at which the numerator no longer increases, where adding one more mode brings more noise than signal.
157 Continuous partitioning of neuronal variability
- Classic models of neural responses explain them as homogeneous Poisson processes whose rates are the product of a stimulus dependent tuning $f_s$ and a trial-specific gain $g_k$: $$y \sim \text{Poisson}(f_s g_k).$$
- Innovation to capture temporal variability: the Continuous Modulated Poisson Model.
- Tunings and gains as Gaussian processes: $$ \log g(t) \sim \text{GP}(0, K_g), \quad \log f_s(t) \sim \text{GP}(0, K_f(s)).$$
- Gains modelled as $$K_g = \rho_g \exp\left(-{|t_1 – t_2|^q \over \ell_g}\right).$$
- Can then e.g. monitor how parameters like optimal temporal correlation lengths $\ell_g$ and heavy-tailed-ness $q$ vary across brain areas.
35 Gain-modulated linear networks: a tractable framework for context-dependent computations
92 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines
- Main idea: fit RBMs to neural data.
- Marginalize out hidden states to get effective interactions between units at all orders.
- Advantage over e.g. Schneidman’s approach by making it easier to estimate higher order interactions.
- But surely subsampling problems must be the same? i.e. estimates of high-order interactions will be noisy due to lack of observations.
- Advantage over e.g. Schneidman’s approach by making it easier to estimate higher order interactions.
- Compute index of higher order interactions between pairs of units.
- Can indicate either missing units in the recording, or true higher order interactions (e.g. via glia).
218 A unified theory of feature learning in RNNs and DNNs
- Compared the solutions in a regression task by RNNs and DNNs.
- RNNs can be viewed as DNNs using temporal unrolling.
- Key difference is that in the unrolled RNN, the layer weights are shared.
- Weight sharing imposes an inductive bias which can make RNNs more sample efficient e.g. when learning temporal sequences.
152 Dynamical archetype analysis: Autonomous computation
208 Plastic Circuits for Context-Dependent Decisions
- Investigated the effect of Short-Term Synaptic Plasticity (STSP) in an RNN performing the Mante 2013 task (output based on colour or motion signal, depending on context).
- STSP: Strengths are determined as the product of utilization and activity, $w_{\text{eff}, ij}(t) =w_{ij} u_j(t) x_j(t).$
- This is mean to model e.g. vesicle pool depletion etc.
- A network using STSP could perform the task, one with Hebbian plasticity could not (?).
- Found that in STSP, context information is stored in neural activity, not the synapses.
- A network with fixed weights wanting to implement the same thing has to do so through nonlinear activations: $A(t) x(t) \to W \phi(x(t)),$, which presumably could get complicated, intractable.
Leave a Reply