{"id":7292,"date":"2026-03-13T19:40:19","date_gmt":"2026-03-13T19:40:19","guid":{"rendered":"https:\/\/sinatootoonian.com\/?p=7292"},"modified":"2026-03-25T11:34:11","modified_gmt":"2026-03-25T11:34:11","slug":"cosyne-2026","status":"publish","type":"post","link":"https:\/\/sinatootoonian.com\/index.php\/2026\/03\/13\/cosyne-2026\/","title":{"rendered":"Cosyne 2026"},"content":{"rendered":"\n<p><em>My running notes from Cosyne 2026. Most were hastily written during\/immediately after live presentations, so likely contain errors reflecting my misunderstandings. Apologies to the presenters.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Posters by Topic<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Experimental<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#convergent-motifs\">1-007 Convergent motifs of early olfactory processing<\/a><\/li>\n\n\n\n<li><a href=\"#sparse-glomerular\">1-063 <br>Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures<\/a><\/li>\n\n\n\n<li><a href=\"#generalization-memorization\">3-070 Generalization and memorization in mouse olfactory learning<\/a><\/li>\n\n\n\n<li><a href=\"#sensory-prediction\">3-130 Sensory prediction errors update predictive representations<\/a><\/li>\n\n\n\n<li><a href=\"#generative-diffusion\">1-140 <br>A generative diffusion model reveals V2\u2019s <\/a>representation of natural images<\/li>\n\n\n\n<li><a href=\"#state-dependent\">3-005 State-dependent modulation of neocortical sensory processing<\/a><\/li>\n\n\n\n<li><a href=\"#noise-correlations\">3-225 Noise Correlations for Efficient Learning<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Data Analysis<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#uncovering-statistical\">2-092 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines<\/a><\/li>\n\n\n\n<li><a href=\"#interpretable-timeseries\">1-089 INTERPRETABLE TIME-SERIES ANALYSIS WITH <strong>GUMBEL DYNAMICS<\/strong><\/a><\/li>\n\n\n\n<li><a href=\"#identifying-interpretable\">3-138 Identifying interpretable latent factors within and across brain regions<\/a><\/li>\n\n\n\n<li><a href=\"#identifying-neural\">3-030 Identifying Neural Activity Manifolds through <strong>Non-reversibility <\/strong>Analysis<\/a><\/li>\n\n\n\n<li><a href=\"#continuous-multinomial\">1-065 Continuous Multinomial Logistic Regression for Neural Decoding<\/a><\/li>\n\n\n\n<li><a href=\"#continuous-partitioning\">2-157 Continuous partitioning of neuronal variability<\/a><\/li>\n\n\n\n<li><a href=\"#dynamical-archetype\">2-152 DYNAMICAL ARCHETYPE ANALYSIS: AUTONOMOUS COMPUTATION<\/a><\/li>\n\n\n\n<li><a href=\"#generalized-dsa\">3-069 Generalized DSA: Comparing Neural Population Dynamics by Identifying Optimal Linearizing Embeddings<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Theory\/Modeling<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"#statistical-theory\">2-136 Statistical theory for inferring population geometry in high-dimensional neural data<\/a><\/li>\n\n\n\n<li><a href=\"#estimating-neural\">3-154 Estimating neural coding fidelity in high dimensions with limited samples<\/a><\/li>\n\n\n\n<li><a href=\"#nonlocal-variational\">2-045 A non-local variational framework for optimal neural representations<\/a><\/li>\n\n\n\n<li><a href=\"#unified-theory\">2-218 A unified theory of feature learning in RNNs and DNNs<\/a><\/li>\n\n\n\n<li><a href=\"#plastic-circuits\">2-208 Plastic Circuits for Context-Dependent Decisions<\/a><\/li>\n\n\n\n<li><a href=\"#spatiotemporal-dynamics\">2-173 Spatiotemporal Dynamics in Recurrent Neural Networks as <strong>Flow Invariance<\/strong><\/a><\/li>\n\n\n\n<li><a href=\"#distinguishing-probabilistic\">3-111 Distinguishing probabilistic from heuristic neural representations of uncertainty<\/a><\/li>\n\n\n\n<li><a href=\"#canonical-cortical\">3-046 Canonical cortical circuits: A unified sampling machine for static and dynamic inference<\/a><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Posters by Session<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"#poster-session1\">Poster Session 1<\/a><\/h3>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"convergent-motifs\"><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2025.09.03.673748v1\">1-007 Convergent motifs of early olfactory processing<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Vertebrate and invertebrate olfactory systems are similar, but not derived from a common ancestor, implies convergent evolution.<\/li>\n\n\n\n<li>Question: Can the structure of the olfactory system be determined normatively?<\/li>\n\n\n\n<li>Model\n<ul class=\"wp-block-list\">\n<li>Sparse monomolecular concentration vectors $\\cc$<\/li>\n\n\n\n<li>Sensed by receptors with affinities $\\WW$.<\/li>\n\n\n\n<li>Expressed, not necessarily one-to-one, in ORNs according to $\\bE$.<\/li>\n\n\n\n<li>Hill transformed, plus isotropic additive noise, to give ORN responses.$$ \\rr = \\varphi(\\bE \\WW \\cc) + \\eta.$$<\/li>\n\n\n\n<li>ORN responses converge, not necessarily one to one, on glomeruli<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Learned parameters of the Hill function, and mappings of ORs to ORNs, to maximize mutual information between odours and responses.\n<ul class=\"wp-block-list\">\n<li>I think there was an energetic cost somewhere as well.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Found the 1-1 mapping from ORs to ORNs ($\\bE \\approx \\II$).<\/li>\n\n\n\n<li>Downstream: found 1-1 mapping from ORNs to glomeruli.<\/li>\n\n\n\n<li>Dicussion: These results depend strongly on the assumed structure of the noise.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"sparse-glomerular\">1-063 <br>Sparse glomerular representations explain odor discrimination in complex, concentration-varying mixtures<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>2AFC task with one target monomolecular odour among up to 16 distractor odours.<\/li>\n\n\n\n<li>Performance depended on concentration of target odour, but not number of distractor odours.<\/li>\n\n\n\n<li>Recapitulated in model with sparse, highly tuned receptors.\n<ul class=\"wp-block-list\">\n<li>These only sense the target odour, unresponsive to distractors, hence affected by target SNR but not number of distractors.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"generative-diffusion\">1-140 <br>A generative diffusion model reveals V2\u2019s representation of natural images<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Modeled the manifold of images by training a diffusion model to produce images.<\/li>\n\n\n\n<li>Generated two kinds of &#8220;noise&#8221;:\n<ul class=\"wp-block-list\">\n<li>On-manifold: Gaussian perturbations to the diffusion model <strong>input<\/strong>.<\/li>\n\n\n\n<li>Off-manifold: Gaussian perturbations to the diffusion mode <strong>output<\/strong>.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Measured responses of Macaque V2.\n<ul class=\"wp-block-list\">\n<li>On-manifold noise produced higher variability in the responses.\n<ul class=\"wp-block-list\">\n<li>Such &#8220;noise&#8221; produces valid, different images, so the responses will reflect the new images geneated.<\/li>\n\n\n\n<li>Cosine similarity of these responses decreased with noise level.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Off-manifold noise produced similar responses across all noise levels.\n<ul class=\"wp-block-list\">\n<li>This noise produces corrupted versions of the same images, so lack of variability perhaps reflects mapping of all of these to the same image.<\/li>\n\n\n\n<li>Cosine similarity was similar across noise levels.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Comment:<\/strong> It&#8217;s like V2 is inverting the diffusion model.\n<ul class=\"wp-block-list\">\n<li>On-manifold noise produces variable activity as the mapping back to the image-generating latents, varies.<\/li>\n\n\n\n<li>Off-manifold noise produces constant responses, as the mapping back is to the same image.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"interpretable-timeseries\">1-089 <a href=\"https:\/\/arxiv.org\/abs\/2509.21578\">Interpretable time-series analysis with Gumbel dynamics<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Context was dynamics of discrete latent state generating observations.<\/li>\n\n\n\n<li>We track the probability of being in each state.<\/li>\n\n\n\n<li>We want state occupancies to be nearly one-hot, for interpretability.\n<ul class=\"wp-block-list\">\n<li>We want the system to be mostly in one state, rather than distributed across states.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Standard approach uses softmax, which has two problems:\n<ul class=\"wp-block-list\">\n<li>If we work with the probabilities, then these can stay nearly uniform.\n<ul class=\"wp-block-list\">\n<li>Downstream decoder\/observation model transforming can expand the small fluctuations from uniformity as needed.<\/li>\n\n\n\n<li>Hard to interpret as states occupancy will be distributed.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Alternatively, we can sample from the softmax.\n<ul class=\"wp-block-list\">\n<li>Fixes the interpretability issue as exactly one state will be occupied<\/li>\n\n\n\n<li>In efficient, as gradients etc. will only operate on that active state \/ sample.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Solution:<strong>Gumbel Softmax<\/strong>: $$ z \\sim \\text{GS}(\\pi, \\tau) \\iff z \\sim \\text{softmax}\\left({\\pi + \\eta \\over \\tau}\\right), \\quad \\eta \\sim \\text{Gumbel}(0,1).$$\n<ul class=\"wp-block-list\">\n<li>Pushes the softmax probabilities so state-occupancy is nearly one-hot.\n<ul class=\"wp-block-list\">\n<li>Helps interpretability<\/li>\n\n\n\n<li>More efficient than purely one hot because states with small probabilities are also update.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"continuous-multinomial\"><a href=\"https:\/\/openreview.net\/forum?id=theeeNBSTG\">1-065 Continuous Multinomial Logistic Regression for Neural Decoding<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Standard logistic regression: $$p(y_k|\\xx) \\propto \\exp(-\\ww_k^T \\xx).$$<\/li>\n\n\n\n<li>Weights are fixed per class.<\/li>\n\n\n\n<li>Can be extended to a temporal dimension by treating each time bin independently.\n<ul class=\"wp-block-list\">\n<li>This ignore temporal structure.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Idea: allow $\\ww_k$ to vary smoothly in time by giving it a Gaussian process prior : $$ \\ww_k \\sim \\text{GP}(\\bzero, \\lambda).$$\n<ul class=\"wp-block-list\">\n<li>I think there was also some linking of the different states as well, so that their vector evolutions were also linked.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Take the mean $\\ww_k(t)$ to compute output probabilities. $$ p(\\yy_k | \\xx(t)) \\propto \\exp(-\\overline{\\ww}_k(t)^T \\xx).$$<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Poster Session 2<\/h3>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"statistical-theory\">2-136 Statistical theory for inferring population geometry in high-dimensional neural data<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Used RMT to investigate how covariance estimation varies with the number of neurons $N$ and trials $T$.<\/li>\n\n\n\n<li>First result was on PCA dimension (participation ratio), showing how subsampling $N$ neurons and $T$ trials to $M$ neurons and $P$ trials affects estimation by relating the PCA dimensions at the two configuration.<\/li>\n\n\n\n<li>Next result was about how estimation error relative to true covariance, $\\|\\hat C &#8211; C\\|_F^2$ varies as $D_N\/T$. \n<ul class=\"wp-block-list\">\n<li>$D_N$ is the true PCA dimensionality for $N$ neurons in the infinite trials limit.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>What if the true dimensionality is not known? Replication error, $\\|\\hat C_1 &#8211; \\hat C_2\\|_F^2$, varies as $2 \\hat{D}_{N,T}\/T$.\n<ul class=\"wp-block-list\">\n<li>$\\hat{D}_{N,T}$ is the dimensionality of the sample.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li> Estimating eigenvectors and eigenvalues:\n<ul class=\"wp-block-list\">\n<li>Low rank signal model: $C = U D U + \\sigma^2 I$.<\/li>\n\n\n\n<li>SNR for the $k$&#8217;th dimensions: $SNR_k = d_k\/{N \\sigma^2}$.<\/li>\n\n\n\n<li>There was some critical SNR threshold above which eigenvalues and eigenvectors could be estimated.<\/li>\n\n\n\n<li>Effective $R^2$ is $R^2 = {\\sum_k O_k SNR_k &#8211; {K\/N} \\over \\sum_k SNR_k -1 }.$\n<ul class=\"wp-block-list\">\n<li>O_k is the alignment of the recovered eigenvector with the true eigenvector.<\/li>\n\n\n\n<li>Recovering each mode contributes SNR, weighted by the alignment, but also brings noise (contributing $N^{-1}$ per mode).<\/li>\n\n\n\n<li>Best rank to recover is the $K$ at which the numerator no longer increases, where adding one more mode brings more noise than signal.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"continuous-partitioning\"><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2025.07.23.666404v1.full\">2-157 Continuous partitioning of neuronal variability<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Classic models of neural responses explain them as homogeneous Poisson processes whose rates are the product of a stimulus dependent tuning $f_s$ and a trial-specific gain $g_k$: $$y \\sim \\text{Poisson}(f_s g_k).$$<\/li>\n\n\n\n<li>Innovation to capture temporal variability: the <strong>Continuous Modulated Poisson Model<\/strong>. <\/li>\n\n\n\n<li>Tunings and gains as Gaussian processes: $$ \\log g(t) \\sim \\text{GP}(0, K_g), \\quad \\log f_s(t) \\sim \\text{GP}(0, K_f(s)).$$<\/li>\n\n\n\n<li>Gains modelled as $$K_g = \\rho_g \\exp\\left(-{|t_1 &#8211; t_2|^q \\over \\ell_g}\\right).$$<\/li>\n\n\n\n<li>Can then e.g. monitor how parameters like optimal temporal correlation lengths $\\ell_g$ and heavy-tailed-ness $q$ vary across brain areas.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"uncovering-statistical\"><a href=\"https:\/\/arxiv.org\/abs\/2603.11032\">2-092 Uncovering statistical structure in large-scale neural activity with restricted Boltzmann machines<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Main idea: fit RBMs to neural data.<\/li>\n\n\n\n<li>Marginalize out hidden states to get effective interactions between units at all orders.\n<ul class=\"wp-block-list\">\n<li>Advantage over e.g. Schneidman&#8217;s approach by making it easier to estimate higher order interactions.\n<ul class=\"wp-block-list\">\n<li>But surely subsampling problems must be the same? i.e. estimates of high-order interactions will be noisy due to lack of observations.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Compute index of higher order interactions between pairs of units.\n<ul class=\"wp-block-list\">\n<li>Can indicate either missing units in the recording, or true higher order interactions (e.g. via glia).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"unified-theory\">2-218 A unified theory of feature learning in RNNs and DNNs<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Compared the solutions in a regression task by RNNs and DNNs.<\/li>\n\n\n\n<li>RNNs can be viewed as DNNs using temporal unrolling.<\/li>\n\n\n\n<li>Key difference is that in the unrolled RNN, the layer weights are shared.<\/li>\n\n\n\n<li>Weight sharing imposes an inductive bias which can make RNNs more sample efficient e.g. when learning temporal sequences.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"dynamical-archetype\"><a href=\"https:\/\/arxiv.org\/abs\/2507.05505\">2-152 Dynamical archetype analysis: Autonomous computation<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Neural systems often have different geometry, but the same topology\n<ul class=\"wp-block-list\">\n<li>Converge to the same pattern of fixed points, repelled by the same set of repellers etc.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>This is called <strong>topological conjugacy<\/strong>: Two systems are topologically conjugate if there is a homeomorphism $\\Phi$ that transforms one set of dynamics into the other.<\/li>\n\n\n\n<li>The set of all homeomorphisms is too large, so parameterize using a Neural ODE.<\/li>\n\n\n\n<li>Compute the distance between two sets of dynamics $f$, $g$ by minimizing a combination of a trajectory mismatch $d_\\text{traj}$, and homeomorphism complexity $d_\\text{cxty}$.<\/li>\n\n\n\n<li>Trajectory loss: Given  the trajectory at time $t$ starting at $x$ under $f$, dynamics, $\\phi_f^t(x)$, and similarly $\\phi_g^t(x)$: $$ d_\\text{traj}(\\Phi; f, g) = \\int_t \\| \\phi_f^t(x) &#8211; \\underbrace{\\Phi(\\phi_g^t ( \\Phi^{-1}(x)))}_{\\text{$g$ trajectory in $f$ space}}\\| dt$$<\/li>\n\n\n\n<li>Complexity loss: $$ d_\\text{cxty}(\\Phi) = \\int \\| \\nabla \\Phi(x) &#8211; I \\| dx,$$ evaluated along the $f$ trajectory (?).<\/li>\n\n\n\n<li>Measure the distance of a given neural system to a fixed set of archetype dynamics, e.g. line attractor, ring attractor.<\/li>\n\n\n\n<li>Showed that they could correctly map dynamics to their archetypes when the ground truth was known. \n<ul class=\"wp-block-list\">\n<li>E.g. neural ring attractor dynamics (fly system? head direction system?) mapped to ring attractor archetype.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"plastic-circuits\">2-208 Plastic Circuits for Context-Dependent Decisions<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Investigated the effect of Short-Term Synaptic Plasticity (STSP) in an RNN performing the Mante 2013 task (output based on colour or motion signal, depending on context).<\/li>\n\n\n\n<li>STSP: Strengths are determined as the product of utilization and activity, $w_{\\text{eff}, ij}(t) =w_{ij} u_j(t) x_j(t).$\n<ul class=\"wp-block-list\">\n<li>This is mean to model e.g. vesicle pool depletion etc.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>A network using STSP could perform the task, one with Hebbian plasticity could not (?).<\/li>\n\n\n\n<li>Found that in STSP, context information is stored in neural activity, not the synapses.<\/li>\n\n\n\n<li>A network with fixed weights wanting to implement the same thing has to do so through nonlinear activations: $A(t) x(t) \\to W \\phi(x(t)),$, which presumably could get complicated, intractable.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"spatiotemporal-dynamics\"><a href=\"https:\/\/arxiv.org\/abs\/2409.13669v2\">2-173 Spatiotemporal Dynamics in Recurrent Neural Networks as Flow Invariance<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>RNNs are often used to learn stimulus dynamics.<\/li>\n\n\n\n<li>It&#8217;s natural to want equivariant hidden representations: flow in the stimulus results in corresponding flow in the latents.<\/li>\n\n\n\n<li>Incorporating such equivariance into the RNN dynamics can dramatically speed up learning.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"nonlocal-variational\">2-045 A non-local variational framework for optimal neural representations<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Tuning curves are often defined by maximizing Fisher information.<\/li>\n\n\n\n<li>Fisher information is a local measure &#8211; doesn&#8217;t capture e.g. errors due to jumps in the inferred values.<\/li>\n\n\n\n<li>Mutual information is global, but can be hard to compute.<\/li>\n\n\n\n<li>Solution: a non-local loss on tuning curves comparing all pairs of input stimuli: $$ L[f] = {1 \\over (2 \\pi)^2} \\int_\\theta \\int_{\\theta&#8217;} \\ell(f(\\theta), f(\\theta&#8217;)),$$ where $\\ell$ is misclassification error.<\/li>\n\n\n\n<li>How to solve this?\n<ul class=\"wp-block-list\">\n<li>$f(\\theta) = p(x|\\theta)$, the population response.<\/li>\n\n\n\n<li>The population response space is a manifold with the Fisher-Rao metric.<\/li>\n\n\n\n<li>The responses to two stimuli are two points in this space, separated by a geodesic distance $d$.<\/li>\n\n\n\n<li>Classification error $\\ell$ is approximately erfc of this distance.<\/li>\n\n\n\n<li>The set of all responses (to circular stimuli) forms a closed curve in this space.<\/li>\n\n\n\n<li>The optimal tuning curves that minimize the loss form a circle in the space of square-root firing rates.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Poster Session 3<\/h3>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"estimating-neural\">3-154 Estimating neural coding fidelity in high dimensions with limited samples<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>d&#8217; measures discriminability but can be biased when neurons >> trials.<\/li>\n\n\n\n<li>Used RMT to estimate d&#8217; in high-dimensional setting and produce a less-biased estimator.<\/li>\n\n\n\n<li>Key quantity: Signal aligned spectrum: $$ G_\\rho(x) = \\sum_{i=1}^N \\left({v_i^T u \\over \\|u\\|}\\right)^2 1_{x > \\lambda_i},$$ where $v_i$ are the noise directions in decreasing order of variance $\\lambda_i$, and $u$ is the signal direction. <\/li>\n\n\n\n<li>In those terms, $$ d&#8217; = \\|u\\|^2 \\int {1 \\over \\lambda} dG(\\lambda).$$<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"identifying-interpretable\"><br>3-138 Identifying interpretable latent factors within and across brain regions<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Decompose temporal activity into sparse factors, orthogonal factors convolved with Gaussian filters, possibly with some delay.\n<ul class=\"wp-block-list\">\n<li>Orthogonality and sparsity give interpretability.<\/li>\n\n\n\n<li>Convolution with Gaussian filters is faster to fit than general Gaussian process.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"generalization-memorization\">3-070 Generalization and memorization in mouse olfactory learning<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Trained mice to distinguish a variety of sixteen component mixtures to test generalization vs memorization.<\/li>\n\n\n\n<li>Mice can do both, biased towards simple rules when these exist.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"sensory-prediction\">3-130 Sensory prediction errors update predictive representations<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>RSC is the source of sensory predictions (not prediction errors) to V1.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"noise-correlations\">3-225 Noise Correlations for Efficient Learning<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>For optimal discrimination we want noise correlations to be orthogonal to the discriminating dimensions.<\/li>\n\n\n\n<li>Tested humans on a joint color-motion discrimination task, where the rule would occasionally flip. <\/li>\n\n\n\n<li>Modelled this with shallow linear net mapping color and motion to the decision.<\/li>\n\n\n\n<li>Observed that noise correlations in the model were <strong>parallel<\/strong> to the optimal discrimination direction.\n<ul class=\"wp-block-list\">\n<li>This would produce sub-optimal accuracy, and indeed it does.<\/li>\n\n\n\n<li>But they hypothesize that it helps find the discriminating directions.\n<ul class=\"wp-block-list\">\n<li>I think this is putting the cart before the horse.<\/li>\n\n\n\n<li>Classifiers will find the discriminating directions, and that in turn will affect the noise correlations.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"state-dependent\">3-005 State-dependent modulation of neocortical sensory processing<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Imaged mouse S1 and PPC during two-alternative multi-sensory discrimination task.<\/li>\n\n\n\n<li>Modelled brain activity using a 3-state GLM-HMM<\/li>\n\n\n\n<li>Produced one &#8220;engaged&#8221; state, two &#8220;disengaged&#8221; states.<\/li>\n\n\n\n<li>In the engaged state:\n<ul class=\"wp-block-list\">\n<li>Stronger reps in S1, PPC-A<\/li>\n\n\n\n<li>Stronger communication b\/w S1 to PPC-A<\/li>\n\n\n\n<li> Stronger bottom-up activation.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"generalized-dsa\">3-069 Generalized DSA: Comparing Neural Population Dynamics by Identifying Optimal Linearizing Embeddings<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>DSA has two steps: \n<ul class=\"wp-block-list\">\n<li>Map nonlinear dynamics to high (infinite) dimensional linear dynamics using Koopman theory.<\/li>\n\n\n\n<li>Find an orthonormal coordinate transformation that best lines up one set with another: $$ d_\\text{DSA}(A,B) = \\min_{C \\in O(N)} \\|A &#8211; C B C^T\\|.$$<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Genearlized DSA:\n<ul class=\"wp-block-list\">\n<li>Find eigenspectrum of dynamics<\/li>\n\n\n\n<li>Measure distance between spectra using optimal transport.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Ostrow: Is faster \/ works better than DSA.<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"identifying-neural\">3-030 Identifying Neural Activity Manifolds through Non-reversibility Analysis<\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Neural dynamics are generally <strong>not<\/strong> reversible.\n<ul class=\"wp-block-list\">\n<li>i.e. $p(x_t = a, x_{t+1} = b) != p(x_t = b, x_{t+1}=a)$<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Noise dynamics <strong>are<\/strong> reversible.<\/li>\n\n\n\n<li>Idea: Dimension reduce dynamics by finding projections that produce maximally non-reversible dynamics.<\/li>\n\n\n\n<li>Let $X_k \\in \\RR^{N \\times T}$ be the responses to condition $k$. <\/li>\n\n\n\n<li>Compute the covariance of the vectorized responses $C = \\EE(\\vec{X_k^T}, \\vec{X_k^T}^T).$<\/li>\n\n\n\n<li>Split this into a reversible and non-reversible part:\n<ul class=\"wp-block-list\">\n<li>Let $\\sigma(C)$ be the time-transposed covariances.<\/li>\n\n\n\n<li>$C^+ = C + \\sigma(C)$<\/li>\n\n\n\n<li>$C^- = C &#8211; \\sigma(C)$<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Non-reversibility index: $$\\xi = {\\|C^-\\|_F \\over \\|C^+\\|_F}.$$<\/li>\n\n\n\n<li>Tough to maximize this coefficient itself, instead just maximize the numerator:\n<ul class=\"wp-block-list\">\n<li>Find $U$ to maximize $\\|C^-\\|$ for the projected data $Y_k = U^T X_k$.<\/li>\n\n\n\n<li>Non-reversible part has a simple expression: $$ \\|C^-\\|_F^2 = \\sum_{k,k&#8217;} \\tr{Y_k Y_{k&#8217;}^T}^2 &#8211; \\tr{Y_k Y_k^T Y_{k&#8217;} Y_{k&#8217;}^T}.$$<\/li>\n\n\n\n<li>Can be kernelized.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"distinguishing-probabilistic\"><a href=\"https:\/\/openreview.net\/forum?id=haNKHOak3J\">3-111 Distinguishing probabilistic from heuristic neural representations of uncertainty<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When are neural representations truly probabilistic rather than just heuristic?<\/li>\n\n\n\n<li>Found that truly probabilistic reps require a bottleneck that forces learning of sufficient statistics.<\/li>\n\n\n\n<li>Otherwise can just memorize inputs.\n<ul class=\"wp-block-list\">\n<li>Measurable if can predict inputs from the hidden states.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"canonical-cortical\"><br><a href=\"https:\/\/openreview.net\/forum?id=BpBW4gJofo\">3-046 Canonical cortical circuits: A unified sampling machine for static and dynamic inference<\/a><\/h5>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Found that the canonical microcircuit was a substrate for Hamiltonian dynamics and allowed fast inference.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Workshops 1<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Eero<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Efficient coding (Barlow): information transmission while minimizing redundancy.<\/li>\n\n\n\n<li>First part of the talk: building circuit models that progressively reduce different kinds of redundancy, and how these match up to corresponding visual ares.<\/li>\n\n\n\n<li>Second part of talk: accessing the image manifold through denoising\n<ul class=\"wp-block-list\">\n<li>Supervised training of image denoisers\n<ul class=\"wp-block-list\">\n<li>For least squares loss this reports posterior mean: $$ y = x + z \\mapsto \\hat x(y) = \\int x p(x|y) dx .$$ <\/li>\n\n\n\n<li>The trained system implicitly contains information about the prior on image. How to access this information? <\/li>\n\n\n\n<li><strong>Tweedie&#8217;s<\/strong> identity: $$\\hat{x}(y) = y + \\sigma^2 \\nabla_y \\log p(y).$$<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Ken Harris<\/h3>\n\n\n\n<h2 class=\"wp-block-heading\">Workshops 2<\/h2>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Questions<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Can the OT metric on tuning curves be pulled back to a metric in the response space to allow direct comparison of datasets with different sizes?<\/li>\n\n\n\n<li>Non-reversibility analysis looks for an orthogonal projection in neural space. Is there also an optimal time scale, a projection along the time direction, to compute non-reversibility over?<\/li>\n\n\n\n<li>Ken Harris mentioned the trivial neural code for numbers, where successive neurons code for successive significant digits.\n<ul class=\"wp-block-list\">\n<li>This is an intensive code, information per dimensions decays to zero in the large N limit.<\/li>\n\n\n\n<li>What is a corresponding extensive code, that distributes the information evenly among neurons?\n<ul class=\"wp-block-list\">\n<li>That is also easy to decode?<\/li>\n\n\n\n<li>Does the naive encoding correspond to axis-aligned coordinates?\n<ul class=\"wp-block-list\">\n<li>And the a distributed code could be a rotation?<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Matthew Chalk&#8217;s talk\n<ul class=\"wp-block-list\">\n<li>He defined a multi-resolution information metric by looking at how fisher information changed with noise corruption of different magnitudes.<\/li>\n\n\n\n<li>The eigenvectors of the metric give the most informative directions for each stimulus.<\/li>\n\n\n\n<li>Can we derie Can the multi-resolution information metric be derived from locally most informative directions?<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>My running notes on Cosyne 2026.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,167],"tags":[],"class_list":["post-7292","post","type-post","status-publish","format-standard","hentry","category-blog","category-conference"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/7292","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/comments?post=7292"}],"version-history":[{"count":91,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/7292\/revisions"}],"predecessor-version":[{"id":7533,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/7292\/revisions\/7533"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=7292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/categories?post=7292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/tags?post=7292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}