Author: Sina
-
Linearizing Covariance for the Free Model, Part II
I extend the linearization to include non-linear diagonal terms, but at least the simplest approximation doesn’t capture the large values we’re after.
-
Linearizing the Covariance Loss for the Free Model
I linearize the covariance for the Free model, and find that I need to include an additional diagonal component.
-
Deep Linear Networks Learn Hierarchical Structure
Running notes on Saxe et al.’s “A mathematical theory of semantic development in deep neural networks.”
-
Linearizing the Covariance Loss
We’re after insight, not an exact solution. ChatGPT had a good suggestion to linearize the loss around $\zz = 1$. In this post we do that.
-
The Diagonal Model with Centering
We’re going to try to make sense of the solutions to minimizing the following $$L(\zz) = {1 \over 2} \|\XX^T \ZZ^T \JJ \ZZ \XX – \SS\|_F^2 + {\lambda \over 2}\|\zz – \bone\|_2^2$$
-
Plume Marginalization
I workout the expression for the likelihood after marginalizing out flow changes.
-
When to Smell in Stereo?
We show that stereo-olfaction beats mono-olfaction when searching surfaces for for olfactory edges.
-
Unary vs. Binary Expressions of Independence
I discuss the unary and binary expressions of independence and how their meanings are slightly different.
-
Markov Blankets in Bayesian Networks
We determine how to find the Markov boundary of a node by first looking at some examples, then using a formal derivation.
-
Synchronization with Bimodal Spines
I update the activity-dependent synchrony model to make spines bimodal, increasing their inhibition when their parent GC is active.
-
Activity-Dependent Synchronization of Linear Integrate and Fire Units
We’re interested in activity-dependent synchronization. This is where the inhibition that a mitral cell receives from a granule cell spine requires that spine to have been previously depolarized enough to activate the NMDA channels that are required (via both the additional depolarization of the spine and increased Ca2+ influx they provide) to cause vesicle release.…
-
Eigenvalue Density via the Stieltjes Transform
Notes on how to compute the eigenvalue density of a random matrix using the Stieltes transform. based on Chapter 2 of “A First Course in Random Matrix Theory,” and conversations with ChatGPT.
-
Linking Representational Geometry and Neural Function
These are my notes on Harvey et al. “What represenational similarity measures imply about decodable information.”
-
Memory erasure by dopamine-gated retrospective learning
Hastily written notes immediately after the Gatsby TNJC where this preprint was presented.
-
Multivariate Gaussians from Bayesian Networks
We show in detail how to compute the mean and covariance of the multivariate Gaussian produced by a linear-Gaussian Bayesian network.
-
Notes on Toy Models of Superposition
On the Discord we’ve been discussing “Toy Models of Superposition” form Anthropic. It’s a long blog post, so these are my running notes to get people (and myself) up to speed if they’ve missed a week or two of the discussion. Problem Setup The authors’ basic aim is to demonstrate “superposition“: neurons representing multiple input…
-
Converting joint distributions to Bayesian networks
In these notes we discuss how to convert a joint distribution into a graph called a Bayesian network, and how the structure of the graph suggests ways to reduce the parameters required to specify the joint.
-
Jensen-Shannon Divergence
I discuss how the Jensen-Shannon divergence is a smoothed symmetrization of the KL divergence comparing two distributions, and connect it to the performance of their optimal binary discriminator.
-
Graph Spectra and Clustering
I describe how the spectrum of the graph built from a dataset can indicate its clustered-ness.
-
Matching Pearson Correlations
In this post I switch to matching Pearson correlations, rather than covariances, and then generalize to the scalar product of an arbitrary function of the outputs.