Category: Blog
-
Linking Representational Geometry and Neural Function
These are my notes on Harvey et al. “What represenational similarity measures imply about decodable information.”
-
Memory erasure by dopamine-gated retrospective learning
Hastily written notes immediately after the Gatsby TNJC where this preprint was presented.
-
Multivariate Gaussians from Bayesian Networks
We show in detail how to compute the mean and covariance of the multivariate Gaussian produced by a linear-Gaussian Bayesian network.
-
Notes on Toy Models of Superposition
On the Discord we’ve been discussing “Toy Models of Superposition” form Anthropic. It’s a long blog post, so these are my running notes to get people (and myself) up to speed if they’ve missed a week or two of the discussion. As I’ve started these notes midway through the discussion, I’ll start on the latest…
-
Converting joint distributions to Bayesian networks
In these notes we discuss how to convert a joint distribution into a graph called a Bayesian network, and how the structure of the graph suggests ways to reduce the parameters required to specify the joint.
-
Jensen-Shannon Divergence
I discuss how the Jensen-Shannon divergence is a smoothed symmetrization of the KL divergence comparing two distributions, and connect it to the performance of their optimal binary discriminator.
-
Graph Spectra and Clustering
I describe how the spectrum of the graph built from a dataset can indicate its clustered-ness.
-
Matching Pearson Correlations
In this post I switch to matching Pearson correlations, rather than covariances, and then generalize to the scalar product of an arbitrary function of the outputs.
-
Notes on Kernel PCA
Following Bishop, I show how to express the eigenvectors of the feature projections in terms of the eigenvectors of the kernel matrix, and how to compute the kernel of centered features from the uncentered one.
-
Dimension reduction of vector fields
I discuss two notions of dimension-reduction of vector fields from the “low-rank hypothesis” paper, and which might be the ‘correct’ one.
-
The low-rank hypothesis of complex systems
In this post I will summarize the paper “The low-rank hypothesis of complex systems” by Thibeault et al.
-
An iterative reweighted least squares miracle
I show what’s really happening in the iterative reweighted least squares updates for logistic regression described in PRML 4.3.3.
-
Computing with Line Attractors
These notes are based on Seung’s “How the Brain Keeps the Eyes Still”, where he discusses how a line attractor network may implement a memory of the desired fixation angle that ultimately drives the muscles in the eye.
-
A Free Connectivity Non-Solution
In this post I explore one possible unconstrained connectivity solution that turns out to not work.
-
How feature size interacts with regularization
In this post we’re going to explore how estimated feature size is affected by regularization. The intuition is that the shrinkage applied by regularization will mean low amplitude features get (even more) swamped by additive noise.
-
The Logic of Free Connectivity
In this post we try to understand the diagonal term and the two rank-1 terms that we find when we fit connectivity without any constraints.
-
Quantization
In this post we try to understand the diagonal connectivity solutions by quantizing the elements to the three values $[0,1,z]$.
-
When is the distribution of two iid random variables spherically symmetric?
In this post we show that the joint distribution of two iid random variables is spherically symmetric iff the marginal distribution is Gaussian.
-
Decorrelation through gain control
Decorrelation is typically thought to require lateral interactions. But how much can we achieve just by gain control?
-
EM for Factor Analysis
In this note I work out the EM updates for factor analysis, following the presentation in PRML 12.2.4.