Category: Blog
-
The equivalent kernel for non-zero prior mean
This note is a brief addendum to Section 3.3 of Bishop on Bayesian Linear Regression. Some of the derivations in that section assume, for simplicity, that the prior mean on the weights is zero. Here we’ll relax this assumption and see what happens to the equivalent kernel. Background The setting in that section is that,…
-
RL produces more brain-like representations for motor learning than supervised learning
These are my rapidly scribbled notes on Codol et al.’s “Brain-like neural dynamics for behavioral control develop through reinforcement learning” (and likely contain errors). What learning algorithm does the baby’s brain use to learn motor tasks? We have at least two candidates: supervised learning (SL), which measures and minimizes discrepancies between desired and actual states…
-
Notes on the Geometry of Least Squares
In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…
-
Inference by decorrelation
We frequently observe decorrelation in projection neuron responses. This has often been linked to either redundancy reduction, or pattern separation. Can we make an explicit link to inference? A simple case to consider is $\ell_2$ regularized MAP inference, where $$ \log p(x|y) = L(x,y) = {1 \over 2\sigma^2} \|y – A x\|_2^2 + {\gamma \over…
-
Background on Koopman Theory
These notes fill in some of the details of Section 2.1 of Kamb et al.’s “Time-Delay Observables for Koopman: Theory and Applications”. They were made by relying heavily on finite-dimensional intuition (operators as infinite-dimensional matrices), and by talking with ChatGPT, so likely contain errors. We are interested in understanding the time evolution of a dynamical…
-
Notes on the Recognition-Parameterized Model
Recently, William Walker and colleagues proposed the Recognition Parameterized Model (RPM) to perform unsupervised learning of the causes behind observations, but without the need to reconstruct those observations. This post summarizes my (incomplete) understanding of the model. One popular approach to unsupervised learning is autoencoding, where we learn a low-dimensional representation of our data that…
-
Between geometry and topology
At one of the journal clubs I recently attended, we discussed “The Topology and Geometry of Neural Representations”. The motivation for the paper is that procedures like RSA, which capture the overlap of population representations of different stimuli, can be overly sensitive to some geometrical features of the representation the brain might not care about.…
-
Notes on Atick and Redlich 1993
In their 1993 paper Atick and Redlich consider the problem of learning receptive fields that optimize information transmission. They consider a linear transformation of a vector of retinal inputs $s$ to ganglion cell outputs of the same dimension $$y = Ks.$$ They aim to find a biologically plausible learning rule that will use the input…
-
Changing regularization, II
Today I went back to trying to understand the solution when using the original regularization. While doing so it occurred to me that if I use a slightly different regularization, I can get a closed-form solution for the feedforward connectivity $Z$, and without most (though not all) of the problems I was having in my…
-
Notes on Multiresolution Matrix Factorization
These are my notes from early January on Kondor et al.’s Multiresolution Matrix Factorization from 2014. This was a conference paper and the exposition was a bit terse in places, so below I try to fill in some of the details I thought were either missing or confusing. Motivating MMF We will be interested in…
-
Changing regularization
This morning it occurred to me that the problems we’re having with our equation \begin{align}S^2 Z^2 S^2 – S C S = \lambda (Z^{-1} – I)\label{main}\tag{1}\end{align} are due to the regularizer we use, $\|Z – I\|_F^2$. This regularizer makes the default behavior of the feedforward connections passing the input directly to the output. But it’s…
-
Wrangling quartics, V
Yesterday I went to discuss the problem with one of my colleagues. He had the interesting idea of modelling $S$, and especially $S^2$, as low rank, in particular as $S = s_1 e_1 e_1^T$. That is, shifting the focus on $S$ from $Z$. I tried this out today, and although it didn’t quite pan out,…
-
Wrangling quartics, IV
I’m trying to make some sense of $$ {1 \over \la’} \left(S^2 \wt Z_{UU} S^2 – S \wt C_{VV} S\right) + I = \wt Z_{UU}^{-1}. \label{start}\tag{1}$$ Below I’m going to drop all the tildes and subscripts, for clarity. If we left multiply by $Z$ we get $$ {1 \over \la’} Z(S^2 Z^2 S^2 – S…
-
Wrangling quartics, III
We are trying to understand the connectivity solutions $Z$ found when minimizing the objective $$ {1 \over 2 n^2 } \|X^T Z^T Z X – C\|_F^2 + {\la \over 2 m^2}\|Z – I\|_F^2.$$ Recap We found in the previous post that solutions satisfy$$ {1 \over \la’} \left(S^2 \wt Z_{UU}^2 S^2 – S \wt C_{VV} S…
-
How many neurons or trials to recover signal geometry?
This my transcription of notes on a VVTNS talk by Itamar Landau about recovering the geometry of high-dimensional neural signals corrupted by noise. Caveat emptor: These notes are based on what I remember or hastily wrote down during the presentation, so they likely contain errors and omissions. Motivation The broad question is then: Under what…
-
Wrangling quartics, II
In the last post on this topic, we saw that when optimizing the objective$$ {1 \over 2 n^2 } \|X^T Z^T Z X – C\|_F^2 + {\la \over 2 m^2}\|Z – I\|_F^2,$$ any solution $Z$ is symmetric and satisfies $${2 \over n^2} \left( XX^T Z^2 XX^T – X C X^T\right) + {\la \over m^2} I…
-
Inverting arrowhead matrices.
I need to invert a matrix of the form $$ M = I + S^2 H S^2,$$ where $H$ is a symmetric matrix, and $S^2$ is diagonal. The elements of $S^2$ drop off very quickly, so what remains of $H$ are its first column and first row, scaled by $S_{1}^2 S^2$. The result is that…
-
How many lateral dendrites cross a granule cell arbor?
The projection neurons of the olfactory bulb are the mitral cells and tufted cells. Most mitral cells don’t communicate with each other directly. Instead, they interact through the synapses that their lateral dendrites make onto granule cell abors. Activation of these synapses excites the target granule cells, which in turn inhibit the mitral cells that…
-
Decomposing connectivity
While working on optimizing connectivity for whitening (see below) I remembered that it can be useful to decompose connectivity matrices relating neurons into components relating pseudo-neurons. In this post, I’ll show how this can be done, and highlight its application to the whitening problem. I will assume that our $N \times N$ connectivity matrix $W$…
-
Why mean, median, mode?
Recently while thinking about covariances I got to thinking about why we define an even simpler statistic, the mean, as we do. That’s what this post is about. Suppose we have a dataset $X$ consisting of $N$ numbers $x_1 \dots x_N$. Their mean, $\overline x,$ is of course $$ \overline{x} = {1 \over N} \sum_{i=1}^N…