sinatootoonian.com

Category: Blog

Decorrelation through gain control

Decorrelation is typically thought to require lateral interactions. But how much can we gain just by gain control? The setting as usual is $N$-dimensional glomerular inputs $\xx$, driving projection neuron activity according to $\dot \yy \propto – \sigma^2 \yy + \xx – \WW \yy$, which at steady state gives an input output transformation $$(\II \sigma^2…

23 May 2025
EM for Factor Analysis

In this note I work out the EM updates for factor analysis, following the presentation in PRML 12.2.4. In factor analysis our model of the observations in terms of latents is $$ p(\xx_n|\zz_n, \WW, \bmu, \bPsi) = \mathcal{N}(\xx_n;\WW \zz_n + \bmu, \bPsi).$$ Here $\bPsi$ is a diagonal matrix used to capture the variances of the…

20 April 2025
Automatic Relevance Determination for Probabilistic PCA

In this note I flesh out the computations for Section 12.2.3 of Bishop’s Pattern Recognition and Machine Learning, where he uses automatic relevance to determine the dimensionality of the principal subspace in probabilistic PCA. The principal subspace describing the data is spanned by the columns $\ww_1, \dots, \ww_M$ of $\WW$. The proper Bayesian way to…

10 April 2025
A simple property of sparse vectors

This came up in Chapter 7 of Wainwright’s “High-dimensional Statistics”. In that Chapter we’re interested in determining how close solutions $\hat \theta$ to different flavours of the Lasso problem come to the true, $S$-sparse vector $\theta^*$. A useful notion is the set of $S$-dominant vectors (my terminology): $$ C(S) = \{x: \|x_{S^c}\|_1 \le \|x_S\|_1\},$$ i.e.…

18 March 2025
Understanding Expectation Maximization as Coordinate Ascent

These notes are based on what I learned from my first postdoc advisor, who learned it (I believe) from (Neal and Hinton 1998). See also section 4 of (Roweis and Ghahramani 1999) for a short derivation, and the broader discussion in Chapter 9 of Bishop, in particular Section 9.4 Introduction When performing maximum likelihood estimation…

9 March 2025
Maximum likelihood PCA

These are my derivations of the maximum likelihood estimates of the parameters of probabilistic PCA as described in section 12.2.1 of Bishop, and with some hints from (Tipping and Bishop 1999). Once we have determined the maximum likelihood estimate of $\mu$ and plugged it in, we have (Bishop 12.44)$$ L = \ln p(X|W, \mu, \sigma^2)…

5 March 2025
Reaction rate inference

Consider the toy example set of reactions\begin{align*}S_0 &\xrightarrow{k_1} S_1 + S_2\\S_2 &\xrightarrow{k_2}S_3 + S_4\\S_1 + S_3 &\xrightarrow{k_3} S_5\end{align*}We have (noisy) data on the concentrations of the species as a function of time. We want to infer the rates $k_1$ to $k_3$. Let’s write the derivatives:\begin{align*}\dot S_0 &=- k_1 S_0\\\dot S_1 &= k_1 S_0 -k_3 S_1…

28 February 2025
Natural parameterization of the Gaussian distribution

The Gaussian distribution in the usual parameters The Gaussian distribution in one dimension is often parameterized using the mean $\mu$ and the variance $\sigma^2$, in terms of which $$ p(x|\mu, \sigma^2) = {1 \over \sqrt{2\pi \sigma^2}} \exp\left(-{(x – \mu)^2 \over 2 \sigma^2} \right).$$ The Gaussian distribution is in the exponential family. For distributions in this…

16 February 2025
The inference model when missing observations

The inference model isn’t giving good performance. But is this because we’re missing data? In the inference model, the recorded output activity is related to the input according to $$ (\sigma^2 \II + \AA \AA^T) \bLa = \YY,$$where we’ve absorbed $\gamma$ into $\AA$. We can model this as $N$ observations of $\yy$ given $\bla$, where$$…

27 January 2025
A noob’s-eye view of reinforcement learning

I recently completed the Coursera Reinforcement Learning Specialization. These are my notes, still under construction, on some of what I learned. The course was based on Sutton and Barto’s freely available reinforcement learning book, so images will be from there unless otherwise stated. All errors are mine, so please let me know about any in…

15 January 2025
Notes on the evidence approximation

These notes closely follow section 3.5 of Bishop on the Evidence Approximation, much of which is based on this paper on Bayesian interpolation by David MacKay, and to which I refer to below. Motivation We have some dataset of inputs $\XX = \{\xx_1, \dots, \xx_N\}$ and corresponding outputs $\tt = \{t_1, \dots, t_N\}$, and we’re…

30 December 2024
The equivalent kernel for non-zero prior mean

This note is a brief addendum to Section 3.3 of Bishop on Bayesian Linear Regression. Some of the derivations in that section assume, for simplicity, that the prior mean on the weights is zero. Here we’ll relax this assumption and see what happens to the equivalent kernel. Background The setting in that section is that,…

20 December 2024
RL produces more brain-like representations for motor learning than supervised learning

These are my rapidly scribbled notes on Codol et al.’s “Brain-like neural dynamics for behavioral control develop through reinforcement learning” (and likely contain errors). What learning algorithm does the baby’s brain use to learn motor tasks? We have at least two candidates: supervised learning (SL), which measures and minimizes discrepancies between desired and actual states…

12 November 2024
Notes on the Geometry of Least Squares

In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…

25 October 2024
Inference by decorrelation

We frequently observe decorrelation in projection neuron responses. This has often been linked to either redundancy reduction, or pattern separation. Can we make an explicit link to inference? A simple case to consider is $\ell_2$ regularized MAP inference, where $$ \log p(x|y) = L(x,y) = {1 \over 2\sigma^2} \|y – A x\|_2^2 + {\gamma \over…

2 October 2024
Background on Koopman Theory

These notes fill in some of the details of Section 2.1 of Kamb et al.’s “Time-Delay Observables for Koopman: Theory and Applications”. They were made by relying heavily on finite-dimensional intuition (operators as infinite-dimensional matrices), and by talking with ChatGPT, so likely contain errors. We are interested in understanding the time evolution of a dynamical…

24 September 2024
Notes on the Recognition-Parameterized Model

Recently, William Walker and colleagues proposed the Recognition Parameterized Model (RPM) to perform unsupervised learning of the causes behind observations, but without the need to reconstruct those observations. This post summarizes my (incomplete) understanding of the model. One popular approach to unsupervised learning is autoencoding, where we learn a low-dimensional representation of our data that…

19 June 2024
Between geometry and topology

At one of the journal clubs I recently attended, we discussed “The Topology and Geometry of Neural Representations”. The motivation for the paper is that procedures like RSA, which capture the overlap of population representations of different stimuli, can be overly sensitive to some geometrical features of the representation the brain might not care about.…

10 April 2024
Notes on Atick and Redlich 1993

In their 1993 paper Atick and Redlich consider the problem of learning receptive fields that optimize information transmission. They consider a linear transformation of a vector of retinal inputs $s$ to ganglion cell outputs of the same dimension $$y = Ks.$$ They aim to find a biologically plausible learning rule that will use the input…

8 April 2024
Changing regularization, II

Today I went back to trying to understand the solution when using the original regularization. While doing so it occurred to me that if I use a slightly different regularization, I can get a closed-form solution for the feedforward connectivity $Z$, and without most (though not all) of the problems I was having in my…

4 April 2024