Category: Notes

  • Jensen-Shannon Divergence

    I discuss how the Jensen-Shannon divergence is a smoothed symmetrization of the KL divergence comparing two distributions, and connect it to the performance of their optimal binary discriminator.

  • Graph Spectra and Clustering

    I describe how the spectrum of the graph built from a dataset can indicate its clustered-ness.

  • A simple property of sparse vectors

    We show that the difference of any vector that is $S$-sparse and any other vector with the same or lesser $\ell_1$ norm is $S$-dominant.

  • Understanding Expectation Maximization as Coordinate Ascent

    In these notes we follow Neal and Hinton 1998 and show how to view EM as coordinate ascent on the negative variational free energy.

  • Maximum likelihood PCA

    These are my derivations of the maximum likelihood estimates of the parameters of probabilistic PCA as described in section 12.2.1 of Bishop, and with some hints from (Tipping and Bishop 1999).

  • A noob’s-eye view of reinforcement learning

    I recently completed the Coursera Reinforcement Learning Specialization. These are my notes, still under construction, on some of what I learned. The course was based on Sutton and Barto’s freely available reinforcement learning book, so images will be from there unless otherwise stated. All errors are mine, so please let me know about any in…

  • Background on Koopman Theory

    These notes fill in some of the details of Section 2.1 of Kamb et al.’s “Time-Delay Observables for Koopman: Theory and Applications”. They were made by relying heavily on finite-dimensional intuition (operators as infinite-dimensional matrices), and by talking with ChatGPT, so likely contain errors. We are interested in understanding the time evolution of a dynamical…

  • Notes on the Recognition-Parameterized Model

    Recently, William Walker and colleagues proposed the Recognition Parameterized Model (RPM) to perform unsupervised learning of the causes behind observations, but without the need to reconstruct those observations. This post summarizes my (incomplete) understanding of the model. One popular approach to unsupervised learning is autoencoding, where we learn a low-dimensional representation of our data that…

  • Notes on Atick and Redlich 1993

    In their 1993 paper Atick and Redlich consider the problem of learning receptive fields that optimize information transmission. They consider a linear transformation of a vector of retinal inputs $s$ to ganglion cell outputs of the same dimension $$y = Ks.$$ They aim to find a biologically plausible learning rule that will use the input…

  • Notes on Multiresolution Matrix Factorization

    These are my notes from early January on Kondor et al.’s Multiresolution Matrix Factorization from 2014. This was a conference paper and the exposition was a bit terse in places, so below I try to fill in some of the details I thought were either missing or confusing. Motivating MMF We will be interested in…

  • Wrangling quartics, III

    We are trying to understand the connectivity solutions $Z$ found when minimizing the objective $$ {1 \over 2 n^2 } \|X^T Z^T Z X – C\|_F^2 + {\la \over 2 m^2}\|Z – I\|_F^2.$$ Recap We found in the previous post that solutions satisfy$$ {1 \over \la’} \left(S^2 \wt Z_{UU}^2 S^2 – S \wt C_{VV} S…

  • Decomposing connectivity

    While working on optimizing connectivity for whitening (see below) I remembered that it can be useful to decompose connectivity matrices relating neurons into components relating pseudo-neurons. In this post, I’ll show how this can be done, and highlight its application to the whitening problem. I will assume that our $N \times N$ connectivity matrix $W$…