Tag: notes

  • EM for Factor Analysis

    In this note I work out the EM updates for factor analysis, following the presentation in PRML 12.2.4. In factor analysis our model of the observations in terms of latents is $$ p(\xx_n|\zz_n, \WW, \bmu, \bPsi) = \mathcal{N}(\xx_n;\WW \zz_n + \bmu, \bPsi).$$ Here $\bPsi$ is a diagonal matrix used to capture the variances of the…

  • Understanding Expectation Maximization as Coordinate Ascent

    These notes are based on what I learned from my first postdoc advisor, who learned it (I believe) from (Neal and Hinton 1998). See also section 4 of (Roweis and Ghahramani 1999) for a short derivation, and the broader discussion in Chapter 9 of Bishop, in particular Section 9.4 Introduction When performing maximum likelihood estimation…

  • Maximum likelihood PCA

    These are my derivations of the maximum likelihood estimates of the parameters of probabilistic PCA as described in section 12.2.1 of Bishop, and with some hints from (Tipping and Bishop 1999). Once we have determined the maximum likelihood estimate of $\mu$ and plugged it in, we have (Bishop 12.44)$$ L = \ln p(X|W, \mu, \sigma^2)…

  • Natural parameterization of the Gaussian distribution

    The Gaussian distribution in the usual parameters The Gaussian distribution in one dimension is often parameterized using the mean $\mu$ and the variance $\sigma^2$, in terms of which $$ p(x|\mu, \sigma^2) = {1 \over \sqrt{2\pi \sigma^2}} \exp\left(-{(x – \mu)^2 \over 2 \sigma^2} \right).$$ The Gaussian distribution is in the exponential family. For distributions in this…

  • A noob’s-eye view of reinforcement learning

    I recently completed the Coursera Reinforcement Learning Specialization. These are my notes, still under construction, on some of what I learned. The course was based on Sutton and Barto’s freely available reinforcement learning book, so images will be from there unless otherwise stated. All errors are mine, so please let me know about any in…

  • Notes on the evidence approximation

    These notes closely follow section 3.5 of Bishop on the Evidence Approximation, much of which is based on this paper on Bayesian interpolation by David MacKay, and to which I refer to below. Motivation We have some dataset of inputs $\XX = \{\xx_1, \dots, \xx_N\}$ and corresponding outputs $\tt = \{t_1, \dots, t_N\}$, and we’re…

  • The equivalent kernel for non-zero prior mean

    This note is a brief addendum to Section 3.3 of Bishop on Bayesian Linear Regression. Some of the derivations in that section assume, for simplicity, that the prior mean on the weights is zero. Here we’ll relax this assumption and see what happens to the equivalent kernel. Background The setting in that section is that,…

  • Notes on the Geometry of Least Squares

    In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…

  • Notes on Multiresolution Matrix Factorization

    These are my notes from early January on Kondor et al.’s Multiresolution Matrix Factorization from 2014. This was a conference paper and the exposition was a bit terse in places, so below I try to fill in some of the details I thought were either missing or confusing. Motivating MMF We will be interested in…

  • How many neurons or trials to recover signal geometry?

    This my transcription of notes on a VVTNS talk by Itamar Landau about recovering the geometry of high-dimensional neural signals corrupted by noise. Caveat emptor: These notes are based on what I remember or hastily wrote down during the presentation, so they likely contain errors and omissions. Motivation The broad question is then: Under what…

  • Decomposing connectivity

    While working on optimizing connectivity for whitening (see below) I remembered that it can be useful to decompose connectivity matrices relating neurons into components relating pseudo-neurons. In this post, I’ll show how this can be done, and highlight its application to the whitening problem. I will assume that our $N \times N$ connectivity matrix $W$…