Skip to content

sinatootoonian.com

About
Blog
Projects
Code
YouTube

Tag: notes

Converting joint distributions to Bayesian networks

In these notes we discuss how to convert a joint distribution into a graph called a Bayesian network, and how the structure of the graph suggests ways to reduce the parameters required to specify the joint.

11 January 2026
Jensen-Shannon Divergence

I discuss how the Jensen-Shannon divergence is a smoothed symmetrization of the KL divergence comparing two distributions, and connect it to the performance of their optimal binary discriminator.

22 December 2025
Notes on Kernel PCA

Following Bishop, I show how to express the eigenvectors of the feature projections in terms of the eigenvectors of the kernel matrix, and how to compute the kernel of centered features from the uncentered one.

31 October 2025
EM for Factor Analysis

In this note I work out the EM updates for factor analysis, following the presentation in PRML 12.2.4.

20 April 2025
Understanding Expectation Maximization as Coordinate Ascent

In these notes we follow Neal and Hinton 1998 and show how to view EM as coordinate ascent on the negative variational free energy.

9 March 2025
Maximum likelihood PCA

These are my derivations of the maximum likelihood estimates of the parameters of probabilistic PCA as described in section 12.2.1 of Bishop, and with some hints from (Tipping and Bishop 1999).

5 March 2025
Natural parameterization of the Gaussian distribution

The Gaussian distribution in the usual parameters The Gaussian distribution in one dimension is often parameterized using the mean $\mu$ and the variance $\sigma^2$, in terms of which $$ p(x|\mu, \sigma^2) = {1 \over \sqrt{2\pi \sigma^2}} \exp\left(-{(x – \mu)^2 \over 2 \sigma^2} \right).$$ The Gaussian distribution is in the exponential family. For distributions in this…

16 February 2025
A noob’s-eye view of reinforcement learning

I recently completed the Coursera Reinforcement Learning Specialization. These are my notes, still under construction, on some of what I learned. The course was based on Sutton and Barto’s freely available reinforcement learning book, so images will be from there unless otherwise stated. All errors are mine, so please let me know about any in…

15 January 2025
Notes on the evidence approximation

These notes closely follow section 3.5 of Bishop on the Evidence Approximation, much of which is based on this paper on Bayesian interpolation by David MacKay, and to which I refer to below. Motivation We have some dataset of inputs $\XX = \{\xx_1, \dots, \xx_N\}$ and corresponding outputs $\tt = \{t_1, \dots, t_N\}$, and we’re…

30 December 2024
The equivalent kernel for non-zero prior mean

This note is a brief addendum to Section 3.3 of Bishop on Bayesian Linear Regression. Some of the derivations in that section assume, for simplicity, that the prior mean on the weights is zero. Here we’ll relax this assumption and see what happens to the equivalent kernel. Background The setting in that section is that,…

20 December 2024
Notes on the Geometry of Least Squares

In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…

25 October 2024
Notes on Multiresolution Matrix Factorization

These are my notes from early January on Kondor et al.’s Multiresolution Matrix Factorization from 2014. This was a conference paper and the exposition was a bit terse in places, so below I try to fill in some of the details I thought were either missing or confusing. Motivating MMF We will be interested in…

1 April 2024
How many neurons or trials to recover signal geometry?

This my transcription of notes on a VVTNS talk by Itamar Landau about recovering the geometry of high-dimensional neural signals corrupted by noise. Caveat emptor: These notes are based on what I remember or hastily wrote down during the presentation, so they likely contain errors and omissions. Motivation The broad question is then: Under what…

22 February 2024
Decomposing connectivity

While working on optimizing connectivity for whitening (see below) I remembered that it can be useful to decompose connectivity matrices relating neurons into components relating pseudo-neurons. In this post, I’ll show how this can be done, and highlight its application to the whitening problem. I will assume that our $N \times N$ connectivity matrix $W$…

6 February 2024

sinatootoonian.com