Tag: prml
-
EM for Factor Analysis
In this note I work out the EM updates for factor analysis, following the presentation in PRML 12.2.4. In factor analysis our model of the observations in terms of latents is $$ p(\xx_n|\zz_n, \WW, \bmu, \bPsi) = \mathcal{N}(\xx_n;\WW \zz_n + \bmu, \bPsi).$$ Here $\bPsi$ is a diagonal matrix used to capture the variances of the…
-
Automatic Relevance Determination for Probabilistic PCA
In this note I flesh out the computations for Section 12.2.3 of Bishop’s Pattern Recognition and Machine Learning, where he uses automatic relevance to determine the dimensionality of the principal subspace in probabilistic PCA. The principal subspace describing the data is spanned by the columns $\ww_1, \dots, \ww_M$ of $\WW$. The proper Bayesian way to…
-
The equivalent kernel for non-zero prior mean
This note is a brief addendum to Section 3.3 of Bishop on Bayesian Linear Regression. Some of the derivations in that section assume, for simplicity, that the prior mean on the weights is zero. Here we’ll relax this assumption and see what happens to the equivalent kernel. Background The setting in that section is that,…
-
Notes on the Geometry of Least Squares
In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…