Tag: teaching
-
Notes on the Geometry of Least Squares
In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…