Author: Sina
-
Notes on Kernel PCA
Following Bishop, I show how to express the eigenvectors of the feature projections in terms of the eigenvectors of the kernel matrix, and how to compute the kernel of centered features from the uncentered one.
-
Dimension reduction of vector fields
I discuss two notions of dimension-reduction of vector fields from the “low-rank hypothesis” paper, and which might be the ‘correct’ one.
-
The low-rank hypothesis of complex systems
In this post I will summarize the paper “The low-rank hypothesis of complex systems” by Thibeault et al.
-
An iterative reweighted least squares miracle
I show what’s really happening in the iterative reweighted least squares updates for logistic regression described in PRML 4.3.3.
-
Computing with Line Attractors
These notes are based on Seung’s “How the Brain Keeps the Eyes Still”, where he discusses how a line attractor network may implement a memory of the desired fixation angle that ultimately drives the muscles in the eye.
-
A Free Connectivity Non-Solution
In this post I explore one possible unconstrained connectivity solution that turns out to not work.
-
How feature size interacts with regularization
In this post we’re going to explore how estimated feature size is affected by regularization. The intuition is that the shrinkage applied by regularization will mean low amplitude features get (even more) swamped by additive noise.
-
The Logic of Free Connectivity
In this post we try to understand the diagonal term and the two rank-1 terms that we find when we fit connectivity without any constraints.
-
Quantization
In this post we try to understand the diagonal connectivity solutions by quantizing the elements to the three values $[0,1,z]$.
-
When is the distribution of two iid random variables spherically symmetric?
In this post we show that the joint distribution of two iid random variables is spherically symmetric iff the marginal distribution is Gaussian.
-
Decorrelation through gain control
Decorrelation is typically thought to require lateral interactions. But how much can we achieve just by gain control?
-
EM for Factor Analysis
In this note I work out the EM updates for factor analysis, following the presentation in PRML 12.2.4.
-
Automatic Relevance Determination for PPCA
In this note I flesh out the computations for Section 12.2.3 of Bishop’s Pattern Recognition and Machine Learning, where he uses automatic relevance to determine the dimensionality of the principal subspace in probabilistic PCA.
-
A simple property of sparse vectors
We show that the difference of any vector that is $S$-sparse and any other vector with the same or lesser $\ell_1$ norm is $S$-dominant.
-
Understanding Expectation Maximization as Coordinate Ascent
In these notes we follow Neal and Hinton 1998 and show how to view EM as coordinate ascent on the negative variational free energy.
-
Maximum likelihood PCA
These are my derivations of the maximum likelihood estimates of the parameters of probabilistic PCA as described in section 12.2.1 of Bishop, and with some hints from (Tipping and Bishop 1999).
-
Reaction rate inference
In this post we show that, given a set of first order reactions with unknown rates, inferring the reaction rates from a dataset of instantaneous species concentrations is a linear regression problem. We solve it for some toy examples.
-
Natural parameterization of the Gaussian distribution
The Gaussian distribution in the usual parameters The Gaussian distribution in one dimension is often parameterized using the mean $\mu$ and the variance $\sigma^2$, in terms of which $$ p(x|\mu, \sigma^2) = {1 \over \sqrt{2\pi \sigma^2}} \exp\left(-{(x – \mu)^2 \over 2 \sigma^2} \right).$$ The Gaussian distribution is in the exponential family. For distributions in this…
-
The inference model when missing observations
The inference model isn’t giving good performance. But is this because we’re missing data? In the inference model, the recorded output activity is related to the input according to $$ (\sigma^2 \II + \AA \AA^T) \bLa = \YY,$$where we’ve absorbed $\gamma$ into $\AA$. We can model this as $N$ observations of $\yy$ given $\bla$, where$$…
-
A noob’s-eye view of reinforcement learning
I recently completed the Coursera Reinforcement Learning Specialization. These are my notes, still under construction, on some of what I learned. The course was based on Sutton and Barto’s freely available reinforcement learning book, so images will be from there unless otherwise stated. All errors are mine, so please let me know about any in…