Tag: optimization
-
Notes on the Geometry of Least Squares
In this post I expand on the details of section 3.1.2 in Pattern Recognition and Machine Learning. We found that maximum likelihood estimation requires minimizing $$E(\mathbf w) = {1 \over 2} \sum_{n=1}^N (t_n – \ww^T \bphi(\xx_n))^2.$$ Here the vector $\bphi(\xx_n)$ contains each of our features evaluated on the single input datapoint $\xx_n$, $$\bphi(\xx_n) = [\phi_0(\xx_n),…
-
Inference by decorrelation
We frequently observe decorrelation in projection neuron responses. This has often been linked to either redundancy reduction, or pattern separation. Can we make an explicit link to inference? A simple case to consider is $\ell_2$ regularized MAP inference, where $$ \log p(x|y) = L(x,y) = {1 \over 2\sigma^2} \|y – A x\|_2^2 + {\gamma \over…
-
Notes on Atick and Redlich 1993
In their 1993 paper Atick and Redlich consider the problem of learning receptive fields that optimize information transmission. They consider a linear transformation of a vector of retinal inputs $s$ to ganglion cell outputs of the same dimension $$y = Ks.$$ They aim to find a biologically plausible learning rule that will use the input…
-
Changing regularization, II
Today I went back to trying to understand the solution when using the original regularization. While doing so it occurred to me that if I use a slightly different regularization, I can get a closed-form solution for the feedforward connectivity $Z$, and without most (though not all) of the problems I was having in my…
-
Decomposing connectivity
While working on optimizing connectivity for whitening (see below) I remembered that it can be useful to decompose connectivity matrices relating neurons into components relating pseudo-neurons. In this post, I’ll show how this can be done, and highlight its application to the whitening problem. I will assume that our $N \times N$ connectivity matrix $W$…
-
The closest rotation to a scaling
What is the closest rotation to a pure scaling? Intuitively, it should be the null rotation, the identity. One way to see this might be to consider starting with the identity scaling, for which it’s clearly true. If we then scale along one of the coordinate axes, there doesn’t seem to be any ‘torque’ that…
-
Wrangling quartics
The problem we have is to minimize $${1\over 2} \|A ^T Z^T Z A – B \|_F^2 + {\lambda \over 2} \|Z – I\|_F^2,$$ where $B$ is symmetric. This is quartic over $Z$ so its solution will be the roots of some cubic function of $Z$, which is likely intractable. However, this problem might have…
-
Nearest orthonormal matrix
We sometimes need to find the nearest orthonormal matrix $Q$ to a given square matrix $X$. This can easily be done using Lagrange multipliers, as I’ll show here. By ‘nearest’ we will mean in the element-wise sum-of-squares sense. This is equivalent to the squared Frobenius norm, so our optimization problem is $$ \argmin_Q \; {1…
-
The nearest dataset with a given covariance
When analyzing neural activity I sometimes want to find the nearest dataset to the one I’m working with that also has a desired covariance $\Sigma$. In this post I’m going to show how to compute such a dataset. Let the dataset we’re given be the $m \times n$ matrix $X$, and the new dataset that…