Decorrelation is typically thought to require lateral interactions. But how much can we gain just by gain control?
The setting as usual is $N$-dimensional glomerular inputs $\xx$, driving projection neuron activity according to $\dot \yy \propto – \sigma^2 \yy + \xx – \WW \yy$, which at steady state gives an input output transformation $$(\II \sigma^2 + \WW)^{-1} \xx = \ZZ \xx = \yy.$$
The data we have are an $N \times M$ matrix $\XX$ of input responses to $M$ odours, and and $N_y \times M$ matrix $\YY$ of output responses. Because we don’t record the same inputs and output channels, we can’t compare predicted outputs $\ZZ \XX$ to observed outputs $\YY$ directly. So instead we compare the odour-by-odour representation matrices. Representations are defined as either the raw overlaps, so $$ \XX^T \ZZ^T \ZZ \XX \quad \text{vs.} \quad \YY^T\YY,$$ or covariances, $$ \XX^T \ZZ^T \JJ_X^T \JJ_X \ZZ \XX \quad \text{vs.} \quad \YY^T\JJ_Y^T \JJ_Y\YY,$$ where the idempotent $\JJ$ matrices do mean subtraction.
In this post we will assume that $\ZZ$ is diagonal. I analyzed the simplest version of the problem, where we compared overlaps, and did not regularize $\ZZ$. In that case, our loss is $$L(\ZZ) = {1\over 2} \|\XX^T \ZZ^2 \XX – \YY^T \YY\|_F^2.$$
Since the Frobenius norm is invariant to rotations, and $\YY^T \YY = \VV_Y \SS_Y^2 \VV_Y^T$, $$L(\ZZ) = {1\over 2}\|\XX^T \ZZ^2 \XX – \VV_Y \SS_Y^2 \VV_Y^T \|_F^2 = {1\over 2}\|\VV_Y^T \XX^T \ZZ^2 \XX \VV_Y – \SS_Y^2 \|_F^2 = {1 \over 2}\|\XX^T \ZZ^2 \XX – \SS_Y^2 \|_F^2,$$ where I’ve relabelled $$\XX \to \XX \VV_Y.$$
We next compute the gradient. The first step is to get the differential, and we’ll do so with respect to $\ZZ^2$ \begin{align} dL &= \tr[(\XX^T \ZZ^2 \XX – \SS_Y^2)^T \XX^T d\ZZ^2 \XX]\\ &= \tr[\XX (\XX^T \ZZ^2 \XX – \SS_Y^2)^T \XX^T d\ZZ^2 ]\\ & = \tr[\XX \XX^T \ZZ^2 \XX \XX^T d\ZZ^2] – \tr[\XX \SS_Y^2 \XX^T d\ZZ^2]. \end{align}
Then the gradient is \begin{align} \nabla_{\ZZ^2} L &= \diag{\XX \XX^T \ZZ^2 \XX \XX^T} – \diag{\XX \SS_Y^2 \XX^T}.\end{align}
Another way to parse this is as \begin{align} \nabla_{\ZZ^2} L &= \diag{\XX \underbrace{(\XX^T \ZZ^2 \XX – \SS_Y^2)}_{\AA} \XX^T}.\end{align}
Then $$\diag{\XX \AA \XX^T}_{i} = \sum_{k,m} \XX_{ik} \AA_{km} \XX_{im} = \xx^T_i \AA \xx_i.$$
If this is zero for all $i$, then since we have more rows than columns, and we can assume $\XX$ is full rank, then this implies that $\AA$ is zero. This is because, if we express $\xx_i = \UU \bal_i$, where $\AA = \UU \DD \UU^T$, then $$\xx_i^T \AA \xx_i = \bal_i^T \UU^T \UU \DD \UU^T \UU \bal_i = \bal_i^T \DD \bal_i = \dd^T \bal_i^2.$$ Combining these across $i$ we get that $$ \diag{\XX \AA \XX^T} = (\bal \odot \bal)\dd.$$ But $\bal$ will be full rank because $\XX$ is, and that means $\bal \odot \bal$ (most likely) is, so the only way for this result to be zero is if $\dd = \bzero$.
Approach 1
One way to parse that is as $$ \nabla_{\ZZ^2} L = (\XX \XX^T \odot \XX \XX^T)\zz^2 – (\XX \odot \XX) \ss_Y^2. \quad \checkmark$$ Setting this gradient to zero, we get $$(\XX \XX^T \odot \XX \XX^T)\zz^2 = (\XX \odot \XX) \ss_Y^2.$$
To interpret this expression, we express $$[\XX \XX^T]_{ij} = \|\XX_i\| [\hat \XX_i \cdot \hat \XX_j] \|\XX_j\| = \DD_i \bTh_{ij} \DD_j,$$ so $$\XX \XX^T \odot \XX \XX^T = \DD^2 \bTh^2 \DD^2.$$
On the righthand side, we have $\XX \odot \XX = \DD^2 \hat \XX \odot \hat \XX.$ So our equation becomes $$ \bTh^2 \DD^2 \zz^2 = (\hat \XX \odot \hat \XX)\ss_Y^2 = \hat \XX^2 \ss_Y^2,$$ where we have to be careful here that some of the exponents here indicate Hadamard products.
The terms on the righthand side are alignments: each indicates how much a given input cells’ receptive receptive field, measured as energy, correlates with that of the output. So the natural thing is to express the lefthand side in those terms.
To keep things simple, I’lI assume that I have the same number of odours as units, so that $\hat \XX^2$ is square. I then decompose $\bTh^2 = \hat \XX^2 \bPhi$ to get \[ \bPhi = [\hat \XX^2]^{-1} \bTh^2 = [\hat \XX \odot \hat \XX]^{-1} (\hat \XX \hat \XX^T \odot \hat \XX \hat \XX^T) .\]
So we arrive at $$\bPhi \DD^2 \zz^2 = \ss_Y^2.$$ In other words, the colums of $\bPhi$ contain template receptive fields which must then positively combine to match the target. If a solution exists for $\ss_Y^2 = 1$, then decorrelation is possible through gain control.
$$\blacksquare$$
Leave a Reply