The problem we have is to minimize $${1\over 2} \|A ^T Z^T Z A – B \|_F^2 + {\lambda \over 2} \|Z – I\|_F^2,$$ where $B$ is symmetric. This is quartic over $Z$ so its solution will be the roots of some cubic function of $Z$, which is likely intractable. However, this problem might have some special structure that will make it soluble.
We’ll try to alleviate the quartic-ness, which comes from the first term, by moving $Z^TZ$ into a new variable $W$. Defining $$L(Z, W) = {1\over 2} \|A ^T W A – B \|_F^2 + {\lambda \over 2} \|Z – I\|_F^2,$$ we solve the constrained optimization problem $$\argmin_{Z,W} \\; L(Z, W) \quad \text{such that} \quad Z^TZ = W.$$
The Lagrangian is $$ \begin{align}L(Z, W , \La) &= {1\over 2} \tr((A ^T W A – B)(A^T W A – B)^T)\\ &+ {\la \over 2}\tr((Z – I)(Z-I)^T) +\tr(\Lambda^T (W – Z^T Z)).\end{align}$$
The differential for $Z$ is $$dL_Z = \la \tr(dZ(Z-I)^T) – \tr(\La^T dZ^T Z – \La^T Z^T dZ),$$ so the gradient is $$\nabla_Z L= \la (Z – I) – Z (\La^T + \La).$$ Setting it to zero we get $$ \la (Z -I) = Z(\La^T + \La).$$ Left multiplying by $Z^T$ and applying the constraint we get, after rearranging, $$ \begin{align}W (\la I- \La^T – \La) =\la Z^T.\tag{1}\label{eq1} \end{align}$$ Left Right multiplying both sides by their transpose, we get $$ W (\la I- \La^T – \La)^2 W =\la^2 Z^T Z = \la^2 W.$$ And since $W$ is invertible,
$$ W (\la I- \La^T – \La)^2 = \la^2 I \implies \la I- \La^T – \La = \la W^{-1/2}.$$ Plugging this into $\Eqn{eq1}$ we get $$Z = W^{1/2}.$$ This isn’t entirely trivial – the constraint could have been satisfied for e.g. any $ Q W^{1/2}$ with orthonormal $Q$. But presumably the regularization makes $Q = I$.
The differential for $W$ is $$dL_W = \tr(A^T dW A (A^T W A – B)^T ) + \tr(\La^T dW)$$ so the gradient is $$ \nabla_W L = A(A^T W A – B) A^T+ \La.$$ Adding this to its transpose, adding $\la I$ to both sides, setting the gradient contributions to zero and rearranging, we get $$ 2 A(A^T W A – B) A^T + \la I = \la I- \La – \La^T = \la W^{-1/2}.$$ In terms of $Z$,
$$ 2 A A^T Z^2 AA^T – 2 A BA^T + \la I = \la Z^{-1}.$$
So we’ve got a slightly simplified cubic. Any chance we can do something with this? The best bet is probably to change basis. Applying SVD,
$$ A = USV^T \implies 2 U S^2 U^T Z^2 U S^2 U^T – 2 U S V^T B V S U^T + \la I = \la Z^{-1}.$$ Using the notation $\wt{Z}_{UV} = U^T Z V$, we can clean this up a bit to $$ 2 U S^2 \wt{Z}_{UU}^2 S^2 U^T – 2 U S \wt{B}_{VV} S U^T + \la UU^T = \la U\wt{Z}_{UU}^{-1}U^T, $$ and left multiply by $U^T$ and right by $U$ to get $$ 2 S^2 \wt{Z}_{UU}^2 S^2 – 2 S \wt{B}_{VV} S + \la I = \la \wt{Z}_{UU}^{-1}.$$
We’d normally be hooped at this point, but for the special case I’m interested in, we have two further special properties: $V = I$, and $B$ is diagonal. $\wt{B}_{VV}$ then simplifies to just $B$, and we get $$ 2 S^2 \wt{Z}_{UU}^2 S^2 – 2 S^2 B + \la I = \la \wt{Z}_{UU}^{-1}.$$ This then implies that $\wt{Z}_{UU}$ is diagonal, so we get a system of independent cubic equations
$$ 2 S^4 \wt{Z}_{UU}^3 – (2 S^2 B – \la I) \wt{Z}_{UU} – \la I = 0.$$ There are closed-form formulas for the solutions of such equations, but they’re quite complicated and are unlikely to yield any insight.
Instead, we resign ourselves to the cases of very small and very large $\la$. When the regularizer dominates, $$\la \to \infty \implies \wt{Z}_{UU} \to I \implies Z \to I,$$ as we’d expect. When there’s little regularization, and using that $Z$ is full rank,
$$ \la \to 0 \implies 2 S^4 \wt{Z}_{UU}^3 \to 2 S^2 B \wt{Z}_{UU} \implies \wt{Z}_{UU} \to \sqrt{B}/S.$$
In summary, we managed to simplify the problem significantly but still ended up with a system of independent cubics that are too complicated to solve to yield any insight, so we just examined the solutions at the extremes of the regularization path.
Leave a Reply