{"id":1621,"date":"2024-04-04T10:51:19","date_gmt":"2024-04-04T09:51:19","guid":{"rendered":"https:\/\/sinatootoonian.com\/?p=1621"},"modified":"2025-12-27T16:04:56","modified_gmt":"2025-12-27T16:04:56","slug":"changing-regularization-ii","status":"publish","type":"post","link":"https:\/\/sinatootoonian.com\/index.php\/2024\/04\/04\/changing-regularization-ii\/","title":{"rendered":"Changing regularization, II"},"content":{"rendered":"\n<p>Today I went back to trying to understand the solution when using the original regularization. While doing so it occurred to me that if I use a slightly different regularization, I can get a closed-form solution for the feedforward connectivity $Z$, and without most (though not all) of the problems I was having in my <a href=\"https:\/\/sinatootoonian.com\/index.php\/2024\/03\/30\/changing-regularization\/\">previous attempt at changing the regularizer<\/a>.<\/p>\n\n\n\n<p>Before describing the idea, let&#8217;s recall the connectivity decomposition. Our loss function is $$L(Z) = {1 \\over 2 n^2} \\|Y^T Y &#8211; X^T Z^T Z X \\|_F^2 + {\\lambda \\over 2 m^2}\\|Z &#8211; I\\|_F^2.$$ As before, we apply SVD to write this as  $$L(Z) = {1 \\over 2 n^2} \\|V_Y S_Y^2 V_Y^T &#8211; V_X S_X U_X^T Z^T Z U_X S_X V_X^T\\|_F^2 + {\\lambda \\over 2 m^2}\\|Z &#8211; I\\|_F^2.$$ Letting $Q$ be an orthonormal completion of the basis formed by the columns of $U_X$, we can decompose the connectivity as \\begin{align} Z &amp;= U_X U_X^T Z U_X U_X^T + U_X U_X^T Z Q Q^T + Q Q^T Z U_X U_X^T + Q Q^T Z Q Q^T \\\\&amp;= U_X \\wt{Z}_{UU} U_X^T + U_X \\wt{Z}_{UQ} Q^T + Q  \\wt{Z}_{Q U} U_X^T + Q \\wt{Z}_{QQ}Q^T.\\end{align} <\/p>\n\n\n\n<p>We can then write the loss as \\begin{align}L(Z) &amp;= {1 \\over 2 n^2} \\|V_Y S_Y^2 V_Y^T &#8211; V_X S_X \\wt Z_{UU}^T \\wt Z_{UU} S_X V_X^T &#8211; V_X S_X \\wt Z_{QU}^T \\wt Z_{QU} S_X V_X^T\\|_F^2\\\\ &amp;+ {\\lambda \\over 2 m^2}\\left(\\|\\wt{Z}_{UU} &#8211; I\\|_F^2 + \\|\\wt{Z}_{QQ} &#8211; I\\|_F^2 + \\|\\wt{Z}_{QU}\\|_F^2 + \\|\\wt{Z}_{UQ}\\|_F^2\\right).\\end{align} <\/p>\n\n\n\n<p>The first term in the loss is the only thing keeping the regularization from sending $Z$ to $I$, and it only involves $\\wt Z_{UU}$ and $\\wt Z_{QU}$. Therefore, we know that regularization will set the remaining components to their regularization targets, and we can just consider the loss as a function of $\\wt{Z}_{UU}$ and $\\wt{Z}_{QU}$: \\begin{align}L(\\wt{Z}_{UU}, \\wt{Z}_{QU}) &amp;= {1 \\over 2 n^2} \\|V_Y S_Y^2 V_Y^T &#8211; V_X S_X  (\\wt Z_{UU}^T \\wt Z_{UU} + \\wt Z_{QU}^T \\wt Z_{QU}) S_X V_X^T\\|_F^2\\\\ &amp;+ {\\lambda \\over 2 m^2}\\left(\\|\\wt{Z}_{UU} &#8211; I\\|_F^2 + \\|\\wt{Z}_{QU}\\|_F^2\\right).\\end{align} <\/p>\n\n\n\n<p>Notice how in the first term $\\wt{Z}_{QU}$ shows up next to $S_X$, but in the regularization, it shows up alone. <mark style=\"background-color:#9DFF20\" class=\"has-inline-color\">The idea is: since we have some freedom in choosing the regularization, why not regularize $\\wt{Z}_{QU} S_X$ and $\\wt{Z}_{UU} S_X$ instead?<\/mark><\/p>\n\n\n\n<p>In that case, the loss becomes \\begin{align}L(\\wt{Z}_{UU}, \\wt{Z}_{QU}) &amp;= {1 \\over 2 n^2} \\|V_Y S_Y^2 V_Y^T &#8211; V_X S_X  (\\wt Z_{UU}^T \\wt Z_{UU} + \\wt Z_{QU}^T \\wt Z_{QU}) S_X V_X^T\\|_F^2\\\\ &amp;+ {\\lambda \\over 2 m^2}\\left(\\|\\wt{Z}_{UU} S_X- I\\|_F^2 + \\|\\wt{Z}_{QU} S_X\\|_F^2\\right).\\end{align} <\/p>\n\n\n\n<p>It&#8217;s then natural to define $$ F_U \\triangleq \\wt{Z}_{UU} S_X, \\quad F_Q \\triangleq \\wt{Z}_{QU}S_X$$ in terms of which the loss is<br>\\begin{align}L(F_U, F_Q) &amp;= {1 \\over 2 n^2} \\|V_Y S_Y^2 V_Y^T &#8211; V_X F_U^T F_U V_X^T- V_X F_Q^T F_Q V_X^T\\|_F^2\\\\&amp; +  {\\lambda \\over 2 m^2}\\left(\\|F_{U}- I\\|_F^2 + \\|F_{Q}\\|_F^2\\right).\\end{align} Notice how $F_U$ and $F_Q$ have absorbed $S_X$ in the first term.<\/p>\n\n\n\n<p>We can simplify further by defining stacking  $$ F = \\left[\\begin{matrix}F_U \\\\ F_Q \\end{matrix}\\right], \\quad F_0 =\\left[\\begin{matrix} I  \\\\ 0 \\end{matrix}\\right]. $$ We then have the loss in terms of $F$ as \\begin{align}L(F) &amp;= {1 \\over 2 n^2} \\|V_Y S_Y^2 V_Y^T &#8211; V_X F^T F V_X^T\\|_F^2  +  {\\lambda \\over 2 m^2} \\|F &#8211; F_0\\|_F^2 \\end{align}<\/p>\n\n\n\n<p>Since the Frobenius norm is invariant to rotation, we can move $V_Y$ around in the first term. Defining $R = V_Y^T V_X$, we get, after shifting some constants to the regularizer $$L(F) = {1 \\over 2} \\|S_Y^2 &#8211; R F^T F R^T\\|_F^2 + \\la&#8217; F^2, \\quad \\la&#8217; \\triangleq {\\la n^2 \\over 2 m^2}$$ <\/p>\n\n\n\n<p>Taking derivatives, \\begin{align}\\nabla_F L &amp;= &#8211; 2 F (R^T (S_Y^2 &#8211; R F^T F R^T)R) + 2\\lambda&#8217; (F &#8211; F_0)\\\\<br>&amp;=- 2 F (R^T S_Y^2 R &#8211; F^T F) + 2 \\lambda&#8217; (F &#8211; F_0).\\end{align}<\/p>\n\n\n\n<p>Setting this to zero and left-multiplying by $F^T$, we get<br>$$ F^T F (R^T S_Y^2 R &#8211; F^T F) =  \\lambda (F^T F &#8211; F^T F_0).$$<\/p>\n\n\n\n<p>Left-multiplying by $(F^T F)^{-1}$ and rearranging $$  R^T \\left(S_Y^2 &#8211; \\lambda&#8217; \\right) R =  F^T F- \\lambda&#8217; (F^TF)^{-1} F^T F_0.$$<\/p>\n\n\n\n<p>Applying SVD $ F= U_F S_F V_F^T,$  and \\begin{align} R^T \\left(S_Y^2 &#8211; \\la&#8217; \\right) R &amp;=  V_F S_F^2 V_F^T- \\la&#8217; V_F S_F^{-2} V_F^T V_F S_F U_F^T F_0\\\\ &amp;=  V_F S_F^2 V_F^T- \\la&#8217; V_F S_F^{-1} U_F^T F_0.\\end{align}<\/p>\n\n\n\n<p>The solution to this is to set $U_F^T = [V_F^T, 0].$ In that case $U_F^T F_0 = V_F^T$, and \\begin{align} R^T \\left(S_Y^2 &#8211; \\la&#8217;\\right) R &amp;=  V_F \\left(S_F^2 &#8211; {\\la&#8217; \\over S_F}  \\right) V_F^T,\\end{align} so that $$ \\boxed{V_F = R^T = V_X^T V_Y.}$$ The singular values $S_F$ are then the solutions to the independent cubic equations<br>$$\\boxed{S_F^3 + \\left(\\la&#8217; &#8211; S_Y^2\\right)S_F &#8211; \\la&#8217; = 0.}$$ Checking the limits,<br>$$ \\lim_{\\lambda \\to 0} S_F = S_Y, \\quad \\lim_{\\lambda \\to \\infty} S_F = 1.$$<\/p>\n\n\n\n<p>So we get $$ \\boxed{F_U = V_F S_F V_F^T, \\quad F_Q = 0}$$ with $V_F$ and $S_F$ as above, from which we get \\begin{align}\\boxed{\\wt{Z}_{UU} = V_F S_F V_F^T S_X^{-1}, \\quad \\wt Z_{QU} = 0.}\\end{align}<\/p>\n\n\n\n<p><em><strong>Remark:<\/strong> That was a lot of work to derive that $F_Q = 0$. If we knew that at the beginning we wouldn&#8217;t need the derivative of the loss and could just derive the expression for $F_U$ directly from it. I wonder if there&#8217;s a more direct way to see that $F_Q$ must be 0?<\/em><\/p>\n\n\n\n<p>Translating this back to connectivity space,<br>\\begin{align} Z &amp;= U_X \\wt{Z}_{UU} U_X^T\\\\ &amp;= U_X V_F S_F V_F^T S_X^{-1} U_X^T \\\\ &amp;= U_X V_X^T V_Y S_F V_Y^T V_X S_X^{-1} U_X^T.\\end{align}<\/p>\n\n\n\n<p>We can interpret this as <br>$$ Z = \\underbrace{(U_X V_X^T)}_{\\text{Nearest `rotation&#8217; to } X} \\cdot \\underbrace{(V_Y S_F V_Y^T)}_{\\text{Approximately } \\sqrt{Y^T Y}} \\cdot \\underbrace{(V_X S_X^{-1}U_X^T)}_{\\text{Left pseudoinverse of } X}$$<\/p>\n\n\n\n<p>When applied to $X$, we get <br>$$ Z X = U_X V_X^T V_Y S_F V_Y^T V_X S_X^{-1} U_X^T U_X S_X V_X^T = U_X V_X^T V_Y S_F V_Y^T .$$<\/p>\n\n\n\n<p>Finally, $$ X^T Z^T Z X =  V_Y S_F V_Y^T V_X U_X^T U_X V_X^T V_Y S_F V_Y^T = V_Y S_F^2 V_Y^T.$$ <\/p>\n\n\n\n<p>This will tend to $Y^T Y$ at low regularization, and to the identity at high regularization. The latter makes sense in hindsight because our regularization is in the eigenspace of $U_X$, and the connectivity is such that the corresponding pseudounits produce unit variance at the output.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Numerical verification<\/h3>\n\n\n\n<p>Here is what an example $F$ matrix looks like for one of the fits:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"379\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-3-1024x379.png\" alt=\"\" class=\"wp-image-1793\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-3-1024x379.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-3-300x111.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-3-768x284.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-3-1536x569.png 1536w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-3-2048x758.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>So $F_Q$ is indeed zero, and $F_U$ is symmetric. I can then compare $V_F$ and $S_F$ to their predicted values:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"532\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-4-1024x532.png\" alt=\"\" class=\"wp-image-1794\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-4-1024x532.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-4-300x156.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-4-768x399.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-4-1536x798.png 1536w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/04\/grafik-4.png 1556w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>This shows a good match modulo the sign-flipping on the eigenvectors, which is unavoidable due to their inherent sign ambiguity.<\/p>\n\n\n\n<p>So, the calculations seem correct.<\/p>\n\n\n\n<p>$$\\begin{flalign*} &amp;&amp; \\phantom{a} &amp; \\hfill \\square \\end{flalign*}$$<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today I went back to trying to understand the solution when using the original regularization. While doing so it occurred to me that if I use a slightly different regularization, I can get a closed-form solution for the feedforward connectivity $Z$, and without most (though not all) of the problems I was having in my [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,148],"tags":[36,52,56,3,50,57,40],"class_list":["post-1621","post","type-post","status-publish","format-standard","hentry","category-blog","category-research","tag-calculation","tag-decorrelation","tag-matrix-factorization","tag-optimization","tag-regularization","tag-whitening","tag-work"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/1621","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/comments?post=1621"}],"version-history":[{"count":137,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/1621\/revisions"}],"predecessor-version":[{"id":1803,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/1621\/revisions\/1803"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=1621"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/categories?post=1621"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/tags?post=1621"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}