{"id":1593,"date":"2024-03-30T19:00:14","date_gmt":"2024-03-30T19:00:14","guid":{"rendered":"https:\/\/sinatootoonian.com\/?p=1593"},"modified":"2025-12-27T16:05:20","modified_gmt":"2025-12-27T16:05:20","slug":"changing-regularization","status":"publish","type":"post","link":"https:\/\/sinatootoonian.com\/index.php\/2024\/03\/30\/changing-regularization\/","title":{"rendered":"Changing regularization"},"content":{"rendered":"\n<p>This morning it occurred to me that the problems we&#8217;re having with our equation \\begin{align}S^2 Z^2 S^2 &#8211; S C S = \\lambda (Z^{-1} &#8211; I)\\label{main}\\tag{1}\\end{align} are due to the regularizer we use, $\\|Z &#8211; I\\|_F^2$. This regularizer makes the default behavior of the feedforward connections passing the input directly to the output. But it&#8217;s also where the $Z^{-1}$ comes from in $\\Eqn{main}$, making the solution hard to understand.<\/p>\n\n\n\n<p>If instead we change our regularizer to $\\|Z^T Z &#8211; I\\|_F^2$, then not only are solutions easier to understand, but we get a closed-form answer. In this case, the objective becomes $$ L(Z) = {1 \\over 2} \\|Y^T Y &#8211; X^T Z^T Z X \\|_F^2 + {\\lambda \\over 2} \\|Z^T Z &#8211; I\\|_F^2.$$ This function only depends on $Z$ through $Z^TZ$, so letting $W = Z^T Z$, we instead optimize  $$ L(W) = {1 \\over 2} \\|Y^T Y &#8211; X^T W X \\|_F^2 + {\\lambda \\over 2} \\|W &#8211; I\\|_F^2.$$ Setting the gradient to zero, we get that $\\wt{W}_{UU}$ (which I will just call $W$ below for brevity), satisfies $$ S^2 W S^2 &#8211; S C S = \\lambda(I &#8211; W).$$ We can then explicitly solve for $W$ element-wise as $$ W_{ij} = {S_i S_j C_{ij} + \\lambda \\delta_{ij} \\over S_i^2 S_j^2 + \\lambda}.$$ <\/p>\n\n\n\n<p>Note that at this point we haven&#8217;t constrained $W$ at all. This can cause problems, as described below.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Problems<\/h3>\n\n\n\n<p>The closed-form solution above is great, but it comes with a few problems.<\/p>\n\n\n\n<p>First, the solution is for $W = Z^T Z$. Therefore $Z$ is not completely specified. The same issue occurs in the decorrelation literature, where further constraints on $Z$ are used to yield unique solutions. <\/p>\n\n\n\n<p>One such is the ZCA solution which constrains $Z = Z^T$. In that case, $W = Z^2$ so we can try simply taking $Z = \\sqrt{W}.$ This would be fine except that for my data, $W$, while symmetric, has negative eigenvalues. The square root will produce connectivity with complex values, which seem hard to interpret, at the very least. <\/p>\n\n\n\n<p>The root of this problem is that minimizing $L(W)$, without any additional constraints on $W$, won&#8217;t necessarily produce an answer that&#8217;s positive semi-definite. In fact, we were lucky that the answer was even symmetric when there was no explicit requirement that this be the case.<\/p>\n\n\n\n<p>Another possibility is the PCA solution which requires that $ZZ^T$ equal some diagonal matrix $D$. In that case, we get $Z (Z^T Z) = Z W = D Z$ or equivalently $W Z^T = D Z^T$. This means that the rows of $Z$ are the eigenvectors of $W$, which, naively, seems very tough to compute and neurally implement.<\/p>\n\n\n\n<p>An additional problem is the regularisation itself. A neural implementation of e.g. the ZCA case constrains $Z^2$ to be near $I$. Squaring mixes information about different synapses and implies that updating one synapse requires knowledge about many others &#8211; i.e. is non-local.<\/p>\n\n\n\n<p>One possibility might be that the solutions of found with the original regularization are similar to those found with the new regularization, with the appropriate additional constraints on $Z$. But at least for ZCA, this does not seem to be the case:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"523\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/03\/image-11-1024x523.png\" alt=\"\" class=\"wp-image-1600\" style=\"width:650px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/03\/image-11-1024x523.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/03\/image-11-300x153.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/03\/image-11-768x392.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/03\/image-11-1536x784.png 1536w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2024\/03\/image-11.png 1646w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>The right panel shows the closed-form solution described in the text.<\/em><\/figcaption><\/figure>\n\n\n\n<p>So I&#8217;ll keep this possibility in mind, but continue with my original regularization.<\/p>\n\n\n\n<p>$$\\begin{flalign*} &amp;&amp; \\phantom{a} &amp; \\hfill \\square \\end{flalign*}$$<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This morning it occurred to me that the problems we&#8217;re having with our equation \\begin{align}S^2 Z^2 S^2 &#8211; S C S = \\lambda (Z^{-1} &#8211; I)\\label{main}\\tag{1}\\end{align} are due to the regularizer we use, $\\|Z &#8211; I\\|_F^2$. This regularizer makes the default behavior of the feedforward connections passing the input directly to the output. But it&#8217;s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,148],"tags":[52,13,50,40,51],"class_list":["post-1593","post","type-post","status-publish","format-standard","hentry","category-blog","category-research","tag-decorrelation","tag-pca","tag-regularization","tag-work","tag-zca"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/1593","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/comments?post=1593"}],"version-history":[{"count":16,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/1593\/revisions"}],"predecessor-version":[{"id":1610,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/1593\/revisions\/1610"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=1593"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/categories?post=1593"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/tags?post=1593"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}