{"id":7546,"date":"2026-04-21T16:19:04","date_gmt":"2026-04-21T15:19:04","guid":{"rendered":"https:\/\/sinatootoonian.com\/?p=7546"},"modified":"2026-04-21T16:19:05","modified_gmt":"2026-04-21T15:19:05","slug":"relating-gain-and-crowding-in-the-diagonal-model","status":"publish","type":"post","link":"https:\/\/sinatootoonian.com\/index.php\/2026\/04\/21\/relating-gain-and-crowding-in-the-diagonal-model\/","title":{"rendered":"Relating Gain and Crowding in the Diagonal Model"},"content":{"rendered":"\n<p>When we <a href=\"https:\/\/sinatootoonian.com\/index.php\/2026\/03\/05\/linearizing-the-covariance-loss\/\" data-type=\"post\" data-id=\"7014\">linearized the diagonal model<\/a> we determined the gains relative to unity as $$ \\bdelta = (\\GG^T \\GG + \\lambda \\II)^{-1} \\GG^T \\rr,$$ where $\\rr$ is the vectorized residual $\\SS &#8211; \\XX^T \\XX$. We&#8217;d like to not just report these numbers, but explain them. Complexity in the explanation derives from the correlations in the off-diagonal terms of $\\GG^T\\GG$, the overlaps between the  representations of each unit. We found in that post that ignoring those overlaps doesn&#8217;t give a good fit. So we instead stuck with the formula above, but expressed it in terms if whitening the target covariance, and whitening the filters we&#8217;re matching that against. That may ultimately be what we have to do, but it does again make things hard to explain. <\/p>\n\n\n\n<p>Before we started doing mean-subtraction, we explained the gains the diagonal model assigned to a unit in terms of &#8220;alignment&#8221;, defined as its representation&#8217;s overlap with the target, and crowding, how much its representation aligned with that of the others. Can we do something similar here?<\/p>\n\n\n\n<p>To see how that would work, let&#8217;s rewrite the above equation as $$ (\\GG^T \\GG + \\lambda \\II) \\bdelta = \\GG^T \\rr.$$ In terms of single units, this says $$ \\sum_j (\\GG^T \\GG)_{ij} \\delta_j + \\lambda \\delta_i = \\bg_i^T \\rr \\triangleq a_i,$$ where $a_i$ is the unit&#8217;s alignment with the residual. <\/p>\n\n\n\n<p>Rearranging to isolate $\\delta_i$, we get $$ \\|\\bg_i\\|_2^2 \\delta_i + \\sum_{j \\neq i} \\bg_i^T \\bg_j \\delta_j + \\lambda \\delta_i = a_i.$$ We can then solve $$ \\delta_i = {a_i &#8211; b_i \\over \\|\\bg_i\\|_2^2 + \\lambda}, \\quad b_i \\triangleq \\sum_{j \\neq i} \\bg_i^T \\bg_j \\delta_j.$$<\/p>\n\n\n\n<p>Now $b_i$ looks like a crowding term, measuring overlaps of representations from each term. The problem is that we need the deviations $\\delta_j$, from the other units, and these won&#8217;t be available until we actually perform the computation. What we&#8217;d like is to give experimentalists a simple rule, just based on the target and input representations, that predicts which units will have large gains. We don&#8217;t want this rule to depend on the gains of other units, as those won&#8217;t be initially available to the experimenters.<\/p>\n\n\n\n<p>We got around this before, when we weren&#8217;t averaging, and were looking at the raw gains not their deviations, by using a crowding exponent: raising each term to a fixed value less than 1. This was purely heuristic. And since the gains were being regularized to be near 1, even setting this exponent to 1, equivalent to assuming all the other gains were exactly 1, gave good results. We can&#8217;t do that here because we&#8217;re working with deviations and their regularized target is 0.<\/p>\n\n\n\n<p>It would be useful if there was a simple empirical relationship that we could exploit to approximate $b_i$ without having to know much about the actual gains. To check, let&#8217;s plot $b_i$ against $\\hat b_i \\triangleq \\sum_j \\bg_i^T \\bg_j$:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"465\" height=\"467\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-28.png\" alt=\"\" class=\"wp-image-7566\" style=\"aspect-ratio:0.9957506608224311;width:361px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-28.png 465w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-28-300x300.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-28-150x150.png 150w\" sizes=\"auto, (max-width: 465px) 100vw, 465px\" \/><\/figure>\n\n\n\n<p>A few things to notice:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The $b_i$ are all negative. That means that the effect of the overlaps is to boost the gain beyond the individual contributions.<\/li>\n\n\n\n<li>The relationship is almost perfectly linear.<\/li>\n\n\n\n<li>The slope of the linearity is negative: the larger the overlaps with the rest of the population, the more the boost.<\/li>\n\n\n\n<li>The intercept is basically zero, so we basically have a proportionality here.<\/li>\n<\/ol>\n\n\n\n<p>Are these trivial consequences of a simple relationship?<\/p>\n\n\n\n<p>It&#8217;s certainly not a generic property: if I randomly generate $\\GG$ using Gaussian elements, I get:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"906\" height=\"826\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/04\/image-2.png\" alt=\"\" class=\"wp-image-7650\" style=\"aspect-ratio:1.0968728576701998;width:318px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/04\/image-2.png 906w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/04\/image-2-300x274.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/04\/image-2-768x700.png 768w\" sizes=\"auto, (max-width: 906px) 100vw, 906px\" \/><\/figure>\n\n\n\n<p>We can write our estimate of the residual as $$\\hat \\rr = \\GG \\bdelta = \\sum_i \\bg_i \\delta_i = \\bg_i \\delta_i + \\sum_{j \\neq i} \\bg_j \\delta_j \\triangleq \\bg_i \\delta_i + \\hat \\rr_i,$$ where $\\hat \\rr_i$ is our estimate with the contribution from unit $i$ removed.<\/p>\n\n\n\n<p>In these terms, $$ b_i = \\bg_i^T \\sum_{j \\neq i} \\bg_j \\delta_j = \\bg_i^T \\hat \\rr_i.$$ So, it&#8217;s the overlap of a unit&#8217;s representation with the residual without that unit. If that overlap is large, it will depress the gain on that unit relative to the uncorrelated case. That makes sense &#8211; when we assume units are uncorrelated, each should make an independent contribution. If there&#8217;s overlap with what the other units can produce, that means the given unit&#8217;s contribution wasn&#8217;t unique, and can be distributed to the others, reducing its gain.<\/p>\n\n\n\n<p>Our estimate of this term is $$ \\hat b_i = \\bg_i^T \\sum_{j \\neq i} \\bg_i = \\bg_i^T \\overline \\bg_i,$$ where we&#8217;ve defined $$ \\overline \\bg_i \\triangleq \\sum_{j \\neq i} \\bg_i,$$ the mean population representation without unit i (up to scaling).<\/p>\n\n\n\n<p>What we&#8217;ve found above is that $$ b_i \\propto  -\\hat b_i.$$<\/p>\n\n\n\n<p>This says that the more a unit overlaps with the remaining population vector, the larger its effective boost (since $-b_i$ increases). This seems counterintuitive. <\/p>\n\n\n\n<p>To better understand this, lets write the estimate in terms of the population mean.<\/p>\n\n\n\n<p>\\begin{align*} \\hat \\rr_i = \\sum_{j \\neq i} \\bg_j \\delta_j &amp;= \\sum_{j \\neq i} (\\bg_j &#8211; \\overline \\bg_i) \\delta_j + \\overline \\bg_i \\sum_{j \\neq i} \\delta_j. \\\\ b_i = \\bg_i^T \\hat \\rr_i &amp;= \\bg_i^T \\sum_{j \\neq i} (\\bg_j &#8211; \\overline \\bg_i) \\delta_j + \\bg_i^T \\overline \\bg_i \\sum_{j \\neq i} \\delta_j \\\\b_i &amp;= \\bg_i^T \\sum_{j \\neq i} (\\bg_j &#8211; \\overline \\bg_i) \\delta_j + \\hat b_i \\sum_{j \\neq i} \\delta_j\\\\ &amp;= d_i + c_i \\hat b_i\\end{align*}<\/p>\n\n\n\n<p>This reveals a linear relationship between any given $b_i$ and its corresponding $\\hat b_i$. But the constants of proportionality, $c_i$, and the intercepts, $d_i$, vary per unit:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"846\" height=\"297\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-30.png\" alt=\"\" class=\"wp-image-7609\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-30.png 846w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-30-300x105.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-30-768x270.png 768w\" sizes=\"auto, (max-width: 846px) 100vw, 846px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"821\" height=\"306\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-29.png\" alt=\"\" class=\"wp-image-7608\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-29.png 821w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-29-300x112.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-29-768x286.png 768w\" sizes=\"auto, (max-width: 821px) 100vw, 821px\" \/><\/figure>\n\n\n\n<p>Interestingly, the intercept scales linearly with $\\hat b_i$,<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"823\" height=\"323\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-31.png\" alt=\"\" class=\"wp-image-7610\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-31.png 823w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-31-300x118.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-31-768x301.png 768w\" sizes=\"auto, (max-width: 823px) 100vw, 823px\" \/><\/figure>\n\n\n\n<p>while the coefficients are more or less constant at approximately $\\sum \\delta_i$<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"847\" height=\"308\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-32.png\" alt=\"\" class=\"wp-image-7611\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-32.png 847w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-32-300x109.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/03\/image-32-768x279.png 768w\" sizes=\"auto, (max-width: 847px) 100vw, 847px\" \/><\/figure>\n\n\n\n<p>Hmm&#8230;<\/p>\n\n\n\n<p>$$\\blacksquare$$<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When we linearized the diagonal model we determined the gains relative to unity as $$ \\bdelta = (\\GG^T \\GG + \\lambda \\II)^{-1} \\GG^T \\rr,$$ where $\\rr$ is the vectorized residual $\\SS &#8211; \\XX^T \\XX$. We&#8217;d like to not just report these numbers, but explain them. Complexity in the explanation derives from the correlations in the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,166,148],"tags":[],"class_list":["post-7546","post","type-post","status-publish","format-standard","hentry","category-blog","category-bulb-io","category-research"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/7546","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/comments?post=7546"}],"version-history":[{"count":64,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/7546\/revisions"}],"predecessor-version":[{"id":7701,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/7546\/revisions\/7701"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=7546"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/categories?post=7546"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/tags?post=7546"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}