{"id":79,"date":"2024-01-04T15:53:09","date_gmt":"2024-01-04T15:53:09","guid":{"rendered":"https:\/\/sinatootoonian.com\/?p=79"},"modified":"2025-12-28T15:35:09","modified_gmt":"2025-12-28T15:35:09","slug":"the-nearest-dataset-with-a-given-covariance","status":"publish","type":"post","link":"https:\/\/sinatootoonian.com\/index.php\/2024\/01\/04\/the-nearest-dataset-with-a-given-covariance\/","title":{"rendered":"The nearest dataset with a given covariance"},"content":{"rendered":"\n<p>When analyzing neural activity I sometimes want to find the nearest dataset to the one I&#8217;m working with that also has a desired covariance $\\Sigma$. In this post I&#8217;m going to show how to compute such a dataset. <\/p>\n\n\n\n<p>Let the dataset we&#8217;re given be the $m \\times n$ matrix $X$, and the new dataset that we will produce be the matrix $Y$, of the same size as $X$. We assume that $n \\ge m$, that is, our matrices have as many or more columns as row. Our problem is to find a matrix $Y$ close to $X$ and with the desired covariance. <\/p>\n\n\n\n<p>We will measure closeness of matrices using the squared Frobenius norm, which is just the sum of the squared element-wise differences, $$ \\|A &#8211; B\\|_F^2 = \\sum_{ij} (A_{ij} &#8211; B_{ij})^2 = \\text{tr}\\left[(A &#8211; B) (A-B)^T\\right].$$ We therefore arrive at a constrained optimization problem: $$ \\argmin_Y {1 \\over 2} \\|X &#8211; Y\\|_F^2 \\quad \\text{ such that }\\quad \\text{cov}(Y) = \\Sigma.$$<\/p>\n\n\n\n<p>For simplicity we will use the biased covariance of $Y$ $$ \\text{cov}(Y) = {1 \\over n}(Y &#8211; \\overline{y}1^T)(Y &#8211; \\overline{y}1^T)^T = {1 \\over n} Y Y^T &#8211; \\overline{yy}^T,$$ where $\\overline{y} = {1 \\over n} Y 1$ is the $m$-dimensional vector of row means. It will be useful to express $Y$ as $$ Y = \\widetilde{Y} + \\overline{y}1^T,$$  where $\\widetilde{Y}$ contains the mean-subtracted rows. Then $$ YY^T = \\widetilde{Y}\\widetilde{Y}^T + {1 \\over n^2} Y 1 1^T 1 1^T Y^T = \\widetilde{Y}\\widetilde{Y}^T + {Y 1 \\over n} 1^T 1 {1^T  Y^T \\over n} = \\widetilde{Y}\\widetilde{Y}^T + n \\overline{y} \\overline{y}.$$ We can therefore write the covariance condition as  $$ n \\, \\text{cov}(Y) = YY^T &#8211; n \\overline{yy}^T = \\widetilde{Y}\\widetilde{Y}^T + n \\overline{yy} &#8211; n \\overline{yy} = \\widetilde{Y}\\widetilde{Y}^T= n \\Sigma.$$ <\/p>\n\n\n\n<p>We now express our optimization problem in terms of the row-means and mean-subtracted rows of $X$ and $Y$ as $$ \\argmin_{\\widetilde{Y}, \\overline{y}} {1 \\over 2} \\|\\widetilde{X} + \\overline{x}1^T &#8211; \\widetilde{Y} &#8211; \\overline{y}1^T\\|_F^2 \\quad \\text{ such that }\\quad \\widetilde{Y}\\widetilde{Y}^T= n \\Sigma. $$<\/p>\n\n\n\n<p>The matrices containing the row means are orthogonal to the matrices containing the mean-subtracted rows, e.g. $$ \\widetilde{X} (\\overline{y}1^T)^T = \\widetilde{X} 1 \\overline{y}^T = 0\\overline{y}^T = 0.$$ Since our objective uses the squared Frobenius norm, orthogonality means we can apply the Pythagoreran theorem to split the objective into a sum of a part containing only the row means, and a part containing the mean subtracted rows, so $$ \\argmin_{\\widetilde{Y}, \\overline{y}} {1 \\over 2} \\|\\overline{x}1^T &#8211; \\overline{y}1^T\\|_F^2  + {1 \\over 2} \\|\\widetilde{X} &#8211; \\widetilde{Y}\\|_F^2 \\quad \\text{ such that }\\quad \\widetilde{Y}\\widetilde{Y}^T= n \\Sigma. $$ Our constraint is only on the mean-subtracted rows, so we can set the row means of $Y$, $\\overline{y}$, to match the row means of $X$, $\\overline{x}$, thereby eliminating the first term in the objective. <\/p>\n\n\n\n<p>We have now simplified our optimization problem to only involve the mean-subtracted rows of $Y$: $$ \\argmin_{\\widetilde{Y}}  {1 \\over 2} \\|\\widetilde{X} &#8211; \\widetilde{Y}\\|_F^2 \\quad \\text{ such that }\\quad \\widetilde{Y}\\widetilde{Y}^T= n \\Sigma. $$ To solve this problem, we first perform the decompositions $$ \\widetilde{X} = U_x S_x V_x^T, \\quad \\widetilde{Y} = U_y S_y V_y^T, \\quad \\Sigma = U \\Psi U^T.$$ Our covariance constraint then becomes $$ \\widetilde{Y}\\widetilde{Y}^T = U_y S_y^2 U_y^T =n \\Sigma = n U \\Lambda U^T.$$ This implies that $$ U_y = U, \\quad  S_y = \\sqrt{n \\Psi} \\triangleq S,$$ where we&#8217;ve defined $S$ to avoid having to carry square roots around. <\/p>\n\n\n\n<p>Our optimization is now only over just $V_y$, which we will henceforth call $V$: $$ \\argmin_{V}  {1 \\over 2} \\|\\wt{X} &#8211; U S V^T \\|_F^2 \\quad \\text{ such that }\\quad V^T V = I. $$ Because the Frobenius norm is invariant to rotation, we can left multiply the terms inside the norm with a rotation without changing its value. Left-multiplying by $U^T$, our problem is equivalent to $$ \\argmin_{V}  {1 \\over 2} \\|\\wt{X}_1 &#8211; S V^T \\|_F^2 \\quad \\text{ such that }\\quad V^T V = I, $$ where we&#8217;ve defined $$ \\wt X_1 \\triangleq U^T \\wt X. $$ <\/p>\n\n\n\n<p>We solve our optimization problem with Lagrange multipliers using the Lagrangian $$ L(V, \\Lambda) = {1 \\over 2} \\tr\\left[(\\wt{X}_1 &#8211; S V^T)(\\wt{X}_1 &#8211; S V^T)^T\\right] + \\tr[\\Lambda^T (V^TV &#8211; I)].$$ To compute its gradient with respect to $V$ we compute its corresponding differential with respect to $V$ as<br>$$ dL= -\\tr[S dV^T (\\wt{X}_1 &#8211; SV^T)^T] + \\tr[\\Lambda^T dV^T V + \\Lambda^T V^T dV].$$ Using the rotation property of the trace, we have $$ dL= -\\tr[dV^T(\\wt{X}_1 &#8211; SV^T)^T S dV] + \\tr[ dV^T V \\Lambda^T + \\Lambda^T V^T dV],$$ from which the gradient of the Lagrangian with respect to $V$  is $$ L_V = -(\\wt{X}_1 &#8211; SV^T)^T S + V(\\Lambda + \\Lambda^T).$$ <\/p>\n\n\n\n<p>Setting the gradient to zero, we arrive at  $$ \\wt{X}^T_1 S &#8211; V S^2 = V (\\Lambda + \\Lambda^T).$$ Left-multiplying by $V^T$, and using that the constraint $V^T V = I$ is satisfied at the optimum and slightly rearranging, $$ V^T \\wt{X}_1^T S = S^2+ \\Lambda + \\Lambda^T.$$ The expression on the right-hand side is symmetric, therefore the one on the left must be as well. Defining $$ S \\wt{X}_1 \\triangleq \\wt{X}_2 = U_2 S_2 V_2^T$$ we have that $$V^T \\wt{X}_2^T = V^T V_2 S_2 U_2^T $$ must be symmetric, which can only happen if $$ V^T V_2 = U_2 \\iff V^T = U_2 V_2^T \\iff V = V_2 U_2^T.$$<\/p>\n\n\n\n<p>Putting all the pieces together, we have that <br>$$ \\begin{align*} \\boxed{ \\begin{aligned} Y^* &amp;= \\overline{x}1^T + \\wt{Y}^*,\\\\<br>\\wt{Y}^* &amp;= U \\sqrt{n \\Psi} U_2 V_2^T,\\end{aligned}}\\end{align*}$$ where $U$ and $\\Psi$ are the eigenvectors and eigenvalues of $\\Sigma$, respectively, and $U_2$ and $V_2$ are the left and right singular vectors of $\\wt{X}_2 = S \\wt{X_1} = S U^T \\wt{X}.$<\/p>\n\n\n\n<p>One check of this expression is that if our dataset already has the desired covariance, the new dataset should equal the old one. That is, $$n \\Sigma = \\wt{X} \\wt{X}^T \\implies \\wt{Y}^* = \\wt{X}.$$ In this case, $$ \\begin{align}\\wt{X} &amp;= U \\sqrt{n \\Psi} V_x^T \\\\ \\wt{X}_1 &amp;= U^T X = \\sqrt{n \\Psi} V_x^T \\\\ \\wt{X}_2 &amp;= S \\wt{X}_1 = n \\Psi V_x^T \\end{align}.$$ Therefore, $U_2 = I$ and $V_2 = V_x$, so we get $$ \\wt{Y}^* = U \\sqrt{n \\Psi} U_2 V_2^T = U \\sqrt{n \\Psi} V_x^T = \\wt{X},$$ as required.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Code<\/h2>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndef nearest_matrix_with_cov(X, \u03a3):\n    m, n = X.shape\n    assert n &gt;= m, &quot;X has fewer columns than rows.&quot;\n    xm = X.mean(axis=1)\n    E, U = np.linalg.eigh(\u03a3)\n    S = np.diag(np.sqrt(n * E))\n    Xms = X - xm&#x5B;:,np.newaxis]\n    X1 = U.T @ Xms\n    X2 = S @ X1\n    U2, _, V2 = np.linalg.svd(X2, full_matrices=False)\n    Yms = U @ S @ U2 @ V2\n    Y = Yms + xm&#x5B;:,np.newaxis]\n    return Y\n<\/pre><\/div>","protected":false},"excerpt":{"rendered":"<p>When analyzing neural activity I sometimes want to find the nearest dataset to the one I&#8217;m working with that also has a desired covariance $\\Sigma$. In this post I&#8217;m going to show how to compute such a dataset. Let the dataset we&#8217;re given be the $m \\times n$ matrix $X$, and the new dataset that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,152],"tags":[3],"class_list":["post-79","post","type-post","status-publish","format-standard","hentry","category-blog","category-post","tag-optimization"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/79","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/comments?post=79"}],"version-history":[{"count":143,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/79\/revisions"}],"predecessor-version":[{"id":235,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/79\/revisions\/235"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=79"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/categories?post=79"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/tags?post=79"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}