The nearest dataset with a given covariance

When analyzing neural activity I sometimes want to find the nearest dataset to the one I’m working with that also has a desired covariance $\Sigma$. In this post I’m going to show how to compute such a dataset.

Let the dataset we’re given be the $m \times n$ matrix $X$, and the new dataset that we will produce be the matrix $Y$, of the same size as $X$. We assume that $n \ge m$, that is, our matrices have as many or more columns as row. Our problem is to find a matrix $Y$ close to $X$ and with the desired covariance.

We will measure closeness of matrices using the squared Frobenius norm, which is just the sum of the squared element-wise differences, $$ \|A – B\|_F^2 = \sum_{ij} (A_{ij} – B_{ij})^2 = \text{tr}\left[(A – B) (A-B)^T\right].$$ We therefore arrive at a constrained optimization problem: $$ \argmin_Y {1 \over 2} \|X – Y\|_F^2 \quad \text{ such that }\quad \text{cov}(Y) = \Sigma.$$

For simplicity we will use the biased covariance of $Y$ $$ \text{cov}(Y) = {1 \over n}(Y – \overline{y}1^T)(Y – \overline{y}1^T)^T = {1 \over n} Y Y^T – \overline{yy}^T,$$ where $\overline{y} = {1 \over n} Y 1$ is the $m$-dimensional vector of row means. It will be useful to express $Y$ as $$ Y = \widetilde{Y} + \overline{y}1^T,$$ where $\widetilde{Y}$ contains the mean-subtracted rows. Then $$ YY^T = \widetilde{Y}\widetilde{Y}^T + {1 \over n^2} Y 1 1^T 1 1^T Y^T = \widetilde{Y}\widetilde{Y}^T + {Y 1 \over n} 1^T 1 {1^T Y^T \over n} = \widetilde{Y}\widetilde{Y}^T + n \overline{y} \overline{y}.$$ We can therefore write the covariance condition as $$ n \, \text{cov}(Y) = YY^T – n \overline{yy}^T = \widetilde{Y}\widetilde{Y}^T + n \overline{yy} – n \overline{yy} = \widetilde{Y}\widetilde{Y}^T= n \Sigma.$$

We now express our optimization problem in terms of the row-means and mean-subtracted rows of $X$ and $Y$ as $$ \argmin_{\widetilde{Y}, \overline{y}} {1 \over 2} \|\widetilde{X} + \overline{x}1^T – \widetilde{Y} – \overline{y}1^T\|_F^2 \quad \text{ such that }\quad \widetilde{Y}\widetilde{Y}^T= n \Sigma. $$

The matrices containing the row means are orthogonal to the matrices containing the mean-subtracted rows, e.g. $$ \widetilde{X} (\overline{y}1^T)^T = \widetilde{X} 1 \overline{y}^T = 0\overline{y}^T = 0.$$ Since our objective uses the squared Frobenius norm, orthogonality means we can apply the Pythagoreran theorem to split the objective into a sum of a part containing only the row means, and a part containing the mean subtracted rows, so $$ \argmin_{\widetilde{Y}, \overline{y}} {1 \over 2} \|\overline{x}1^T – \overline{y}1^T\|_F^2 + {1 \over 2} \|\widetilde{X} – \widetilde{Y}\|_F^2 \quad \text{ such that }\quad \widetilde{Y}\widetilde{Y}^T= n \Sigma. $$ Our constraint is only on the mean-subtracted rows, so we can set the row means of $Y$, $\overline{y}$, to match the row means of $X$, $\overline{x}$, thereby eliminating the first term in the objective.

We have now simplified our optimization problem to only involve the mean-subtracted rows of $Y$: $$ \argmin_{\widetilde{Y}} {1 \over 2} \|\widetilde{X} – \widetilde{Y}\|_F^2 \quad \text{ such that }\quad \widetilde{Y}\widetilde{Y}^T= n \Sigma. $$ To solve this problem, we first perform the decompositions $$ \widetilde{X} = U_x S_x V_x^T, \quad \widetilde{Y} = U_y S_y V_y^T, \quad \Sigma = U \Psi U^T.$$ Our covariance constraint then becomes $$ \widetilde{Y}\widetilde{Y}^T = U_y S_y^2 U_y^T =n \Sigma = n U \Lambda U^T.$$ This implies that $$ U_y = U, \quad S_y = \sqrt{n \Psi} \triangleq S,$$ where we’ve defined $S$ to avoid having to carry square roots around.

Our optimization is now only over just $V_y$, which we will henceforth call $V$: $$ \argmin_{V} {1 \over 2} \|\wt{X} – U S V^T \|_F^2 \quad \text{ such that }\quad V^T V = I. $$ Because the Frobenius norm is invariant to rotation, we can left multiply the terms inside the norm with a rotation without changing its value. Left-multiplying by $U^T$, our problem is equivalent to $$ \argmin_{V} {1 \over 2} \|\wt{X}_1 – S V^T \|_F^2 \quad \text{ such that }\quad V^T V = I, $$ where we’ve defined $$ \wt X_1 \triangleq U^T \wt X. $$

We solve our optimization problem with Lagrange multipliers using the Lagrangian $$ L(V, \Lambda) = {1 \over 2} \tr\left[(\wt{X}_1 – S V^T)(\wt{X}_1 – S V^T)^T\right] + \tr[\Lambda^T (V^TV – I)].$$ To compute its gradient with respect to $V$ we compute its corresponding differential with respect to $V$ as
$$ dL= -\tr[S dV^T (\wt{X}_1 – SV^T)^T] + \tr[\Lambda^T dV^T V + \Lambda^T V^T dV].$$ Using the rotation property of the trace, we have $$ dL= -\tr[dV^T(\wt{X}_1 – SV^T)^T S dV] + \tr[ dV^T V \Lambda^T + \Lambda^T V^T dV],$$ from which the gradient of the Lagrangian with respect to $V$ is $$ L_V = -(\wt{X}_1 – SV^T)^T S + V(\Lambda + \Lambda^T).$$

Setting the gradient to zero, we arrive at $$ \wt{X}^T_1 S – V S^2 = V (\Lambda + \Lambda^T).$$ Left-multiplying by $V^T$, and using that the constraint $V^T V = I$ is satisfied at the optimum and slightly rearranging, $$ V^T \wt{X}_1^T S = S^2+ \Lambda + \Lambda^T.$$ The expression on the right-hand side is symmetric, therefore the one on the left must be as well. Defining $$ S \wt{X}_1 \triangleq \wt{X}_2 = U_2 S_2 V_2^T$$ we have that $$V^T \wt{X}_2^T = V^T V_2 S_2 U_2^T $$ must be symmetric, which can only happen if $$ V^T V_2 = U_2 \iff V^T = U_2 V_2^T \iff V = V_2 U_2^T.$$

Putting all the pieces together, we have that
$$ \begin{align*} \boxed{ \begin{aligned} Y^* &= \overline{x}1^T + \wt{Y}^*,\\
\wt{Y}^* &= U \sqrt{n \Psi} U_2 V_2^T,\end{aligned}}\end{align*}$$ where $U$ and $\Psi$ are the eigenvectors and eigenvalues of $\Sigma$, respectively, and $U_2$ and $V_2$ are the left and right singular vectors of $\wt{X}_2 = S \wt{X_1} = S U^T \wt{X}.$

One check of this expression is that if our dataset already has the desired covariance, the new dataset should equal the old one. That is, $$n \Sigma = \wt{X} \wt{X}^T \implies \wt{Y}^* = \wt{X}.$$ In this case, $$ \begin{align}\wt{X} &= U \sqrt{n \Psi} V_x^T \\ \wt{X}_1 &= U^T X = \sqrt{n \Psi} V_x^T \\ \wt{X}_2 &= S \wt{X}_1 = n \Psi V_x^T \end{align}.$$ Therefore, $U_2 = I$ and $V_2 = V_x$, so we get $$ \wt{Y}^* = U \sqrt{n \Psi} U_2 V_2^T = U \sqrt{n \Psi} V_x^T = \wt{X},$$ as required.

Code

def nearest_matrix_with_cov(X, Σ):
    m, n = X.shape
    assert n >= m, "X has fewer columns than rows."
    xm = X.mean(axis=1)
    E, U = np.linalg.eigh(Σ)
    S = np.diag(np.sqrt(n * E))
    Xms = X - xm[:,np.newaxis]
    X1 = U.T @ Xms
    X2 = S @ X1
    U2, _, V2 = np.linalg.svd(X2, full_matrices=False)
    Yms = U @ S @ U2 @ V2
    Y = Yms + xm[:,np.newaxis]
    return Y

The nearest dataset with a given covariance

Code

Comments

Leave a Reply Cancel reply