Nearest orthonormal matrix

We sometimes need to find the nearest orthonormal matrix $Q$ to a given square matrix $X$. This can easily be done using Lagrange multipliers, as I’ll show here.

By ‘nearest’ we will mean in the element-wise sum-of-squares sense. This is equivalent to the squared Frobenius norm, so our optimization problem is $$ \argmin_Q \; {1 \over 2} \|X – Q\|_F^2 \quad \text{such that} \quad QQ^T = I.$$

Using that $\|A\|_F^2 = \tr(AA^T)$, a Lagrangian for this problem is $$L(Q, \Lambda) = {1\over 2} \tr( (X – Q)(X-Q)^T) + \tr(\Lambda^T(I – QQ^T)).$$

We need the gradient with respect to $Q$ so we first compute the differential $$dL = -\tr(dQ (X – Q)^T) – \tr(\Lambda^T dQ Q^T – \Lambda^T Q dQ^T).$$ The gradient satisfies $dL = \tr(\nabla_Q L^T dQ)$. To extract the gradient most easily we want the $dQ$’s to sit at the end of each trace term in the differential. To achieve this, we apply the linearity of the trace to split the second term of the differential, the invariance of the trace to transposing its argument to change $dQ^T$ to $dQ$ wherever it appears, and the circular property of the trace to move $dQ$’s around to the last position, and arrive at
$$dL = -\tr( (X – Q)^T dQ) – \tr(Q^T \Lambda^T dQ) – \tr(Q^T \Lambda dQ).$$

We now read off the gradient by combining the contributions from each term, $$ \nabla_Q L = – (X – Q) – \Lambda Q – \Lambda^T Q.$$

Setting the gradient to zero, we get $$Q – X = (\Lambda + \Lambda^T)Q.$$ Right-multiplying by $Q^T$ and using that $QQ^T = I$ at the optimum, we get after slight rearrangement $$ I – \Lambda – \Lambda^T = XQ^T,$$ so $XQ^T$ is symmetric. Using SVD to decompose $X$ into $U S V^T$ we have
$$ XQ^T = U S V^T Q^T = Q X^T = Q V S U^T \implies Q V = U \implies Q = U V^T. $$

$$\begin{flalign*} && \phantom{a} & \hfill \blacksquare \end{flalign*}$$

In summary, $$\boxed{ X = U S V^T \implies Q = U V^T.} $$ That is, the nearest orthonormal matrix to $X$ has the same left and right singular vectors, and unit singular values.

This problem is a special case of the orthogonal Procrustes problem, in which we’re searching for the rotation $Q$ that brings a second matrix $Y$ closest to $X$. The problem we’ve discussed here corresponds to the case $Y = I$.

Comments

Leave a Reply Cancel reply