A simple property of sparse vectors

This came up in Chapter 7 of Wainwright’s “High-dimensional Statistics”. In that Chapter we’re interested in determining how close solutions $\hat \theta$ to different flavours of the Lasso problem come to the true, $S$-sparse vector $\theta^*$.

A useful notion is the set of $S$-dominant vectors (my terminology):
$$ C(S) = \{x: \|x_{S^c}\|_1 \le \|x_S\|_1\},$$ i.e. those for which at least half of their $\ell_1$ norm lives on the indices $S$.

It turns out for any vector $\theta^*$ that is $S$-sparse i.e. whose non-zero elements are in the set $S$ alone, if another vector $\hat \theta$ satisfies $$\|\hat \theta\|_1 \le \|\theta^*\|_1,$$ then their difference is $S$-dominant,
$$\hat \Delta = \hat \theta – \theta^* \in C(S).$$

We’ll show this algebraically first, and then try to give some intuition for why it’s the case. The key is decomposition into the parts on $S$ and $S^c$, and noting that $\theta^*_{S^c} = 0$:
\begin{align}
\|\hat \theta\|_1 \le \|\theta^*\|_1 &\implies \|\hat \theta_{S^c}\|_1 + \|\hat \theta_S\|_1 \le \|\theta^*_S\|_1 \\ &\implies \|\hat \theta_{S^c}\|_1 \le \|\theta^*_S\|_1 – \|\hat \theta_S\|_1 \\&\implies \|\hat \theta_{S^c}\|_1 \le \|(\theta^* – \hat \theta)_S\|_1\\ &\implies \|(\theta^* – \hat \theta)_{S^c}\|_1 \le \|(\theta^* – \hat \theta)_S\|_1.\end{align}

While I can follow the derivation above, it’s not intuitive to me why it’s true in general. However, it is intuitive in the simple case where $\hat \theta$ lives entirely on $S^c$, in which case the result follows immediately, since
$$ (\hat \theta – \theta^*)_{S^c} = \hat \theta, \quad \text{and} \quad (\hat \theta – \theta^*)_S = -\theta^*.$$

The main result says that this holds generally, even when $\hat \theta$ has a component that lives on $S$. To make the connection to the simple case above, we decompose $\hat \theta$ into parts that live on and off of $S$:
$$\hat \theta = \hat \theta_S + \hat \theta_{S^c}. $$ Because the $\ell_1$ norm distributes over elements,
$$ \|\hat \theta\|_1 = \|\hat \theta_S\|_1 + \|\hat \theta_{S^c}\|_1 \le \|\theta^*_S\|_1.$$ Rearranging we get a bound on the excess norm of $\hat \theta$,$$ \|\hat \theta_{S^c}\|_1 \le \|\theta^*_S\|_1 – \|\hat \theta_S\|_1, $$ which we then relate to the error norm using the triangle inequality $$ \dots \le \|\theta^* – \hat \theta\|_1.$$

Not sure if this was much more intuitive!

$$ \blacksquare$$


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *