How many neurons or trials to recover signal geometry?

This my transcription of notes on a VVTNS talk by Itamar Landau about recovering the geometry of high-dimensional neural signals corrupted by noise. Caveat emptor: These notes are based on what I remember or hastily wrote down during the presentation, so they likely contain errors and omissions.

Motivation

Experimentalists record from more and more neurons each year. In fact, there’s a corresponding Moore’s law with a period of 7 years.
Neural data often seems to lie on low-dimensional manifolds.
Neural recordings are corrupted by noise.

The broad question is then: Under what conditions can we recover the underlying geometry from noisy neural recordings? In particular, What are the influences of the number of neurons and the number of trials?

Problem setup

We consider an $N \times T$ data matrix $R$ containing the recorded activity $Y$ of $N$ neurons in $T$ trials corrupted by additive noise. That is, $$R = Y + \text{noise}.$$ We assume further that neural activity $Y$ lies on a $K$-dimensional subspace spanned by the orthonormal basis $U$, and has SVD $$ Y = \sigma_n U V^T,$$ where $U$ is $N \times K$, $V$ is $T \times K$, and the singular value $\sigma_s$, the same for all $K$ dimensions, measures the signal strength. The recorded activity is then $$ R = \sigma_s U V^T + \sigma_n Z,$$ where $Z$ is the $N \times T$ noise matrix whose elements are drawn independently (from a standard Gaussian I believe, though I think they have results for other distributions as well), and $\sigma_n$ is the noise strength.

Problem statement

Given the setting above, the problem is then: how does the addition of noise distort the singular values and singular vectors of the recorded signal $R$ relative to those of the neuronal activity $Y$, as a function of the SNR, and how does this change when one records more neurons (increases $N$) or more trials (increases $T$)? Itamar phrased it alternatively as understanding the input-output map of singular values and singular vectors.

Distortions in singular values are measured by simply comparing the singular values of the recorded signal $R$ to those of the neural activity $Y$.

Distortions in singular vectors are measured in two ways. First, by the overlap $u_1^T \hat u_1$ of the leading (left) singular vector $u_1$ of the neural activity and that of the recordings $\hat u_1$ – this is called the cross-overlap. Secondly, by the norm $\|\hat u_1^T U\|$ of the projection of the leading singular vector in the recordings into the signal subspace $U$ of the neural activity (the subspace overlap).

Signal models

They discussed results for three different signal models:

Finite rank, where $K$ is fixed while $N$ and $T$ are varied;
Extensive rank, where $K$ is some constant proportion of $N$;
Power law, where the signals were samples from a multivariate normal with a covariance whose eigenvalue spectrum obeyed followed a power law (similar to e.g. the Stringer et al. data from mouse visual cortex).

Key results

Finite rank signals

The finite rank case has been covered in previous work by other authors, e.g. here. In that setting the SNR is on the order of $(NT)^{-1/2}$. This means: if we record more neurons, we can record fewer trials.

There are also threshold effects in SNR. Below the threshold the signal singular values disappear into the noise bulk and singular vectors don’t overlap with the true values. Above threshold the singular values of the signals separate from the noise bulk and the the singular vectors start having $O(1)$ overlap with the true values.

Itamar and colleagues tested the predictions of the finite rank model on some mouse data from the Schnitzer lab by subsampling the data and seeing how the singular values and vectors changed. The finite rank theory:

Predicted singular value distortions well;
Predicted subspace overlap well;
Didn’t predict cross-overlap well;

extensive rank signals

This was the motivation to discuss the extensive rank regime, which is described in their recent PRE paper. Some of the key results in this section were

There are three phases to the recovered singular value distribution as the input signal strength is increased. At low signal strength, there’s only a single noise bulk. As the signal strength increases, a new bubble starts emerging out of the bulk. As the signal is increased further, the bubble separates entirely from the bulk.
The estimated (leading?) singular value (at the center of the bubble?) showed inflation i.e. were larger than the true value.
Related to inflation, there was competition between the recovered singular values, where the largest singular values would be larger than the true values and the smallest would be smaller (wouldn’t you get this just through the sorting?) The larger message here was that, unlike in the finite rank case, where the singular values don’t interact in the recovery and can be treated independently, in the extensive-rank case they do interact, presumably complicated the analysis.
In contrast to the finite rank setting, there wasn’t a threshold effect for cross-overlap. Even before the ‘bubbling-off’ phases of the recovered values, when all the recovered values were in the noise bulk, there started to be O(1) overlap between the recovered and true singular vectors.

power law signals

They have also started looking at data with power-law statistics, such as in mouse visual cortex. The signal model here is of $T$ independent samples from a centered multivariate Gaussian whose covariance spectrum obeys a power law. That is,
$$ Y = \{y_1, \dots y_T\}, \quad y_i \sim \mathcal{N}(0, C), \quad C = U\Lambda U^T, \quad \lambda_i = {\alpha – 1 \over \la_\text{min}} \left({\la_i \over \la_\text{min}}\right)^{-\alpha}.$$

They were interested in estimating two properties of the signal: the power law exponent $\alpha$ and the power law cutoff $\la_\text{min}$. They found that for estimating $\alpha$, $NT$ is important, just like in the finite rank case. So you can trade-off neurons against trials. In contrast, for $\la_\text{min}$, having more neurons meant you needed more neurons to estimate $\la_\text{min}$ – Itamar discussed it in terms of recovering the same extent of the power-law tail.

Summary

For finite rank signals, recover depends on $N T$: we can trade off neurons against trials.
Both singular value and singular vector distortion show threshold effects.
With increasing signal strength a recovered singular value emerges from the noise bulk.
For extensive rank signals, the recovered singular value density ‘buds-off’ a bubble around the recover singular values as the signal strength increases.
There is therefore a threshold effect for the singular value distribution.
Interestingly, there is no threshold effect for the overlaps: even when the SNR is low enough to only see a noise bulk, there can be O(1) overlap of the recover singular vectors with their true values.
What was the $NT$ tradeoff in the extensive rank case?
In the power law setting, we can trade neurons for trials if we’re estimating the power law exponent.
We need to scale trials with neurons if we’re interested in recovering the same tail-extent of the singular value distribution.

How would experimentalists know which signal regime they’re in, if any of those considered? Itamar addressed this on experimental data by looking at how the singular value and vector distortions varied under subsampling of the data, and compared these to the predictions from the different models. The predictions of the finite rank and extensive rank models for the singular value distortions were similar for the data he analyzed, but the prediction for the overlap were different. So he agreed that the cross-overlap, or the subspace-overlap, computed on train-test splits of subsampled data, would be a useful model-selection metric to determine the regime of the data.