Notes on Kernel PCA

We’d like to project datapoints along the principal components in feature space. And because features can be hard to compute, we’d like to do this without computing the features explicitly. We’ll see below that we can do so using the kernel function associated with the feature map.

We start by computing the principal components in feature space. If we assume that the feature map has zero mean, then covariance in feature space is $$ \mathbf C = {1 \over N} \sum_n \phi(\xx_n) \phi(\xx_n)^T.$$ The principal components are the eigenvectors of this matrix, so satisfy $$ \CC \vv_i = \lambda_i \vv_i.$$

We don’t want to perform the eigendecomposition in feature space, so instead we notice, first, that the eigenvectors are linear combinations of the points in feature space,
$$ {N \over \lambda_i} \CC \vv_i = \sum_n \phi(\xx_n) \underbrace{\phi(\xx_n)^T \vv_i}_{a_{ni}} = \sum_n \phi(\xx_n) a_{ni} = \vv_i.$$

We can then write the eigenvalue equation in terms of the coefficients $\aa$ as
$$ \underbrace{{1 \over N} \sum_n \phi(\xx_n) \phi(\xx_n)^T}_{\CC} \underbrace{\sum_m \phi(\xx_m) a_{mi}}_{\vv_i} = \lambda_i \sum_k \phi(\xx_k) a_{ki}.$$

Multiplying on the left by $\phi(\xx_q)^T$ we get
$${1 \over N} \sum_n \sum_m \phi(\xx_q)^T \phi(\xx_n) \phi(\xx_n)^T \phi(\xx_m) a_{mi} = \lambda_i \sum_p \phi(\xx_q)^T \phi(\xx_p) a_{pi}.$$

Those inner products define the kernel, $$ K_{ab} = k(\xx_a, \xx_b) = \phi(\xx_a)^T \phi(\xx_b),$$ we get $${1 \over N} \sum_n \sum_m K_{qn} K_{nm} a_{mi} = \lambda_i \sum_p K_{qp} a_{pi}.$$

Applying the symmetry of the kernel, we can write the above as
$$ \KK^2 \aa_i = \lambda_i N \KK \aa_i.$$ Left multiplying by $\KK^{-1}$ we get $$\KK \aa_i = \lambda_i N \aa_i,$$ so the $\aa_i$ are eigenvectors of the kernel matrix $\KK$.

Projection onto feature space eigenvectors

Now that we have the eigenvectors in feature space, we’d like to project a new data point onto them. We can do this using only the kernel, since $$ \phi(\xx)^T \vv = \phi(\xx)^T \sum_j \phi(\xx_j) a_j = \sum_j k(\xx, \xx_j) a_j.$$

So as long as we can compute the kernel efficiently for new data points, we can compute the eigenvector projections without having to compute the features themselves.

Working with non-centered features

We’ve assumed above that the projections were zero mean, i.e. that $$\sum_n \phi(\xx_n) = \bzero.$$ They won’t be in general. One obvious solution is to mean-subtract the features – that is, to use
$$ \widetilde{\phi}(\xx_n) = \phi(\xx_n) – {1 \over N} \sum_n \phi(\xx_n).$$ But this requires actually computing the features, which we want to avoid.

Instead, we note that the kernel for the centered features, $\widetilde \KK$ can be expressed in terms of those for the uncentered features, $\KK$ since
\begin{align} \widetilde{\KK} &= (\phi(\XX) – {1 \over N} \phi(\XX)\bone \bone^T)^T (\phi(\XX) – {1 \over N} \phi(\XX)\bone \bone^T)\\
&= \phi(\XX)^T \phi(\XX) + {1 \over N^2} \bone \bone^T \phi(\XX)^T \phi(\XX) \bone \bone^T – {1 \over N} \bone \bone^T \phi(\XX)^T \phi(\XX) – {1 \over N} \phi(\XX)^T \phi(\XX) \bone \bone^T\\
&= \KK + {1 \over N^2} \bone \bone^T \KK \bone \bone^T – {1 \over N} \bone \bone^T \KK – {1 \over N} \KK \bone \bone^T,
\end{align} where $$\phi(\XX) = \left[\phi(\xx_1), \dots, \phi(\xx_N)\right].$$

So we can produce a centered kernel from the uncentred one, and then apply the results in the previous sections.


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *