The Gaussian distribution in the usual parameters
The Gaussian distribution in one dimension is often parameterized using the mean $\mu$ and the variance $\sigma^2$, in terms of which $$ p(x|\mu, \sigma^2) = {1 \over \sqrt{2\pi \sigma^2}} \exp\left(-{(x – \mu)^2 \over 2 \sigma^2} \right).$$
The Gaussian distribution is in the exponential family. For distributions in this family, the normalizing function gives information about the moments of the distribution. For example, in Bishop eq. 2.228 we see that the negative gradient of the logarithm of the normalizing function informs about the mean.
A viewer on the YouTube channel noticed this, considered the form of the Gaussian distribution above and wondered how the gradient of the log of the normalizing function there, $1/\sqrt{2\pi \sigma^2}$, could inform about the mean, since the mean parameter doesn’t even show up in that function.
The Gaussian distribution in natural parameters
The confusion arises because the parameterization above is not the one we need to apply results about the exponential family. To find that parameterization, we need to express the distribution as (see Bishop 2.194)
$$ p(x|\eta) = h(x) g(\eta) \exp(\eta^T u(x)),$$ where $\eta$ is the vector of natural parameters of the distribution – which often don’t seem very natural!
To put the Gaussian in the form above, we match terms. The argument of the exponential in the usual parameterization is
\begin{align} -{1 \over 2\sigma^2} (x – \mu)^2 &= {-x^2 + 2 x\mu – \mu^2 \over 2\sigma^2}\\ &= \underbrace{\left[{\mu \over \sigma^2}, -{1 \over 2\sigma^2}\right]}_{\eta} \mathrel{}^T \underbrace{(x, x^2)}_{u(x)} – {\mu^2 \over 2\sigma^2}.\end{align} We then get $$ p(x|\mu, \eta) = \underbrace{{1 \over \sqrt{2\pi}}}_{h(x)} \underbrace{ {1 \over \sqrt{\sigma^2}} \exp\left(-{\mu^2 \over 2\sigma^2}\right)}_{g(\eta)} \exp([\mu/\sigma^{2}, -1/2\sigma^{2}]^T[x,x^2]).$$
This is now in the form we need. We see now that natural parameters for the Gaussian are $$ \eta = (\eta_1, \eta_2) = \left({\mu \over \sigma^2}, -{1 \over 2 \sigma^2}\right).$$ Note that other natural parameterizations are possible – e.g. if we’d taken $u(x) = (x, -x^2)$, then the second element of $\eta$ would have switched sign, etc.
Given the natural parameterization above, let’s verify the expression relating gradients of the normalizing function and moments of $u(x)$,
$$ – \nabla_\eta \log g(\eta) = \mathbb E (u(x)).$$
For us, \begin{align} g(\eta)&= {1 \over \sqrt{\sigma^2}} \exp\left(-{\mu^2 \over 2\sigma^2}\right) \\ L = \log g(\eta) &= {1 \over 2} \log{1 \over \sigma^2} -{\mu^2 \over 2\sigma^2}\\ &= {1\over 2} \log(-2\eta_2) + {\eta_1^2 \over 4 \eta_2}.\end{align}
Then $$-{\partial L \over \partial \eta_1} = – {\eta_1 \over 2\eta_2}= \mu = \mathbb E(x). \checkmark$$
Also $$-{\partial L \over \partial \eta_2} = -{1 \over 2\eta_2} + {\eta_1^2 \over 4 \eta_2^2} = \sigma^2 + \mu^2 = \var(x) + \mathbb E(x)^2 = \mathbb E(x^2) .\checkmark$$
So we see that indeed, the gradient of the logarithm of the normalizing function does indeed give us a mean, but it’s the mean of the natural statistics $(x, x^2)$, and the normalizing function we have to use is $g(\eta)$, not the one in the ‘usual’ parameterization.
$$\blacksquare$$
Leave a Reply