The Normal Distribution
[SchonL11] provides a very nice exposition on this topic.
However, its step (29) is not obvious. [Wan] is another
interpretation that is possibly clearer and more concise.
Exercise 5.1
The following facts are useful in this proof:
\[\begin{split}\newcommand{\E}[1]{\operatorname{E}\left[#1\right]}
\newcommand{\Cov}[1]{\operatorname{cov}\left(#1\right)}
\begin{gather*}
\boldsymbol{\mu} = \E{\mathbf{x}}\\\\
\E{\mathbf{A} \mathbf{x} + \mathbf{b}} =
\mathbf{A} \E{\mathbf{x}} + \mathbf{b}\\\\
\boldsymbol{\Sigma} = \Cov{\mathbf{x}} =
\E{
\left( \mathbf{x} - \E{\mathbf{x}} \right)
\left( \mathbf{x} - \E{\mathbf{x}} \right)^\top
} =
\E{
\mathbf{x} \mathbf{x}^\top - \mathbf{x} \E{\mathbf{x}}^\top -
\E{\mathbf{x}} \mathbf{x}^\top +
\E{\mathbf{x}} \E{\mathbf{x}}^\top
}\\\\
\Cov{\mathbf{A} \mathbf{x} + \mathbf{b}} =
\mathbf{A} \Cov{\mathbf{x}} \mathbf{A}^\top
\end{gather*}\end{split}\]
Let \(\mathbf{y} = \mathbf{A} \mathbf{x} + \mathbf{b}\) where
\(\mathbf{A}\) is nonsingular so that
\(\mathbf{x} = \mathbf{A}^{-1} (\mathbf{y} - \mathbf{b})\). The mean and
covariance are derived as
\[\begin{split}\boldsymbol{\mu} &= \E{\mathbf{x}}\\
&= \E{\mathbf{A}^{-1} (\mathbf{y} - \mathbf{b})}\\
&= \mathbf{A}^{-1} \E{\mathbf{y}} - \mathbf{A}^{-1} \mathbf{b}\\
\mathbf{A} \boldsymbol{\mu} + \mathbf{b} &= \E{\mathbf{y}}\\
&= \tilde{\boldsymbol{\mu}}\end{split}\]
and
\[\begin{split}\boldsymbol{\Sigma} &= \Cov{\mathbf{x}}\\
&= \Cov{\mathbf{A}^{-1} (\mathbf{y} - \mathbf{b})}\\
&= \mathbf{A}^{-1} \Cov{\mathbf{y} - \mathbf{b}} \mathbf{A}^{-\top}\\
&= \mathbf{A}^{-1} \Cov{\mathbf{y}} \mathbf{A}^{-\top}\\
\mathbf{A} \boldsymbol{\Sigma} \mathbf{A}^\top &= \Cov{\mathbf{y}}\\
&= \tilde{\boldsymbol{\Sigma}}.\end{split}\]
Thus
\[\DeclareMathOperator{\NormDist}{Norm}
Pr(\mathbf{y}) =
\NormDist_{\mathbf{y}}\left[
\mathbf{A} \boldsymbol{\mu} + \mathbf{b},
\mathbf{A} \boldsymbol{\Sigma} \mathbf{A}^\top
\right].\]
Exercise 5.2
See Exercise 5.1 for the derivations of
the following terms.
A solution to
\[\begin{split}\begin{aligned}
\mathbf{I} &= \tilde{\boldsymbol{\Sigma}}\\
&= \mathbf{A} \boldsymbol{\Sigma} \mathbf{A}^{\top}\\
\mathbf{A}^{-1} \mathbf{A}^{-\top} &= \boldsymbol{\Sigma}
\end{aligned}
\quad \text{and} \quad
\begin{aligned}
\boldsymbol{0} &= \tilde{\boldsymbol{\mu}}\\
&= \mathbf{A} \boldsymbol{\mu} + \mathbf{b}\\
\mathbf{b} &= -\mathbf{A} \boldsymbol{\mu}
\end{aligned}\end{split}\]
is to set \(\mathbf{A} = \boldsymbol{\Sigma}^{-1 / 2}\) resulting in
\(\mathbf{b} = -\boldsymbol{\Sigma}^{-1 / 2} \boldsymbol{\mu}\).
Exercise 5.3
Recall that
\[\begin{split}Pr(\mathbf{x} = \begin{bmatrix} \mathbf{x}_1\\ \mathbf{x}_2 \end{bmatrix})
&= Pr(\mathbf{x}_1, \mathbf{x}_2)\\
&= \NormDist_{\mathbf{x}}\left[
\boldsymbol{\mu}
= \begin{bmatrix}
\boldsymbol{\mu}_1\\ \boldsymbol{\mu}_2
\end{bmatrix},
\boldsymbol{\Sigma}
= \begin{bmatrix}
\boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{21}^\top\\
\boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22}
\end{bmatrix}
\right]\\
&= \frac{1}{
(2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{1 / 2}
}
\exp\left[
-0.5 (\mathbf{x} - \boldsymbol{\mu})^\top
\boldsymbol{\Sigma}^{-1}
(\mathbf{x} - \boldsymbol{\mu})
\right]\end{split}\]
where
\(\boldsymbol{\Sigma}_{11} \in \mathbb{R}^{p \times p}\),
\(\boldsymbol{\Sigma}_{21} \in \mathbb{R}^{q \times p}\),
\(\boldsymbol{\Sigma}_{22} \in \mathbb{R}^{q \times q}\), and
\(p + q = D\).
The Schur complement \(\mathbf{S}\) of \(\boldsymbol{\Sigma}_{11}\) in
\(\boldsymbol{\Sigma}\) is defined as
\[\mathbf{S} =
\boldsymbol{\Sigma}_{22} -
\boldsymbol{\Sigma}_{21}
\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\Sigma}_{21}^\top.\]
It is symmetric positive definite because \(\boldsymbol{\Sigma}\) is
positive definite according to (5.7). This quantity is useful for deriving a
closed-form expression for the inverse of the full covariance matrix:
\[\begin{split}\boldsymbol{\Sigma}^{-1}
&= \left(
\begin{bmatrix}
\mathbf{I}_p & \boldsymbol{0}\\
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{11}^{-1} &
\mathbf{I}_q
\end{bmatrix}
\begin{bmatrix}
\boldsymbol{\Sigma}_{11} & \boldsymbol{0}\\
\boldsymbol{0} & \mathbf{S}
\end{bmatrix}
\begin{bmatrix}
\mathbf{I}_p &
\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\Sigma}_{21}^\top\\
\boldsymbol{0} & \mathbf{I}_q
\end{bmatrix}
\right)^{-1}\\
&= \begin{bmatrix}
\mathbf{I}_p &
-\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\Sigma}_{21}^\top\\
\boldsymbol{0} & \mathbf{I}_q
\end{bmatrix}
\begin{bmatrix}
\boldsymbol{\Sigma}_{11}^{-1} & \boldsymbol{0}\\
\boldsymbol{0} & \mathbf{S}^{-1}
\end{bmatrix}
\begin{bmatrix}
\mathbf{I}_p & \boldsymbol{0}\\
-\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{11}^{-1} &
\mathbf{I}_q
\end{bmatrix}\\
&= \begin{bmatrix}
\boldsymbol{\Sigma}_{11}^{-1} +
\boldsymbol{\Sigma}_{11}^{-1}
\boldsymbol{\Sigma}_{21}^\top
\mathbf{S}^{-1}
\boldsymbol{\Sigma}_{21}
\boldsymbol{\Sigma}_{11}^{-1} &
-\boldsymbol{\Sigma}_{11}^{-1}
\boldsymbol{\Sigma}_{21}^\top \mathbf{S}^{-1}\\
-\mathbf{S}^{-1} \boldsymbol{\Sigma}_{21}
\boldsymbol{\Sigma}_{11}^{-1} &
\mathbf{S}^{-1}
\end{bmatrix}.\end{split}\]
The foregoing expression simplifies the determinant of
\(\boldsymbol{\Sigma}\) to
\[\begin{split}\left\vert \boldsymbol{\Sigma} \right\vert
&= \left\vert
\begin{bmatrix}
\mathbf{I}_p & \boldsymbol{0}\\
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{11}^{-1} &
\mathbf{I}_q\\
\end{bmatrix}
\begin{bmatrix}
\boldsymbol{\Sigma}_{11} & \boldsymbol{0}\\
\boldsymbol{0} & \mathbf{S}
\end{bmatrix}
\begin{bmatrix}
\mathbf{I}_p &
\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\Sigma}_{21}^\top\\
\boldsymbol{0} & \mathbf{I}_q
\end{bmatrix}
\right\vert\\
&= \left\vert
\begin{bmatrix}
\mathbf{I}_p & \boldsymbol{0}\\
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{11}^{-1} &
\mathbf{I}_q\\
\end{bmatrix}
\right\vert
\left\vert
\begin{bmatrix}
\boldsymbol{\Sigma}_{11} & \boldsymbol{0}\\
\boldsymbol{0} & \mathbf{S}
\end{bmatrix}
\right\vert
\left\vert
\begin{bmatrix}
\mathbf{I}_p &
\boldsymbol{\Sigma}_{11}^{-1} \boldsymbol{\Sigma}_{21}^\top\\
\boldsymbol{0} & \mathbf{I}_q
\end{bmatrix}
\right\vert
& \quad & \det(AB) = \det(A) \det(A)\\
&= \left\vert
\begin{bmatrix}
\boldsymbol{\Sigma}_{11} & \boldsymbol{0}\\
\boldsymbol{0} & \mathbf{S}
\end{bmatrix}
\right\vert
& \quad & \det\left( \mathbf{T}_n \right) = \prod_{k = 1}^n a_{kk}\\
&= \left\vert \boldsymbol{\Sigma}_{11} \right\vert
\left\vert \mathbf{S} \right\vert
& \quad & \text{block matrix determinant property.}\end{split}\]
\[\begin{split}& Pr(\mathbf{x}_1)\\
&= \int Pr(\mathbf{x}_1, \mathbf{x}_2) d\mathbf{x}_2\\
&= \int
\frac{1}{
(2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{1 / 2}
}
\exp\left[ -0.5
\begin{bmatrix}
\mathbf{x}_1 - \boldsymbol{\mu}_1\\
\mathbf{x}_2 - \boldsymbol{\mu}_2
\end{bmatrix}^\top
\begin{bmatrix}
\boldsymbol{\Lambda}_{11} & \boldsymbol{\Lambda}_{21}^\top\\
\boldsymbol{\Lambda}_{21} & \boldsymbol{\Lambda}_{22}
\end{bmatrix}
\begin{bmatrix}
\mathbf{x}_1 - \boldsymbol{\mu}_1\\
\mathbf{x}_2 - \boldsymbol{\mu}_2
\end{bmatrix}
\right] d\mathbf{x}_2
& \quad & \Lambda = \Sigma^{-1} =
\begin{bmatrix}
\boldsymbol{\Lambda}_{11} & \boldsymbol{\Lambda}_{21}^\top\\
\boldsymbol{\Lambda}_{21} & \boldsymbol{\Lambda}_{22}
\end{bmatrix}\\
&= \int
\frac{1}{
(2 \pi)^{(p + q) / 2}
\left\vert \boldsymbol{\Sigma}_{11} \right\vert^{1 / 2}
\left\vert \mathbf{S} \right\vert^{1 / 2}
}
\exp\left[ -0.5
\left(
(\mathbf{x}_1 - \boldsymbol{\mu}_1)^\top \boldsymbol{\Lambda}_{11}
(\mathbf{x}_1 - \boldsymbol{\mu}_1) +
2 (\mathbf{x}_1 - \boldsymbol{\mu}_1)^\top
\boldsymbol{\Lambda}_{21}^\top
(\mathbf{x}_2 - \boldsymbol{\mu}_2) +
(\mathbf{x}_2 - \boldsymbol{\mu}_2)^\top
\boldsymbol{\Lambda}_{22} (\mathbf{x}_2 - \boldsymbol{\mu}_2)
\right)
\right] d\mathbf{x}_2\\
&= \int
\NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}
\right]
\frac{1}{
(2 \pi)^{q / 2}
\left\vert \mathbf{S} \right\vert^{1 / 2}
}
\exp\left[ -0.5
\left[
(\mathbf{x}_2 - \boldsymbol{\mu}_2) -
\boldsymbol{\Sigma}_{21} \boldsymbol{\Sigma}_{11}^{-1}
(\mathbf{x}_1 - \boldsymbol{\mu}_1)
\right]^\top
\mathbf{S}^{-1}
\left[
(\mathbf{x}_2 - \boldsymbol{\mu}_2) -
\boldsymbol{\Sigma}_{21} \boldsymbol{\Sigma}_{11}^{-1}
(\mathbf{x}_1 - \boldsymbol{\mu}_1)
\right]
\right] d\mathbf{x}_2\\
&= \NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}
\right]
\int
\NormDist_{\mathbf{x}_2}\left[
\boldsymbol{\mu}_2 +
\boldsymbol{\Sigma}_{21} \boldsymbol{\Sigma}_{11}^{-1}
(\mathbf{x}_1 - \boldsymbol{\mu}_1),
\mathbf{S}
\right] d\mathbf{x}_2\\
&= \NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}
\right]\end{split}\]
Exercise 5.4
This is true if and only if it satisfies the definition of matrix inverse:
\[M M^{-1} = M^{-1} M = I.\]
A simple way to show this is to decompose the block matrix \(M\) using the
Schur complement of \(D\) in \(M\). The upper, diagonal, and lower
triangular matrices cancels out.
Exercise 5.5
Another expression for \(\boldsymbol{\Sigma}^{-1}\) in
Exercise 5.3 is
\[\begin{split}\boldsymbol{\Sigma}^{-1}
&= \left(
\begin{bmatrix}
\mathbf{I}_p &
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}\\
\boldsymbol{0} & \mathbf{I}_q\\
\end{bmatrix}
\begin{bmatrix}
\mathbf{S} & \boldsymbol{0}\\
\boldsymbol{0} & \boldsymbol{\Sigma}_{22}
\end{bmatrix}
\begin{bmatrix}
\mathbf{I}_p & \boldsymbol{0}\\
\boldsymbol{\Sigma}_{22}^{-1} \boldsymbol{\Sigma}_{21} & \mathbf{I}_q
\end{bmatrix}
\right)^{-1}\\
&= \begin{bmatrix}
\mathbf{I}_p & \boldsymbol{0}\\
-\boldsymbol{\Sigma}_{22}^{-1} \boldsymbol{\Sigma}_{21} & \mathbf{I}_q
\end{bmatrix}
\begin{bmatrix}
\mathbf{S}^{-1} & \boldsymbol{0}\\
\boldsymbol{0} & \boldsymbol{\Sigma}_{22}^{-1}
\end{bmatrix}
\begin{bmatrix}
\mathbf{I}_p &
-\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}\\
\boldsymbol{0} & \mathbf{I}_q\\
\end{bmatrix}\\
&= \begin{bmatrix}
\mathbf{S}^{-1} &
-\mathbf{S}^{-1}
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}\\
-\boldsymbol{\Sigma}_{22}^{-1}
\boldsymbol{\Sigma}_{21} \mathbf{S}^{-1} &
\boldsymbol{\Sigma}_{22}^{-1} +
\boldsymbol{\Sigma}_{22}^{-1}
\boldsymbol{\Sigma}_{21}
\mathbf{S}^{-1}
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}
\end{bmatrix}\end{split}\]
where
\[\mathbf{S} =
\boldsymbol{\Sigma}_{11} -
\boldsymbol{\Sigma}_{21}^\top
\boldsymbol{\Sigma}_{22}^{-1}
\boldsymbol{\Sigma}_{21}\]
is the Schur complement of \(\boldsymbol{\Sigma}_{22}\) in
\(\boldsymbol{\Sigma}\). The determinant of \(\boldsymbol{\Sigma}\) is
simplified to
\[\left\vert \boldsymbol{\Sigma} \right\vert =
\left\vert \mathbf{S} \right\vert
\left\vert \boldsymbol{\Sigma}_{22} \right\vert.\]
Going through the same motions gives
\[\begin{split}& Pr(\mathbf{x}_1, \mathbf{x}_2)\\
&= \frac{1}{
(2 \pi)^{(p + q) / 2}
\left\vert \mathbf{S} \right\vert^{1 / 2}
\left\vert \boldsymbol{\Sigma}_{22} \right\vert^{1 / 2}
}
\exp\left[
\left(
\left( \mathbf{x}_1 - \boldsymbol{\mu}_1 \right)^\top
\boldsymbol{\Lambda}_{11}
\left( \mathbf{x}_1 - \boldsymbol{\mu}_1 \right) +
2 \left( \mathbf{x}_1 - \boldsymbol{\mu}_1 \right)^\top
\boldsymbol{\Lambda}_{21}^\top
\left( \mathbf{x}_2 - \boldsymbol{\mu}_2 \right) +
\left( \mathbf{x}_2 - \boldsymbol{\mu}_2 \right)^\top
\boldsymbol{\Lambda}_{22}
\left( \mathbf{x}_2 - \boldsymbol{\mu}_2 \right)
\right)
\right]^{-0.5}\\
&= \NormDist_{\mathbf{x}_2}\left[
\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_{22}
\right]
\frac{1}{(2 \pi)^{p / 2} \left\vert \mathbf{S} \right\vert^{1 / 2}}
\exp\left[
\left(
\left( \mathbf{x}_1 - \boldsymbol{\mu}_1 \right) -
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}
\left( \mathbf{x}_2 - \boldsymbol{\mu}_2 \right)
\right)^\top
\mathbf{S}^{-1}
\left(
\left( \mathbf{x}_1 - \boldsymbol{\mu}_1 \right) -
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}
\left( \mathbf{x}_2 - \boldsymbol{\mu}_2 \right)
\right)
\right]^{-0.5} d\mathbf{x}_2\\
&= \NormDist_{\mathbf{x}_2}\left[
\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_{22}
\right]
\NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1 +
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}
(\mathbf{x}_2 - \boldsymbol{\mu}_2),
\mathbf{S}
\right].\end{split}\]
Rearranging the equations using conditional probability (2.4) results in
\[Pr(\mathbf{x}_1 \mid \mathbf{x}_2) =
\frac{Pr(\mathbf{x}_1, \mathbf{x}_2)}{Pr(\mathbf{x}_2)} =
\NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1 +
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}
(\mathbf{x}_2 - \boldsymbol{\mu}_2),
\mathbf{S}
\right].\]
Exercise 5.6
When the covariance is diagonal (i.e. the individual variables are independent),
the off-diagonal elements (e.g. \(\boldsymbol{\Sigma}_{21}^\top\)) in
Exercise 5.5 will be zero. Thus
\[\begin{split}Pr(\mathbf{x}_1 \mid \mathbf{x}_2)
&= \NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1 +
\boldsymbol{\Sigma}_{21}^\top \boldsymbol{\Sigma}_{22}^{-1}
(\mathbf{x}_2 - \boldsymbol{\mu}_2),
\boldsymbol{\Sigma}_{11} -
\boldsymbol{\Sigma}_{21}^\top
\boldsymbol{\Sigma}_{22}^{-1} \boldsymbol{\Sigma}_{21}
\right]\\
&= \NormDist_{\mathbf{x}_1}\left[
\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}
\right]\\
&= Pr(\mathbf{x}_1).\end{split}\]
Exercise 5.7
Let \(x, a, b \in \mathbb{R}^D\) and
\(A, B \in \mathbb{R}^{D \times D}\).
\[\begin{split}& \NormDist_{x}[a, A] \NormDist_{x}[b, B]\\
&= \frac{1}{\left\vert 2 \pi A \right\vert^{1 / 2}}
\exp\left[ (x - a)^\top A^{-1} (x - a) \right]^{-0.5}
\frac{1}{\left\vert 2 \pi B \right\vert^{1 / 2}}
\exp\left[ (x - b)^\top B^{-1} (x - b) \right]^{-0.5}\\
&= \frac{1}{(2 \pi)^{D} \left\vert AB \right\vert^{1 / 2}}
\exp\left[
x^\top A^{-1} x - 2 x^\top A^{-1} a + a^\top A^{-1} a +
x^\top B^{-1} x - 2 x^\top B^{-1} b + b^\top B^{-1} b
\right]^{-0.5}\\
&= \frac{1}{(2 \pi)^{D} \left\vert AB \right\vert^{1 / 2}}
\exp\left[
x^\top (A^{-1} + B^{-1}) x - 2 x^\top (A^{-1} a + B^{-1} b) +
a^\top A^{-1} a + b^\top B^{-1} b
\right]^{-0.5}
& \quad & \text{rearrange terms to expose pattern}\\
&= \frac{1}{(2 \pi)^{D} \left\vert AB \right\vert^{1 / 2}}
\exp\left[
(x - \boldsymbol{\mu})^\top (A^{-1} + B^{-1}) (x - \boldsymbol{\mu}) -
\boldsymbol{\mu}^\top (A^{-1} + B^{-1}) \boldsymbol{\mu} +
a^\top A^{-1} a + b^\top B^{-1} b
\right]^{-0.5}
& \quad & \text{completing the square}\\
&= \frac{
\left\vert \boldsymbol{\Sigma} \right\vert^{1 / 2}
}{
(2 \pi)^{D / 2} \left\vert AB \right\vert^{1 / 2}
}
\exp\left[
a^\top A^{-1} a + b^\top B^{-1} b -
\boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}
\right]^{-0.5}
\NormDist_{x}[\boldsymbol{\mu}, \boldsymbol{\Sigma}]\\
&\propto \NormDist_{x}[\boldsymbol{\mu}, \boldsymbol{\Sigma}]\end{split}\]
where \(\boldsymbol{\mu} = \boldsymbol{\Sigma} (A^{-1} a + B^{-1} b)\) and
\(\boldsymbol{\Sigma} = (A^{-1} + B^{-1})^{-1}\).
Exercise 5.8
The results of Exercise 5.7 illustrate that
the new mean and variance are respectively
\[\mu =
\frac{
\sigma_1^{-2} \mu_1 + \sigma_2^{-2} \mu_2
}{
\sigma_1^{-2} + \sigma_2^{-2}
} =
a \mu_1 + b \mu_2
\quad \text{and} \quad
\sigma^2 = \frac{1}{\sigma_1^{-2} + \sigma_2^{-2}}\]
where \(a, b > 0\) and \(a + b = 1\).
Assuming \(\sigma_1^2, \sigma_2^2 > 0\), the following (applicable to both)
shows that the new variance is smaller than either of them:
\[\begin{split}\sigma_1^{-2} + \sigma_2^{-2} &> \sigma_1^{-2}\\
\sigma_1^2 &> (\sigma_1^{-2} + \sigma_2^{-2})^{-1}\\
&> \sigma^2.\end{split}\]
The variance proof is quite clever in the sense that you start by assuming
what you want (\(\sigma^2 < \sigma_1^2\)) and work backwards to reach some
kind of obviously true proposition
(\(\sigma_1^{-2} + \sigma_2^{-2} > \sigma_1^{-2}\)) under certain
assumptions. Then present the proof backwards!
Exercise 5.9
Exercise 5.7 states that
\[\kappa =
\frac{
\left\vert \boldsymbol{\Sigma} \right\vert^{1 / 2}
}{
(2 \pi)^{D / 2} \left\vert AB \right\vert^{1 / 2}
}
\exp\left[
\left(
a^\top A^{-1} a + b^\top B^{-1} b -
\boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}
\right)
\right]^{-0.5}.\]
Notice that
\[\begin{split}\frac{
\left\vert \boldsymbol{\Sigma} \right\vert^{1 / 2}
}{
\left\vert AB \right\vert^{1 / 2}
}
&= \left(
\left\vert AB \right\vert
\left\vert A^{-1} + B^{-1} \right\vert
\right)^{-1 / 2}\\
&= \left(
\left\vert A \right\vert
\left\vert A^{-1} + B^{-1} \right\vert
\left\vert B \right\vert
\right)^{-1 / 2}\\
&= \left(
\left\vert A (A^{-1} + B^{-1}) B \right\vert
\right)^{-1 / 2}\\
&= \left(
\left\vert A + B \right\vert
\right)^{-1 / 2}\end{split}\]
and
\[\begin{split}& \exp\left[
a^\top A^{-1} a + b^\top B^{-1} b -
\boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}
\right]^{-0.5}\\
&= \exp\left[
a^\top A^{-1} a + b^\top B^{-1} b -
a^\top A^{-1} \boldsymbol{\Sigma} A^{-1} a -
b^\top B^{-1} \boldsymbol{\Sigma} B^{-1} b -
2 a^\top A^{-1} \boldsymbol{\Sigma} B^{-1} b
\right]^{-0.5}\\
&= \exp\left[
a^\top A^{-1} a + b^\top B^{-1} b -
a^\top (A \boldsymbol{\Sigma}^{-1} A)^{-1} a -
b^\top (B \boldsymbol{\Sigma}^{-1} B)^{-1} b -
2 a^\top (B \boldsymbol{\Sigma}^{-1} A)^{-1} b
\right]^{-0.5}\\
&= \exp\left[
a^\top A^{-1} a + b^\top B^{-1} b -
a^\top (A + B)^{-1} B A^{-1} a -
b^\top (A + B)^{-1} A B^{-1} b -
2 a^\top (A + B)^{-1} b
\right]^{-0.5}\\
&= \exp\left[
a^\top \left( A^{-1} - (A + B)^{-1} B A^{-1} \right) a +
b^\top \left( B^{-1} - (A + B)^{-1} A B^{-1} \right) b -
2 a^\top (A + B)^{-1} b
\right]^{-0.5}\\
&= \exp\left[
a^\top (A + B)^{-1} a - 2 a^\top (A + B)^{-1} b + b^\top (A + B)^{-1} b
\right]^{-0.5}
& \quad & \text{(a)}\\
&= \exp\left[
(a - b) (A + B)^{-1} (a - b)
\right]^{-0.5}.\end{split}\]
Thus \(\kappa = \NormDist_{a}[b, A + B]\).
(a)
One approach to this solution is to assume the desired identities
\[\begin{split}A^{-1} - (A + B)^{-1} B A^{-1} &= (A + B)^{-1}\\
B^{-1} - (A + B)^{-1} A B^{-1} &= (A + B)^{-1}\end{split}\]
hold and try to solve for \((A + B)^{-1}\). This leads to the following
identities:
\[\begin{split}(A + B) A^{-1} &= I + B A^{-1}\\
A^{-1} &= (A + B)^{-1} (I + B A^{-1})\\\\
(A + B) B^{-1} &= A B^{-1} + I\\
B^{-1} &= (A + B)^{-1} (A B^{-1} + I).\end{split}\]
The purpose of the assumption is to derive some kind of obviously true
proposition and then work backwards. The solution in the book made use of the
clever observation that
\[a^\top A^{-1} a = a^\top (A + B)^{-1} (A + B) A^{-1} a =
a^\top (A + B)^{-1} a + a^\top (A + B)^{-1} B A^{-1} a\]
and
\[b^\top B^{-1} b = b^\top (A + B)^{-1} (A + B) B^{-1} b =
b^\top (A + B)^{-1} b + b^\top (A + B)^{-1} A B^{-1} b.\]
Exercise 5.10
Suppose \(x \in \mathbb{R}^n\), \(A \in \mathbb{R}^{n \times m}\),
\(y \in \mathbb{R}^m\), \(b \in \mathbb{R}^n\), and
\(\Sigma \in \mathbb{R}^{n \times n}\).
\[\begin{split}& \NormDist_x[Ay + b, \Sigma]\\
&= \frac{1}{\left\vert 2 \pi \Sigma \right\vert^{1 / 2}}
\exp\left[
(x - Ay - b)^\top \Sigma^{-1} (x - Ay - b)
\right]^{-0.5}\\
&= \frac{1}{(2 \pi)^{n / 2} \left\vert \Sigma \right\vert^{1 / 2}}
\exp\left[
x^\top \Sigma^{-1} x^\top - 2 x^\top \Sigma^{-1} A y -
2 x^\top \Sigma^{-1} b + y^\top A^\top \Sigma^{-1} Ay +
2 y^\top A^\top \Sigma^{-1} b + b^\top \Sigma^{-1} b
\right]^{-0.5}\\
&= \frac{1}{(2 \pi)^{n / 2} \left\vert \Sigma \right\vert^{1 / 2}}
\exp\left[
x^\top \Sigma^{-1} x^\top - 2 x^\top \Sigma^{-1} b +
b^\top \Sigma^{-1} b
\right]^{-0.5}
\exp\left[
y^\top A^\top \Sigma^{-1} A y -
2 y^\top A^\top \Sigma^{-1} (x - b)
\right]^{-0.5}\\
&= \kappa_1
\exp\left[
y^\top A^\top \Sigma^{-1} A y -
2 y^\top A^\top \Sigma^{-1} (x - b)
\right]^{-0.5}\\
&= \kappa_1
\exp\left[
\left(
y - \Sigma' A^\top \Sigma^{-1} (x - b)
\right)^\top
\Sigma'^{-1}
\left(
y - \Sigma' A^\top \Sigma^{-1} (x - b)
\right) -
(x - b)^\top \Sigma^{-1} A \Sigma' A^\top \Sigma^{-1} (x - b)
\right]^{-0.5}\\
&= \kappa_1
\exp\left[ (A' x + b')^\top \Sigma'^{-1} (A' x + b') \right]^{0.5}
\exp\left[
\left(
y - (A' x + b')
\right)^\top
\Sigma'^{-1}
\left(
y - (A' x + b')
\right)
\right]^{-0.5}\\
&= \kappa_2 \left\vert 2 \pi \Sigma' \right\vert^{1 / 2}
\NormDist_y[A' x + b', \Sigma']\\
&= \kappa \NormDist_y[A' x + b', \Sigma']\end{split}\]
where
\[\begin{split}\Sigma' &= (A^\top \Sigma^{-1} A)^{-1} \in \mathbb{R}^{m \times m}\\\\
A' &= \Sigma' A^\top \Sigma^{-1} \in \mathbb{R}^{m \times n}\\\\
b' &= -\Sigma' A^\top \Sigma^{-1} b \in \mathbb{R}^{m \times 1}\end{split}\]
\[\begin{split}\kappa
&= (2 \pi)^{(m - n) / 2}
\frac{
\left\vert \Sigma' \right\vert^{1 / 2}
}{
\left\vert \Sigma \right\vert^{1 / 2}
}
\exp\left[
x^\top \Sigma^{-1} x - 2 x^\top \Sigma^{-1} b + b^\top \Sigma^{-1} b
\right]^{-0.5}
\exp\left[
(A' x + b')^\top \Sigma'^{-1} (A' x + b')
\right]^{0.5}\\
&= (2 \pi)^{(m - n) / 2}
\frac{
\left\vert \Sigma' \right\vert^{1 / 2}
}{
\left\vert \Sigma \right\vert^{1 / 2}
}
\exp\left[
(x - b)^\top
\left(
\Sigma^{-1} - \Sigma^{-1} A \Sigma' A^\top \Sigma^{-1}
\right) (x - b)
\right]^{-0.5}.\end{split}\]
References
- SchonL11
Thomas B Schön and Fredrik Lindsten. Manipulating the multivariate gaussian density. Division of Automatic Control, Linköping University, Sweden, Tech. Rep, 2011.
- Wan
Ruye Wang. Marginal and conditional distributions of multivariate normal distribution. http://fourier.eng.hmc.edu/e161/lectures/gaussianprocess/node7.html. Accessed on 2017-06-11.