Temporal Models
Exercise 19.1
Suppose
\[\DeclareMathOperator{\NormDist}{Norm}
Pr(\mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1}) =
\NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_{t - 1}
\right].\]
\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})
&= \int
Pr(\mathbf{w}_t, \mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1})
d\mathbf{w}_{t - 1}
& \quad & \text{(2.1)}\\
&= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t - 1})
Pr(\mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1})
d\mathbf{w}_{t - 1}
& \quad & \text{Markov assumption}\\
&= \int \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_{t - 1},
\boldsymbol{\Sigma}_p
\right]
\NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_{t - 1}
\right]
d\mathbf{w}_{t - 1}
& \quad & \text{(19.6)}\\
&= \kappa_1 \kappa_2
\int \NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}'', \boldsymbol{\Sigma}''
\right]
d\mathbf{w}_{t - 1}
& \quad & \text{(a), (b)}\\
&= \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1},
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top
\right]
& \quad & \text{(c)}\\
&= \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_+, \boldsymbol{\Sigma}_+
\right]\end{split}\]
(a)
By Exercise 5.10,
\[\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_{t - 1},
\boldsymbol{\Sigma}_p
\right] =
\kappa_1 \NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t,
\boldsymbol{\Sigma}'
\right]\]
where
\[\begin{split}\boldsymbol{\Sigma}'
&= (\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\boldsymbol{\Psi})^{-1}
\\\\
\boldsymbol{\Psi}'
&= \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\\\\
\boldsymbol{\mu}'
&= -\boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\boldsymbol{\mu}_p
\\\\
\kappa_1
&= \frac{
\left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2}
}{
\left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2}
}
\exp\left[
-0.5
(\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_t - \boldsymbol{\mu}_p)
\right].\end{split}\]
(b)
By Exercise 5.7 and
Exercise 5.9,
\[\kappa_1 \NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t,
\boldsymbol{\Sigma}'
\right]
\NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_{t - 1}
\right] =
\kappa_1 \kappa_2 \NormDist_{\mathbf{w}_{t - 1}}\left[
\boldsymbol{\mu}'', \boldsymbol{\Sigma}''
\right]\]
where
\[\begin{split}\boldsymbol{\Sigma}''
&= \left(
{\boldsymbol{\Sigma}'}^{-1} + \boldsymbol{\Sigma}_{t - 1}^{-1}
\right)^{-1}
\\\\
\boldsymbol{\mu}''
&= \boldsymbol{\Sigma}''
\left(
{\boldsymbol{\Sigma}'}^{-1}
\left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t \right) +
\boldsymbol{\Sigma}_{t - 1}^{-1} \boldsymbol{\mu}_{t - 1}
\right)
\\\\
\kappa_2
&= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t}\left[
\boldsymbol{\mu}_{t - 1},
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right].\end{split}\]
(c)
\[\begin{split}& \kappa_1 \kappa_2\\
&= \frac{
\left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2}
}{
\left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2}
}
\exp\left[
(\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_t - \boldsymbol{\mu}_p)
\right]^{-0.5}
\NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t}\left[
\boldsymbol{\mu}_{t - 1},
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right]\\
&= \frac{
\left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2}
\exp\left[
(\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_t - \boldsymbol{\mu}_p) +
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right)^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)
\right]^{-0.5}
}{
(2 \pi)^{D / 2}
\left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2}
\left\vert
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right\vert^{1 / 2}
}\\
&= \frac{1}{
(2 \pi)^{D / 2}
\left\vert
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top
\right\vert^{1 / 2}
}
\exp\left[
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top
\right)^{-1}
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)
\right]^{-0.5}
& \quad & \text{(c.1), (c.2)}\end{split}\]
(c.1)
\[\begin{split}\frac{
\left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2}
}{
\left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2}
\left\vert
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right\vert^{1 / 2}
}
&= \left(
\left\vert
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right\vert
\left\vert \boldsymbol{\Sigma}_p \right\vert
\left\vert
\left(
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)^{-1} + \boldsymbol{\Sigma}_{t - 1}
\right\vert
\right)^{-1 / 2}
& \quad & \text{(C.11)}\\
&= \left(
\left\vert \boldsymbol{\Sigma}_p \right\vert
\left\vert
\mathbf{I} +
\left(
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right) \boldsymbol{\Sigma}_{t - 1}
\right\vert
\right)^{-1 / 2}
& \quad & \text{(C.10)}\\
&= \left(
\left\vert \boldsymbol{\Sigma}_p \right\vert
\left\vert
\mathbf{I} +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1}
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right\vert
\right)^{-1 / 2}
& \quad & \text{Sylvester's Determinant Theorem}\\
&= \left\vert
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top
\right\vert^{-1 / 2}
& \quad & \text{(C.10)}\end{split}\]
(c.2)
\[\begin{split}& \exp\left[
(\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_t - \boldsymbol{\mu}_p) +
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right)^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)
\right]^{-0.5}\\
&= \exp\left[
(\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_t - \boldsymbol{\mu}_p) +
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right)^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)
\right]^{-0.5}\\
&= \exp\left[
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}
\boldsymbol{\Psi}
\right)^{-1}
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)
\right]^{-0.5}
& \quad & \text{(c.3)}\\
&= \exp\left[
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top
\right)^{-1}
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)
\right]^{-0.5}
& \quad & \text{(C.61)}\end{split}\]
(c.3)
Notice that the summands be decomposed into
\[\begin{split}& (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_t - \boldsymbol{\mu}_p)\\
&= (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\boldsymbol{\Sigma}_p^{-1}
(\mathbf{w}_t - \boldsymbol{\mu}_p) -
(\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
(\mathbf{w}_t - \boldsymbol{\mu}_p)\\
&= (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\boldsymbol{\Sigma}_p^{-1}
(\mathbf{w}_t - \boldsymbol{\mu}_p) -
(\boldsymbol{\Psi}' \mathbf{w}_t + \boldsymbol{\mu}')^\top
{\boldsymbol{\Sigma}'}^{-1}
(\boldsymbol{\Psi}' \mathbf{w}_t + \boldsymbol{\mu}')\end{split}\]
and
\[\begin{split}& \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1}
\right)^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)\\
&= \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
{\boldsymbol{\Sigma}'}^{-1} -
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)^{-1}
{\boldsymbol{\Sigma}'}^{-1}
\right)
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)
& \quad & \text{(c.4)}\\
&= \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right) -
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)^{-1}
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)\\
&= \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right) -
\left(
\mathbf{w}_t - \boldsymbol{\mu} -
\boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)^\top
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)^{-1}
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\left(
\mathbf{w}_t - \boldsymbol{\mu} -
\boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right).\end{split}\]
Since
\[\begin{split}& \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)\\
&= \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t
\right)^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t
\right) -
\boldsymbol{\mu}_{t - 1}^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
2 \boldsymbol{\mu}' +
2 \boldsymbol{\Psi}' \mathbf{w}_t -
\boldsymbol{\mu}_{t - 1}
\right)\\
&= \left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t
\right)^\top
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t
\right) -
\boldsymbol{\mu}_{t - 1}^\top \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}
\left(
2 \mathbf{w}_t -
2 \boldsymbol{\mu} -
\boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)\end{split}\]
and
\[\begin{split}& (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top
\boldsymbol{\Sigma}_p^{-1}
(\mathbf{w}_t - \boldsymbol{\mu}_p) -
\boldsymbol{\mu}_{t - 1}^\top \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}
\left(
2 \mathbf{w}_t -
2 \boldsymbol{\mu} -
\boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)\\
&= \left(
\mathbf{w}_t - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)^\top
\boldsymbol{\Sigma}_p^{-1}
\left(
\mathbf{w}_t - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right),\end{split}\]
the sum of the original summands is
\[\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}
\boldsymbol{\Psi}
\right)^{-1}
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
\left(
\mathbf{w}_t -
\boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}
\right).\]
(c.4)
See Exercise 5.9 for more details.
\[\begin{split}\left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1}
&= {\boldsymbol{\Sigma}'}^{-1} -
\left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1}
\boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}\\
&= {\boldsymbol{\Sigma}'}^{-1} -
{\boldsymbol{\Sigma}'}^{-1}
\left(
\mathbf{I} + \boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}
\right)^{-1}
\boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}\\
&= {\boldsymbol{\Sigma}'}^{-1} -
{\boldsymbol{\Sigma}'}^{-1}
\left[
\boldsymbol{\Sigma}_{t - 1}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)
\right]^{-1}
\boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}\\
&= {\boldsymbol{\Sigma}'}^{-1} -
{\boldsymbol{\Sigma}'}^{-1}
\left(
\boldsymbol{\Sigma}_{t - 1}^{-1} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)^{-1}
{\boldsymbol{\Sigma}'}^{-1}\end{split}\]
Exercise 19.2
\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
&= \frac{
Pr(\mathbf{w}_t, \mathbf{x}_{1 \ldots t})
}{
Pr(\mathbf{x}_{1 \ldots t})
}\\
&= \frac{
Pr(\mathbf{x}_t \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})
Pr(\mathbf{x}_{1 \ldots t - 1})
}{
Pr(\mathbf{x}_t \mid \mathbf{x}_{1 \ldots t - 1})
Pr(\mathbf{x}_{1 \ldots t - 1})
}\\
&= \frac{
Pr(\mathbf{x}_t \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})
}{
\int Pr(\mathbf{x}_t, \mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})
d\mathbf{w}_t
}\\
&= \frac{
\NormDist_{\mathbf{x}_t}\left[
\boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_t,
\boldsymbol{\Sigma}_m
\right]
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_+,
\boldsymbol{\Sigma}_+
\right]
}{
\int Pr(\mathbf{x}_t \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_t
}
& \quad & \text{(19.8), (19.9)}\\
&= \frac{
\kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_t,
\boldsymbol{\Sigma}_t
\right]
}{
\int \kappa_1 \kappa_2
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t
\right] d\mathbf{w}_t
}
& \quad & \text{(a), (b)}\\
&= \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_t,
\boldsymbol{\Sigma}_t
\right]\end{split}\]
(a)
Suppose \(\mathbf{x}_\cdot \in \mathbb{R}^n\) and
\(\mathbf{w}_\cdot \in \mathbb{R}^m\). By
Exercise 5.10,
\[\NormDist_{\mathbf{x}_t}\left[
\boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_t,
\boldsymbol{\Sigma}_m
\right] =
\kappa_1 \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t,
\boldsymbol{\Sigma}'
\right]\]
where
\[\begin{split}\boldsymbol{\Sigma}'
&= \left(
\boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi}
\right)^{-1}
\\\\
\boldsymbol{\Phi}'
&= \boldsymbol{\Sigma}' \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1}
\\\\
\boldsymbol{\mu}'
&= -\boldsymbol{\Sigma}' \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1}
\boldsymbol{\mu}_m
\\\\
\kappa_1
&= (2 \pi)^{(m - n) / 2}
\frac{
\left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2}
}{
\left\vert \boldsymbol{\Sigma}_m \right\vert^{1 / 2}
}
\exp\left[
-0.5
(\mathbf{x}_t - \boldsymbol{\mu}_m)^\top
\left(
\boldsymbol{\Sigma}_m^{-1} -
\boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} \boldsymbol{\Sigma}'
\boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1}
\right)
(\mathbf{x}_t - \boldsymbol{\mu}_m)
\right].\end{split}\]
(b)
By Exercise 5.7 and
Exercise 5.9,
\[\kappa_1 \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t,
\boldsymbol{\Sigma}'
\right]
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_+, \boldsymbol{\Sigma}_+
\right] =
\kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t
\right]\]
where
\[\begin{split}\boldsymbol{\Sigma}_t
&= \left(
{\boldsymbol{\Sigma}'}^{-1} + \boldsymbol{\Sigma}_+^{-1}
\right)^{-1}
= \left(
\boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} +
\boldsymbol{\Sigma}_+^{-1}
\right)^{-1}
\\\\
\boldsymbol{\mu}_t
&= \boldsymbol{\Sigma}_t
\left(
{\boldsymbol{\Sigma}'}^{-1} \left(
\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t
\right) +
\boldsymbol{\Sigma}_+^{-1} \boldsymbol{\mu}_+
\right)
= \boldsymbol{\Sigma}_t
\left(
\boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \left(
\mathbf{x}_t - \boldsymbol{\mu}_m
\right) +
\boldsymbol{\Sigma}_+^{-1} \boldsymbol{\mu}_+
\right)
\\\\
\kappa_2
&= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t}\left[
\boldsymbol{\mu}_+,
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_+
\right].\end{split}\]
Exercise 19.3
\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
&= \frac{
\NormDist_{\mathbf{x}_t}\left[
\boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_t,
\boldsymbol{\Sigma}_m
\right]
\sum_{k = 1}^K \lambda_k
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{+k},
\boldsymbol{\Sigma}_{+k}
\right]
}{
\int Pr(\mathbf{x}_t \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})
d\mathbf{w}_t
}
& \quad & \text{(19.8) and Exercise 19.2}\\
&= \frac{
\kappa \sum_{k = 1}^K \kappa_k \lambda_k \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk},
\boldsymbol{\Sigma}_{tk}
\right]
}{
\kappa \sum_{k = 1}^K \kappa_{k} \lambda_k
}
& \quad & \text{(a), (b)}\\
&= \sum_{k = 1}^K \lambda'_k \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk},
\boldsymbol{\Sigma}_{tk}
\right]
& \quad & \lambda'_k =
\frac{
\kappa_k \lambda_k
}{
\sum_{k' = 1}^K \kappa_{k'} \lambda_{k'}
}.\end{split}\]
See Exercise 19.2 for more details.
In the next time update step, the prediction becomes
\[\begin{split}Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots t})
&= \int Pr(\mathbf{w}_{t + 1}, \mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
d\mathbf{w}_t
& \quad & \text{(2.1)}\\
&= \int Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) d\mathbf{w}_t
& \quad & \text{Markov assumption}\\
&= \int \NormDist_{\mathbf{w}_{t + 1}}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t,
\boldsymbol{\Sigma}_p
\right]
\sum_{k = 1}^K \lambda'_k
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk}
\right] d\mathbf{w}_t
& \quad & \text{(19.6) and Exercise 19.1}\\
&= \sum_{k = 1}^K \lambda'_k \int
\NormDist_{\mathbf{w}_{t + 1}}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t,
\boldsymbol{\Sigma}_p
\right]
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk}
\right] d\mathbf{w}_t
& \quad & \text{sum rule in integration}\\
&= \sum_{k = 1}^K \lambda'_k
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \boldsymbol{\mu}_{tk},
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_{tk} \boldsymbol{\Psi}^\top
\right]
& \quad & \text{(c) from Exercise 19.1}\\
&= \sum_{k = 1}^K \lambda'_k
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{+k}, \boldsymbol{\Sigma}_{+k}
\right].\end{split}\]
See Exercise 19.1 for more details.
(a)
By (a) and (b) from Exercise 19.2,
\[\kappa \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t,
\boldsymbol{\Sigma}'
\right]
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{+k}, \boldsymbol{\Sigma}_{+k}
\right] =
\kappa \kappa_{k} \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk}
\right]\]
where
\[\begin{split}\boldsymbol{\Sigma}_{tk}
&= \left(
{\boldsymbol{\Sigma}'}^{-1} +
\boldsymbol{\Sigma}_{+k}^{-1}
\right)^{-1}
= \left(
\boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} +
\boldsymbol{\Sigma}_{+k}^{-1}
\right)^{-1}
\\\\
\boldsymbol{\mu}_{tk}
&= \boldsymbol{\Sigma}_{tk}
\left(
{\boldsymbol{\Sigma}'}^{-1} \left(
\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t
\right) +
\boldsymbol{\Sigma}_{+k}^{-1} \boldsymbol{\mu}_{+k}
\right)
= \boldsymbol{\Sigma}_t
\left(
\boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \left(
\mathbf{x}_t - \boldsymbol{\mu}_m
\right) +
\boldsymbol{\Sigma}_{+k}^{-1} \boldsymbol{\mu}_{+k}
\right)
\\\\
\kappa_{k}
&= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t}\left[
\boldsymbol{\mu}_{+k},
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{+k}
\right].\end{split}\]
(b)
\[\begin{split}\int \kappa \sum_{k = 1}^K \kappa_{k} \lambda_k
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk}
\right] d\mathbf{w}_t
&= \kappa \sum_{k = 1}^K \int \kappa_{k} \lambda_k
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk}
\right] d\mathbf{w}_t
& \quad & \text{sum rule in integration}\\
&= \kappa \sum_{k = 1}^K \kappa_{k} \lambda_k
& \quad & \text{sum rule in integration}\end{split}\]
Exercise 19.4
The max-marginals inference is essentially (10.16):
\[\DeclareMathOperator*{\argmax}{arg\,max}
\hat{\mathbf{w}} =
\argmax_{\mathbf{w}_t} Pr(\mathbf{w}_t \mid \mathbf{w}_{t - 1}).\]
The temporal model could still be (19.5) where
\(\boldsymbol{\Psi} = \boldsymbol{\Psi}_1\) or
\(\boldsymbol{\Psi} = \boldsymbol{\Psi}_2\).
A simple strategy could be to choose the state transition matrix that maximizes
the current time step [GH00].
Exercise 19.5
The joint posterior distribution can be factorized into an HMM (11.1), which
can be solved in \(\mathcal{O}(TK^2)\) using the Viterbi algorithm where
\(K\) is the number of possible states (see
Exercise 11.2 for more details).
In the Kalman filter, \(T\) grows as more measurements are taken, so
computing the marginal posteriors is preferred because it can be solved for in
closed form.
Exercise 19.6
The following are based on Section 11.4.4 and [Sch].
The forward pass starts with
\[\mathbf{m}_{\mathbf{x}_1 \rightarrow g_1} =
\delta[\mathbf{x}_1^*]
\qquad \text{(11.36).}\]
The message is then forwarded as
\[\begin{split}\mathbf{m}_{g_1 \rightarrow \mathbf{w}_1}
&= \int Pr(\mathbf{x}_1 \mid \mathbf{w}_1)
\delta\left[ \mathbf{x}_1^* \right] d\mathbf{x}_1\\
&= Pr(\mathbf{x}_1 = \mathbf{x}_1^* \mid \mathbf{w}_1)
& \quad & \text{(11.37).}\end{split}\]
Generalizing the message yields the measurement model
\[\mathbf{m}_{g_t \rightarrow \mathbf{w}_t} =
Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t)
\qquad \text{(19.8).}\]
At time step \(t = 1\), the result is arbitrary as suggested in the
paragraph after (19.16) where
\[\begin{split}Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t)
&= \frac{
Pr(\mathbf{x}_t = \mathbf{x}_t^*, \mathbf{w}_t)
}{
Pr(\mathbf{w}_t)
}\\
&= \frac{
Pr(\mathbf{w}_t \mid \mathbf{x}_t = \mathbf{x}_t^*)
Pr(\mathbf{x}_t = \mathbf{x}_t^*)
}{
Pr(\mathbf{w}_t)
}\\
Pr(\mathbf{w}_t \mid \mathbf{x}_t = \mathbf{x}_t^*)
&= \frac{
Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t) Pr(\mathbf{w}_t)
}{
Pr(\mathbf{x}_t = \mathbf{x}_t^*)
}
& \quad & \text{(19.1).}\end{split}\]
This means the first hidden variable adds prior information and forwards the
message normalized as
\[\begin{split}\mathbf{m}_{\mathbf{w}_1 \rightarrow g_{12}}
&= \mathbf{m}_{g_1 \rightarrow \mathbf{w}_1}
\frac{Pr(\mathbf{w}_1)}{Pr(\mathbf{x}_1 = \mathbf{x}_t^*)}\\
&= \frac{
Pr(\mathbf{x}_1 = \mathbf{x}_1^* \mid \mathbf{w}_1) Pr(\mathbf{w}_1)
}{
Pr(\mathbf{x}_1 = \mathbf{x}_1^*)
}\\
&= Pr(\mathbf{w}_1 \mid \mathbf{x}_1 = \mathbf{x}_1^*)
& \quad & \text{(11.35).}\end{split}\]
Generalizing what the function node (at \(t > 1\)) forwards yields the
prediction step
\[\begin{split}\mathbf{m}_{g_{t - 1, t} \rightarrow \mathbf{w}_t}
&= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t - 1})
Pr(\mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1})
d\mathbf{w}_{t - 1}\\
&= Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})
& \quad & \text{(11.37), (19.9).}\end{split}\]
Generalizing what the unobserved variable (at \(t > 1\)) forwards yields the
measurement incorporation step
\[\begin{split}\mathbf{m}_{\mathbf{w}_t \rightarrow g_{t, t + 1}}
&= \frac{
\mathbf{m}_{g_{t} \rightarrow \mathbf{w}_t}
\mathbf{m}_{g_{t - 1, t} \rightarrow \mathbf{w}_t}
}{
Pr(\mathbf{x}_{1 \ldots t})
}\\
&= Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})\\
&= Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
& \quad & \text{(11.35), (19.10).}\end{split}\]
Notice that the backward pass is not needed because the forward pass propagates
normalized messages.
Exercise 19.7
By inspection, the fixed interval smoother occurs after the Kalman filter i.e.
wait until \(T\) observations have been made and then retrospectively
calculate \(Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T})\) for
\(t < T\).
The base case of this inductive proof is
\[\begin{split}Pr(\mathbf{w}_T \mid \mathbf{x}_{1 \ldots T})
&= \frac{
Pr(\mathbf{x}_T \mid \mathbf{w}_T)
Pr(\mathbf{w}_T \mid \mathbf{x}_{1 \ldots T - 1})
}{
Pr(\mathbf{x}_{1 \ldots T})
}\\
&= \NormDist_{\mathbf{w}_T}\left[
\boldsymbol{\mu}_{T \mid T}, \boldsymbol{\Sigma}_{T \mid T}
\right]
& \quad & \text{(19.10).}\end{split}\]
Insights from [Fle] suggest that the D-separation should be
invoked. The inductive step is then
\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T})
&= \int Pr(\mathbf{w}_{t + 1}, \mathbf{w}_t \mid \mathbf{x}_{1 \ldots T})
d\mathbf{w}_{t + 1}
& \quad & \text{(2.1)}\\
&= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots T})
Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots T})
d\mathbf{w}_{t + 1}
& \quad & \text{(2.6) with Markov assumption}\\
&= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t})
Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots T})
d\mathbf{w}_{t + 1}
& \quad & \text{D-separation}\\
&= \int
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1}
\right]
\NormDist_{\mathbf{w}_{t + 1}}\left[
\boldsymbol{\mu}_{t + 1 \mid T}, \boldsymbol{\Sigma}_{t + 1 \mid T}
\right]
d\mathbf{w}_{t + 1}
& \quad & \text{(a)}\\
&= \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{t \mid T},
\boldsymbol{\Sigma}_{t \mid T}
\right]
& \quad & \text{(b).}\end{split}\]
(a)
\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t})
&= \frac{
Pr(\mathbf{w}_t, \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t})
}{
Pr(\mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t})
}
& \quad & \text{(2.4)}\\
&= \frac{
Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
Pr(\mathbf{x}_{1 \ldots t})
}{
Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots t})
Pr(\mathbf{x}_{1 \ldots t})
}
& \quad & \text{(2.5)}\\
&= \frac{
Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
}{
\int Pr(\mathbf{w}_t, \mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots t})
d\mathbf{w}_{t}
}
& \quad & \text{(2.1)}\\
&= \frac{
Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
}{
\int Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) d\mathbf{w}_{t}
}
& \quad & \text{(2.6) with Markov assumption}\\
&= \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1}
\right]
& \quad & \text{(a.1)}\end{split}\]
(a.1)
\[\begin{split}Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t)
Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t})
&= \NormDist_{\mathbf{w}_{t + 1}}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t,
\boldsymbol{\Sigma}_p
\right]
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t
\right]
& \quad & \text{(19.6), (19.10)}\\
&= \kappa_1 \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1},
\boldsymbol{\Sigma}'
\right]
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t
\right]
& \quad & \text{(a.2)}\\
&= \kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1}
\right]
& \quad & \text{Exercise 5.7 and 5.9}\end{split}\]
where
\[\begin{split}\boldsymbol{\Sigma}'_{t + 1}
&= \left(
\boldsymbol{\Sigma}'^{-1} + \boldsymbol{\Sigma}_t^{-1}
\right)^{-1}\\
&= \left(
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} +
\boldsymbol{\Sigma}_t^{-1}
\right)^{-1}
\\\\
\boldsymbol{\mu}'_{t + 1}
&= \boldsymbol{\Sigma}'_{t + 1}
\left(
\boldsymbol{\Sigma}'^{-1}
\left(
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1}
\right) +
\boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t
\right)\\
&= \boldsymbol{\Sigma}'_{t + 1}
\left(
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_p
\right) +
\boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t
\right)\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \mathbf{w}_{t + 1} -
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p +
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Sigma}_t^{-1}
\boldsymbol{\mu}_t
\\\\
\kappa_2
&= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1}}
\left[
\boldsymbol{\mu}_t,
\boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_t
\right].\end{split}\]
See Exercise 5.7 and
Exercise 5.9 for more details.
(a.2)
By Exercise 5.10,
\[\NormDist_{\mathbf{w}_{t + 1}}\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t,
\boldsymbol{\Sigma}_p
\right] =
\kappa_1 \text{Norm}_{\mathbf{w}_t}\left[
\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1},
\boldsymbol{\Sigma}'
\right]\]
where
\[\begin{split}\boldsymbol{\Sigma}'
&= \left(
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\right)^{-1}
\\\\
\boldsymbol{\Psi}'
&= \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\\\\
\boldsymbol{\mu}'
&= -\boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\boldsymbol{\mu}_p
\\\\
\kappa_1
&= \frac{
\left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2}
}{
\left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2}
}
\exp\left[
-0.5
(\mathbf{w}_{t + 1} - \boldsymbol{\mu}_p)^\top
\left(
\boldsymbol{\Sigma}_p^{-1} -
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}'
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}
\right)
(\mathbf{w}_{t + 1} - \boldsymbol{\mu}_p)
\right].\end{split}\]
(b)
The generative equations for the distributions from
\[Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T}) =
\int Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t})
Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots T}) d\mathbf{w}_{t + 1}\]
are
\[\begin{split}\mathbf{w}_t
&= \boldsymbol{\mu}'_{t + 1} + \boldsymbol{\epsilon}_{t + 1}\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \mathbf{w}_{t + 1} -
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p +
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t +
\boldsymbol{\epsilon}_{t + 1}
& \quad & \text{(a.1)}\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \mathbf{w}_{t + 1} -
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p +
\left(
\boldsymbol{\Sigma}_t -
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}_t
\right) \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t +
\boldsymbol{\epsilon}_{t + 1}
& \quad & \text{Exercise 5.9 (a)}\\
&= \boldsymbol{\mu}_t +
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \boldsymbol{\mu}_t
\right) +
\boldsymbol{\epsilon}_{t + 1}\\
&= \boldsymbol{\mu}_t +
\mathbf{C}_t
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{+ \mid t + 1}
\right) +
\boldsymbol{\epsilon}_{t + 1}
& \quad & \text{(b.1) and (19.9)}\end{split}\]
and
\[\mathbf{w}_{t + 1} =
\boldsymbol{\mu}_{t + 1 \mid T} + \boldsymbol{\epsilon}_{t + 1 \mid T}\]
where
\[\begin{split}\DeclareMathOperator{\Cov}{\mathrm{Cov}}
\DeclareMathOperator{\E}{\mathrm{E}}
\E[\boldsymbol{\epsilon}_{t + 1}]
&= \E[\boldsymbol{\epsilon}_{t + 1 \mid T}]
= \boldsymbol{0}
\\\\
\Cov(\boldsymbol{\epsilon}_{t + 1}, \boldsymbol{\epsilon}_{t + 1})
&= \E\left[
\left(
\boldsymbol{\epsilon}_{t + 1} -
\E[\boldsymbol{\epsilon}_{t + 1}]
\right)
\left(
\boldsymbol{\epsilon}_{t + 1} -
\E[\boldsymbol{\epsilon}_{t + 1}]
\right)^\top
\right]
= \E\left[
\boldsymbol{\epsilon}_{t + 1} \boldsymbol{\epsilon}_{t + 1}^\top
\right] -
\E[\boldsymbol{\epsilon}_{t + 1}] \E[\boldsymbol{\epsilon}_{t + 1}]^\top
= \boldsymbol{\Sigma}'_{t + 1}
\\\\
\Cov\left(
\boldsymbol{\epsilon}_{t + 1 \mid T},
\boldsymbol{\epsilon}_{t + 1 \mid T}
\right)
&= \E\left[
\boldsymbol{\epsilon}_{t + 1 \mid T}
\boldsymbol{\epsilon}_{t + 1 \mid T}^\top
\right] -
\E[\boldsymbol{\epsilon}_{t + 1 \mid T}]
\E[\boldsymbol{\epsilon}_{t + 1 \mid T}]^\top
= \boldsymbol{\Sigma}_{t + 1 \mid T}
\\\\
\Cov(\boldsymbol{\epsilon}_{t + 1}, \boldsymbol{\epsilon}_{t + 1 \mid T})
&= \boldsymbol{0},\end{split}\]
which implies
\[\Cov(\mathbf{w}_{t + 1}, \boldsymbol{\epsilon}_{t + 1}) =
\boldsymbol{0}.\]
These assumptions result in
\[Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T}) =
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}_{t \mid T},
\boldsymbol{\Sigma}_{t \mid T}
\right]
\qquad \text{(b.3), (b.4).}\]
(b.1)
\[\begin{split}\mathbf{C}_t
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}\\
&= \left(
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} +
\boldsymbol{\Sigma}_t^{-1}
\right)^{-1}
\boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}\\
&= \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top
\left(
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi} \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top
\right)^{-1}\\
&= \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_{+ \mid t + 1}^{-1}
& \quad & \text{(19.9)}\end{split}\]
To simplify notations, define \(A = \boldsymbol{\Sigma}_p\) and
\(B = \boldsymbol{\Psi} \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top\).
By Exercise 5.9 (a),
\(\boldsymbol{\Sigma}_t =
\boldsymbol{\Sigma}'_{t + 1}
\left(
\mathbf{I} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\boldsymbol{\Sigma}_t
\right)\).
\[\begin{split}\boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top (A + B)^{-1}
&= \boldsymbol{\Sigma}'_{t + 1}
\left(
\mathbf{I} +
\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi}
\boldsymbol{\Sigma}_t
\right)
\boldsymbol{\Psi}^\top
(A + B)^{-1}\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top (A + B)^{-1} +
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
A^{-1} B (A + B)^{-1}\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top (A + B)^{-1} +
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top A^{-1} B
\left( B^{-1} - (A + B)^{-1} A B^{-1} \right)
& \quad & \text{Exercise 5.9 (a)}\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top A^{-1} +
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top (A + B)^{-1} -
\boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
A^{-1} B (A + B)^{-1} A B^{-1}\\
&= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top
\boldsymbol{\Sigma}_p^{-1}
& \quad & \text{(b.2)}\end{split}\]
(b.2)
\[\begin{split}A^{-1} B (A + B)^{-1} A B^{-1}
&= A^{-1} B \left( A^{-1} - (A + B)^{-1} B A^{-1} \right) A B^{-1}
& \quad & \text{Exercise 5.9 (a)}\\
&= A^{-1} B A^{-1} A B^{-1} - A^{-1} B (A + B)^{-1} B A^{-1} A B^{-1}\\
&= A^{-1} - A^{-1} B (A + B)^{-1}\\
A^{-1} B (A + B)^{-1} A B^{-1} (A + B)
&= \left( A^{-1} - A^{-1} B (A + B)^{-1} \right) (A + B)\\
&= A^{-1} (A + B) - A^{-1} B\\
&= \mathbf{I}\\
A^{-1} B (A + B)^{-1} A B^{-1}
&= (A + B)^{-1}\end{split}\]
(b.3)
\[\begin{split}\boldsymbol{\mu}_{t \mid T}
&= \E[\mathbf{w}_t]\\
&= \boldsymbol{\mu}_t +
\mathbf{C}_t
\left(
\E[\mathbf{w}_{t + 1}] - \boldsymbol{\mu}_{+ \mid t + 1}
\right) +
\E[\boldsymbol{\epsilon}_{t + 1}]
& \quad & \text{(2.14), (2.15), (2.16)}\\
&= \boldsymbol{\mu}_t +
\mathbf{C}_t \left(
\boldsymbol{\mu}_{t + 1 \mid T} - \boldsymbol{\mu}_{+ \mid t + 1}
\right)\end{split}\]
(b.4)
\[\begin{split}\boldsymbol{\Sigma}_{t \mid T}
&= \Cov(\mathbf{w}_t, \mathbf{w}_t)\\
&= \E\left[
\left(
\mathbf{w}_t - \E[\mathbf{w}_t]
\right)
\left(
\mathbf{w}_t - \E[\mathbf{w}_t]
\right)^\top
\right]\\
&= \E\left[
\left(
\mathbf{C}_t
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T}
\right) +
\boldsymbol{\epsilon}_{t + 1}
\right)
\left(
\mathbf{C}_t
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T}
\right) +
\boldsymbol{\epsilon}_{t + 1}
\right)^\top
\right]\\
&= \mathbf{C}_t \E\left[
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T}
\right)
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T}
\right)^\top
\right] \mathbf{C}_t^\top +
\mathbf{C}_t
\E\left[
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T}
\right) \boldsymbol{\epsilon}_{t + 1}^\top
\right] +
\E\left[
\boldsymbol{\epsilon}_{t + 1}
\left(
\mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T}
\right)^\top
\right] \mathbf{C}_t^\top +
\E\left[
\boldsymbol{\epsilon}_{t + 1} \boldsymbol{\epsilon}_{t + 1}^\top
\right]\\
&= \mathbf{C}_t \boldsymbol{\Sigma}_{t + 1 \mid T} \mathbf{C}_t^\top +
\boldsymbol{\Sigma}'_{t + 1}\\
&= \mathbf{C}_t \boldsymbol{\Sigma}_{t + 1 \mid T} \mathbf{C}_t^\top +
\left(
\boldsymbol{\Sigma}_t -
\boldsymbol{\Sigma}_t \boldsymbol{\Psi}_t^\top
\left(
\boldsymbol{\Sigma}_p +
\boldsymbol{\Psi}_t \boldsymbol{\Sigma}_t \boldsymbol{\Psi}_t^\top
\right)^{-1}
\boldsymbol{\Psi}_t \boldsymbol{\Sigma}_t
\right)
& \quad & \text{(C.61)}\\
&= \boldsymbol{\Sigma}_t +
\mathbf{C}_t \boldsymbol{\Sigma}_{t + 1 \mid T} \mathbf{C}_t^\top -
\mathbf{C}_t
\boldsymbol{\Sigma}_{+ \mid t + 1}^\top \mathbf{C}_t^\top
& \quad & \text{(b.1)}\\
&= \boldsymbol{\Sigma}_t +
\mathbf{C}_t
\left(
\boldsymbol{\Sigma}_{t + 1 \mid T} -
\boldsymbol{\Sigma}_{+ \mid t + 1}
\right)
\mathbf{C}_t^\top\end{split}\]
Exercise 19.8
The graphical model for the Kalman filter is
\[Pr(\{ \mathbf{x}_n \}_{n = 1}^N, \{ \mathbf{w}_n \}_{n = 1}^N) =
\left( \prod_{n = 1}^N Pr(\mathbf{x}_n \mid \mathbf{w}_n) \right)
\left( \prod_{n = 2}^N Pr(\mathbf{w}_n \mid \mathbf{w}_{n - 1}) \right)
Pr(\mathbf{w}_1)
\qquad \text{(10.19), (11.1).}\]
[Arc] is good for verifying this previous result and
Exercise 19.7. [Mac]
could also serve to verify the results of this exercise.
Note that [BMM96][MBM96] are useless; one should not
even consider wasting their time to skim these papers. Just read the book’s
explanations instead.
(i)
This supervised learning scenario is a fully observed Markov model
([JB01]) i.e. the training set consists of \(I\)
matched sets of states \(\{ \mathbf{w}_{in} \}_{i = 1, n = 1}^{I, N}\) and
measurements \(\{ \mathbf{x}_{in} \}_{i = 1, n = 1}^{I, N}\).
Maximum likelihood (or another technique like maximum a posteriori and the
Bayesian approach) can be applied to fit the parameters
\(\boldsymbol{\theta} =
\left\{
\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0,
\boldsymbol{\mu}_p, \boldsymbol{\Sigma}_p, \boldsymbol{\Psi},
\boldsymbol{\mu}_m, \boldsymbol{\Sigma}_m, \boldsymbol{\Phi}
\right\}\) to the data:
\[\begin{split}\hat{\boldsymbol{\theta}}
&= \argmax_{\boldsymbol{\theta}} \prod_{i = 1}^I
Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N,
\{ \mathbf{w}_{in} \}_{n = 1}^N \mid \boldsymbol{\theta})
& \quad & \text{(10.21)}\\
&= \argmax_{\boldsymbol{\theta}}
\sum_{i = 1}^I
\log Pr(\mathbf{w}_{i1} \mid \boldsymbol{\theta}) +
\sum_{n = 1}^N
\log Pr(\mathbf{x}_{in} \mid
\mathbf{w}_{in}, \boldsymbol{\theta}) +
\sum_{n = 2}^N
\log Pr(\mathbf{w}_{in} \mid
\mathbf{w}_{i(n - 1)}, \boldsymbol{\theta})\\
&= \argmax_{\boldsymbol{\theta}}
\sum_{i = 1}^I
\log \NormDist_{\mathbf{w}_{i1}}\left[
\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0
\right] +
\sum_{n = 1}^N
\log \NormDist_{\mathbf{x}_{in}}
\left[
\boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_{in},
\boldsymbol{\Sigma}_m
\right] +
\sum_{n = 2}^N
\log \NormDist_{\mathbf{w}_{in}}
\left[
\boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)},
\boldsymbol{\Sigma}_p
\right]
& \quad & \text{(i.a), (19.6), (19.8)}\\
&= \argmax_{\boldsymbol{\theta}}
-\frac{1}{2} \sum_{i = 1}^I
D_w \log 2 \pi +
\log \left\vert \boldsymbol{\Sigma}_0 \right\vert +
\left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)^\top
\boldsymbol{\Sigma}_0^{-1}
\left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right) +\\
&\qquad
\sum_{n = 1}^N
D_m \log 2 \pi +
\log \left\vert \boldsymbol{\Sigma}_m \right\vert +
\left(
\mathbf{x}_{in} - \boldsymbol{\mu}_m -
\boldsymbol{\Phi} \mathbf{w}_{in}
\right)^\top
\boldsymbol{\Sigma}_m^{-1}
\left(
\mathbf{x}_{in} - \boldsymbol{\mu}_m -
\boldsymbol{\Phi} \mathbf{w}_{in}
\right) +\\
&\qquad
\sum_{n = 2}^N
D_p \log 2 \pi +
\log \left\vert \boldsymbol{\Sigma}_p \right\vert +
\left(
\mathbf{w}_{in} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}
\right)^\top
\boldsymbol{\Sigma}_p^{-1}
\left(
\mathbf{w}_{in} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}
\right)
& \quad & \text{(5.1)}\end{split}\]
(ii)
This unsupervised learning scenario treats the states
\(\{ \mathbf{w}_{in} \}_{i = 1, n = 1}^{I, N}\) as hidden and only the
measurements \(\{ \mathbf{x}_{in} \}_{i = 1, n = 1}^{I, N}\) are observed
resulting in
\[\begin{split}\hat{\boldsymbol{\theta}}
&= \argmax_{\boldsymbol{\theta}}
\prod_{i = 1}^I
Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N \mid \boldsymbol{\theta})\\
&= \argmax_{\boldsymbol{\theta}}
\prod_{i = 1}^I
\int Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N, \mathbf{h}_i \mid
\boldsymbol{\theta})
d\mathbf{h}_i\end{split}\]
where \(\mathbf{h}_i = \{ \mathbf{w}_{in} \}_{n = 1}^N\), which can be
solved using the EM algorithm [Par].
The E-step consists of computing the posterior distribution over the states for
each time sequence
\[\begin{split}q_i(\mathbf{h}_i)
&= Pr(\mathbf{h}_i \mid
\{ \mathbf{x}_{in} \}_{n = 1}^N, \boldsymbol{\theta})\\
&= Pr(\{ \mathbf{w}_{in} \}_{n = 1}^N |
\{ \mathbf{x}_{in} \}_{n = 1}^N, \boldsymbol{\theta})\\
&= Pr(\mathbf{w}_{iN} \mid
\{ \mathbf{x}_{in} \}_{n = 1}^N, \boldsymbol{\theta})
\prod_{n = 1}^{N - 1}
Pr(\mathbf{w}_{i(N - n)} \mid \mathbf{w}_{i(N - n + 1)},
\{ \mathbf{x}_{in} \}_{n = 1}^{N - n},
\boldsymbol{\theta})
& \quad & \text{Exercise 19.7 (a),}\end{split}\]
which can be computed using the terms that result from running the Kalman filter
followed by the Kalman fixed interval smoother. See
Exercise 19.7 for more details. It is
important to realize that \(q_i(\mathbf{h}_i)\) itself is not used directly
in the M-step; the E-step’s purpose is to estimate the expected value and
covariance of each hidden variable
\[Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) =
\NormDist_{\mathbf{w}_t}\left[
\boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1}
\right].\]
Since no prior knowledge can be leveraged besides assuming a Gaussian
distribution, the initial parameters can be randomly initialized.
In the M-step, the lower bound is maximized with respect to the parameters
\(\boldsymbol{\theta} =
\left\{
\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0,
\boldsymbol{\mu}_p, \boldsymbol{\Sigma}_p, \boldsymbol{\Psi},
\boldsymbol{\mu}_m, \boldsymbol{\Sigma}_m, \boldsymbol{\Phi}
\right\}\) so that
\[\begin{split}\DeclareMathOperator{\tr}{\mathrm{tr}}
\boldsymbol{\theta}^{[t + 1]}
&= \argmax_{\boldsymbol{\theta}}
\sum_{i = 1}^I
\int q_i^{[t]}(\mathbf{h}_i)
\log Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N, \mathbf{h}_i \mid
\boldsymbol{\theta}) d\mathbf{h}_i
& \quad & \text{(7.51)}\\
&= \argmax_{\boldsymbol{\theta}}
\sum_{i = 1}^I
\E\left[
\log Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N,
\{ \mathbf{w}_{in} \}_{n = 1}^N \mid \boldsymbol{\theta})
\right]\\
&= \argmax_{\boldsymbol{\theta}}
-\frac{1}{2} \left(
C + I \log \left\vert \boldsymbol{\Sigma}_0 \right\vert +
I N \log \left\vert \boldsymbol{\Sigma}_m \right\vert +
I (N - 1) \log \left\vert \boldsymbol{\Sigma}_p \right\vert +
\tr\left(
\E[Z] \boldsymbol{\Sigma}_0^{-1}
\right) +
\tr\left(
\E[M] \boldsymbol{\Sigma}_m^{-1}
\right) +
\tr\left(
\E[P] \boldsymbol{\Sigma}_p^{-1}
\right)
\right)
& \quad & \text{(i), (ii.a), (ii.b), (ii.c).}\end{split}\]
(ii.a)
\[C = I D_w \log 2 \pi + I N D_m \log 2 \pi + I (N - 1) D_p \log 2 \pi\]
and
\[\begin{split}\sum_{i = 1}^I
\left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)^\top
\boldsymbol{\Sigma}_0^{-1}
\left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)
&= \tr\left[
\sum_{i = 1}^I
\left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)
\left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)^\top
\boldsymbol{\Sigma}_0^{-1}
\right]
& \quad & \text{(C.14), (C.15)}\\
&= \tr\left[
Z \boldsymbol{\Sigma}_0^{-1}
\right]\end{split}\]
(ii.b)
\[\begin{split}& \sum_{i = 1}^I \sum_{n = 1}^N
\left(
\mathbf{x}_{in} - \boldsymbol{\mu}_m -
\boldsymbol{\Phi} \mathbf{w}_{in}
\right)^\top
\boldsymbol{\Sigma}_m^{-1}
\left(
\mathbf{x}_{in} - \boldsymbol{\mu}_m -
\boldsymbol{\Phi} \mathbf{w}_{in}
\right)\\
&= \tr\left[
\sum_{i = 1}^I \sum_{n = 1}^N
\left(
\mathbf{x}_{in} - \boldsymbol{\mu}_m -
\boldsymbol{\Phi} \mathbf{w}_{in}
\right)
\left(
\mathbf{x}_{in} - \boldsymbol{\mu}_m -
\boldsymbol{\Phi} \mathbf{w}_{in}
\right)^\top
\boldsymbol{\Sigma}_m^{-1}
\right]
& \quad & \text{(C.14), (C.15)}\\
&= \tr\left[
M \boldsymbol{\Sigma}_m^{-1}
\right]\end{split}\]
(ii.c)
\[\begin{split}& \sum_{i = 1}^I \sum_{n = 2}^N
\left(
\mathbf{w}_{in} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}
\right)^\top
\boldsymbol{\Sigma}_p^{-1}
\left(
\mathbf{w}_{in} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}
\right)\\
&= \tr\left[
\sum_{i = 1}^I \sum_{n = 2}^N
\left(
\mathbf{w}_{in} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}
\right)
\left(
\mathbf{w}_{in} - \boldsymbol{\mu}_p -
\boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}
\right)^\top
\boldsymbol{\Sigma}_p^{-1}
\right]
& \quad & \text{(C.14), (C.15)}\\
&= \tr\left[
P \boldsymbol{\Sigma}_p^{-1}
\right]\end{split}\]
Exercise 19.9
The mean and covariance of the points are respectively
\[\begin{split}\sum_{j = 0}^{2D_\mathbf{w}} a_j \hat{\mathbf{w}}^{[j]}
&= a_0 \boldsymbol{\mu}_{t - 1} +
\sum_{j = 1}^{D_\mathbf{w}}
\frac{1 - a_0}{2D_\mathbf{w}}
\left(
\boldsymbol{\mu}_{t - 1} +
\sqrt{\frac{D_\mathbf{w}}{1 - a_0}}
\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j
\right) +\\
&\qquad
\sum_{j = D_\mathbf{w} + 1}^{2D_\mathbf{w}}
\frac{1 - a_0}{2D_\mathbf{w}}
\left(
\boldsymbol{\mu}_{t - 1} -
\sqrt{\frac{D_\mathbf{w}}{1 - a_0}}
\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_{j - D_\mathbf{w}}
\right)
& \quad & \text{(19.40), (19.41)}\\
&= a_0 \boldsymbol{\mu}_{t - 1} +
2 D_\mathbf{w} \frac{1 - a_0}{2D_\mathbf{w}} \boldsymbol{\mu}_{t - 1}\\
&= \boldsymbol{\mu}_{t - 1}\end{split}\]
and
\[\begin{split}\sum_{j = 0}^{2D_\mathbf{w}} a_j
\left( \hat{\mathbf{w}}^{[j]} - \boldsymbol{\mu}_{t - 1} \right)
\left( \hat{\mathbf{w}}^{[j]} - \boldsymbol{\mu}_{t - 1} \right)^\top
&= \sum_{j = 1}^{D_\mathbf{w}}
\frac{1 - a_0}{2D_\mathbf{w}} \frac{D_\mathbf{w}}{1 - a_0}
\left(
\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j
\right)
\left(
\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j
\right)^\top +\\
&\qquad
\sum_{j = D_\mathbf{w} + 1}^{2D_\mathbf{w}}
\frac{1 - a_0}{2D_\mathbf{w}} \frac{D_\mathbf{w}}{1 - a_0}
\left(
-\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_{j - D_\mathbf{w}}
\right)
\left(
-\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_{j - D_\mathbf{w}}
\right)^\top
& \quad & \text{(19.40), (19.41)}\\
&= \sum_{j = 1}^{D_\mathbf{w}}
\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j
\mathbf{e}_j^\top {\boldsymbol{\Sigma}_{t - 1}^{1 / 2}}^\top\\
&= \sum_{j = 1}^{D_\mathbf{w}}
\mathbf{U} \boldsymbol{\Lambda}^{1/2} \mathbf{e}_j
\mathbf{e}_j^\top \boldsymbol{\Lambda}^{1/2} \mathbf{V}^\top\\
&= \sum_{j = 1}^{D_\mathbf{w}}
\lambda_j \mathbf{U}_{\cdot j} \mathbf{V}_{j \cdot}^\top\\
&= \boldsymbol{\Sigma}_{t - 1}\end{split}\]
where the SVD of
\[\begin{split}\boldsymbol{\Sigma}_{t - 1}
&= \mathbf{U} \boldsymbol{\Lambda} \mathbf{V}^\top\\
&= \sum_j \lambda_j \mathbf{U}_{\cdot j} \mathbf{V}_{j \cdot}^\top,
\\\\
\boldsymbol{\Sigma}_{t - 1}^{1 / 2}
&= \mathbf{U} \boldsymbol{\Lambda}^{1 / 2},
\\\\
{\boldsymbol{\Sigma}_{t - 1}^{1 / 2}}^\top
&= \boldsymbol{\Lambda}^{1 / 2} \mathbf{V}^\top.\end{split}\]
Exercise 19.20
\[\begin{split}\mathbf{x}
&= \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}]
& \quad & \text{(19.30)}\\
\begin{bmatrix} x_1\\ y_1\\ x_2\\ y_2 \end{bmatrix}
&= \begin{bmatrix} u_1\\ v_1\\ u_2\\ v_2 \end{bmatrix} \frac{1}{1 + w} +
\boldsymbol{\epsilon}
& \quad & \text{(19.50)}\end{split}\]
[Hoo] has a nice worked out example that makes the following
more understandable.
\[\begin{split}\boldsymbol{\Phi}
&= \frac{
\partial \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}]
}{\partial \mathbf{w}}
& \quad & \text{(19.31)}\\
&= \frac{
\partial \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}]
}{
\partial \left\{ u_1, v_1, u_2, v_2, w\right\}
}\\
&= \frac{1}{1 + w} \begin{bmatrix}
1 & 0 & 0 & 0 & -\frac{u_1}{1 + w}\\
0 & 1 & 0 & 0 & -\frac{v_1}{1 + w}\\
0 & 0 & 1 & 0 & -\frac{u_2}{1 + w}\\
0 & 0 & 0 & 1 & -\frac{v_2}{1 + w}
\end{bmatrix}
\\\\
\boldsymbol{\Upsilon}
&= \frac{
\partial \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}]
}{
\partial \boldsymbol{\epsilon}
}\\
&= \mathbf{I}
& \quad & \text{(19.31).}\end{split}\]
References
- Arc
Cedric Archambeau. Filtering and smoothing in dynamical systems. http://www0.cs.ucl.ac.uk/staff/C.Archambeau/ATML/atml_files/atml08_lect2_dynsyst.pdf. Accessed on 2017-08-03.
- BMM96
Gary D Brushe, Robert E Mahony, and John B Moore. A forward backward algorithm for ml state and sequence estimation. In ISSPA, 224–227. 1996.
- Fle
Tristan Fletcher. The kalman filter explained. https://tristan-fletcher-fdxe.squarespace.com/s/LDS-87ae.pdf. Accessed on 2017-08-02.
- Hoo
Adam Hoover. Extended kalman filter. http://www.ces.clemson.edu/ ahoover/ece854/lecture-notes/lecture-ekf.pdf. Accessed on 2017-08-02.
- JB01
Michael I Jordan and Chris Bishop. An introduction to graphical models. unpublished book, 2001. pg. 40.
- Mac
Lester Mackey. Linear gaussian state space model. http://web.stanford.edu/ lmackey/stats306b/doc/stats306b-spring14-lecture11_scribed.pdf. Accessed on 2017-08-03.
- MBM96
Robert E Mahony, Gary D Brushe, and John B Moore. Hybrid algorithms for maximum likelihood and maximum a posterior sequence estimation. In ISSPA, 451–454. 1996.
- Par
Lucas C. Parra. Hidden markov model kalman filter. http://bme.ccny.cuny.edu/faculty/parra/teaching/biomed-dsp/class10.pdf. Accessed on 2017-08-03.
- Sch
Sandro Schonborn. Graphical models: sum-product algorithm. http://cs-wwwarchiv.cs.unibas.ch/lehre/hs11/cs351/_Slides/Schoenborn_SumProduct.pdf. Accessed on 2017-08-02.