Temporal Models

Exercise 19.1

Suppose

\[\DeclareMathOperator{\NormDist}{Norm} Pr(\mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1}) = \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_{t - 1} \right].\]
\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) &= \int Pr(\mathbf{w}_t, \mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_{t - 1} & \quad & \text{(2.1)}\\ &= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t - 1}) Pr(\mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_{t - 1} & \quad & \text{Markov assumption}\\ &= \int \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_{t - 1}, \boldsymbol{\Sigma}_p \right] \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_{t - 1} \right] d\mathbf{w}_{t - 1} & \quad & \text{(19.6)}\\ &= \kappa_1 \kappa_2 \int \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}'', \boldsymbol{\Sigma}'' \right] d\mathbf{w}_{t - 1} & \quad & \text{(a), (b)}\\ &= \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top \right] & \quad & \text{(c)}\\ &= \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_+, \boldsymbol{\Sigma}_+ \right]\end{split}\]

(a)

By Exercise 5.10,

\[\NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_{t - 1}, \boldsymbol{\Sigma}_p \right] = \kappa_1 \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t, \boldsymbol{\Sigma}' \right]\]

where

\[\begin{split}\boldsymbol{\Sigma}' &= (\boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi})^{-1} \\\\ \boldsymbol{\Psi}' &= \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \\\\ \boldsymbol{\mu}' &= -\boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p \\\\ \kappa_1 &= \frac{ \left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2} }{ \left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2} } \exp\left[ -0.5 (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_t - \boldsymbol{\mu}_p) \right].\end{split}\]

(b)

By Exercise 5.7 and Exercise 5.9,

\[\kappa_1 \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t, \boldsymbol{\Sigma}' \right] \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}_{t - 1} \right] = \kappa_1 \kappa_2 \NormDist_{\mathbf{w}_{t - 1}}\left[ \boldsymbol{\mu}'', \boldsymbol{\Sigma}'' \right]\]

where

\[\begin{split}\boldsymbol{\Sigma}'' &= \left( {\boldsymbol{\Sigma}'}^{-1} + \boldsymbol{\Sigma}_{t - 1}^{-1} \right)^{-1} \\\\ \boldsymbol{\mu}'' &= \boldsymbol{\Sigma}'' \left( {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t \right) + \boldsymbol{\Sigma}_{t - 1}^{-1} \boldsymbol{\mu}_{t - 1} \right) \\\\ \kappa_2 &= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t}\left[ \boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right].\end{split}\]

(c)

\[\begin{split}& \kappa_1 \kappa_2\\ &= \frac{ \left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2} }{ \left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2} } \exp\left[ (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_t - \boldsymbol{\mu}_p) \right]^{-0.5} \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t}\left[ \boldsymbol{\mu}_{t - 1}, \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right]\\ &= \frac{ \left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2} \exp\left[ (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_t - \boldsymbol{\mu}_p) + \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right) \right]^{-0.5} }{ (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2} \left\vert \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right\vert^{1 / 2} }\\ &= \frac{1}{ (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top \right\vert^{1 / 2} } \exp\left[ \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top \right)^{-1} \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right) \right]^{-0.5} & \quad & \text{(c.1), (c.2)}\end{split}\]

(c.1)

\[\begin{split}\frac{ \left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2} }{ \left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2} \left\vert \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right\vert^{1 / 2} } &= \left( \left\vert \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right\vert \left\vert \boldsymbol{\Sigma}_p \right\vert \left\vert \left( \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} + \boldsymbol{\Sigma}_{t - 1} \right\vert \right)^{-1 / 2} & \quad & \text{(C.11)}\\ &= \left( \left\vert \boldsymbol{\Sigma}_p \right\vert \left\vert \mathbf{I} + \left( \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right) \boldsymbol{\Sigma}_{t - 1} \right\vert \right)^{-1 / 2} & \quad & \text{(C.10)}\\ &= \left( \left\vert \boldsymbol{\Sigma}_p \right\vert \left\vert \mathbf{I} + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right\vert \right)^{-1 / 2} & \quad & \text{Sylvester's Determinant Theorem}\\ &= \left\vert \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top \right\vert^{-1 / 2} & \quad & \text{(C.10)}\end{split}\]

(c.2)

\[\begin{split}& \exp\left[ (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_t - \boldsymbol{\mu}_p) + \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right) \right]^{-0.5}\\ &= \exp\left[ (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_t - \boldsymbol{\mu}_p) + \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right) \right]^{-0.5}\\ &= \exp\left[ \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right) \right]^{-0.5} & \quad & \text{(c.3)}\\ &= \exp\left[ \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{t - 1} \boldsymbol{\Psi}^\top \right)^{-1} \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right) \right]^{-0.5} & \quad & \text{(C.61)}\end{split}\]

(c.3)

Notice that the summands be decomposed into

\[\begin{split}& (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_t - \boldsymbol{\mu}_p)\\ &= (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \boldsymbol{\Sigma}_p^{-1} (\mathbf{w}_t - \boldsymbol{\mu}_p) - (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} (\mathbf{w}_t - \boldsymbol{\mu}_p)\\ &= (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \boldsymbol{\Sigma}_p^{-1} (\mathbf{w}_t - \boldsymbol{\mu}_p) - (\boldsymbol{\Psi}' \mathbf{w}_t + \boldsymbol{\mu}')^\top {\boldsymbol{\Sigma}'}^{-1} (\boldsymbol{\Psi}' \mathbf{w}_t + \boldsymbol{\mu}')\end{split}\]

and

\[\begin{split}& \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)\\ &= \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top \left( {\boldsymbol{\Sigma}'}^{-1} - {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} {\boldsymbol{\Sigma}'}^{-1} \right) \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right) & \quad & \text{(c.4)}\\ &= \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right) - \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)\\ &= \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right) - \left( \mathbf{w}_t - \boldsymbol{\mu} - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \left( \mathbf{w}_t - \boldsymbol{\mu} - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right).\end{split}\]

Since

\[\begin{split}& \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)^\top {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)\\ &= \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t \right)^\top {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t \right) - \boldsymbol{\mu}_{t - 1}^\top {\boldsymbol{\Sigma}'}^{-1} \left( 2 \boldsymbol{\mu}' + 2 \boldsymbol{\Psi}' \mathbf{w}_t - \boldsymbol{\mu}_{t - 1} \right)\\ &= \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t \right)^\top {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_t \right) - \boldsymbol{\mu}_{t - 1}^\top \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \left( 2 \mathbf{w}_t - 2 \boldsymbol{\mu} - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)\end{split}\]

and

\[\begin{split}& (\mathbf{w}_t - \boldsymbol{\mu}_p)^\top \boldsymbol{\Sigma}_p^{-1} (\mathbf{w}_t - \boldsymbol{\mu}_p) - \boldsymbol{\mu}_{t - 1}^\top \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \left( 2 \mathbf{w}_t - 2 \boldsymbol{\mu} - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)\\ &= \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)^\top \boldsymbol{\Sigma}_p^{-1} \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right),\end{split}\]

the sum of the original summands is

\[\left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) \left( \mathbf{w}_t - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_{t - 1} \right).\]

(c.4)

See Exercise 5.9 for more details.

\[\begin{split}\left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1} &= {\boldsymbol{\Sigma}'}^{-1} - \left( \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{t - 1} \right)^{-1} \boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}\\ &= {\boldsymbol{\Sigma}'}^{-1} - {\boldsymbol{\Sigma}'}^{-1} \left( \mathbf{I} + \boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1} \right)^{-1} \boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}\\ &= {\boldsymbol{\Sigma}'}^{-1} - {\boldsymbol{\Sigma}'}^{-1} \left[ \boldsymbol{\Sigma}_{t - 1} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right) \right]^{-1} \boldsymbol{\Sigma}_{t - 1} {\boldsymbol{\Sigma}'}^{-1}\\ &= {\boldsymbol{\Sigma}'}^{-1} - {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\Sigma}_{t - 1}^{-1} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} {\boldsymbol{\Sigma}'}^{-1}\end{split}\]

Exercise 19.2

\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) &= \frac{ Pr(\mathbf{w}_t, \mathbf{x}_{1 \ldots t}) }{ Pr(\mathbf{x}_{1 \ldots t}) }\\ &= \frac{ Pr(\mathbf{x}_t \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) Pr(\mathbf{x}_{1 \ldots t - 1}) }{ Pr(\mathbf{x}_t \mid \mathbf{x}_{1 \ldots t - 1}) Pr(\mathbf{x}_{1 \ldots t - 1}) }\\ &= \frac{ Pr(\mathbf{x}_t \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) }{ \int Pr(\mathbf{x}_t, \mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_t }\\ &= \frac{ \NormDist_{\mathbf{x}_t}\left[ \boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_t, \boldsymbol{\Sigma}_m \right] \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_+, \boldsymbol{\Sigma}_+ \right] }{ \int Pr(\mathbf{x}_t \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_t } & \quad & \text{(19.8), (19.9)}\\ &= \frac{ \kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t \right] }{ \int \kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t \right] d\mathbf{w}_t } & \quad & \text{(a), (b)}\\ &= \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t \right]\end{split}\]

(a)

Suppose \(\mathbf{x}_\cdot \in \mathbb{R}^n\) and \(\mathbf{w}_\cdot \in \mathbb{R}^m\). By Exercise 5.10,

\[\NormDist_{\mathbf{x}_t}\left[ \boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_t, \boldsymbol{\Sigma}_m \right] = \kappa_1 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t, \boldsymbol{\Sigma}' \right]\]

where

\[\begin{split}\boldsymbol{\Sigma}' &= \left( \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} \right)^{-1} \\\\ \boldsymbol{\Phi}' &= \boldsymbol{\Sigma}' \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \\\\ \boldsymbol{\mu}' &= -\boldsymbol{\Sigma}' \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\mu}_m \\\\ \kappa_1 &= (2 \pi)^{(m - n) / 2} \frac{ \left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2} }{ \left\vert \boldsymbol{\Sigma}_m \right\vert^{1 / 2} } \exp\left[ -0.5 (\mathbf{x}_t - \boldsymbol{\mu}_m)^\top \left( \boldsymbol{\Sigma}_m^{-1} - \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} \boldsymbol{\Sigma}' \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \right) (\mathbf{x}_t - \boldsymbol{\mu}_m) \right].\end{split}\]

(b)

By Exercise 5.7 and Exercise 5.9,

\[\kappa_1 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t, \boldsymbol{\Sigma}' \right] \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_+, \boldsymbol{\Sigma}_+ \right] = \kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t \right]\]

where

\[\begin{split}\boldsymbol{\Sigma}_t &= \left( {\boldsymbol{\Sigma}'}^{-1} + \boldsymbol{\Sigma}_+^{-1} \right)^{-1} = \left( \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} + \boldsymbol{\Sigma}_+^{-1} \right)^{-1} \\\\ \boldsymbol{\mu}_t &= \boldsymbol{\Sigma}_t \left( {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t \right) + \boldsymbol{\Sigma}_+^{-1} \boldsymbol{\mu}_+ \right) = \boldsymbol{\Sigma}_t \left( \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \left( \mathbf{x}_t - \boldsymbol{\mu}_m \right) + \boldsymbol{\Sigma}_+^{-1} \boldsymbol{\mu}_+ \right) \\\\ \kappa_2 &= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t}\left[ \boldsymbol{\mu}_+, \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_+ \right].\end{split}\]

Exercise 19.3

\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) &= \frac{ \NormDist_{\mathbf{x}_t}\left[ \boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_t, \boldsymbol{\Sigma}_m \right] \sum_{k = 1}^K \lambda_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{+k}, \boldsymbol{\Sigma}_{+k} \right] }{ \int Pr(\mathbf{x}_t \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_t } & \quad & \text{(19.8) and Exercise 19.2}\\ &= \frac{ \kappa \sum_{k = 1}^K \kappa_k \lambda_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right] }{ \kappa \sum_{k = 1}^K \kappa_{k} \lambda_k } & \quad & \text{(a), (b)}\\ &= \sum_{k = 1}^K \lambda'_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right] & \quad & \lambda'_k = \frac{ \kappa_k \lambda_k }{ \sum_{k' = 1}^K \kappa_{k'} \lambda_{k'} }.\end{split}\]

See Exercise 19.2 for more details.

In the next time update step, the prediction becomes

\[\begin{split}Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots t}) &= \int Pr(\mathbf{w}_{t + 1}, \mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) d\mathbf{w}_t & \quad & \text{(2.1)}\\ &= \int Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) d\mathbf{w}_t & \quad & \text{Markov assumption}\\ &= \int \NormDist_{\mathbf{w}_{t + 1}}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t, \boldsymbol{\Sigma}_p \right] \sum_{k = 1}^K \lambda'_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right] d\mathbf{w}_t & \quad & \text{(19.6) and Exercise 19.1}\\ &= \sum_{k = 1}^K \lambda'_k \int \NormDist_{\mathbf{w}_{t + 1}}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t, \boldsymbol{\Sigma}_p \right] \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right] d\mathbf{w}_t & \quad & \text{sum rule in integration}\\ &= \sum_{k = 1}^K \lambda'_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_{tk} \boldsymbol{\Psi}^\top \right] & \quad & \text{(c) from Exercise 19.1}\\ &= \sum_{k = 1}^K \lambda'_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{+k}, \boldsymbol{\Sigma}_{+k} \right].\end{split}\]

See Exercise 19.1 for more details.

(a)

By (a) and (b) from Exercise 19.2,

\[\kappa \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t, \boldsymbol{\Sigma}' \right] \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{+k}, \boldsymbol{\Sigma}_{+k} \right] = \kappa \kappa_{k} \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right]\]

where

\[\begin{split}\boldsymbol{\Sigma}_{tk} &= \left( {\boldsymbol{\Sigma}'}^{-1} + \boldsymbol{\Sigma}_{+k}^{-1} \right)^{-1} = \left( \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \boldsymbol{\Phi} + \boldsymbol{\Sigma}_{+k}^{-1} \right)^{-1} \\\\ \boldsymbol{\mu}_{tk} &= \boldsymbol{\Sigma}_{tk} \left( {\boldsymbol{\Sigma}'}^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t \right) + \boldsymbol{\Sigma}_{+k}^{-1} \boldsymbol{\mu}_{+k} \right) = \boldsymbol{\Sigma}_t \left( \boldsymbol{\Phi}^\top \boldsymbol{\Sigma}_m^{-1} \left( \mathbf{x}_t - \boldsymbol{\mu}_m \right) + \boldsymbol{\Sigma}_{+k}^{-1} \boldsymbol{\mu}_{+k} \right) \\\\ \kappa_{k} &= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Phi}' \mathbf{x}_t}\left[ \boldsymbol{\mu}_{+k}, \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_{+k} \right].\end{split}\]

(b)

\[\begin{split}\int \kappa \sum_{k = 1}^K \kappa_{k} \lambda_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right] d\mathbf{w}_t &= \kappa \sum_{k = 1}^K \int \kappa_{k} \lambda_k \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{tk}, \boldsymbol{\Sigma}_{tk} \right] d\mathbf{w}_t & \quad & \text{sum rule in integration}\\ &= \kappa \sum_{k = 1}^K \kappa_{k} \lambda_k & \quad & \text{sum rule in integration}\end{split}\]

Exercise 19.4

The max-marginals inference is essentially (10.16):

\[\DeclareMathOperator*{\argmax}{arg\,max} \hat{\mathbf{w}} = \argmax_{\mathbf{w}_t} Pr(\mathbf{w}_t \mid \mathbf{w}_{t - 1}).\]

The temporal model could still be (19.5) where \(\boldsymbol{\Psi} = \boldsymbol{\Psi}_1\) or \(\boldsymbol{\Psi} = \boldsymbol{\Psi}_2\).

A simple strategy could be to choose the state transition matrix that maximizes the current time step [GH00].

Exercise 19.5

The joint posterior distribution can be factorized into an HMM (11.1), which can be solved in \(\mathcal{O}(TK^2)\) using the Viterbi algorithm where \(K\) is the number of possible states (see Exercise 11.2 for more details).

In the Kalman filter, \(T\) grows as more measurements are taken, so computing the marginal posteriors is preferred because it can be solved for in closed form.

Exercise 19.6

The following are based on Section 11.4.4 and [Sch].

The forward pass starts with

\[\mathbf{m}_{\mathbf{x}_1 \rightarrow g_1} = \delta[\mathbf{x}_1^*] \qquad \text{(11.36).}\]

The message is then forwarded as

\[\begin{split}\mathbf{m}_{g_1 \rightarrow \mathbf{w}_1} &= \int Pr(\mathbf{x}_1 \mid \mathbf{w}_1) \delta\left[ \mathbf{x}_1^* \right] d\mathbf{x}_1\\ &= Pr(\mathbf{x}_1 = \mathbf{x}_1^* \mid \mathbf{w}_1) & \quad & \text{(11.37).}\end{split}\]

Generalizing the message yields the measurement model

\[\mathbf{m}_{g_t \rightarrow \mathbf{w}_t} = Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t) \qquad \text{(19.8).}\]

At time step \(t = 1\), the result is arbitrary as suggested in the paragraph after (19.16) where

\[\begin{split}Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t) &= \frac{ Pr(\mathbf{x}_t = \mathbf{x}_t^*, \mathbf{w}_t) }{ Pr(\mathbf{w}_t) }\\ &= \frac{ Pr(\mathbf{w}_t \mid \mathbf{x}_t = \mathbf{x}_t^*) Pr(\mathbf{x}_t = \mathbf{x}_t^*) }{ Pr(\mathbf{w}_t) }\\ Pr(\mathbf{w}_t \mid \mathbf{x}_t = \mathbf{x}_t^*) &= \frac{ Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t) Pr(\mathbf{w}_t) }{ Pr(\mathbf{x}_t = \mathbf{x}_t^*) } & \quad & \text{(19.1).}\end{split}\]

This means the first hidden variable adds prior information and forwards the message normalized as

\[\begin{split}\mathbf{m}_{\mathbf{w}_1 \rightarrow g_{12}} &= \mathbf{m}_{g_1 \rightarrow \mathbf{w}_1} \frac{Pr(\mathbf{w}_1)}{Pr(\mathbf{x}_1 = \mathbf{x}_t^*)}\\ &= \frac{ Pr(\mathbf{x}_1 = \mathbf{x}_1^* \mid \mathbf{w}_1) Pr(\mathbf{w}_1) }{ Pr(\mathbf{x}_1 = \mathbf{x}_1^*) }\\ &= Pr(\mathbf{w}_1 \mid \mathbf{x}_1 = \mathbf{x}_1^*) & \quad & \text{(11.35).}\end{split}\]

Generalizing what the function node (at \(t > 1\)) forwards yields the prediction step

\[\begin{split}\mathbf{m}_{g_{t - 1, t} \rightarrow \mathbf{w}_t} &= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t - 1}) Pr(\mathbf{w}_{t - 1} \mid \mathbf{x}_{1 \ldots t - 1}) d\mathbf{w}_{t - 1}\\ &= Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1}) & \quad & \text{(11.37), (19.9).}\end{split}\]

Generalizing what the unobserved variable (at \(t > 1\)) forwards yields the measurement incorporation step

\[\begin{split}\mathbf{m}_{\mathbf{w}_t \rightarrow g_{t, t + 1}} &= \frac{ \mathbf{m}_{g_{t} \rightarrow \mathbf{w}_t} \mathbf{m}_{g_{t - 1, t} \rightarrow \mathbf{w}_t} }{ Pr(\mathbf{x}_{1 \ldots t}) }\\ &= Pr(\mathbf{x}_t = \mathbf{x}_t^* \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t - 1})\\ &= Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) & \quad & \text{(11.35), (19.10).}\end{split}\]

Notice that the backward pass is not needed because the forward pass propagates normalized messages.

Exercise 19.7

By inspection, the fixed interval smoother occurs after the Kalman filter i.e. wait until \(T\) observations have been made and then retrospectively calculate \(Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T})\) for \(t < T\).

The base case of this inductive proof is

\[\begin{split}Pr(\mathbf{w}_T \mid \mathbf{x}_{1 \ldots T}) &= \frac{ Pr(\mathbf{x}_T \mid \mathbf{w}_T) Pr(\mathbf{w}_T \mid \mathbf{x}_{1 \ldots T - 1}) }{ Pr(\mathbf{x}_{1 \ldots T}) }\\ &= \NormDist_{\mathbf{w}_T}\left[ \boldsymbol{\mu}_{T \mid T}, \boldsymbol{\Sigma}_{T \mid T} \right] & \quad & \text{(19.10).}\end{split}\]

Insights from [Fle] suggest that the D-separation should be invoked. The inductive step is then

\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T}) &= \int Pr(\mathbf{w}_{t + 1}, \mathbf{w}_t \mid \mathbf{x}_{1 \ldots T}) d\mathbf{w}_{t + 1} & \quad & \text{(2.1)}\\ &= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots T}) Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots T}) d\mathbf{w}_{t + 1} & \quad & \text{(2.6) with Markov assumption}\\ &= \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots T}) d\mathbf{w}_{t + 1} & \quad & \text{D-separation}\\ &= \int \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1} \right] \NormDist_{\mathbf{w}_{t + 1}}\left[ \boldsymbol{\mu}_{t + 1 \mid T}, \boldsymbol{\Sigma}_{t + 1 \mid T} \right] d\mathbf{w}_{t + 1} & \quad & \text{(a)}\\ &= \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{t \mid T}, \boldsymbol{\Sigma}_{t \mid T} \right] & \quad & \text{(b).}\end{split}\]

(a)

\[\begin{split}Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) &= \frac{ Pr(\mathbf{w}_t, \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) }{ Pr(\mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) } & \quad & \text{(2.4)}\\ &= \frac{ Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) Pr(\mathbf{x}_{1 \ldots t}) }{ Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots t}) Pr(\mathbf{x}_{1 \ldots t}) } & \quad & \text{(2.5)}\\ &= \frac{ Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) }{ \int Pr(\mathbf{w}_t, \mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots t}) d\mathbf{w}_{t} } & \quad & \text{(2.1)}\\ &= \frac{ Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) }{ \int Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) d\mathbf{w}_{t} } & \quad & \text{(2.6) with Markov assumption}\\ &= \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1} \right] & \quad & \text{(a.1)}\end{split}\]

(a.1)

\[\begin{split}Pr(\mathbf{w}_{t + 1} \mid \mathbf{w}_t) Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots t}) &= \NormDist_{\mathbf{w}_{t + 1}}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t, \boldsymbol{\Sigma}_p \right] \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t \right] & \quad & \text{(19.6), (19.10)}\\ &= \kappa_1 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1}, \boldsymbol{\Sigma}' \right] \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}_t \right] & \quad & \text{(a.2)}\\ &= \kappa_1 \kappa_2 \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1} \right] & \quad & \text{Exercise 5.7 and 5.9}\end{split}\]

where

\[\begin{split}\boldsymbol{\Sigma}'_{t + 1} &= \left( \boldsymbol{\Sigma}'^{-1} + \boldsymbol{\Sigma}_t^{-1} \right)^{-1}\\ &= \left( \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} + \boldsymbol{\Sigma}_t^{-1} \right)^{-1} \\\\ \boldsymbol{\mu}'_{t + 1} &= \boldsymbol{\Sigma}'_{t + 1} \left( \boldsymbol{\Sigma}'^{-1} \left( \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1} \right) + \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t \right)\\ &= \boldsymbol{\Sigma}'_{t + 1} \left( \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_p \right) + \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t \right)\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \mathbf{w}_{t + 1} - \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p + \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t \\\\ \kappa_2 &= \NormDist_{\boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1}} \left[ \boldsymbol{\mu}_t, \boldsymbol{\Sigma}' + \boldsymbol{\Sigma}_t \right].\end{split}\]

See Exercise 5.7 and Exercise 5.9 for more details.

(a.2)

By Exercise 5.10,

\[\NormDist_{\mathbf{w}_{t + 1}}\left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_t, \boldsymbol{\Sigma}_p \right] = \kappa_1 \text{Norm}_{\mathbf{w}_t}\left[ \boldsymbol{\mu}' + \boldsymbol{\Psi}' \mathbf{w}_{t + 1}, \boldsymbol{\Sigma}' \right]\]

where

\[\begin{split}\boldsymbol{\Sigma}' &= \left( \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \right)^{-1} \\\\ \boldsymbol{\Psi}' &= \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \\\\ \boldsymbol{\mu}' &= -\boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p \\\\ \kappa_1 &= \frac{ \left\vert \boldsymbol{\Sigma}' \right\vert^{1 / 2} }{ \left\vert \boldsymbol{\Sigma}_p \right\vert^{1 / 2} } \exp\left[ -0.5 (\mathbf{w}_{t + 1} - \boldsymbol{\mu}_p)^\top \left( \boldsymbol{\Sigma}_p^{-1} - \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}' \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \right) (\mathbf{w}_{t + 1} - \boldsymbol{\mu}_p) \right].\end{split}\]

(b)

The generative equations for the distributions from

\[Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T}) = \int Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) Pr(\mathbf{w}_{t + 1} \mid \mathbf{x}_{1 \ldots T}) d\mathbf{w}_{t + 1}\]

are

\[\begin{split}\mathbf{w}_t &= \boldsymbol{\mu}'_{t + 1} + \boldsymbol{\epsilon}_{t + 1}\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \mathbf{w}_{t + 1} - \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p + \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t + \boldsymbol{\epsilon}_{t + 1} & \quad & \text{(a.1)}\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \mathbf{w}_{t + 1} - \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\mu}_p + \left( \boldsymbol{\Sigma}_t - \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}_t \right) \boldsymbol{\Sigma}_t^{-1} \boldsymbol{\mu}_t + \boldsymbol{\epsilon}_{t + 1} & \quad & \text{Exercise 5.9 (a)}\\ &= \boldsymbol{\mu}_t + \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \boldsymbol{\mu}_t \right) + \boldsymbol{\epsilon}_{t + 1}\\ &= \boldsymbol{\mu}_t + \mathbf{C}_t \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{+ \mid t + 1} \right) + \boldsymbol{\epsilon}_{t + 1} & \quad & \text{(b.1) and (19.9)}\end{split}\]

and

\[\mathbf{w}_{t + 1} = \boldsymbol{\mu}_{t + 1 \mid T} + \boldsymbol{\epsilon}_{t + 1 \mid T}\]

where

\[\begin{split}\DeclareMathOperator{\Cov}{\mathrm{Cov}} \DeclareMathOperator{\E}{\mathrm{E}} \E[\boldsymbol{\epsilon}_{t + 1}] &= \E[\boldsymbol{\epsilon}_{t + 1 \mid T}] = \boldsymbol{0} \\\\ \Cov(\boldsymbol{\epsilon}_{t + 1}, \boldsymbol{\epsilon}_{t + 1}) &= \E\left[ \left( \boldsymbol{\epsilon}_{t + 1} - \E[\boldsymbol{\epsilon}_{t + 1}] \right) \left( \boldsymbol{\epsilon}_{t + 1} - \E[\boldsymbol{\epsilon}_{t + 1}] \right)^\top \right] = \E\left[ \boldsymbol{\epsilon}_{t + 1} \boldsymbol{\epsilon}_{t + 1}^\top \right] - \E[\boldsymbol{\epsilon}_{t + 1}] \E[\boldsymbol{\epsilon}_{t + 1}]^\top = \boldsymbol{\Sigma}'_{t + 1} \\\\ \Cov\left( \boldsymbol{\epsilon}_{t + 1 \mid T}, \boldsymbol{\epsilon}_{t + 1 \mid T} \right) &= \E\left[ \boldsymbol{\epsilon}_{t + 1 \mid T} \boldsymbol{\epsilon}_{t + 1 \mid T}^\top \right] - \E[\boldsymbol{\epsilon}_{t + 1 \mid T}] \E[\boldsymbol{\epsilon}_{t + 1 \mid T}]^\top = \boldsymbol{\Sigma}_{t + 1 \mid T} \\\\ \Cov(\boldsymbol{\epsilon}_{t + 1}, \boldsymbol{\epsilon}_{t + 1 \mid T}) &= \boldsymbol{0},\end{split}\]

which implies

\[\Cov(\mathbf{w}_{t + 1}, \boldsymbol{\epsilon}_{t + 1}) = \boldsymbol{0}.\]

These assumptions result in

\[Pr(\mathbf{w}_t \mid \mathbf{x}_{1 \ldots T}) = \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}_{t \mid T}, \boldsymbol{\Sigma}_{t \mid T} \right] \qquad \text{(b.3), (b.4).}\]

(b.1)

\[\begin{split}\mathbf{C}_t &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}\\ &= \left( \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} + \boldsymbol{\Sigma}_t^{-1} \right)^{-1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1}\\ &= \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top \left( \boldsymbol{\Sigma}_p + \boldsymbol{\Psi} \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top \right)^{-1}\\ &= \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_{+ \mid t + 1}^{-1} & \quad & \text{(19.9)}\end{split}\]

To simplify notations, define \(A = \boldsymbol{\Sigma}_p\) and \(B = \boldsymbol{\Psi} \boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top\). By Exercise 5.9 (a), \(\boldsymbol{\Sigma}_t = \boldsymbol{\Sigma}'_{t + 1} \left( \mathbf{I} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}_t \right)\).

\[\begin{split}\boldsymbol{\Sigma}_t \boldsymbol{\Psi}^\top (A + B)^{-1} &= \boldsymbol{\Sigma}'_{t + 1} \left( \mathbf{I} + \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} \boldsymbol{\Psi} \boldsymbol{\Sigma}_t \right) \boldsymbol{\Psi}^\top (A + B)^{-1}\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top (A + B)^{-1} + \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top A^{-1} B (A + B)^{-1}\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top (A + B)^{-1} + \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top A^{-1} B \left( B^{-1} - (A + B)^{-1} A B^{-1} \right) & \quad & \text{Exercise 5.9 (a)}\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top A^{-1} + \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top (A + B)^{-1} - \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top A^{-1} B (A + B)^{-1} A B^{-1}\\ &= \boldsymbol{\Sigma}'_{t + 1} \boldsymbol{\Psi}^\top \boldsymbol{\Sigma}_p^{-1} & \quad & \text{(b.2)}\end{split}\]

(b.2)

\[\begin{split}A^{-1} B (A + B)^{-1} A B^{-1} &= A^{-1} B \left( A^{-1} - (A + B)^{-1} B A^{-1} \right) A B^{-1} & \quad & \text{Exercise 5.9 (a)}\\ &= A^{-1} B A^{-1} A B^{-1} - A^{-1} B (A + B)^{-1} B A^{-1} A B^{-1}\\ &= A^{-1} - A^{-1} B (A + B)^{-1}\\ A^{-1} B (A + B)^{-1} A B^{-1} (A + B) &= \left( A^{-1} - A^{-1} B (A + B)^{-1} \right) (A + B)\\ &= A^{-1} (A + B) - A^{-1} B\\ &= \mathbf{I}\\ A^{-1} B (A + B)^{-1} A B^{-1} &= (A + B)^{-1}\end{split}\]

(b.3)

\[\begin{split}\boldsymbol{\mu}_{t \mid T} &= \E[\mathbf{w}_t]\\ &= \boldsymbol{\mu}_t + \mathbf{C}_t \left( \E[\mathbf{w}_{t + 1}] - \boldsymbol{\mu}_{+ \mid t + 1} \right) + \E[\boldsymbol{\epsilon}_{t + 1}] & \quad & \text{(2.14), (2.15), (2.16)}\\ &= \boldsymbol{\mu}_t + \mathbf{C}_t \left( \boldsymbol{\mu}_{t + 1 \mid T} - \boldsymbol{\mu}_{+ \mid t + 1} \right)\end{split}\]

(b.4)

\[\begin{split}\boldsymbol{\Sigma}_{t \mid T} &= \Cov(\mathbf{w}_t, \mathbf{w}_t)\\ &= \E\left[ \left( \mathbf{w}_t - \E[\mathbf{w}_t] \right) \left( \mathbf{w}_t - \E[\mathbf{w}_t] \right)^\top \right]\\ &= \E\left[ \left( \mathbf{C}_t \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T} \right) + \boldsymbol{\epsilon}_{t + 1} \right) \left( \mathbf{C}_t \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T} \right) + \boldsymbol{\epsilon}_{t + 1} \right)^\top \right]\\ &= \mathbf{C}_t \E\left[ \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T} \right) \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T} \right)^\top \right] \mathbf{C}_t^\top + \mathbf{C}_t \E\left[ \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T} \right) \boldsymbol{\epsilon}_{t + 1}^\top \right] + \E\left[ \boldsymbol{\epsilon}_{t + 1} \left( \mathbf{w}_{t + 1} - \boldsymbol{\mu}_{t + 1 \mid T} \right)^\top \right] \mathbf{C}_t^\top + \E\left[ \boldsymbol{\epsilon}_{t + 1} \boldsymbol{\epsilon}_{t + 1}^\top \right]\\ &= \mathbf{C}_t \boldsymbol{\Sigma}_{t + 1 \mid T} \mathbf{C}_t^\top + \boldsymbol{\Sigma}'_{t + 1}\\ &= \mathbf{C}_t \boldsymbol{\Sigma}_{t + 1 \mid T} \mathbf{C}_t^\top + \left( \boldsymbol{\Sigma}_t - \boldsymbol{\Sigma}_t \boldsymbol{\Psi}_t^\top \left( \boldsymbol{\Sigma}_p + \boldsymbol{\Psi}_t \boldsymbol{\Sigma}_t \boldsymbol{\Psi}_t^\top \right)^{-1} \boldsymbol{\Psi}_t \boldsymbol{\Sigma}_t \right) & \quad & \text{(C.61)}\\ &= \boldsymbol{\Sigma}_t + \mathbf{C}_t \boldsymbol{\Sigma}_{t + 1 \mid T} \mathbf{C}_t^\top - \mathbf{C}_t \boldsymbol{\Sigma}_{+ \mid t + 1}^\top \mathbf{C}_t^\top & \quad & \text{(b.1)}\\ &= \boldsymbol{\Sigma}_t + \mathbf{C}_t \left( \boldsymbol{\Sigma}_{t + 1 \mid T} - \boldsymbol{\Sigma}_{+ \mid t + 1} \right) \mathbf{C}_t^\top\end{split}\]

Exercise 19.8

The graphical model for the Kalman filter is

\[Pr(\{ \mathbf{x}_n \}_{n = 1}^N, \{ \mathbf{w}_n \}_{n = 1}^N) = \left( \prod_{n = 1}^N Pr(\mathbf{x}_n \mid \mathbf{w}_n) \right) \left( \prod_{n = 2}^N Pr(\mathbf{w}_n \mid \mathbf{w}_{n - 1}) \right) Pr(\mathbf{w}_1) \qquad \text{(10.19), (11.1).}\]

[Arc] is good for verifying this previous result and Exercise 19.7. [Mac] could also serve to verify the results of this exercise.

Note that [BMM96][MBM96] are useless; one should not even consider wasting their time to skim these papers. Just read the book’s explanations instead.

(i)

This supervised learning scenario is a fully observed Markov model ([JB01]) i.e. the training set consists of \(I\) matched sets of states \(\{ \mathbf{w}_{in} \}_{i = 1, n = 1}^{I, N}\) and measurements \(\{ \mathbf{x}_{in} \}_{i = 1, n = 1}^{I, N}\).

Maximum likelihood (or another technique like maximum a posteriori and the Bayesian approach) can be applied to fit the parameters \(\boldsymbol{\theta} = \left\{ \boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0, \boldsymbol{\mu}_p, \boldsymbol{\Sigma}_p, \boldsymbol{\Psi}, \boldsymbol{\mu}_m, \boldsymbol{\Sigma}_m, \boldsymbol{\Phi} \right\}\) to the data:

\[\begin{split}\hat{\boldsymbol{\theta}} &= \argmax_{\boldsymbol{\theta}} \prod_{i = 1}^I Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N, \{ \mathbf{w}_{in} \}_{n = 1}^N \mid \boldsymbol{\theta}) & \quad & \text{(10.21)}\\ &= \argmax_{\boldsymbol{\theta}} \sum_{i = 1}^I \log Pr(\mathbf{w}_{i1} \mid \boldsymbol{\theta}) + \sum_{n = 1}^N \log Pr(\mathbf{x}_{in} \mid \mathbf{w}_{in}, \boldsymbol{\theta}) + \sum_{n = 2}^N \log Pr(\mathbf{w}_{in} \mid \mathbf{w}_{i(n - 1)}, \boldsymbol{\theta})\\ &= \argmax_{\boldsymbol{\theta}} \sum_{i = 1}^I \log \NormDist_{\mathbf{w}_{i1}}\left[ \boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0 \right] + \sum_{n = 1}^N \log \NormDist_{\mathbf{x}_{in}} \left[ \boldsymbol{\mu}_m + \boldsymbol{\Phi} \mathbf{w}_{in}, \boldsymbol{\Sigma}_m \right] + \sum_{n = 2}^N \log \NormDist_{\mathbf{w}_{in}} \left[ \boldsymbol{\mu}_p + \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)}, \boldsymbol{\Sigma}_p \right] & \quad & \text{(i.a), (19.6), (19.8)}\\ &= \argmax_{\boldsymbol{\theta}} -\frac{1}{2} \sum_{i = 1}^I D_w \log 2 \pi + \log \left\vert \boldsymbol{\Sigma}_0 \right\vert + \left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)^\top \boldsymbol{\Sigma}_0^{-1} \left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right) +\\ &\qquad \sum_{n = 1}^N D_m \log 2 \pi + \log \left\vert \boldsymbol{\Sigma}_m \right\vert + \left( \mathbf{x}_{in} - \boldsymbol{\mu}_m - \boldsymbol{\Phi} \mathbf{w}_{in} \right)^\top \boldsymbol{\Sigma}_m^{-1} \left( \mathbf{x}_{in} - \boldsymbol{\mu}_m - \boldsymbol{\Phi} \mathbf{w}_{in} \right) +\\ &\qquad \sum_{n = 2}^N D_p \log 2 \pi + \log \left\vert \boldsymbol{\Sigma}_p \right\vert + \left( \mathbf{w}_{in} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)} \right)^\top \boldsymbol{\Sigma}_p^{-1} \left( \mathbf{w}_{in} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)} \right) & \quad & \text{(5.1)}\end{split}\]

(ii)

This unsupervised learning scenario treats the states \(\{ \mathbf{w}_{in} \}_{i = 1, n = 1}^{I, N}\) as hidden and only the measurements \(\{ \mathbf{x}_{in} \}_{i = 1, n = 1}^{I, N}\) are observed resulting in

\[\begin{split}\hat{\boldsymbol{\theta}} &= \argmax_{\boldsymbol{\theta}} \prod_{i = 1}^I Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N \mid \boldsymbol{\theta})\\ &= \argmax_{\boldsymbol{\theta}} \prod_{i = 1}^I \int Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N, \mathbf{h}_i \mid \boldsymbol{\theta}) d\mathbf{h}_i\end{split}\]

where \(\mathbf{h}_i = \{ \mathbf{w}_{in} \}_{n = 1}^N\), which can be solved using the EM algorithm [Par].

The E-step consists of computing the posterior distribution over the states for each time sequence

\[\begin{split}q_i(\mathbf{h}_i) &= Pr(\mathbf{h}_i \mid \{ \mathbf{x}_{in} \}_{n = 1}^N, \boldsymbol{\theta})\\ &= Pr(\{ \mathbf{w}_{in} \}_{n = 1}^N | \{ \mathbf{x}_{in} \}_{n = 1}^N, \boldsymbol{\theta})\\ &= Pr(\mathbf{w}_{iN} \mid \{ \mathbf{x}_{in} \}_{n = 1}^N, \boldsymbol{\theta}) \prod_{n = 1}^{N - 1} Pr(\mathbf{w}_{i(N - n)} \mid \mathbf{w}_{i(N - n + 1)}, \{ \mathbf{x}_{in} \}_{n = 1}^{N - n}, \boldsymbol{\theta}) & \quad & \text{Exercise 19.7 (a),}\end{split}\]

which can be computed using the terms that result from running the Kalman filter followed by the Kalman fixed interval smoother. See Exercise 19.7 for more details. It is important to realize that \(q_i(\mathbf{h}_i)\) itself is not used directly in the M-step; the E-step’s purpose is to estimate the expected value and covariance of each hidden variable

\[Pr(\mathbf{w}_t \mid \mathbf{w}_{t + 1}, \mathbf{x}_{1 \ldots t}) = \NormDist_{\mathbf{w}_t}\left[ \boldsymbol{\mu}'_{t + 1}, \boldsymbol{\Sigma}'_{t + 1} \right].\]

Since no prior knowledge can be leveraged besides assuming a Gaussian distribution, the initial parameters can be randomly initialized.

In the M-step, the lower bound is maximized with respect to the parameters \(\boldsymbol{\theta} = \left\{ \boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0, \boldsymbol{\mu}_p, \boldsymbol{\Sigma}_p, \boldsymbol{\Psi}, \boldsymbol{\mu}_m, \boldsymbol{\Sigma}_m, \boldsymbol{\Phi} \right\}\) so that

\[\begin{split}\DeclareMathOperator{\tr}{\mathrm{tr}} \boldsymbol{\theta}^{[t + 1]} &= \argmax_{\boldsymbol{\theta}} \sum_{i = 1}^I \int q_i^{[t]}(\mathbf{h}_i) \log Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N, \mathbf{h}_i \mid \boldsymbol{\theta}) d\mathbf{h}_i & \quad & \text{(7.51)}\\ &= \argmax_{\boldsymbol{\theta}} \sum_{i = 1}^I \E\left[ \log Pr(\{ \mathbf{x}_{in} \}_{n = 1}^N, \{ \mathbf{w}_{in} \}_{n = 1}^N \mid \boldsymbol{\theta}) \right]\\ &= \argmax_{\boldsymbol{\theta}} -\frac{1}{2} \left( C + I \log \left\vert \boldsymbol{\Sigma}_0 \right\vert + I N \log \left\vert \boldsymbol{\Sigma}_m \right\vert + I (N - 1) \log \left\vert \boldsymbol{\Sigma}_p \right\vert + \tr\left( \E[Z] \boldsymbol{\Sigma}_0^{-1} \right) + \tr\left( \E[M] \boldsymbol{\Sigma}_m^{-1} \right) + \tr\left( \E[P] \boldsymbol{\Sigma}_p^{-1} \right) \right) & \quad & \text{(i), (ii.a), (ii.b), (ii.c).}\end{split}\]

(ii.a)

\[C = I D_w \log 2 \pi + I N D_m \log 2 \pi + I (N - 1) D_p \log 2 \pi\]

and

\[\begin{split}\sum_{i = 1}^I \left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)^\top \boldsymbol{\Sigma}_0^{-1} \left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right) &= \tr\left[ \sum_{i = 1}^I \left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right) \left( \mathbf{w}_{i1} - \boldsymbol{\mu}_0 \right)^\top \boldsymbol{\Sigma}_0^{-1} \right] & \quad & \text{(C.14), (C.15)}\\ &= \tr\left[ Z \boldsymbol{\Sigma}_0^{-1} \right]\end{split}\]

(ii.b)

\[\begin{split}& \sum_{i = 1}^I \sum_{n = 1}^N \left( \mathbf{x}_{in} - \boldsymbol{\mu}_m - \boldsymbol{\Phi} \mathbf{w}_{in} \right)^\top \boldsymbol{\Sigma}_m^{-1} \left( \mathbf{x}_{in} - \boldsymbol{\mu}_m - \boldsymbol{\Phi} \mathbf{w}_{in} \right)\\ &= \tr\left[ \sum_{i = 1}^I \sum_{n = 1}^N \left( \mathbf{x}_{in} - \boldsymbol{\mu}_m - \boldsymbol{\Phi} \mathbf{w}_{in} \right) \left( \mathbf{x}_{in} - \boldsymbol{\mu}_m - \boldsymbol{\Phi} \mathbf{w}_{in} \right)^\top \boldsymbol{\Sigma}_m^{-1} \right] & \quad & \text{(C.14), (C.15)}\\ &= \tr\left[ M \boldsymbol{\Sigma}_m^{-1} \right]\end{split}\]

(ii.c)

\[\begin{split}& \sum_{i = 1}^I \sum_{n = 2}^N \left( \mathbf{w}_{in} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)} \right)^\top \boldsymbol{\Sigma}_p^{-1} \left( \mathbf{w}_{in} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)} \right)\\ &= \tr\left[ \sum_{i = 1}^I \sum_{n = 2}^N \left( \mathbf{w}_{in} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)} \right) \left( \mathbf{w}_{in} - \boldsymbol{\mu}_p - \boldsymbol{\Psi} \mathbf{w}_{i(n - 1)} \right)^\top \boldsymbol{\Sigma}_p^{-1} \right] & \quad & \text{(C.14), (C.15)}\\ &= \tr\left[ P \boldsymbol{\Sigma}_p^{-1} \right]\end{split}\]

Exercise 19.9

The mean and covariance of the points are respectively

\[\begin{split}\sum_{j = 0}^{2D_\mathbf{w}} a_j \hat{\mathbf{w}}^{[j]} &= a_0 \boldsymbol{\mu}_{t - 1} + \sum_{j = 1}^{D_\mathbf{w}} \frac{1 - a_0}{2D_\mathbf{w}} \left( \boldsymbol{\mu}_{t - 1} + \sqrt{\frac{D_\mathbf{w}}{1 - a_0}} \boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j \right) +\\ &\qquad \sum_{j = D_\mathbf{w} + 1}^{2D_\mathbf{w}} \frac{1 - a_0}{2D_\mathbf{w}} \left( \boldsymbol{\mu}_{t - 1} - \sqrt{\frac{D_\mathbf{w}}{1 - a_0}} \boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_{j - D_\mathbf{w}} \right) & \quad & \text{(19.40), (19.41)}\\ &= a_0 \boldsymbol{\mu}_{t - 1} + 2 D_\mathbf{w} \frac{1 - a_0}{2D_\mathbf{w}} \boldsymbol{\mu}_{t - 1}\\ &= \boldsymbol{\mu}_{t - 1}\end{split}\]

and

\[\begin{split}\sum_{j = 0}^{2D_\mathbf{w}} a_j \left( \hat{\mathbf{w}}^{[j]} - \boldsymbol{\mu}_{t - 1} \right) \left( \hat{\mathbf{w}}^{[j]} - \boldsymbol{\mu}_{t - 1} \right)^\top &= \sum_{j = 1}^{D_\mathbf{w}} \frac{1 - a_0}{2D_\mathbf{w}} \frac{D_\mathbf{w}}{1 - a_0} \left( \boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j \right) \left( \boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j \right)^\top +\\ &\qquad \sum_{j = D_\mathbf{w} + 1}^{2D_\mathbf{w}} \frac{1 - a_0}{2D_\mathbf{w}} \frac{D_\mathbf{w}}{1 - a_0} \left( -\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_{j - D_\mathbf{w}} \right) \left( -\boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_{j - D_\mathbf{w}} \right)^\top & \quad & \text{(19.40), (19.41)}\\ &= \sum_{j = 1}^{D_\mathbf{w}} \boldsymbol{\Sigma}_{t - 1}^{1 / 2} \mathbf{e}_j \mathbf{e}_j^\top {\boldsymbol{\Sigma}_{t - 1}^{1 / 2}}^\top\\ &= \sum_{j = 1}^{D_\mathbf{w}} \mathbf{U} \boldsymbol{\Lambda}^{1/2} \mathbf{e}_j \mathbf{e}_j^\top \boldsymbol{\Lambda}^{1/2} \mathbf{V}^\top\\ &= \sum_{j = 1}^{D_\mathbf{w}} \lambda_j \mathbf{U}_{\cdot j} \mathbf{V}_{j \cdot}^\top\\ &= \boldsymbol{\Sigma}_{t - 1}\end{split}\]

where the SVD of

\[\begin{split}\boldsymbol{\Sigma}_{t - 1} &= \mathbf{U} \boldsymbol{\Lambda} \mathbf{V}^\top\\ &= \sum_j \lambda_j \mathbf{U}_{\cdot j} \mathbf{V}_{j \cdot}^\top, \\\\ \boldsymbol{\Sigma}_{t - 1}^{1 / 2} &= \mathbf{U} \boldsymbol{\Lambda}^{1 / 2}, \\\\ {\boldsymbol{\Sigma}_{t - 1}^{1 / 2}}^\top &= \boldsymbol{\Lambda}^{1 / 2} \mathbf{V}^\top.\end{split}\]

Exercise 19.20

\[\begin{split}\mathbf{x} &= \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}] & \quad & \text{(19.30)}\\ \begin{bmatrix} x_1\\ y_1\\ x_2\\ y_2 \end{bmatrix} &= \begin{bmatrix} u_1\\ v_1\\ u_2\\ v_2 \end{bmatrix} \frac{1}{1 + w} + \boldsymbol{\epsilon} & \quad & \text{(19.50)}\end{split}\]

[Hoo] has a nice worked out example that makes the following more understandable.

\[\begin{split}\boldsymbol{\Phi} &= \frac{ \partial \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}] }{\partial \mathbf{w}} & \quad & \text{(19.31)}\\ &= \frac{ \partial \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}] }{ \partial \left\{ u_1, v_1, u_2, v_2, w\right\} }\\ &= \frac{1}{1 + w} \begin{bmatrix} 1 & 0 & 0 & 0 & -\frac{u_1}{1 + w}\\ 0 & 1 & 0 & 0 & -\frac{v_1}{1 + w}\\ 0 & 0 & 1 & 0 & -\frac{u_2}{1 + w}\\ 0 & 0 & 0 & 1 & -\frac{v_2}{1 + w} \end{bmatrix} \\\\ \boldsymbol{\Upsilon} &= \frac{ \partial \mathbf{g}[\mathbf{w}, \boldsymbol{\epsilon}] }{ \partial \boldsymbol{\epsilon} }\\ &= \mathbf{I} & \quad & \text{(19.31).}\end{split}\]

References

Arc

Cedric Archambeau. Filtering and smoothing in dynamical systems. http://www0.cs.ucl.ac.uk/staff/C.Archambeau/ATML/atml_files/atml08_lect2_dynsyst.pdf. Accessed on 2017-08-03.

BMM96

Gary D Brushe, Robert E Mahony, and John B Moore. A forward backward algorithm for ml state and sequence estimation. In ISSPA, 224–227. 1996.

Fle

Tristan Fletcher. The kalman filter explained. https://tristan-fletcher-fdxe.squarespace.com/s/LDS-87ae.pdf. Accessed on 2017-08-02.

Hoo

Adam Hoover. Extended kalman filter. http://www.ces.clemson.edu/ ahoover/ece854/lecture-notes/lecture-ekf.pdf. Accessed on 2017-08-02.

JB01

Michael I Jordan and Chris Bishop. An introduction to graphical models. unpublished book, 2001. pg. 40.

Mac

Lester Mackey. Linear gaussian state space model. http://web.stanford.edu/ lmackey/stats306b/doc/stats306b-spring14-lecture11_scribed.pdf. Accessed on 2017-08-03.

MBM96

Robert E Mahony, Gary D Brushe, and John B Moore. Hybrid algorithms for maximum likelihood and maximum a posterior sequence estimation. In ISSPA, 451–454. 1996.

Par

Lucas C. Parra. Hidden markov model kalman filter. http://bme.ccny.cuny.edu/faculty/parra/teaching/biomed-dsp/class10.pdf. Accessed on 2017-08-03.

Sch

Sandro Schonborn. Graphical models: sum-product algorithm. http://cs-wwwarchiv.cs.unibas.ch/lehre/hs11/cs351/_Slides/Schoenborn_SumProduct.pdf. Accessed on 2017-08-02.