Common Probability Distributions
When fitting probability models to data, it is necessary to know the
uncertainty of the fit.
If the posterior distributions are in the same family as the prior
distributions, then the prior and the posterior are called conjugate
distributions.
The prior then is known as the conjugate prior.
When a distribution is multiplied with its conjugate, the result is
proportional to a new distribution which has the same form as the conjugate.
Exercise 3.1
Let \(x\) be a Bernoulli distributed random variable parameterized by
\(\lambda\). The mean and variance are defined as
\[\begin{split}\DeclareMathOperator{\BernDist}{Bern}
\mathrm{E}[X] &= \sum_{x \in X} x \BernDist_x[\lambda]\\
&= 0 (1 - \lambda) + 1 (\lambda)\\
&= \lambda\end{split}\]
and
\[\begin{split}\mathrm{E}\left[ (X - \mathrm{E}[X])^2 \right]
&= \mathrm{E}[X^2] - \mathrm{E}[X]^2
& \quad & \text{Exercise 2.10}\\
&= \sum_{x \in X} x^2 \BernDist_x[\lambda] - \lambda^2
& \quad & \text{(2.12) where } f(X) \mapsto f[x] = x^2\\
&= 0^2 (1 - \lambda) + 1^2 (\lambda) - \lambda^2\\
&= \lambda (1 - \lambda).\end{split}\]
See Exercise 2.10 for more details.
Exercise 3.2
A useful fact is the relationship between the beta function and the gamma
function:
\[\begin{split}B(x, y)
&= \int_0^1 t^{x - 1} (1 - t)^{y - 1} dt\\
&= \frac{\Gamma[x] \Gamma[y]}{\Gamma[x + y]}
& \quad & \text{convolution integral property}\end{split}\]
where \(\Gamma[z] = \int_0^\infty t^{z - 1} e^{-t} dt\).
Finding the mode (position of the peak) of the beta distribution is equivalent
to finding the parameter that maximizes the beta distribution:
\[\begin{split}\DeclareMathOperator{\BetaDist}{Beta}
\frac{\partial}{\partial \lambda} \BetaDist_\lambda[\alpha, \beta]
&= B(\alpha, \beta)^{-1} \frac{\partial}{\partial \lambda}
\left(
\lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}
\right)\\
0 &= B(\alpha, \beta)^{-1}
\left[
(\alpha - 1) (1) \lambda^{\alpha - 2} (1 - \lambda)^{\beta - 1} +
(\beta - 1) (-1) \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 2}
\right]\\
0 &= \lambda^{\alpha - 2} (1 - \lambda)^{\beta - 2}
\left[ (\alpha - 1) (1 - \lambda) - (\beta - 1) \lambda \right]\\
0 &= -\lambda \left[ (\alpha - 1) + (\beta - 1) \right] + (\alpha - 1)\\
\lambda &= \frac{\alpha - 1}{\alpha + \beta - 2}\end{split}\]
Exercise 3.3
Notice that \(1 - \mu = \frac{\beta}{\alpha + \beta}\) and
\(\sigma^2 = \frac{\mu (1 - \mu)}{\alpha + \beta + 1} \iff
\frac{\mu (1 - \mu)}{\sigma^2} = \alpha + \beta + 1\).
\[\alpha = \mu \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right)\]
\[\begin{split}\beta &= \frac{\mu (1 - \mu)}{\sigma^2} - 1 - \alpha\\
&= \frac{\mu (1 - \mu)}{\sigma^2} - 1 -
\mu \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right)\\
&= (1 - \mu) \frac{\mu (1 - \mu)}{\sigma^2} + \mu - 1\\
&= (1 - \mu) \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right)\end{split}\]
Exercise 3.4
Recall that \(\BetaDist_\lambda[\alpha, \beta] =
B(\alpha, \beta)^{-1} \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}\). Let
\(x\) denote \(\lambda\) for
\[\begin{split}Pr(x \mid \boldsymbol{\theta})
&= a[x] \exp\left(
\mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] -
d[\boldsymbol{\theta}]
\right)\\
&= a[x]
\exp\left( \mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] \right)
\exp\left( -d[\boldsymbol{\theta}] \right).\end{split}\]
The Beta distribution can be represented in the generalized form of the
exponential family as follows:
\[\begin{split}\begin{gather*}
a[x] = 1\\
\mathbf{b}[\boldsymbol{\theta}] =
\begin{bmatrix} \alpha - 1\\ \beta - 1 \end{bmatrix}\\
\mathbf{c}[x] = \begin{bmatrix} \log(x)\\ \log(1 - x) \end{bmatrix}\\
d[\boldsymbol{\theta}] = \log{B(\alpha, \beta)}.
\end{gather*}\end{split}\]
Exercise 3.5
Given \(z > 0\),
\[\begin{split}\Gamma[z + 1]
&= \int_0^\infty t^{(z + 1) - 1} e^{-t} dt\\
&= \left[ -t^z e^{-t} \right]_0^\infty -
\int_0^\infty -e^{-t} z t^{z - 1} dt
& \quad & \text{integration by parts with }
u = t^z, dv = e^{-t} dt, v = -e^{-t}, du = z t^{z - 1} dt\\
&= z \Gamma[z]\end{split}\]
because
\[\begin{split}\left[ -t^z e^{-t} \right]_0^\infty
&= \lim_{t \rightarrow \infty} -t^z e^{-t} - \left( -(0)^z e^{-0} \right)\\
&= \lim_{t \rightarrow \infty} -\frac{t^z}{e^t}
& \quad & \text{convert to indeterminate form } \frac{\infty}{\infty}\\
&= \lim_{t \rightarrow \infty} -\frac{z!}{e^t}
& \quad & \text{apply L'Hopital's rule } z + 1 \text{ times}\\
&= 0.\end{split}\]
Exercise 3.6
\[\begin{split}\DeclareMathOperator{\NormDist}{Norm}
Pr(x \mid \mu) Pr(\mu)
&= \NormDist_x[\mu, 1.0] \NormDist_\mu[\mu_p, \sigma_p^2]\\
&= \frac{1}{\sqrt{2 \pi}}
\exp\left[-0.5 (x - \mu)^2 \right]
\frac{1}{\sqrt{2 \pi \sigma_p^2}}
\exp\left[-0.5 \frac{(\mu - \mu_p)^2}{\sigma_p^2}\right]\\
&= \frac{1}{2 \pi \sigma}
\exp\left[
-0.5 \left(
x^2 - 2 \mu x + \mu^2 +
\frac{\mu^2}{\sigma_p^2} -
\frac{2 \mu \mu_p}{\sigma_p^2} +
\frac{\mu_p^2}{\sigma_p^2}
\right)
\right]\\
&= \frac{1}{2 \pi \sigma}
\exp\left[ -0.5 \left( x^2 + \frac{\mu_p^2}{\sigma_p^2} \right) \right]
\exp\left[
-0.5 \left(
\mu^2 + \frac{\mu^2}{\sigma_p^2} -
2 \mu x - \frac{2 \mu \mu_p}{\sigma_p^2}
\right)
\right]\\
&= \kappa_1
\exp\left[
-0.5 \left(
\frac{\sigma_p^2 + 1}{\sigma_p^2} \mu^2 -
2 \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2} \mu
\right)
\right]\\
&= \kappa_1
\exp\left[
\frac{-0.5 (\sigma_p^2 + 1)}{\sigma_p^2}
\left(
\mu^2 - 2 \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \mu
\right)
\right]
& \quad & \text{set up for completing the square}\\
&= \kappa_1
\exp\left[
\frac{-0.5}{\sigma_p^2 (\sigma_p^2 + 1)^{-1}}
\left(
\mu^2 - 2 \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \mu +
\left( \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \right)^2
\right)
\right]
\exp\left[
\frac{0.5}{\sigma_p^2 (\sigma_p^2 + 1)^{-1}}
\left( \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \right)^2
\right]\\
&= \kappa_2
\exp\left[
-0.5
\frac{
\left( \mu - \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \right)^2
}{
\sigma_p^2 (\sigma_p^2 + 1)^{-1}
}
\right]\\
&= \kappa_3 \NormDist_\mu \left[
\frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1},
\sigma_p^2 (\sigma_p^2 + 1)^{-1}
\right]\end{split}\]
Exercise 3.7
Recall that
\[\begin{split}\NormDist_x[\mu, \sigma^2]
&= \frac{1}{\sqrt{2 \pi \sigma^2}}
\exp\left[ -0.5 \frac{(x - \mu)^2}{\sigma^2} \right]\\
&= \frac{1}{\sqrt{2 \pi \sigma^2}}
\exp\left[ \frac{-0.5}{\sigma^2} (x^2 - 2 x \mu + \mu^2) \right].\end{split}\]
The univariate normal distribution can be represented in the generalized form of
the exponential family as follows:
\[\begin{split}Pr(x \mid \boldsymbol{\theta})
&= a[x] \exp\left(
\mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] -
d[\boldsymbol{\theta}]
\right)\\
&= a[x]
\exp\left(
\mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x]
\right)
\exp\left( -d[\boldsymbol{\theta}] \right)\end{split}\]
where
\[\begin{split}\begin{gather*}
a[x] = 1\\
\mathbf{b}[\boldsymbol{\theta}] =
-\frac{0.5}{\sigma^2} \begin{bmatrix} 1\\ -2 \mu \end{bmatrix}\\
\mathbf{c}[x] = \begin{bmatrix} x^2\\ x \end{bmatrix}\\
d[\boldsymbol{\theta}] =
0.5 \frac{\mu^2}{\sigma^2} + \log{\sqrt{2 \pi \sigma^2}}.
\end{gather*}\end{split}\]
Exercise 3.8
Finding the mode (position of the peak) of the normal scaled inverse gamma
distribution is equivalent to finding its maximum.
\[\begin{split}\DeclareMathOperator{\NormInvGamDist}{NormInvGam}
\frac{\partial}{\partial \mu}
\NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta]
&= \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\alpha + 1}
\frac{\partial}{\partial \mu} \exp\left[
-\frac{2\beta + \gamma(\delta - \mu)^2}{2 \sigma^2}
\right]\\
0 &= \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta]
\left( \frac{\gamma (\delta - \mu)}{\sigma^2} \right)\\
\mu &= \delta\end{split}\]
\[\begin{split}\frac{\partial}{\partial \sigma}
\NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta]
&= \frac{\beta^\alpha}{\Gamma[\alpha]} \sqrt{\frac{\gamma}{2 \pi}}
\frac{\partial}{\partial \sigma} \left[
\sigma^{-2 \alpha - 3}
\exp\left[
-\frac{2\beta + \gamma(\delta - \mu)^2}{2 \sigma^2}
\right]
\right]\\
0 &= (-2 \alpha - 3) \sigma^{-1}
\NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta] +
\NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta]
\left( \frac{2\beta + \gamma(\delta - \mu)^2}{\sigma^3} \right)\\
&= (-2 \alpha - 3) \sigma^2 +
\left( 2\beta + \gamma(\delta - \mu)^2 \right)\\
\sigma^2 &= \frac{2\beta + \gamma(\delta - \mu)^2}{2 \alpha + 3}\end{split}\]
Exercise 3.9
\[\begin{split}\prod_{i = 1}^I \text{Bern}_{x_i}[\lambda]
\cdot \text{Beta}_\lambda[\alpha, \beta]
&= \prod_{i = 1}^I \lambda^{x_i} (1 - \lambda)^{1 - x_i} \cdot
\frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]}
\lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}\\
&= \lambda^{\sum_i x_i} (1 - \lambda)^{\sum_i 1 - x_i}
\frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]}
\lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}\\
&= \frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]}
\lambda^{\alpha - 1 + \sum_i x_i}
(1 - \lambda)^{\beta - 1 + I - \sum x_i}\\
&= \frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]}
\frac{
\Gamma\left[ \alpha + \sum_i x_i \right]
\Gamma\left[ \beta + I - \sum_i x_i \right]
}{
\Gamma[\alpha + \beta + I]
}
\BetaDist_\lambda\left[
\alpha + \sum_i x_i, \beta + I - \sum_i x_i
\right]\end{split}\]
Exercise 3.10
\[\begin{split}\DeclareMathOperator{\CatDist}{Cat}
\DeclareMathOperator{\DirDist}{Dir}
\prod_{i = 1}^I \CatDist_{x_i}[\lambda_{1 \ldots K}] \cdot
\DirDist_{\lambda_{1 \ldots K}}[\alpha_{1 \ldots K}]
&= \prod_{i = 1}^I \prod_{j = 1}^K \lambda_j^{x_{ij}} \cdot
\frac{
\Gamma\left[ \sum_{j = 1}^K \alpha_j \right]
}{
\prod_{j = 1}^K \Gamma[\alpha_j]
}
\prod_{j = 1}^K \lambda_j^{\alpha_j - 1}\\
&= \frac{
\Gamma\left[ \sum_{j = 1}^K \alpha_j \right]
}{
\prod_{j = 1}^K \Gamma[\alpha_j]
}
\prod_{j = 1}^K \lambda_j^{\alpha_j - 1 + N_j}
& \quad & N_j = \sum_{i = 1}^I x_{ij}\\
&= \frac{
\Gamma\left[ \sum_{j = 1}^K \alpha_j \right]
}{
\prod_{j = 1}^K \Gamma[\alpha_j]
}
\frac{
\prod_{j = 1}^K \Gamma[\alpha_j + N_j]
}{
\Gamma\left[ \sum_{j = 1}^K \alpha_j + N_j \right]
}
\DirDist_{\lambda_{1 \ldots K}}[\alpha_1 + N_1, \ldots, \alpha_K + N_K]\\
&= \frac{
\Gamma\left[ \sum_{j = 1}^K \alpha_j \right]
}{
\prod_{j = 1}^K \Gamma[\alpha_j]
}
\frac{
\prod_{j = 1}^K \Gamma[\alpha_j + N_j]
}{
\Gamma\left[ I + \sum_{j = 1}^K \alpha_j \right]
}
\DirDist_{\lambda_{1 \ldots K}}[\alpha_1 + N_1, \ldots, \alpha_K + N_K]
& \quad & \sum_j N_j = I\end{split}\]
Exercise 3.11
\[\begin{split}& \prod_{i = 1}^I \text{Norm}_{x_i}[\mu, \sigma^2] \cdot
\NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta]\\
&= \prod_{i = 1}^I
\frac{1}{\sqrt{2 \pi \sigma^2}}
\exp\left[ -\frac{(x_i - \mu)^2}{2 \sigma^2} \right] \cdot
\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\alpha + 1}
\exp\left[
-\frac{2\beta + \gamma(\delta - \mu)^2}{2 \sigma^2}
\right]\\
&= \left( 2 \pi \sigma^2 \right)^{-I / 2}
\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\alpha + 1}
\exp\left[
-\frac{1}{2 \sigma^2}
\left(
2\beta + \gamma(\delta - \mu)^2 + \sum_i (x_i - \mu)^2
\right)
\right]\\
&= \frac{1}{(2 \pi)^{I / 2}}
\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\alpha + \frac{I}{2} + 1}
\exp\left[
-\frac{1}{2 \sigma^2}
\left(
2\beta + \gamma \delta^2 - 2 \gamma \delta \mu + \gamma \mu^2 +
\sum_i x_i^2 - 2 \mu \sum_i x_i + I \mu^2
\right)
\right]\\
&= \frac{1}{(2 \pi)^{I / 2}}
\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\alpha + \frac{I}{2} + 1}
\exp\left[
-\frac{1}{2 \sigma^2}
\left(
2\beta + \sum_i x_i^2 + \gamma \delta^2 - 2 \gamma \delta \mu -
2 \mu \sum_i x_i + \tilde{\gamma} \mu^2
\right)
\right]
& \quad & \text{swapped in } \tilde{\gamma}\\
&= \frac{1}{(2 \pi)^{I / 2}}
\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\alpha + \frac{I}{2} + 1}
\exp\left[
-\frac{1}{2 \sigma^2}
\left(
2 \tilde{\beta} +
\frac{(\gamma \delta + \sum_i x_i)^2}{\tilde{\gamma}} -
2 \gamma \delta \mu - 2 \mu \sum_i x_i + \tilde{\gamma} \mu^2
\right)
\right]
& \quad & \text{swapped in } \tilde{\beta}\\
&= \frac{1}{(2 \pi)^{I / 2}}
\frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}}
\frac{\beta^\alpha}{\Gamma[\alpha]}
\left( \frac{1}{\sigma^2} \right)^{\tilde{\alpha} + 1}
\exp\left[
-\frac{2 \tilde{\beta} + \tilde{\gamma} (\tilde{\delta} - \mu)^2}
{2 \sigma^2}
\right]
& \quad & \text{swapped in } \tilde{\alpha} \text{ and } \tilde{\delta}\\
&= \kappa \NormInvGamDist_{\mu, \sigma^2}\left[
\tilde{\alpha}, \tilde{\beta}, \tilde{\gamma}, \tilde{\delta}
\right]
& \quad & \text{swapped in } \kappa\end{split}\]
Exercise 3.12
A useful trace property is
\[\DeclareMathOperator{\tr}{\mathrm{tr}}
\tr\left[ z z^\top A^{-1} \right] = z^\top A^{-1} z.\]
\[\begin{split}\DeclareMathOperator{\NorIWisDist}{NorIWis}
& \prod_{i = 1}^I \NormDist_{x_i}[\boldsymbol{\mu}, \boldsymbol{\Sigma}]
\cdot
\NorIWisDist_{\boldsymbol{\mu}, \boldsymbol{\Sigma}}[
\alpha, \boldsymbol{\Psi}, \gamma, \boldsymbol{\delta}
]\\
&= \prod_{i = 1}^I
\frac{1}{
(2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{1/2}
}
\exp\left[
-0.5 (\mathbf{x}_i - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1}
(\mathbf{x}_i - \boldsymbol{\mu})
\right]
\cdot
\frac{
\gamma^{D / 2} \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2}
\exp\left[
-0.5 \left(
\tr\left[ \boldsymbol{\Psi} \boldsymbol{\Sigma}^{-1} \right] +
\gamma (\boldsymbol{\mu} - \boldsymbol{\delta})^\top
\boldsymbol{\Sigma}^{-1}
(\boldsymbol{\mu} - \boldsymbol{\delta})
\right)
\right]
}{
2^{\alpha D / 2} (2 \pi)^{D / 2}
\left\vert \boldsymbol{\Sigma} \right\vert^{(\alpha + D + 2) / 2}
\Gamma_D[\alpha / 2]
}\end{split}\]
\[\begin{split}&= \frac{
\left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2}
}{
\pi^{ID / 2} \Gamma_D[\alpha / 2]
}
\frac{
\exp\left[
-0.5 \left(
\tr\left[ \boldsymbol{\Psi} \boldsymbol{\Sigma}^{-1} \right] +
\gamma (\boldsymbol{\mu} - \boldsymbol{\delta})^\top
\boldsymbol{\Sigma}^{-1}
(\boldsymbol{\mu} - \boldsymbol{\delta}) +
\sum_i
(\mathbf{x}_i - \boldsymbol{\mu})^\top
\boldsymbol{\Sigma}^{-1}
(\mathbf{x}_i - \boldsymbol{\mu})
\right)
\right]
}{
2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2}
\left\vert
\boldsymbol{\Sigma}
\right\vert^{(\tilde{\alpha} + D + 2) / 2}
}
& \quad & \text{swapped in } \tilde{\alpha}\\
&= \frac{
\left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2}
}{
\pi^{ID / 2} \Gamma_D[\alpha / 2]
}
\frac{
\exp\left[
-0.5 \left(
\tr\left[ \boldsymbol{\Psi} \boldsymbol{\Sigma}^{-1} \right] +
\gamma \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1}
\boldsymbol{\mu} -
2 \gamma \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1}
\boldsymbol{\delta} +
\gamma \boldsymbol{\delta}^\top \boldsymbol{\Sigma}^{-1}
\boldsymbol{\delta} +
\sum_i
\mathbf{x}_i^\top \boldsymbol{\Sigma}^{-1} \mathbf{x}_i -
2 \sum_i
\boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \mathbf{x}_i +
I \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}
\right)
\right]
}{
2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2}
\left\vert
\boldsymbol{\Sigma}
\right\vert^{(\tilde{\alpha} + D + 2) / 2}
}\\
&= \frac{
\left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2}
}{
\pi^{ID / 2} \Gamma_D[\alpha / 2]
}
\frac{
\exp\left[
-0.5 \left(
\tr\left[
\left(
\boldsymbol{\Psi} +
\gamma \boldsymbol{\delta} \boldsymbol{\delta}^\top +
\sum_i \mathbf{x}_i \mathbf{x}_i^\top
\right)
\boldsymbol{\Sigma}^{-1}
\right] +
\tilde{\gamma} \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1}
\boldsymbol{\mu} -
2 \gamma \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1}
\boldsymbol{\delta} -
2 \sum_i \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1}
\mathbf{x}_i
\right)
\right]
}{
2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2}
\left\vert
\boldsymbol{\Sigma}
\right\vert^{(\tilde{\alpha} + D + 2) / 2}
}
& \quad & \text{swapped in } \tilde{\gamma}\\
&= \frac{
\left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2}
}{
\pi^{ID / 2} \Gamma_D[\alpha / 2]
}
\frac{
\exp\left[
-0.5 \left(
\tr\left[
\left(
\boldsymbol{\Psi} +
\gamma \boldsymbol{\delta} \boldsymbol{\delta}^\top +
\sum_i \mathbf{x}_i \mathbf{x}_i^\top
\right)
\boldsymbol{\Sigma}^{-1}
\right] +
\tilde{\gamma}
(\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}})^\top
\boldsymbol{\Sigma}^{-1}
(\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}}) -
\tilde{\gamma} \tilde{\boldsymbol{\delta}}^\top
\boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\delta}}
\right)
\right]
}{
2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2}
\left\vert
\boldsymbol{\Sigma}
\right\vert^{(\tilde{\alpha} + D + 2) / 2}
}
& \quad & \text{swapped in } \tilde{\boldsymbol{\delta}}\\
&= \frac{
\left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2}
}{
\pi^{ID / 2} \Gamma_D[\alpha / 2]
}
\frac{
\exp\left[
-0.5 \left(
\tr\left[
\tilde{\boldsymbol{\Psi}} \boldsymbol{\Sigma}^{-1}
\right] +
\tilde{\gamma}
(\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}})^\top
\boldsymbol{\Sigma}^{-1}
(\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}})
\right)
\right]
}{
2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2}
\left\vert
\boldsymbol{\Sigma}
\right\vert^{(\tilde{\alpha} + D + 2) / 2}
}
& \quad & \text{swapped in } \tilde{\boldsymbol{\Psi}}\\
&= \kappa \NorIWisDist_{\boldsymbol{\mu}, \boldsymbol{\Sigma}}\left[
\tilde{\alpha},
\tilde{\boldsymbol{\Psi}},
\tilde{\gamma},
\tilde{\boldsymbol{\delta}}
\right]
& \quad & \text{swapped in } \kappa\end{split}\]