Common Probability Distributions

  • When fitting probability models to data, it is necessary to know the uncertainty of the fit.

    • This uncertainty is represented as a probability distribution over the parameters of the fitted model.

    • See Table 3.1 and 3.1 for a summary.

  • If the posterior distributions are in the same family as the prior distributions, then the prior and the posterior are called conjugate distributions.

    • The prior then is known as the conjugate prior.

    • When a distribution is multiplied with its conjugate, the result is proportional to a new distribution which has the same form as the conjugate.

Exercise 3.1

Let \(x\) be a Bernoulli distributed random variable parameterized by \(\lambda\). The mean and variance are defined as

\[\begin{split}\DeclareMathOperator{\BernDist}{Bern} \mathrm{E}[X] &= \sum_{x \in X} x \BernDist_x[\lambda]\\ &= 0 (1 - \lambda) + 1 (\lambda)\\ &= \lambda\end{split}\]

and

\[\begin{split}\mathrm{E}\left[ (X - \mathrm{E}[X])^2 \right] &= \mathrm{E}[X^2] - \mathrm{E}[X]^2 & \quad & \text{Exercise 2.10}\\ &= \sum_{x \in X} x^2 \BernDist_x[\lambda] - \lambda^2 & \quad & \text{(2.12) where } f(X) \mapsto f[x] = x^2\\ &= 0^2 (1 - \lambda) + 1^2 (\lambda) - \lambda^2\\ &= \lambda (1 - \lambda).\end{split}\]

See Exercise 2.10 for more details.

Exercise 3.2

A useful fact is the relationship between the beta function and the gamma function:

\[\begin{split}B(x, y) &= \int_0^1 t^{x - 1} (1 - t)^{y - 1} dt\\ &= \frac{\Gamma[x] \Gamma[y]}{\Gamma[x + y]} & \quad & \text{convolution integral property}\end{split}\]

where \(\Gamma[z] = \int_0^\infty t^{z - 1} e^{-t} dt\).

Finding the mode (position of the peak) of the beta distribution is equivalent to finding the parameter that maximizes the beta distribution:

\[\begin{split}\DeclareMathOperator{\BetaDist}{Beta} \frac{\partial}{\partial \lambda} \BetaDist_\lambda[\alpha, \beta] &= B(\alpha, \beta)^{-1} \frac{\partial}{\partial \lambda} \left( \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1} \right)\\ 0 &= B(\alpha, \beta)^{-1} \left[ (\alpha - 1) (1) \lambda^{\alpha - 2} (1 - \lambda)^{\beta - 1} + (\beta - 1) (-1) \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 2} \right]\\ 0 &= \lambda^{\alpha - 2} (1 - \lambda)^{\beta - 2} \left[ (\alpha - 1) (1 - \lambda) - (\beta - 1) \lambda \right]\\ 0 &= -\lambda \left[ (\alpha - 1) + (\beta - 1) \right] + (\alpha - 1)\\ \lambda &= \frac{\alpha - 1}{\alpha + \beta - 2}\end{split}\]

Exercise 3.3

Notice that \(1 - \mu = \frac{\beta}{\alpha + \beta}\) and \(\sigma^2 = \frac{\mu (1 - \mu)}{\alpha + \beta + 1} \iff \frac{\mu (1 - \mu)}{\sigma^2} = \alpha + \beta + 1\).

\[\alpha = \mu \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right)\]
\[\begin{split}\beta &= \frac{\mu (1 - \mu)}{\sigma^2} - 1 - \alpha\\ &= \frac{\mu (1 - \mu)}{\sigma^2} - 1 - \mu \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right)\\ &= (1 - \mu) \frac{\mu (1 - \mu)}{\sigma^2} + \mu - 1\\ &= (1 - \mu) \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right)\end{split}\]

Exercise 3.4

Recall that \(\BetaDist_\lambda[\alpha, \beta] = B(\alpha, \beta)^{-1} \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}\). Let \(x\) denote \(\lambda\) for

\[\begin{split}Pr(x \mid \boldsymbol{\theta}) &= a[x] \exp\left( \mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] - d[\boldsymbol{\theta}] \right)\\ &= a[x] \exp\left( \mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] \right) \exp\left( -d[\boldsymbol{\theta}] \right).\end{split}\]

The Beta distribution can be represented in the generalized form of the exponential family as follows:

\[\begin{split}\begin{gather*} a[x] = 1\\ \mathbf{b}[\boldsymbol{\theta}] = \begin{bmatrix} \alpha - 1\\ \beta - 1 \end{bmatrix}\\ \mathbf{c}[x] = \begin{bmatrix} \log(x)\\ \log(1 - x) \end{bmatrix}\\ d[\boldsymbol{\theta}] = \log{B(\alpha, \beta)}. \end{gather*}\end{split}\]

Exercise 3.5

Given \(z > 0\),

\[\begin{split}\Gamma[z + 1] &= \int_0^\infty t^{(z + 1) - 1} e^{-t} dt\\ &= \left[ -t^z e^{-t} \right]_0^\infty - \int_0^\infty -e^{-t} z t^{z - 1} dt & \quad & \text{integration by parts with } u = t^z, dv = e^{-t} dt, v = -e^{-t}, du = z t^{z - 1} dt\\ &= z \Gamma[z]\end{split}\]

because

\[\begin{split}\left[ -t^z e^{-t} \right]_0^\infty &= \lim_{t \rightarrow \infty} -t^z e^{-t} - \left( -(0)^z e^{-0} \right)\\ &= \lim_{t \rightarrow \infty} -\frac{t^z}{e^t} & \quad & \text{convert to indeterminate form } \frac{\infty}{\infty}\\ &= \lim_{t \rightarrow \infty} -\frac{z!}{e^t} & \quad & \text{apply L'Hopital's rule } z + 1 \text{ times}\\ &= 0.\end{split}\]

Exercise 3.6

\[\begin{split}\DeclareMathOperator{\NormDist}{Norm} Pr(x \mid \mu) Pr(\mu) &= \NormDist_x[\mu, 1.0] \NormDist_\mu[\mu_p, \sigma_p^2]\\ &= \frac{1}{\sqrt{2 \pi}} \exp\left[-0.5 (x - \mu)^2 \right] \frac{1}{\sqrt{2 \pi \sigma_p^2}} \exp\left[-0.5 \frac{(\mu - \mu_p)^2}{\sigma_p^2}\right]\\ &= \frac{1}{2 \pi \sigma} \exp\left[ -0.5 \left( x^2 - 2 \mu x + \mu^2 + \frac{\mu^2}{\sigma_p^2} - \frac{2 \mu \mu_p}{\sigma_p^2} + \frac{\mu_p^2}{\sigma_p^2} \right) \right]\\ &= \frac{1}{2 \pi \sigma} \exp\left[ -0.5 \left( x^2 + \frac{\mu_p^2}{\sigma_p^2} \right) \right] \exp\left[ -0.5 \left( \mu^2 + \frac{\mu^2}{\sigma_p^2} - 2 \mu x - \frac{2 \mu \mu_p}{\sigma_p^2} \right) \right]\\ &= \kappa_1 \exp\left[ -0.5 \left( \frac{\sigma_p^2 + 1}{\sigma_p^2} \mu^2 - 2 \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2} \mu \right) \right]\\ &= \kappa_1 \exp\left[ \frac{-0.5 (\sigma_p^2 + 1)}{\sigma_p^2} \left( \mu^2 - 2 \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \mu \right) \right] & \quad & \text{set up for completing the square}\\ &= \kappa_1 \exp\left[ \frac{-0.5}{\sigma_p^2 (\sigma_p^2 + 1)^{-1}} \left( \mu^2 - 2 \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \mu + \left( \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \right)^2 \right) \right] \exp\left[ \frac{0.5}{\sigma_p^2 (\sigma_p^2 + 1)^{-1}} \left( \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \right)^2 \right]\\ &= \kappa_2 \exp\left[ -0.5 \frac{ \left( \mu - \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1} \right)^2 }{ \sigma_p^2 (\sigma_p^2 + 1)^{-1} } \right]\\ &= \kappa_3 \NormDist_\mu \left[ \frac{\sigma_p^2 x + \mu_p}{\sigma_p^2 + 1}, \sigma_p^2 (\sigma_p^2 + 1)^{-1} \right]\end{split}\]

Exercise 3.7

Recall that

\[\begin{split}\NormDist_x[\mu, \sigma^2] &= \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left[ -0.5 \frac{(x - \mu)^2}{\sigma^2} \right]\\ &= \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left[ \frac{-0.5}{\sigma^2} (x^2 - 2 x \mu + \mu^2) \right].\end{split}\]

The univariate normal distribution can be represented in the generalized form of the exponential family as follows:

\[\begin{split}Pr(x \mid \boldsymbol{\theta}) &= a[x] \exp\left( \mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] - d[\boldsymbol{\theta}] \right)\\ &= a[x] \exp\left( \mathbf{b}[\boldsymbol{\theta}]^\top \mathbf{c}[x] \right) \exp\left( -d[\boldsymbol{\theta}] \right)\end{split}\]

where

\[\begin{split}\begin{gather*} a[x] = 1\\ \mathbf{b}[\boldsymbol{\theta}] = -\frac{0.5}{\sigma^2} \begin{bmatrix} 1\\ -2 \mu \end{bmatrix}\\ \mathbf{c}[x] = \begin{bmatrix} x^2\\ x \end{bmatrix}\\ d[\boldsymbol{\theta}] = 0.5 \frac{\mu^2}{\sigma^2} + \log{\sqrt{2 \pi \sigma^2}}. \end{gather*}\end{split}\]

Exercise 3.8

Finding the mode (position of the peak) of the normal scaled inverse gamma distribution is equivalent to finding its maximum.

\[\begin{split}\DeclareMathOperator{\NormInvGamDist}{NormInvGam} \frac{\partial}{\partial \mu} \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta] &= \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\alpha + 1} \frac{\partial}{\partial \mu} \exp\left[ -\frac{2\beta + \gamma(\delta - \mu)^2}{2 \sigma^2} \right]\\ 0 &= \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta] \left( \frac{\gamma (\delta - \mu)}{\sigma^2} \right)\\ \mu &= \delta\end{split}\]
\[\begin{split}\frac{\partial}{\partial \sigma} \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta] &= \frac{\beta^\alpha}{\Gamma[\alpha]} \sqrt{\frac{\gamma}{2 \pi}} \frac{\partial}{\partial \sigma} \left[ \sigma^{-2 \alpha - 3} \exp\left[ -\frac{2\beta + \gamma(\delta - \mu)^2}{2 \sigma^2} \right] \right]\\ 0 &= (-2 \alpha - 3) \sigma^{-1} \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta] + \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta] \left( \frac{2\beta + \gamma(\delta - \mu)^2}{\sigma^3} \right)\\ &= (-2 \alpha - 3) \sigma^2 + \left( 2\beta + \gamma(\delta - \mu)^2 \right)\\ \sigma^2 &= \frac{2\beta + \gamma(\delta - \mu)^2}{2 \alpha + 3}\end{split}\]

Exercise 3.9

\[\begin{split}\prod_{i = 1}^I \text{Bern}_{x_i}[\lambda] \cdot \text{Beta}_\lambda[\alpha, \beta] &= \prod_{i = 1}^I \lambda^{x_i} (1 - \lambda)^{1 - x_i} \cdot \frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]} \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}\\ &= \lambda^{\sum_i x_i} (1 - \lambda)^{\sum_i 1 - x_i} \frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]} \lambda^{\alpha - 1} (1 - \lambda)^{\beta - 1}\\ &= \frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]} \lambda^{\alpha - 1 + \sum_i x_i} (1 - \lambda)^{\beta - 1 + I - \sum x_i}\\ &= \frac{\Gamma[\alpha + \beta]}{\Gamma[\alpha] \Gamma[\beta]} \frac{ \Gamma\left[ \alpha + \sum_i x_i \right] \Gamma\left[ \beta + I - \sum_i x_i \right] }{ \Gamma[\alpha + \beta + I] } \BetaDist_\lambda\left[ \alpha + \sum_i x_i, \beta + I - \sum_i x_i \right]\end{split}\]

Exercise 3.10

\[\begin{split}\DeclareMathOperator{\CatDist}{Cat} \DeclareMathOperator{\DirDist}{Dir} \prod_{i = 1}^I \CatDist_{x_i}[\lambda_{1 \ldots K}] \cdot \DirDist_{\lambda_{1 \ldots K}}[\alpha_{1 \ldots K}] &= \prod_{i = 1}^I \prod_{j = 1}^K \lambda_j^{x_{ij}} \cdot \frac{ \Gamma\left[ \sum_{j = 1}^K \alpha_j \right] }{ \prod_{j = 1}^K \Gamma[\alpha_j] } \prod_{j = 1}^K \lambda_j^{\alpha_j - 1}\\ &= \frac{ \Gamma\left[ \sum_{j = 1}^K \alpha_j \right] }{ \prod_{j = 1}^K \Gamma[\alpha_j] } \prod_{j = 1}^K \lambda_j^{\alpha_j - 1 + N_j} & \quad & N_j = \sum_{i = 1}^I x_{ij}\\ &= \frac{ \Gamma\left[ \sum_{j = 1}^K \alpha_j \right] }{ \prod_{j = 1}^K \Gamma[\alpha_j] } \frac{ \prod_{j = 1}^K \Gamma[\alpha_j + N_j] }{ \Gamma\left[ \sum_{j = 1}^K \alpha_j + N_j \right] } \DirDist_{\lambda_{1 \ldots K}}[\alpha_1 + N_1, \ldots, \alpha_K + N_K]\\ &= \frac{ \Gamma\left[ \sum_{j = 1}^K \alpha_j \right] }{ \prod_{j = 1}^K \Gamma[\alpha_j] } \frac{ \prod_{j = 1}^K \Gamma[\alpha_j + N_j] }{ \Gamma\left[ I + \sum_{j = 1}^K \alpha_j \right] } \DirDist_{\lambda_{1 \ldots K}}[\alpha_1 + N_1, \ldots, \alpha_K + N_K] & \quad & \sum_j N_j = I\end{split}\]

Exercise 3.11

\[\begin{split}& \prod_{i = 1}^I \text{Norm}_{x_i}[\mu, \sigma^2] \cdot \NormInvGamDist_{\mu, \sigma^2}[\alpha, \beta, \gamma, \delta]\\ &= \prod_{i = 1}^I \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left[ -\frac{(x_i - \mu)^2}{2 \sigma^2} \right] \cdot \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\alpha + 1} \exp\left[ -\frac{2\beta + \gamma(\delta - \mu)^2}{2 \sigma^2} \right]\\ &= \left( 2 \pi \sigma^2 \right)^{-I / 2} \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\alpha + 1} \exp\left[ -\frac{1}{2 \sigma^2} \left( 2\beta + \gamma(\delta - \mu)^2 + \sum_i (x_i - \mu)^2 \right) \right]\\ &= \frac{1}{(2 \pi)^{I / 2}} \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\alpha + \frac{I}{2} + 1} \exp\left[ -\frac{1}{2 \sigma^2} \left( 2\beta + \gamma \delta^2 - 2 \gamma \delta \mu + \gamma \mu^2 + \sum_i x_i^2 - 2 \mu \sum_i x_i + I \mu^2 \right) \right]\\ &= \frac{1}{(2 \pi)^{I / 2}} \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\alpha + \frac{I}{2} + 1} \exp\left[ -\frac{1}{2 \sigma^2} \left( 2\beta + \sum_i x_i^2 + \gamma \delta^2 - 2 \gamma \delta \mu - 2 \mu \sum_i x_i + \tilde{\gamma} \mu^2 \right) \right] & \quad & \text{swapped in } \tilde{\gamma}\\ &= \frac{1}{(2 \pi)^{I / 2}} \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\alpha + \frac{I}{2} + 1} \exp\left[ -\frac{1}{2 \sigma^2} \left( 2 \tilde{\beta} + \frac{(\gamma \delta + \sum_i x_i)^2}{\tilde{\gamma}} - 2 \gamma \delta \mu - 2 \mu \sum_i x_i + \tilde{\gamma} \mu^2 \right) \right] & \quad & \text{swapped in } \tilde{\beta}\\ &= \frac{1}{(2 \pi)^{I / 2}} \frac{\sqrt{\gamma}}{\sigma \sqrt{2 \pi}} \frac{\beta^\alpha}{\Gamma[\alpha]} \left( \frac{1}{\sigma^2} \right)^{\tilde{\alpha} + 1} \exp\left[ -\frac{2 \tilde{\beta} + \tilde{\gamma} (\tilde{\delta} - \mu)^2} {2 \sigma^2} \right] & \quad & \text{swapped in } \tilde{\alpha} \text{ and } \tilde{\delta}\\ &= \kappa \NormInvGamDist_{\mu, \sigma^2}\left[ \tilde{\alpha}, \tilde{\beta}, \tilde{\gamma}, \tilde{\delta} \right] & \quad & \text{swapped in } \kappa\end{split}\]

Exercise 3.12

A useful trace property is

\[\DeclareMathOperator{\tr}{\mathrm{tr}} \tr\left[ z z^\top A^{-1} \right] = z^\top A^{-1} z.\]
\[\begin{split}\DeclareMathOperator{\NorIWisDist}{NorIWis} & \prod_{i = 1}^I \NormDist_{x_i}[\boldsymbol{\mu}, \boldsymbol{\Sigma}] \cdot \NorIWisDist_{\boldsymbol{\mu}, \boldsymbol{\Sigma}}[ \alpha, \boldsymbol{\Psi}, \gamma, \boldsymbol{\delta} ]\\ &= \prod_{i = 1}^I \frac{1}{ (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{1/2} } \exp\left[ -0.5 (\mathbf{x}_i - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x}_i - \boldsymbol{\mu}) \right] \cdot \frac{ \gamma^{D / 2} \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \exp\left[ -0.5 \left( \tr\left[ \boldsymbol{\Psi} \boldsymbol{\Sigma}^{-1} \right] + \gamma (\boldsymbol{\mu} - \boldsymbol{\delta})^\top \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - \boldsymbol{\delta}) \right) \right] }{ 2^{\alpha D / 2} (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{(\alpha + D + 2) / 2} \Gamma_D[\alpha / 2] }\end{split}\]
\[\begin{split}&= \frac{ \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2} }{ \pi^{ID / 2} \Gamma_D[\alpha / 2] } \frac{ \exp\left[ -0.5 \left( \tr\left[ \boldsymbol{\Psi} \boldsymbol{\Sigma}^{-1} \right] + \gamma (\boldsymbol{\mu} - \boldsymbol{\delta})^\top \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - \boldsymbol{\delta}) + \sum_i (\mathbf{x}_i - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x}_i - \boldsymbol{\mu}) \right) \right] }{ 2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{(\tilde{\alpha} + D + 2) / 2} } & \quad & \text{swapped in } \tilde{\alpha}\\ &= \frac{ \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2} }{ \pi^{ID / 2} \Gamma_D[\alpha / 2] } \frac{ \exp\left[ -0.5 \left( \tr\left[ \boldsymbol{\Psi} \boldsymbol{\Sigma}^{-1} \right] + \gamma \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} - 2 \gamma \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\delta} + \gamma \boldsymbol{\delta}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\delta} + \sum_i \mathbf{x}_i^\top \boldsymbol{\Sigma}^{-1} \mathbf{x}_i - 2 \sum_i \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \mathbf{x}_i + I \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} \right) \right] }{ 2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{(\tilde{\alpha} + D + 2) / 2} }\\ &= \frac{ \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2} }{ \pi^{ID / 2} \Gamma_D[\alpha / 2] } \frac{ \exp\left[ -0.5 \left( \tr\left[ \left( \boldsymbol{\Psi} + \gamma \boldsymbol{\delta} \boldsymbol{\delta}^\top + \sum_i \mathbf{x}_i \mathbf{x}_i^\top \right) \boldsymbol{\Sigma}^{-1} \right] + \tilde{\gamma} \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu} - 2 \gamma \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \boldsymbol{\delta} - 2 \sum_i \boldsymbol{\mu}^\top \boldsymbol{\Sigma}^{-1} \mathbf{x}_i \right) \right] }{ 2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{(\tilde{\alpha} + D + 2) / 2} } & \quad & \text{swapped in } \tilde{\gamma}\\ &= \frac{ \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2} }{ \pi^{ID / 2} \Gamma_D[\alpha / 2] } \frac{ \exp\left[ -0.5 \left( \tr\left[ \left( \boldsymbol{\Psi} + \gamma \boldsymbol{\delta} \boldsymbol{\delta}^\top + \sum_i \mathbf{x}_i \mathbf{x}_i^\top \right) \boldsymbol{\Sigma}^{-1} \right] + \tilde{\gamma} (\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}})^\top \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}}) - \tilde{\gamma} \tilde{\boldsymbol{\delta}}^\top \boldsymbol{\Sigma}^{-1} \tilde{\boldsymbol{\delta}} \right) \right] }{ 2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{(\tilde{\alpha} + D + 2) / 2} } & \quad & \text{swapped in } \tilde{\boldsymbol{\delta}}\\ &= \frac{ \left\vert \boldsymbol{\Psi} \right\vert^{\alpha / 2} \gamma^{D / 2} }{ \pi^{ID / 2} \Gamma_D[\alpha / 2] } \frac{ \exp\left[ -0.5 \left( \tr\left[ \tilde{\boldsymbol{\Psi}} \boldsymbol{\Sigma}^{-1} \right] + \tilde{\gamma} (\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}})^\top \boldsymbol{\Sigma}^{-1} (\boldsymbol{\mu} - \tilde{\boldsymbol{\delta}}) \right) \right] }{ 2^{\tilde{\alpha} D / 2} (2 \pi)^{D / 2} \left\vert \boldsymbol{\Sigma} \right\vert^{(\tilde{\alpha} + D + 2) / 2} } & \quad & \text{swapped in } \tilde{\boldsymbol{\Psi}}\\ &= \kappa \NorIWisDist_{\boldsymbol{\mu}, \boldsymbol{\Sigma}}\left[ \tilde{\alpha}, \tilde{\boldsymbol{\Psi}}, \tilde{\gamma}, \tilde{\boldsymbol{\delta}} \right] & \quad & \text{swapped in } \kappa\end{split}\]