Models for Visual Words

Latent Dirichlet allocation is a density model for the data in a corpus. It does not involve a “world” term that we wish to infer.

Exercise 20.1

Section 9.9 and 9.10 describe discriminative approaches that handle multi-class problems. The data is then defined as normalized word frequencies \(\mathbf{x} = [T_1, T_2, \ldots, T_K] / \sum_k T_k\).

Exercise 20.2

The following description is the generative process for the plate notation in Figure 20.6.

For each image indexed by \(i \in \{1, \ldots, I\}\) in a corpus:

  1. Choose \(\DeclareMathOperator{\PoissonDist} J_i \sim \PoissonDist(\xi)\).

  2. Choose a \(M\)-dimensional part probability vector \(\boldsymbol{\pi}_i\) from the distribution \(\DeclareMathOperator{\DirDist}{Dir} Pr(\boldsymbol{\pi}_i) = \DirDist_{\boldsymbol{\pi}_i}[\boldsymbol{\alpha}]\).

  3. For each visual word indexed by \(j \in \{1, \ldots, J_i\}\) in image \(i\):

    1. Choose a part \(p_{ij} \in \{1, \ldots, M\}\) from the distribution \(\DeclareMathOperator{\CatDist}{Cat} Pr(p_{ij}) = \CatDist_{p_{ij}}[\boldsymbol{\pi}_i]\).

    2. Choose a visual word \(f_{ij} \in \{1, \ldots, K\}\) from the distribution \(Pr(f_{ij} \mid p_{ij}) = \CatDist_{f_{ij}}[\boldsymbol{\lambda}_{p_{ij}}]\) where \(Pr(\boldsymbol{\lambda}_m) = \DirDist_{\boldsymbol{\lambda}_m}[\boldsymbol{\beta}]\).

The joint distribution for this generative process’ graphical model is

\[\begin{split}Pr(\boldsymbol{\pi}, \boldsymbol{\lambda}, \mathbf{p}, \mathbf{f} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) &= Pr(\boldsymbol{\lambda} \mid \boldsymbol{\beta}) Pr(\boldsymbol{\pi} \mid \boldsymbol{\alpha}) Pr(\mathbf{p} \mid \boldsymbol{\pi}) Pr(\mathbf{f} \mid \mathbf{p}, \boldsymbol{\lambda}) & \quad & \text{(10.3)}\\ &= \prod_{m = 1}^M Pr(\boldsymbol{\lambda}_m \mid \boldsymbol{\beta}) \cdot \prod_{i = 1}^I Pr(\boldsymbol{\pi}_i \mid \boldsymbol{\alpha}) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} Pr(p_{ij} \mid \boldsymbol{\pi}_i) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} Pr(f_{ij} \mid \boldsymbol{\lambda}_{p_{ij}}) & \quad & \text{d-Separation from generative process}\\ &= \prod_{m = 1}^M Pr(\boldsymbol{\lambda}_m \mid \boldsymbol{\beta}) \cdot \prod_{i = 1}^I Pr(\boldsymbol{\pi}_i \mid \boldsymbol{\alpha}) \prod_{j = 1}^{J_i} Pr(p_{ij} \mid \boldsymbol{\pi}_i) Pr(f_{ij} \mid \boldsymbol{\lambda}_{p_{ij}})\end{split}\]

where \(\mathbf{p} = \{ p_{ij} \}_{i = 1, j = 1}^{I, J_i}\) and \(\mathbf{f} = \{ f_{ij} \}_{i = 1, j = 1}^{I, J_i}\) are the hidden part labels and observed visual word labels respectively. This joint distribution is also known as the complete-data likelihood of an image.

When the part and visual word labels are known, the Bayesian approach (or another technique like ML and MAP) can be applied to fit the parameters \(\boldsymbol{\theta} = \left\{ \{ \boldsymbol{\pi}_i \}_{i = 1}^I, \{ \boldsymbol{\lambda}_m \}_{m = 1}^M \right\}\) to the data. The Bayesian approach requires calculating a posterior over the parameters as

\[\begin{split}Pr(\boldsymbol{\theta} \mid \mathbf{p}, \mathbf{f}, \boldsymbol{\alpha}, \boldsymbol{\beta}) &= \frac{ Pr(\boldsymbol{\pi}, \boldsymbol{\lambda}, \mathbf{p}, \mathbf{f} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) }{ Pr(\mathbf{p}, \mathbf{f} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) }\\ &= \frac{ Pr(\boldsymbol{\lambda} \mid \boldsymbol{\beta}) Pr(\boldsymbol{\pi} \mid \boldsymbol{\alpha}) Pr(\mathbf{p} \mid \boldsymbol{\pi}) Pr(\mathbf{f} \mid \mathbf{p}, \boldsymbol{\lambda}) }{ Pr(\mathbf{f} \mid \mathbf{p}, \boldsymbol{\alpha}, \boldsymbol{\beta}) Pr(\mathbf{p} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) }\\ &= \frac{ Pr(\boldsymbol{\lambda} \mid \boldsymbol{\beta}) Pr(\mathbf{f} \mid \mathbf{p}, \boldsymbol{\lambda}) Pr(\boldsymbol{\pi} \mid \boldsymbol{\alpha}) Pr(\mathbf{p} \mid \boldsymbol{\pi}) }{ \left( \int Pr(\boldsymbol{\lambda}, \mathbf{f} \mid \mathbf{p}, \boldsymbol{\alpha}, \boldsymbol{\beta}) d\boldsymbol{\lambda} \right) \left( \prod_{i = 1}^I \int Pr(\boldsymbol{\pi}_i, \mathbf{p}_{i \cdot} \mid \boldsymbol{\alpha}, \boldsymbol{\beta}) d\boldsymbol{\pi}_i \right) } & \quad & \text{marginalization and independence from generative process}\\ &= \frac{ \prod_{m = 1}^M Pr(\boldsymbol{\lambda}_m \mid \boldsymbol{\beta}) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} Pr(f_{ij} \mid \boldsymbol{\lambda}_{p_{ij}}) \cdot \prod_{i = 1}^I Pr(\boldsymbol{\pi}_i \mid \boldsymbol{\alpha}) \prod_{j = 1}^{J_i} Pr(p_{ij} \mid \boldsymbol{\pi}_i) }{ \left( \int \prod_{m = 1}^M Pr(\boldsymbol{\lambda}_m \mid \boldsymbol{\beta}) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} Pr(f_{ij} \mid \boldsymbol{\lambda}_{p_{ij}}) d\boldsymbol{\lambda} \right) \left( \prod_{i = 1}^I \int Pr(\boldsymbol{\pi}_i \mid \boldsymbol{\alpha}) \prod_{j = 1}^{J_i} Pr(\mathbf{p}_{i \cdot} \mid \boldsymbol{\pi}_i) d\boldsymbol{\pi}_i \right) } & \quad & \text{d-Separation from generative process}\\ &= \prod_{m = 1}^M \DirDist_{\boldsymbol{\lambda}_m}\left[ \beta + F_{m1}, \ldots, \beta + F_{mK} \right] \cdot \prod_{i = 1}^I \DirDist_{\boldsymbol{\pi}_i}\left[ \alpha + P_{i1}, \ldots, \alpha + P_{iM} \right] & \quad & \text{Exercise 20.3 (a) and (b)}\end{split}\]

See Exercise 20.3 for more details.

The predictive density (probability that a new data point \(\{p^*, f^*\}\) belongs to the same model) is

\[\begin{split}& Pr(p^*, f^* \mid \mathbf{p}, \mathbf{f}, \boldsymbol{\alpha}, \boldsymbol{\beta})\\ &= \int Pr(p^*, f^* \mid \boldsymbol{\theta}) Pr(\boldsymbol{\theta} \mid \mathbf{p}, \mathbf{f}, \boldsymbol{\alpha}, \boldsymbol{\beta}) d\boldsymbol{\theta}\\ &= \int Pr(f^* \mid \boldsymbol{\lambda}_{p^*}) Pr(p^* \mid \boldsymbol{\pi}_*) Pr(\boldsymbol{\theta} \mid \mathbf{p}, \mathbf{f}, \boldsymbol{\alpha}, \boldsymbol{\beta}) d\boldsymbol{\theta} & \quad & \text{d-Separation from generative process}\\ &= \int \CatDist_{f^*}[\boldsymbol{\lambda}_{p*}] \CatDist_{p^*}[\boldsymbol{\pi}_*] \cdot \prod_{m = 1}^M \DirDist_{\boldsymbol{\lambda}_m}\left[ \beta + F_{m1}, \ldots, \beta + F_{mK} \right] \cdot \prod_{i = 1}^I \DirDist_{\boldsymbol{\pi}_i}\left[ \alpha + P_{i1}, \ldots, \alpha + P_{iM} \right] d\boldsymbol{\lambda} d\boldsymbol{\pi} & \quad & \text{(20.5)}\\ &= \frac{\beta + F_{mk}}{K \beta + \sum_{k = 1}^K F_{mk}} \frac{\alpha + P_{im}}{M \alpha + \sum_{m = 1}^M P_{im}} & \quad & \text{(a), (b), and } \Gamma[1 + n] = n \Gamma[n] \text{ for } n > 0.\end{split}\]

[Tu14][Gri02][Gre05] are helpful resources in proving these relations.

(a)

\[\begin{split}& \prod_{m = 1}^M \DirDist_{\boldsymbol{\lambda}_m}\left[ \beta + F_{m1}, \ldots, \beta + F_{mK} \right] \cdot \CatDist_{f^*}[\boldsymbol{\lambda}_{p^*}]\\ &= \left( \prod_{m = 1}^M \frac{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] }{ \prod_{k = 1}^K \Gamma\left[ \beta + F_{mk} \right] } \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\beta - 1 + F_{mk}} \right) \cdot \prod_{k = 1}^K \boldsymbol{\lambda}_{p^* k}^{\delta[f^* - k]} & \quad & \text{(3.9), (3.8)}\\ &= \left( \prod_{m = 1}^M \frac{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] }{ \prod_{k = 1}^K \Gamma\left[ \beta + F_{mk} \right] } \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\beta - 1 + F_{mk}} \right) \cdot \prod_{m = 1}^M \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\tilde{F}_{mk}} & \quad & \tilde{F}_{mk} = \delta\left[ f^* - k \right] \delta\left[ p^* - m \right]\\ &= \prod_{m = 1}^M \frac{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] }{ \prod_{k = 1}^K \Gamma\left[ \beta + F_{mk} \right] } \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\beta - 1 + F_{mk} + \tilde{F}_{mk}}\\ &= \prod_{m = 1}^M \frac{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] }{ \prod_{k = 1}^K \Gamma\left[ \beta + F_{mk} \right] } \frac{ \prod_{k = 1}^K \Gamma\left[ \beta + F_{mk} + \tilde{F}_{mk} \right] }{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} + \tilde{F}_{mk} \right] } \DirDist_{\boldsymbol{\lambda}_m}\left[ \beta + F_{m1} + \tilde{F}_{m1}, \ldots, \beta + F_{mK} + \tilde{F}_{mK} \right]\end{split}\]

(b)

\[\begin{split}& \prod_{i = 1}^I \DirDist_{\boldsymbol{\pi}_i}\left[ \alpha + P_{i1}, \ldots, \alpha + P_{iM} \right] \cdot \CatDist_{p^*}[\boldsymbol{\pi}_*]\\ &= \left( \prod_{i = 1}^I \frac{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] }{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] } \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\alpha - 1 + P_{im}} \right) \cdot \prod_{m = 1}^M \boldsymbol{\pi}_{*m}^{\delta\left[ p^* - m \right]} & \quad & \text{(3.9), (3.8)}\\ &= \left( \prod_{i = 1}^I \frac{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] }{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] } \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\alpha - 1 + P_{im}} \right) \cdot \prod_{i = 1}^I \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\tilde{P}_{im}} & \quad & \tilde{P}_{im} = \delta\left[ p^* - m \right] \delta\left[ * - i \right]\\ &= \prod_{i = 1}^I \frac{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] }{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] } \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\alpha - 1 + P_{im} + \tilde{P}_{im}}\\ &= \prod_{i = 1}^I \frac{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] }{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] } \frac{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} + \tilde{P}_{im} \right] }{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} + \tilde{P}_{im} \right] } \DirDist_{\boldsymbol{\pi}_i}\left[ \alpha + P_{i1} + \tilde{P}_{i1}, \ldots, \alpha + P_{iM} + \tilde{P}_{iM} \right]\end{split}\]

Exercise 20.3

The following is based on the derivations in Exercise 3.10.

The likelihood term in LDA is given by

\[\begin{split}& \int \prod_{m = 1}^M Pr(\boldsymbol{\lambda}_m \mid \boldsymbol{\beta}) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} Pr(f_{ij} \mid \boldsymbol{\lambda}_{p_{ij}}) d\boldsymbol{\lambda}_{1 \ldots M} & \quad & \text{(20.10)}\\ &= \left( \frac{\Gamma\left[K \beta\right]}{\Gamma[\beta]^K} \right)^M \prod_{m = 1}^M \frac{ \prod_{k = 1}^K \Gamma\left[ \beta + F_{mk} \right] }{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] } \cdot \int \prod_{m = 1}^M \DirDist_{\boldsymbol{\lambda}_m}\left[ \beta + F_{m1}, \ldots, \beta + F_{mK} \right] d\boldsymbol{\lambda}_{1 \ldots M} & \quad & \text{(a)}\\ &= \left( \frac{\Gamma\left[K \beta\right]}{\Gamma[\beta]^K} \right)^M \prod_{m = 1}^M \frac{ \prod_{k = 1}^K \Gamma[\beta + F_{mk}] }{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] }.\end{split}\]

The prior term in LDA is given by

\[\begin{split}& \prod_{i = 1}^I \int Pr(\boldsymbol{\pi}_i \mid \boldsymbol{\alpha}) \prod_{j = 1}^{J_i} Pr(p_{ij} \mid \boldsymbol{\pi}_i) d\boldsymbol{\pi}_{i} & \quad & \text{(20.11)}\\ &= \prod_{i = 1}^I \frac{\Gamma[M \alpha]}{\Gamma[\alpha]^M} \frac{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] }{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] } \int \DirDist_{\boldsymbol{\pi}_i}\left[ \alpha + P_{i1}, \ldots, \alpha + P_{iM} \right] d\boldsymbol{\pi}_i & \quad & \text{(b)}\\ &= \left( \frac{\Gamma[M \alpha]}{\Gamma[\alpha]^M} \right)^I \prod_{i = 1}^I \frac{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] }{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] }.\end{split}\]

(a)

\[\begin{split}\prod_{m = 1}^M Pr(\boldsymbol{\lambda}_m \mid \boldsymbol{\beta}) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} Pr(f_{ij} \mid \boldsymbol{\lambda}_{p_{ij}}) &= \prod_{m = 1}^M \DirDist_{\boldsymbol{\lambda}_m}[\boldsymbol{\beta}] \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} \CatDist_{f_{ij}}[\boldsymbol{\lambda}_{p_{ij}}] & \quad & \text{(20.5), (20.7)}\\ &= \left( \prod_{m = 1}^M \frac{ \Gamma\left[ \sum_{k = 1}^K \boldsymbol{\beta}_k \right] }{ \prod_{k = 1}^K \Gamma\left[ \boldsymbol{\beta}_k \right] } \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\boldsymbol{\beta}_k - 1} \right) \cdot \prod_{i = 1}^I \prod_{j = 1}^{J_i} \prod_{k = 1}^K \boldsymbol{\lambda}_{p_{ij} k}^{\delta\left[ f_{ij} - k \right]} & \quad & \text{(3.9), (3.8)}\\ &= \left( \prod_{m = 1}^M \frac{\Gamma\left[K \beta\right]}{\Gamma[\beta]^K} \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\beta - 1} \right) \cdot \prod_{m = 1}^M \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{F_{mk}} & \quad & F_{mk} = \sum_{i = 1}^I \sum_{j = 1}^{J_i} \delta\left[ f_{ij} - k \right] \delta\left[ p_{ij} - m \right]\\ &= \prod_{m = 1}^M \frac{\Gamma\left[K \beta\right]}{\Gamma[\beta]^K} \prod_{k = 1}^K \boldsymbol{\lambda}_{mk}^{\beta - 1 + F_{mk}}\\ &= \prod_{m = 1}^M \frac{\Gamma\left[K \beta\right]}{\Gamma[\beta]^K} \frac{ \prod_{k = 1}^K \Gamma[\beta + F_{mk}] }{ \Gamma\left[ K \beta + \sum_{k = 1}^K F_{mk} \right] } \DirDist_{\boldsymbol{\lambda}_m}\left[ \beta + F_{m1}, \ldots, \beta + F_{mK} \right]\end{split}\]

(b)

\[\begin{split}\prod_{i = 1}^I Pr(\boldsymbol{\pi}_i \mid \boldsymbol{\alpha}) \prod_{j = 1}^{J_i} Pr(p_{ij} \mid \boldsymbol{\pi}_i) &= \prod_{i = 1}^I \DirDist_{\boldsymbol{\pi}_i}[\boldsymbol{\alpha}] \prod_{j = 1}^{J_i} \CatDist_{p_{ij}}[\boldsymbol{\pi}_i] & \quad & \text{(20.5), (20.7)}\\ &= \prod_{i = 1}^I \left( \frac{ \Gamma\left[ \sum_{m = 1}^M \boldsymbol{\alpha}_m \right] }{ \prod_{m = 1}^M \Gamma\left[ \boldsymbol{\alpha}_m \right] } \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\boldsymbol{\alpha}_m - 1} \cdot \prod_{j = 1}^{J_i} \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\delta\left[ p_{ij} - m \right]} \right) & \quad & \text{(3.9), (3.8)}\\ &= \prod_{i = 1}^I \left( \frac{\Gamma[M \alpha]}{\Gamma[\alpha]^M} \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\alpha - 1} \cdot \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{P_{im}} \right) & \quad & P_{im} = \sum_{j = 1}^{J_i} \delta[p_{ij} - m]\\ &= \prod_{i = 1}^I \frac{\Gamma\left[M \alpha \right]}{\Gamma[\alpha]^M} \prod_{m = 1}^M \boldsymbol{\pi}_{im}^{\alpha - 1 + P_{im}}\\ &= \prod_{i = 1}^I \frac{\Gamma[M \alpha]}{\Gamma[\alpha]^M} \frac{ \prod_{m = 1}^M \Gamma\left[ \alpha + P_{im} \right] }{ \Gamma\left[ M \alpha + \sum_{m = 1}^M P_{im} \right] } \DirDist_{\boldsymbol{\pi}_i}\left[ \alpha + P_{i1}, \ldots, \alpha + P_{iM} \right]\end{split}\]

Exercise 20.4

Adapting LDA to the single author topic model in [FFP05] (Figure 3) results in the following generative process.

Each image has only one object. For each image indexed by \(i \in \{1, \ldots, I\}\) in a corpus:

  1. Choose an object \(w_i \in \{1, \ldots, N\}\) from the distribution \(Pr(w_i) = \CatDist_{w_i}[\boldsymbol{\nu}]\).

  2. Choose a \(M\)-dimensional part probability vector \(\boldsymbol{\pi}_n\) from the distribution \(Pr(\boldsymbol{\pi}_n \mid w_i, \boldsymbol{\alpha}) = \DirDist_{\boldsymbol{\pi}_n}[\boldsymbol{\alpha}_{w_i}]\) where \(\DeclareMathOperator{\GammaDist}{Gamma} Pr(\boldsymbol{\alpha}_{w_i}) = \GammaDist_{\boldsymbol{\alpha}_{w_i}}[\kappa, \theta]\).

  3. Choose \(J_i \sim \PoissonDist(\xi)\).

  4. For each visual word indexed by \(j \in \{1, \ldots, J_i\}\) in image \(i\):

    1. Choose a part \(p_{ij} \in \{1, \ldots, M\}\) from the distribution \(Pr(p_{ij} \mid \boldsymbol{\pi}_n) = \CatDist_{p_{ij}}[\boldsymbol{\pi}_n]\).

    2. Choose a visual word \(f_{ij} \in \{1, \ldots, K\}\) from the distribution \(Pr(f_{ij} \mid p_{ij}, \boldsymbol{\lambda}) = \CatDist_{f_{ij}}[\boldsymbol{\lambda}_{p_{ij}}]\) where \(Pr(\boldsymbol{\lambda}_m) = \DirDist_{\boldsymbol{\lambda}_m}[\boldsymbol{\beta}]\).

Exercise 20.5

See Figure 1 of [RZGSS04].

The Author-Topic model observes the words and the set of authors per document; the other nodes are hidden. The generative process randomly selects an author and uses that author’s (fixed) topic distribution as part of the word generation.

Exercise 20.6

Notice that the parts in the constellation model (Figure 20.10) induces a spatial distribution over its associated words.

Taking the graphical model of Exercise 20.5 as a template, the adjacent visual words can be encouraged to take the same part labels via adding a feature position indexed by the selected topic \(z\). In short, add another node with \(z\) as its parent.

The Gibbs sampling procedure will include an additional 2D normal distribution similar to (20.27).

Exercise 20.7

The graphical model can be constructed from the following generative process.

For each image indexed by \(i \in \{1, \ldots, I\}\) in a corpus:

  1. Choose a scene \(s_i \in \{1, \ldots, C\}\) from the distribution \(Pr(s_i) = \CatDist_{s_i}[\boldsymbol{\nu}]\).

  2. Choose \(J_i \sim \PoissonDist(\xi)\).

  3. For each visual word indexed by \(j \in \{1, \ldots, J_i\}\) in image \(i\):

    1. Choose an object \(w_{ij} \in \{1, \ldots, L\}\) from the distribution \(Pr(w_{ij} \mid s_i = c) = \CatDist_{w_{ij}}[\boldsymbol{\phi}_c]\).

    2. Choose a part \(p_{ij} \in \{1, \ldots, M\}\) from the distribution \(Pr(p_{ij} \mid w_{ij} = n) = \CatDist_{p_{ij}}[\boldsymbol{\pi}_{w_n}]\).

    3. Choose a visual word \(f_{ij} \in \{1, \ldots, K\}\) from the distribution \(Pr(f_{ij} \mid p_{ij} = m) = \CatDist_{f_{ij}}[\boldsymbol{\lambda}_m]\).

    4. Choose a feature position \(\mathbf{x}_{ij}\) from the distribution \(\DeclareMathOperator{\NormDist}{Norm} Pr(\mathbf{x}_{ij} \mid p_{ij} = m, w_{ij} = n) = \NormDist_{\mathbf{x}_{ij}}\left[ \boldsymbol{\mu}_n^{(w)} + \boldsymbol{\mu}_m^{(p)}, \boldsymbol{\Sigma}_n^{(w)} + \boldsymbol{\Sigma}_m^{(p)} \right]\).

\(\{ s_i \}_{i = 1}^I, \{ f_{ij} \}_{i = 1, j = 1}^{I, J_i}, \{ x_{ij} \}_{i = 1, j = 1}^{I, J_i}\) are observable (manifest) variables.

\(\{ w_{ij} \}_{i = 1, j = 1}^{I, J_i}, \{ p_{ij} \}_{i = 1, j = 1}^{I, J_i}\) are unobservable (latent) variables.

The rest are parameters to be estimated with fixed hyperparameters.

Exercise 20.8

Slides 2 to 9 of [Lau] classifies latent variable models as

Manifest variable

Latent variable

Metrical

Categorical

Metrical

Factor analysis

Latent trait analysis

Categorical

Latent profile analysis

Latent class analysis

Discrete factor analysis is another name for latent trait analysis. Categorical variables can be either ordinal or nominal, and metrical variables can either be discrete or continuous.

See [Bar80] for more details.

References

Bar80

David J Bartholomew. Factor analysis for categorical data. Journal of the Royal Statistical Society. Series B (Methodological), pages 293–321, 1980.

FFP05

Li Fei-Fei and Pietro Perona. A bayesian hierarchical model for learning natural scene categories. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, 524–531. IEEE, 2005.

Gre05

Heinrich Gregor. Parameter estimation for text analysis. Technical report, 2005.

Gri02

Tom Griffiths. Gibbs sampling in the generative model of latent dirichlet allocation. unpublished book, 2002.

Lau

Steffen Lauritzen. Latent variable models and factor analysis. http://www.stats.ox.ac.uk/ steffen/teaching/fsmHT07/fsm607c.pdf. Accessed on 2017-08-03.

RZGSS04

Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. The author-topic model for authors and documents. In Proceedings of the 20th conference on Uncertainty in artificial intelligence, 487–494. AUAI Press, 2004.

Tu14

Stephen Tu. The dirichlet-multinomial and dirichlet-categorical models for bayesian inference. Computer Science Division, UC Berkeley, 2014.