Learning and Inference in Vision

Regression

Suppose the likelihood of the data is independent of the model.

\[\begin{split}Pr(\boldsymbol{\theta} \mid w, x) &= \frac{Pr(\boldsymbol{\theta}, w, x)}{Pr(w, x)}\\ &= \frac{ Pr(w \mid x, \boldsymbol{\theta}) Pr(x \mid \boldsymbol{\theta}) Pr(\boldsymbol{\theta}) }{ Pr(w \mid x) Pr(x) }\\ &= \frac{ Pr(w \mid x, \boldsymbol{\theta}) Pr(\boldsymbol{\theta}) }{ Pr(w \mid x) } & \quad & Pr(x \mid \boldsymbol{\theta}) = Pr(x).\end{split}\]

Applications

Suppose the world state is independent of the model [Eng].

\[\begin{split}Pr(w \mid x, \boldsymbol{\theta}) &= \frac{Pr(w, x, \boldsymbol{\theta})}{Pr(x, \boldsymbol{\theta})}\\ &= \frac{ Pr(x \mid w, \boldsymbol{\theta}) Pr(w \mid \boldsymbol{\theta}) Pr(\boldsymbol{\theta}) }{ Pr(x \mid \boldsymbol{\theta}) Pr(\boldsymbol{\theta}) }\\ &= \frac{ Pr(x \mid w, \boldsymbol{\theta}) Pr(w) }{ Pr(x \mid \boldsymbol{\theta}) } & \quad & Pr(w \mid \boldsymbol{\theta}) = Pr(w).\end{split}\]

Exercise 6.1

(i), (iii), and (iv) are classification problems while (ii) and (v) are regression problems.

(i)

\(\mathbf{w}\) represents a discrete state describing whether a face is male or female.

\(\mathbf{x}\) is an image of a face that have been discretized into pixels spanning some color space.

(ii)

\(\mathbf{w}\) represents a continuous state describing the 3D pose of a human body, which covers all physically possible orientations and positions.

\(\mathbf{x}\) is an image of a body that have been discretized into pixels spanning some color space.

(iii)

\(\mathbf{w}\) represents a discrete state spanning the four suits (hearts, diamond, clubs, spades) of a playing card.

\(\mathbf{x}\) is an image of a playing card that have been discretized into pixels spanning some color space.

(iv)

\(\mathbf{w}\) represents a discrete binary state describing whether a face image matches another face image.

\(\mathbf{x}\) consists of a pair of face images where each image has been discretized into pixels spanning some color space.

(v)

\(\mathbf{w}\) represents a continuous state describing the 3D position of a point.

\(\mathbf{x}\) consists of the images produced by a set of cameras and their correspondences; all of which have been discretized into pixels spanning arbitrary color spaces.

Exercise 6.2

Discriminative

According to [Brub][Brua], this is known as multinomial logistic regression.

Use a categorical distribution to model the univariate discrete multi-valued world state \(\mathbf{w}\) as \(Pr(\mathbf{w})\).

Let \(L_m(x) = \phi_{m, 0} + \phi_{m, 1} x\) denote the linear function of the data \(x\) for \(m = 1, 2, \ldots, M\).

Define the probability of observing one of the \(M\) possible outcomes as

\[\lambda_M(x) = \left( 1 + \sum_{i = 1}^{M - 1} \exp L_i(x) \right)^{-1} \quad \land \quad \lambda_m(x) = \lambda_M \exp L_m(x)\]

where \(\sum_m \lambda_m(x) = 1\) for \(x = 1, 2, \ldots, K\).

Applying the same notations as (3.8) gives

\[\DeclareMathOperator{\CatDist}{Cat} Pr(\mathbf{w} \mid x, \boldsymbol{\theta}) = \CatDist_{\mathbf{w}}\left[ \boldsymbol{\lambda}(x) \right]\]

where \(\boldsymbol{\theta} = \{ \phi_{1 \ldots M \times 0 \ldots 1} \}\), \(\mathbf{w} = \mathbf{e}_m\), and \(\boldsymbol{\lambda} = \left( \lambda_1, \ldots, \lambda_M \right)^\top\).

Generative

Since the world state is a discrete multi-valued univariate, define a prior distribution over the world state as

\[Pr(\mathbf{w}) = \CatDist_{\mathbf{w}}\left[ \boldsymbol{\lambda}' \right]\]

where \(\mathbf{w} = \mathbf{e}_m\) and \(\boldsymbol{\lambda}' = \left( \lambda'_1, \ldots, \lambda'_M \right)^\top\).

Use a categorical distribution to model the discrete multi-valued univariate data \(\mathbf{x}\) as \(Pr(\mathbf{x})\).

Let \(L_k(w) = \phi_{k, 0} + \phi_{k, 1} w\) denote the linear function of the world state \(w\) for \(k = 1, 2, \ldots, K\).

Define the probability of observing one of the \(K\) possible outcomes as

\[\lambda_K(w) = \left( 1 + \sum_{i = 1}^{K - 1} \exp L_i(w) \right)^{-1} \quad \land \quad \lambda_k = \lambda_K \exp L_k(w)\]

where \(\sum \lambda_k(w) = 1\) for all \(w = 1, 2, \ldots, M\).

Applying the same notations as (3.8) yields

\[Pr(\mathbf{x} \mid w, \boldsymbol{\theta}) = \CatDist_{\mathbf{x}}\left[ \boldsymbol{\lambda}(w) \right]\]

where \(\boldsymbol{\theta} = \left\{ \boldsymbol{\lambda}', \phi_{1 \ldots K \times 0 \ldots 1} \right\}\), \(\mathbf{x} = \mathbf{e}_k\), and \(\boldsymbol{\lambda} = \left( \lambda_1, \ldots, \lambda_K \right)^\top\).

Exercise 6.3

Since the world state is univariate and continuous, define a prior distribution over the world state as

\[\DeclareMathOperator{\NormDist}{Norm} Pr(w) = \NormDist_w\left[ \mu_p, \sigma_p^2 \right].\]

Use a Bernoulli distribution to model the univariate binary discrete data \(x\) as \(Pr(x)\).

Let \(\lambda(w) = \phi_0 + \phi_1 w\) denote a linear function of the world state \(w\). The generative regression model is then

\[\DeclareMathOperator{\BernDist}{Bern} \DeclareMathOperator{\sigmoid}{sig} Pr(x \mid w, \boldsymbol{\theta}) = \BernDist_x\left[ \sigmoid\left( \lambda(w) \right) \right] = \BernDist_x\left[ \frac{1}{1 + \exp\left[ -\phi_0 - \phi_1 w \right]} \right]\]

where \(\boldsymbol{\theta} = \{ \mu_p, \sigma_p^2, \phi_0, \phi_1 \}\).

Exercise 6.4

Use a beta distribution to model the univariate continuous world state \(w \in \{ 0, 1 \}\) as \(Pr(w)\).

Since the data \(x\) comes from a univariate continuous distribution, we can arbitrarily model that as \(Pr(x) = \NormDist_x\left[ \mu, \sigma^2 \right]\) and represent the parameters of the beta distribution in those terms (see Exercise 3.3):

\[\alpha = \mu \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right) \quad \text{and} \quad \beta = (1 - \mu) \left( \frac{\mu (1 - \mu)}{\sigma^2} - 1 \right).\]

The discriminative regression model is then

\[\DeclareMathOperator{\BetaDist}{Beta} Pr(w \mid x, \boldsymbol{\theta}) = \BetaDist_w[\alpha, \beta]\]

where \(\boldsymbol{\theta} = \left\{ \mu, \sigma^2 \right\}\).

Exercise 6.5

\[\begin{split}L &= \sum_{i = 1}^I \log \NormDist_{w_i}\left[ \phi_0 + \phi_1 x_i, \sigma^2 \right]\\ &= \sum_{i = 1}^I \log \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left[ \frac{(w_i - \phi_0 - \phi_1 x_i)^2}{\sigma^2} \right]^{-0.5}\\ &= -\frac{I}{2} \log 2 \pi - \frac{I}{2} \log \sigma^2 - \frac{1}{2 \sigma^2} \sum_{i = 1}^I (w_i - \phi_0 - \phi_1 x_i)^2\end{split}\]

(a)

\[\begin{split}\frac{\partial L}{\partial \phi_0} &= -\frac{1}{2 \sigma^2} \sum_{i = 1}^I 2 (w_i - \phi_0 - \phi_1 x_i) (-1)\\ 0 &= \frac{1}{2 \sigma^2} \sum_{i = 1}^I w_i - \phi_0 - \phi_1 x_i\\ \phi_0 &= \frac{1}{I} \sum_{i = 1}^I w_i - \phi_1 x_i\end{split}\]

(b)

\[\begin{split}\frac{\partial L}{\partial \phi_1} &= -\frac{1}{2 \sigma^2} \sum_{i = 1}^I 2 (w_i - \phi_0 - \phi_1 x_i) (-x_i)\\ 0 &= \frac{1}{2 \sigma^2} \sum_{i = 1}^I w_i x_i - \phi_0 x_i - \phi_1 x_i^2\\ \phi_1 &= \frac{\sum_{i = 1}^I x_i (w_i - \phi_0)}{\sum_{i = 1}^I x_i^2}\end{split}\]

(c)

\[\begin{split}\frac{\partial L}{\partial \sigma} &= -\frac{I}{2 \sigma^2} 2 \sigma - \frac{1}{2 \sigma^3} (-2) \sum_{i = 1}^I (w_i - \phi_0 - \phi_1 x_i)^2\\ \frac{I}{\sigma} &= \frac{1}{\sigma^3} \sum_{i = 1}^I (w_i - \phi_0 - \phi_1 x_i)^2\\ \sigma^2 &= \frac{1}{I} \sum_{i = 1}^I (w_i - \phi_0 - \phi_1 x_i)^2\end{split}\]

Exercise 6.6

\[\begin{split}Pr(w_i \mid x_i) &= \frac{Pr(w_i, x_i)}{Pr(x_i)}\\ &= \NormDist_{w_i}\left[ \mu_w + \sigma_{xw}^2 \sigma_{xx}^{-1} (x_i - \mu_x), \sigma_{ww}^2 - \sigma_{xw}^2 \sigma_{xx}^{-1} \sigma_{xw}^2 \right] & \quad & \text{(5.13) and Exercise 5.5}\\ &= \NormDist_{w_i}\left[ \phi_0 + \phi_1 x_i, \sigma^2 \right] & \quad & \text{Exercise 6.5, (a), (b), (c)}\end{split}\]

where

\[\begin{split}\phi_0 &= \mu_w - \sigma_{xw}^2 \sigma_{xx}^{-1} \mu_x\\\\ \phi_1 &= \sigma_{xw}^2 \sigma_{xx}^{-1}\\\\ \sigma^2 &= \sigma_{ww}^2 - \sigma_{xw}^2 \sigma_{xx}^{-1} \sigma_{xw}^2.\end{split}\]

See Exercise 5.5 and Exercise 6.5 for more details.

(a)

In order to simplify notations, rewrite the MLE of \(\phi_0\) as

\[\phi_0 = \frac{1}{I} \sum_{i = 1}^I w_i - \phi_1 x_i = \mu_w - \phi_1 \mu_x\]

where \(\mu_w = I^{-1} \sum_{i = 1}^I w_i\) and \(\mu_x = I^{-1} \sum_{i = 1}^I x_i\).

(b)

In order to simplify notations, rewrite the MLE of \(\phi_1\) as

\[\begin{split}\phi_1 &= \frac{\sum_{i = 1}^I x_i (w_i - \phi_0)}{\sum_{i = 1}^I x_i^2}\\ \phi_1 \sum_{i = 1}^I x_i^2 &= \sum_{i = 1}^I x_i w_i - x_i (\mu_w - \phi_1 \mu_x) & \quad & \text{(a)}\\ \phi_1 &= \frac{ \sum_{i = 1}^I x_i w_i - x_i \mu_w }{ \sum_{i = 1}^I x_i^2 - x_i \mu_x }\\ &= \frac{ I^{-1} \sum_{i = 1}^I x_i w_i - x_i \mu_w }{ I^{-1} \sum_{i = 1}^I x_i^2 - x_i \mu_x }\\ &= \left( \frac{\sum_{i = 1}^I x_i w_i}{I} - \mu_x \mu_w \right) \left( \frac{\sum_{i = 1}^I x_i^2}{I} - \mu_x^2 \right)^{-1}\\ &= \frac{ \sum_{i = 1}^I x_i w_i - \mu_x \mu_w }{ \sum_{i = 1}^I x_i^2 - \mu_x^2 }.\end{split}\]

(c)

Substituting in the MLE of \(\phi_0\) and \(\phi_1\) into \(\sigma^2\) gives

\[\begin{split}\sigma^2 &= I^{-1} \sum_{i = 1}^I (w_i - \phi_0 - \phi_1 x_i)^2\\ &= I^{-1} \sum_{i = 1}^I w_i^2 - 2 w_i (\phi_0 + \phi_1 x_i) + (\phi_0 + \phi_1 x_i)^2\\ &= I^{-1} \sum_{i = 1}^I w_i^2 - 2 w_i \phi_0 - 2 w_i x_i \phi_1 + \phi_0^2 + 2 \phi_0 \phi_1 x_i + \phi_1^2 x_i^2\\ &= I^{-1} \sum_{i = 1}^I w_i^2 + \phi_1^2 \left( x_i^2 - 2 x_i \mu_x + \mu_x^2 \right) + \phi_1 \left( 2 \mu_w x_i - 2 \mu_x \mu_w - 2 x_i w_i + 2 \mu_x w_i \right) + \left( \mu_w^2 - 2 \mu_w w_i \right)\\ &= I^{-1} \sum_{i = 1}^I w_i^2 + \phi_1^2 \left( x_i^2 - \mu_x^2 \right) - 2 \phi_1 \left( x_i w_i - \mu_x \mu_w \right) - \mu_w^2\\ &= \frac{\sum_{i = 1}^I w_i^2 - \mu_w^2}{I} + \frac{\phi_1^2}{I} \left( \sum_{i = 1}^I x_i^2 - \mu_x^2 \right) - \frac{2 \phi_1}{I} \left( \sum_{i = 1}^I x_i w_i - \mu_x \mu_w \right)\\ &= \frac{\sum_{i = 1}^I w_i^2 - \mu_w^2}{I} - \left( \frac{\sum_{i = 1}^I x_i w_i}{I} - \mu_x \mu_w \right)^2 \left( \frac{\sum_{i = 1}^I x_i^2}{I} - \mu_x^2 \right)^{-1}\\ &= \frac{\sum_{i = 1}^I (w_i - \mu_w)^2}{I} - \left( I^{-1} \sum_{i = 1}^I (x_i - \mu_x) (w_i - \mu_w) \right)^2 \left( I^{-1} \sum_{i = 1}^I (x_i - \mu_x)^2 \right)^{-1}\\ &= \sigma_{ww}^2 - \sigma_{xw}^2 \sigma_{xx}^{-1} \sigma_{xw}^2 & \quad & \text{definition of covariance with uniform probability.}\end{split}\]

Exercise 6.7

(1)

Assuming \(Pr(w)\) has a uniform prior simplifies (6.11) to

\[Pr(w \mid x) = \frac{ Pr(x \mid w) Pr(w) }{ \sum_{w \in \{ 0, 1 \}} Pr(x \mid w) Pr(w) } = \frac{ Pr(x \mid w) }{ Pr(x \mid w = 1) + Pr(x \mid w = 0) }.\]

The points on the decision boundary obey

\[\begin{split}Pr(w = 0 \mid x) &= Pr(w = 1 \mid x)\\ Pr(x \mid w = 0) &= Pr(x \mid w = 1)\\ \NormDist_x\left[ \mu_0, \sigma_0^2 \right] &= \NormDist_x\left[ \mu_1, \sigma_1^2 \right]\\ -\frac{1}{2} \log 2 \pi - \frac{1}{2} \log \sigma_0^2 - \frac{(x - \mu_0)^2}{2 \sigma_0^2} &= -\frac{1}{2} \log 2 \pi - \frac{1}{2} \log \sigma_1^2 - \frac{(x - \mu_1)^2}{2 \sigma_1^2} & \quad & \text{rearrange into a quadratic equation using log normals}\\ \log \sigma_0^2 + \frac{(x - \mu_0)^2}{\sigma_0^2} &= \log \sigma_1^2 + \frac{(x - \mu_1)^2}{\sigma_1^2}\\ a x^2 + bx + c &= 0\end{split}\]

where

\[\begin{split}a &= \sigma_0^{-2} - \sigma_1^{-2}\\\\ b &= 2 \left( \mu_1 \sigma_1^{-2} - \mu_0 \sigma_0^{-2} \right)\\\\ c &= \mu_0^2 \sigma_0^{-2} - \mu_1^2 \sigma_1^{-2} + \log \sigma_0^2 - \log \sigma_1^2.\end{split}\]

(2)

The shape of the decision boundary for the logistic regression model have the form of

\[\begin{split}Pr(w = 0 \mid x) &= Pr(w = 1 \mid x)\\ \BernDist_{w = 0}\left[ \sigmoid\left( \phi_0 + \phi_1 x \right) \right] &= \BernDist_{w = 1}\left[ \sigmoid\left( \phi_0 + \phi_1 x \right) \right]\\ 1 - \sigmoid\left(\phi_0 + \phi_1 x \right) &= \sigmoid\left( \phi_0 + \phi_1 x \right)\\ 1 + \exp \left( -\phi_0 - \phi_1 x \right) &= 2\\ \phi_1 x + \phi_0 &= 0.\end{split}\]

Exercise 6.8

The following uses the results of Exercise 6.7.

(1)

Suppose \(Pr(w)\) is uniform and \(\mu_0 = 0\), \(\sigma_0^2 = \sigma^2\), \(\mu_1 = 0\), \(\sigma_1^2 = 1.5 \sigma^2\),

\[\begin{split}a &= \sigma_0^{-2} - \sigma_1^{-2} = \frac{1}{3 \sigma^2}\\\\ b &= 2 \left( \mu_1 \sigma_1^{-2} - \mu_0 \sigma_0^{-2} \right) = 0\\\\ c &= \mu_0^2 \sigma_0^{-2} - \mu_1^2 \sigma_1^{-2} + \log \sigma_0^2 - \log \sigma_1^2 = -\log 1.5\end{split}\]

(2)

In order for the discriminative classifier to have the same decision boundary, a quadratic function

\[\phi_2 x^2 + \phi_1 x + \phi_0\]

needs to be used where

\[\begin{split}\begin{gather*} \phi_2 = a\\ \phi_1 = 0\\ \phi_0 = c. \end{gather*}\end{split}\]

Exercise 6.9

Let \(G(\mathbf{x})\) and \(D(\mathbf{x})\) denote the number of parameters a model has as a function of the dimensionality of \(\mathbf{x} \in \mathbb{R}^n\).

Generative Model

Suppose the prior is uniform and the model parameters are \(\boldsymbol{\theta} = \left\{ \boldsymbol{\mu}_0, \boldsymbol{\mu}_1, \boldsymbol{\Sigma}_0, \boldsymbol{\Sigma}_1 \right\}\).

Recall that a symmetric matrix (e.g. covariance matrix) has \(\frac{n (n + 1)}{2}\) scalars.

\[G(\mathbf{x}) = 2n + 2 \frac{n (n + 1)}{2} = n^2 + 3n.\]

Discriminative Model

The model parameters consists of \(\boldsymbol{\theta} = \{ \phi_0, \boldsymbol{\phi} \}\).

\[D(\mathbf{x}) = n + 1.\]

Exercise 6.10

The goal now is to infer a multi-valued label \(w_n \in \{0, 1, 2\}\) that indicates whether the \(n\text{th}\) pixel in the image is part of a known background \((w = 0)\), foreground \((w = 1)\), or shadow \((w = 2)\).

The prior \(Pr(w)\) would be a categorical distribution.

Since the background is known and there is lighting in the scene, shadows will make the pixels “dimmer”. In addition to Equations (6.16) and (6.17), the class conditional distribution of the shadows could be modeled as

\[Pr(\mathbf{x}_n \mid w = 2) = \NormDist_{\mathbf{x}_n}\left[ \boldsymbol{\mu}_{n2}, \boldsymbol{\Sigma}_{n2} \right].\]

References

Brua

Jerry Brunner. Logistic regression. http://www.utstat.toronto.edu/ brunner/oldclass/312f12/lectures/312f12LogisticRegression1.pdf. Accessed on 2017-06-19.

Brub

Jerry Brunner. Multinomial logit models. http://www.utstat.toronto.edu/ brunner/oldclass/312f12/lectures/312f12MultinomialLogit.pdf. Accessed on 2017-06-19.

Eng

Barbara Engelhardt. Introduction: mle, map, bayesian reasoning. https://web.archive.org/web/20150225224855/https://genome.duke.edu/labs/engelhardt/courses/scribe/lec_08_28_2013.pdf. Accessed on 2017-06-17.