Models for Transformations

Affine Transformation Model

Affine transformations provide a good mapping when the depth variation of the plane as seen by the camera is small relative to the mean distance from the camera. This scenario occurs when the viewing angle is not too oblique, the camera is distant, and the field of view is small.

Geometric Properties of the Homography

For a geometric interpretation, see Figures 15.13 and 15.14. The homography maps between:

  • points on a plane in the real world and their positions in an image,

  • points in two different images of the same plane, and

  • two images of a 3D object where the camera has rotated but not translated.

Exercise 15.1

The inverse of \(\mathbf{x}_2 = \boldsymbol{\Omega}_1 \mathbf{x}_1 + \boldsymbol{\tau}_1\) needs to satisfy

\[\begin{split}\mathbf{x}_1 &= \boldsymbol{\Omega}_2 \mathbf{x}_2 + \boldsymbol{\tau}_2\\ &= \boldsymbol{\Omega}_2 \left( \boldsymbol{\Omega}_1 \mathbf{x}_1 + \boldsymbol{\tau}_1 \right) + \boldsymbol{\tau}_2\\ &= \boldsymbol{\Omega}_2 \boldsymbol{\Omega}_1 \mathbf{x}_1 + \boldsymbol{\Omega}_2 \boldsymbol{\tau}_1 + \boldsymbol{\tau}_2.\end{split}\]

Thus \(\boldsymbol{\Omega}_2 = \boldsymbol{\Omega}_1^{-1} = \boldsymbol{\Omega}_1^\top\) and \(\boldsymbol{\tau}_2 = -\boldsymbol{\Omega}_2 \boldsymbol{\tau}_1 = -\boldsymbol{\Omega}_1^\top \boldsymbol{\tau}_1\).

Exercise 15.2

A 2D line is defined as \(\mathbf{l} \cdot \tilde{\mathbf{x}} = \mathbf{l}^\top \tilde{\mathbf{x}} = 0\).

If the points are transformed such that \(\tilde{\mathbf{x}}' = \mathbf{T} \tilde{\mathbf{x}}\), then the line becomes \(\mathbf{l}' = \mathbf{T}^{-\top} \mathbf{l}\) such that

\[\begin{split}\mathbf{l}' \cdot \tilde{\mathbf{x}}' &= \mathbf{l}'^\top \tilde{\mathbf{x}}'\\ &= \mathbf{l}^\top \mathbf{T}^{-1} \mathbf{T} \tilde{\mathbf{x}}\\ &= \mathbf{l}^\top \tilde{\mathbf{x}}\\ &= 0.\end{split}\]

Exercise 15.3

The DLT algorithm in section \(15.2.4\) can be reused for lines \(\mathbf{l}\) and \(\mathbf{l}'\) such that

\[\tilde{\mathbf{l}}_2 \sim H \tilde{\mathbf{l}}_1 \rightarrow \min_{\mathbf{H}} \tilde{\mathbf{l}}_2 - H \tilde{\mathbf{l}}_1.\]

Exercise 15.4

A conic is defined as \(\tilde{\mathbf{x}}^\top \mathbf{C} \tilde{\mathbf{x}} = 0\).

If the points are transformed such that \(\tilde{\mathbf{x}}' = \mathbf{T} \tilde{\mathbf{x}}\), then the conic becomes \(\mathbf{C}' = \mathbf{T}^{-\top} \mathbf{C} \mathbf{T}^{-1}\) such that

\[\begin{split}{\tilde{\mathbf{x}}'}^\top \mathbf{C}' \tilde{\mathbf{x}}' &= \tilde{\mathbf{x}}^\top \mathbf{T}^\top \mathbf{T}^{-\top} \mathbf{C} \mathbf{T}^{-1} \mathbf{T} \tilde{\mathbf{x}}\\ &= \tilde{\mathbf{x}}^\top \mathbf{C} \tilde{\mathbf{x}}\\ &= 0.\end{split}\]

Exercise 15.5

\[\begin{split}\mathbf{T}_\text{Euclidean} &= \begin{bmatrix} \omega_{11} & \omega_{12} & \omega_{13} & \tau_x\\ \omega_{21} & \omega_{22} & \omega_{23} & \tau_y\\ \omega_{31} & \omega_{32} & \omega_{33} & \tau_z\\ 0 & 0 & 0 & 1 \end{bmatrix}\\\\ \mathbf{T}_\text{similarity} &= \begin{bmatrix} \rho \omega_{11} & \rho \omega_{12} & \rho \omega_{13} & \tau_x\\ \rho \omega_{21} & \rho \omega_{22} & \rho \omega_{23} & \tau_y\\ \rho \omega_{31} & \rho \omega_{32} & \rho \omega_{33} & \tau_z\\ 0 & 0 & 0 & 1 \end{bmatrix}\\\\ \mathbf{T}_\text{affine} &= \begin{bmatrix} \phi_{11} & \phi_{12} & \phi_{13} & \tau_x\\ \phi_{21} & \phi_{22} & \phi_{23} & \tau_y\\ \phi_{31} & \phi_{32} & \phi_{33} & \tau_z\\ 0 & 0 & 0 & 1 \end{bmatrix}\\\\ \mathbf{T}_\text{projective} &= \begin{bmatrix} \phi_{11} & \phi_{12} & \phi_{13} & \phi_{14}\\ \phi_{21} & \phi_{22} & \phi_{23} & \phi_{24}\\ \phi_{31} & \phi_{32} & \phi_{33} & \phi_{34}\\ \phi_{41} & \phi_{42} & \phi_{43} & \phi_{44} \end{bmatrix}\end{split}\]

Euclidean, similarity, affine, and perspective have respectively 6, 7, 12, and 15 unconstrained parameters. Notice that even though a 3D rotation matrix has nine parameters, there are six (orthonormal) constraints:

\[\begin{split}\left\vert \boldsymbol{\omega}_{\cdot, 1} \right\vert = \left\vert \boldsymbol{\omega}_{\cdot, 2} \right\vert = \left\vert \boldsymbol{\omega}_{\cdot, 3} \right\vert\\\\ \boldsymbol{\omega}_{\cdot, 3} = \boldsymbol{\omega}_{\cdot, 1} \times \boldsymbol{\omega}_{\cdot, 2}.\end{split}\]

Exercise 15.6

The DLT algorithm can be used to solve for the parameters. Each matched pair of 2D points on the plane or 3D points would generate two or three equations respectively, so only six pairs of 2D points or four pairs of 3D points are needed to solve for the twelve unknowns.

Exercise 15.7

The 1D affine transformation is defined as \(x' = ax + b\; \forall x \in \mathbb{R}\). The ratio of two distances is invariant to a 1D affine transformation because

\[\begin{split}\frac{x_1' - x_2'}{x_2' - x_3'} &= \frac{(a x_1 + b) - (a x_2 + b)}{(a x_2 + b) - (a x_3 + b)}\\ &= \frac{a (x_1 - x_2)}{a (x_2 - x_3)}\\ &= \frac{x_1 - x_2}{x_2 - x_3}\\ &= I.\end{split}\]

Exercise 15.8

The 1D projective transformation is defined as \(x' = \frac{ax + b}{cx + d}\; \forall x \in \mathbb{R}\). The cross-ratio of distances is invariant to a 1D projective transformation because

\[\begin{split}\frac{(x_3' - x_1') (x_4' - x_2')}{(x_3' - x_2') (x_4' - x_1')} &= \frac{ \left( \frac{a x_3 + b}{c x_3 + d} - \frac{a x_1 + b}{c x_1 + d} \right) \left( \frac{a x_4 + b}{c x_4 + d} - \frac{a x_2 + b}{c x_2 + d} \right) }{ \left( \frac{a x_3 + b}{c x_3 + d} - \frac{a x_2 + b}{c x_2 + d} \right) \left( \frac{a x_4 + b}{c x_4 + d} - \frac{a x_1 + b}{c x_1 + d} \right) }\\ &= \frac{ \left[ (a x_1 + b) (c x_3 + d) - (a x_3 + b) (c x_1 + d) \right] \left[ (a x_2 + b) (c x_4 + d) - (a x_4 + b) (c x_2 + d) \right] }{ \left[ (a x_1 + b) (c x_4 + d) - (a x_4 + b) (c x_1 + d) \right] \left[ (a x_2 + b) (c x_3 + d) - (a x_3 + b) (c x_2 + d) \right] } & \quad & \text{(a)}\\ &= \frac{ \left( ad x_1 - ad x_3 - bc x_1 + bc x_3 \right) \left( ad x_2 - ad x_4 - bc x_2 + bc x_4 \right) }{ \left( ad x_1 - ad x_4 - bc x_1 + bc x_4 \right) \left( ad x_2 - ad x_3 - bc x_2 + bc x_3 \right) }\\ &= \frac{ (ad - bc) (x_1 - x_3) (ad - bc) (x_2 - x_4) }{ (ad - bc) (x_1 - x_4) (ad - bc) (x_2 - x_3) }\\ &= \frac{ (x_3 - x_1) (x_4 - x_2) }{ (x_3 - x_2) (x_4 - x_1) }.\end{split}\]

(a)

The common denominator \((c x_1 + d) (c x_2 + d) (c x_3 + d) (c x_4 + d)\) is canceled out.

Exercise 15.9

Let \(\boldsymbol{\phi}_{k,\cdot}\) represent the \(k^\text{th}\) row of the matrix \(\boldsymbol{\Phi}\).

\[\begin{split}\tilde{\mathbf{x}} \times \boldsymbol{\Phi} \tilde{\mathbf{w}} &= \begin{vmatrix} i & j & k\\ x & y & 1\\ \boldsymbol{\phi}_{1,\cdot} \tilde{\mathbf{w}} & \boldsymbol{\phi}_{2,\cdot} \tilde{\mathbf{w}} & \boldsymbol{\phi}_{3,\cdot} \tilde{\mathbf{w}} \end{vmatrix}\\ &= \begin{bmatrix} y \boldsymbol{\phi}_{3,\cdot} \tilde{\mathbf{w}} - \boldsymbol{\phi}_{2,\cdot} \tilde{\mathbf{w}}\\ \boldsymbol{\phi}_{1,\cdot} \tilde{\mathbf{w}} - x \boldsymbol{\phi}_{3,\cdot} \tilde{\mathbf{w}}\\ x \boldsymbol{\phi}_{2,\cdot} \tilde{\mathbf{w}} - y \boldsymbol{\phi}_{1,\cdot} \tilde{\mathbf{w}} \end{bmatrix}\\ &= \begin{bmatrix} y (\phi_{3,1} u + \phi_{3,2} v + \phi_{3,3}) - (\phi_{2,1} u + \phi_{2,2} v + \phi_{2,3})\\ (\phi_{1,1} u + \phi_{1,2} v + \phi_{1,3}) - x (\phi_{3,1} u + \phi_{3,2} v + \phi_{3,3})\\ x (\phi_{2,1} u + \phi_{2,2} v + \phi_{2,3}) - y (\phi_{1,1} u + \phi_{1,2} v + \phi_{1,3}) \end{bmatrix}\\ &= \boldsymbol{0}.\end{split}\]

Note that cross product is merely one interpretation; see Exercise 15.3 for a different perspective.

Exercise 15.10

The camera model in homogeneous coordinates (14.24) is \(\tilde{\mathbf{x}} = \boldsymbol{\Lambda} \left[ \begin{array}{c|c} \boldsymbol{\Omega} & \boldsymbol{\tau} \end{array} \right] \tilde{\mathbf{w}}\).

Since the initial camera extrinsic parameters are \(\boldsymbol{\Omega} = \mathbf{I}\) and \(\boldsymbol{\tau} = \boldsymbol{0}\), \(\tilde{\mathbf{x}}_1 = \boldsymbol{\Lambda} \mathbf{w}\).

The camera then undergoes a pure rotation transformation, with \(\boldsymbol{\Omega} = \boldsymbol{\Omega}_1\), resulting in \(\tilde{\mathbf{x}}_2 = \boldsymbol{\Lambda} \boldsymbol{\Omega}_1 \mathbf{w}\).

The homography \(\boldsymbol{\Phi}\) mapping from image 1 to image 2 needs to satisfy

\[\begin{split}\tilde{\mathbf{x}}_2 &= \boldsymbol{\Phi} \tilde{\mathbf{x}}_1\\ \boldsymbol{\Lambda} \boldsymbol{\Omega}_1 \mathbf{w} &= \boldsymbol{\Phi} \boldsymbol{\Lambda} \mathbf{w};\end{split}\]

thus \(\boldsymbol{\Phi} = \boldsymbol{\Lambda} \boldsymbol{\Omega}_1 \boldsymbol{\Lambda}^{-1}\).

Exercise 15.11

As mentioned in Exercise 15.5, the 3D rotation matrix has three degrees of freedom so either one 3D point match or two 2D point matches are needed.

Exercise 15.12

Let \(k\) denote the number of iterations to run RANSAC. Let \(p\) be the probability that RANSAC selects only inliers from the input data set when it chooses the \(n\) points from which the model parameters are estimated. Let \(w\) denote the probability of choosing an inlier each time a single point is selected.

Each point is selected independently, so \(w^n\) is the probability that all \(n\) points are inliers and \(1 - w^n\) is the probability that at least one of the \(n\) points is an outlier.

We can upper bound \(k\) by assuming a point which has been selected once is replaced and can be selected again in the same iteration. The probability that RANSAC never selects a set of \(n\) points which all are inliers is

\[\begin{split}(1 - w^n)^k &= 1 - p\\ k &= \frac{\log(1 - p)}{\log(1 - w^n)}.\end{split}\]

Exercise 15.13

Apply the EM algorithm as shown in “Mixture of Gaussians” (section 7.4).

I think the t-distribution is more appropriate to handle the outliers. The mixture of Gaussians makes more sense when the data is multimodal. The noise can be large or small, so it is unclear which distribution is the noise.

Exercise 15.14

This is analogous to forward mapping versus reverse mapping. The former requires hole-filling while the latter is hole-free and even supports filtering of the source image.