Multiple Cameras

Rectification

  • Dense reconstruction is the process of estimating the depth at every point in the image.

  • Planar rectification is suitable when the epipole is sufficiently far outside the image. Otherwise, mapping the epipoles to infinity will distort the image greatly.

  • Polar rectification is suitable when planar rectification is not.

    • The first axis represents the distance to the epipole while the second axis denotes the angle from the epipole.

Exercise 16.1

\(x_1\) and \(x_2\) lie on the epipolar plane formed by the optical centers and the intersection of the rays. The line joining the optical centers is called the baseline. The epipolar lines are formed by tracing points on a ray towards the other optical center. The projection of \(\mathcal{O}_1\) into camera plane 2 is called an epipole (and vice versa). Epipoles are located at the intersection of epipolar lines in an image.

Exercise 16.2

\[\begin{split}\mathbf{a} \times \mathbf{b} &= \begin{vmatrix} i & j & k\\ a_1 & a_2 & a_3\\ b_1 & b_2 & b_3 \end{vmatrix}\\ &= \begin{bmatrix} a_2 b_3 - b_2 a_3\\ -(a_1 b_3 - b_1 a_3)\\ a_1 b_2 - b_1 a_2 \end{bmatrix}\\ &= \begin{bmatrix} 0 & -a_3 & a_2\\ a_3 & 0 & -a_1\\ -a_2 & a_1 & 0 \end{bmatrix} \begin{bmatrix} b_1\\ b_2\\ b_3 \end{bmatrix}\end{split}\]

Exercise 16.3

Let \(\{ \mathbf{\Omega} \in \mathcal{SO}(3), \boldsymbol{\tau} \in \mathbb{R}^3 \}\) (defined in world coordinates) denote the change of coordinates from the corresponding camera frame to the world frame:

\[\mathbf{w} = \left[ \begin{array}{c|c} \boldsymbol{\Omega} & \boldsymbol{\tau} \end{array} \right] \tilde{\mathbf{x}} = \boldsymbol{\Omega} \mathbf{x} + \boldsymbol{\tau}\]

where the diacritic \(\tilde{\cdot}\) denotes homogeneous coordinates, and \(w, x \in \mathbb{R}^3\) are the corresponding points in world and camera coordinates respectively. Consequently, the reverse change of frames in 4D homogeneous coordinates is

\[\begin{split}\begin{bmatrix} \boldsymbol{\Omega} & \boldsymbol{\tau}\\ \boldsymbol{0}^\top & 1 \end{bmatrix}^{-1} = \left( \begin{bmatrix} \mathbf{I} & \boldsymbol{\tau}\\ \boldsymbol{0}^\top & 1 \end{bmatrix} \begin{bmatrix} \boldsymbol{\Omega} & \boldsymbol{0}\\ \boldsymbol{0}^\top & 1 \end{bmatrix} \right)^{-1} = \begin{bmatrix} \boldsymbol{\Omega}^\top & \boldsymbol{0}\\ \boldsymbol{0}^\top & 1 \end{bmatrix} \begin{bmatrix} \mathbf{I} & -\boldsymbol{\tau}\\ \boldsymbol{0}^\top & 1 \end{bmatrix} = \begin{bmatrix} \boldsymbol{\Omega}^\top & -\boldsymbol{\Omega}^\top \boldsymbol{\tau}\\ \boldsymbol{0}^\top & 1 \end{bmatrix}.\end{split}\]

The nodal points in world coordinates are

\[\mathcal{O}_1 = \boldsymbol{\Omega}_1 \boldsymbol{0} + \boldsymbol{\tau}_1 \qquad \land \qquad \mathcal{O}_2 = \boldsymbol{\Omega}_2 \boldsymbol{0} + \boldsymbol{\tau}_2\]

since the projective geometry of image formation sets the center of projection at the origin in camera coordinates [Lanb].

The epipolar constraint states that \(\{ \mathbf{w}, \mathbf{x}_1, \mathbf{x}_2, \mathcal{O}_1, \mathcal{O}_2 \}\) are all coplanar:

\[\begin{split}0 &= \overrightarrow{\mathcal{O}_1 \mathbf{w}} \cdot \left( \overrightarrow{\mathcal{O}_1 \mathcal{O}_2} \times \overrightarrow{\mathcal{O}_2 \mathbf{w}} \right)\\ &= \left[ \left( \boldsymbol{\Omega}_1 \mathbf{x}_1 + \boldsymbol{\tau}_1 \right) - \mathcal{O}_1 \right]^\top \left[ \left( \mathcal{O}_2 - \mathcal{O}_1 \right) \times \left( \left( \boldsymbol{\Omega}_2 \mathbf{x}_2 + \boldsymbol{\tau}_2 \right) - \mathcal{O}_2 \right) \right] & \quad & \text{convert everything into world coordinates}\\ &= \left( \boldsymbol{\Omega}_1 \mathbf{x}_1 \right)^\top \left[ \left( \boldsymbol{\tau}_2 - \boldsymbol{\tau}_1 \right) \times \left( \boldsymbol{\Omega}_2 \mathbf{x}_2 \right) \right] & \quad & \text{definition of nodal points}\\ &= \mathbf{x}_1^\top \mathbf{E} \mathbf{x}_2\end{split}\]

where \(\mathbf{E} = \boldsymbol{\Omega}_1^\top \left[ \boldsymbol{\tau}_2 - \boldsymbol{\tau}_1 \right]_\times \boldsymbol{\Omega}_2\) is the essential matrix [Jep].

Exercise 16.4

The essential matrix is a \(3 \times 3\) matrix that relates homogeneous points to epipolar lines. It contains six degrees of freedom, but is ambiguous up to a scale. It has rank two. If we know the intrinsic matrices of the two cameras, we can decompose the essential matrix to recover the rotation exactly and translation up to an unknown scaling factor.

Exercise 16.5

It seems the second camera undergoes a pure translation because the epipoles lie on the infinity plane (see Figure 16.3).

[1]:
import numpy

W = numpy.asmatrix([[0, -1, 0],
                   [1, 0, 0],
                   [0, 0, 1]])
E = numpy.asmatrix([[0, 0, 10],
                   [0, 0, 0],
                   [-10, 0, 0]])
U, d, Vt = numpy.linalg.svd(E)
#grab the last row of V^T or last column of V
e1 = Vt[-1].T
print('e1: {0}'.format(e1.A1))

#grab the last column of U or last row of U^T
e2 = U[:, -1]
print('e1: {0}'.format(e2.A1))

tau_x = U * numpy.diag(d) * W * U.T
tau = numpy.asarray([tau_x.A1[7], tau_x.A1[2], tau_x.A1[3]])
omega = U * numpy.linalg.inv(W) * Vt
print('tau: {0}'.format(tau))
print('omega:\n{0}'.format(omega))

for x in [[1, -1, 1], [-5, -2, 1]]:
    l = E * numpy.asmatrix(x).T
    print('(x, l): ({0}, {1})'.format(x, l.A1))
e1: [0. 1. 0.]
e1: [0. 1. 0.]
tau: [  0. -10.   0.]
omega:
[[-1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0. -1.]]
(x, l): ([1, -1, 1], [ 10   0 -10])
(x, l): ([-5, -2, 1], [10  0 50])

Exercise 16.6

Recall that the \(U\) and \(V\) are orthogonal matrices in \(E_{m \times n} = U_{m \times m} L_{m \times n} V_{n \times n}^\top\) where \(U^\top U = I_m\) and \(V^\top V = I_n\).

\[\begin{split}\boldsymbol{\tau}_{\times} \boldsymbol{\Omega} &= U L W U^\top U W^{-1} V^\top\\ &= U L W W^{-1} V^\top\\ &= U L V^\top\\ &= E\end{split}\]

Exercise 16.7

It is important to note that the essential matrix in Exercise 16.3 assumes the cameras are calibrated as

\[\mathbf{x} = \boldsymbol{\Lambda} \left[ \begin{array}{c|c} \boldsymbol{\Omega} & \boldsymbol{\tau} \end{array} \right] \tilde{\mathbf{w}} = \boldsymbol{\Lambda} \left( \boldsymbol{\Omega} \mathbf{w} + \boldsymbol{\tau} \right)\]

where \(\boldsymbol{\Lambda} = \mathbf{I}_3\). When the cameras are not calibrated, they transform the camera coordinates to homogeneous image (pixel) coordinates [Lana]:

\[0 = \mathbf{x}_1^\top \mathbf{E} \mathbf{x}_2 = \tilde{\mathbf{p}}_1^\top \boldsymbol{\Lambda}_1^{-\top} \mathbf{E} \boldsymbol{\Lambda}_2^{-1} \tilde{\mathbf{p}}_2.\]

The fundamental matrix \(\mathbf{F} \in \mathbb{R}^{3 \times 3}\) plays the role of the essential matrix for cameras with arbitrary intrinsic matrices [Fis]. It has seven degrees of freedom because it still has the constraint \(\det{\mathbf{F}} = 0\) and is ambiguous up to scale.

Exercise 16.8

See Exercise 15.12 for details.

[2]:
import numpy

outliers = 0.3
w = 1.0 - outliers
p = 0.99

for n in [8, 7]:
    k = numpy.log(1 - p) / numpy.log(1 - w**n)
    print('n = {0}: k = {1}'.format(n, k))
n = 8: k = 77.5589168821102
n = 7: k = 53.58343779552468

Exercise 16.9

Let \(\mathbf{l}_3^1 = \mathbf{F}_{13} \tilde{\mathbf{x}}_1\) and \(\mathbf{l}_3^2 = \mathbf{F}_{23} \tilde{\mathbf{x}}_2\) denote the epipolar lines in image 3 corresponding to the points \(\{ \mathbf{x}_1, \mathbf{x}_2 \}\). The position of the corresponding point in image 3 satisfies \(\tilde{\mathbf{x}}_3 = \mathbf{l}_3^1 \times \mathbf{l}_3^2\).

This solution assumes the epipolar lines have a unique intersection. See tri-focal tensor or quadri-focal tensor for a more robust solution.

Exercise 16.10

(i)

Define the \(\{ \mathbf{P}, \mathbf{W}, \mathbf{T} \}\) as follows:

\[\begin{split}\mathbf{P} &= \begin{bmatrix} \boldsymbol{\pi}_1\\ \boldsymbol{\pi}_2\\ \vdots\\ \boldsymbol{\pi}_I\\ \end{bmatrix}\\\\ \mathbf{W} &= \begin{bmatrix} \mathbf{w}_1 & \mathbf{w}_2 & \cdots & \mathbf{w}_J \end{bmatrix}\\ \mathbf{T} &= \begin{bmatrix} \boldsymbol{\tau}_1 & \boldsymbol{\tau}_1 & \cdots & \boldsymbol{\tau}_1\\ \boldsymbol{\tau}_2 & \boldsymbol{\tau}_2 & \cdots & \boldsymbol{\tau}_2\\ \vdots & \vdots & \ddots & \vdots\\ \boldsymbol{\tau}_I & \boldsymbol{\tau}_I & \cdots & \boldsymbol{\tau}_I \end{bmatrix}\end{split}\]

(ii)

[TK92] is the origin of this technique.

The translation along the Z-axis is lost due to the orthographic camera model. In order to solve for the rotation using only \(\mathbf{X}\), the measurements need to be centered which eliminates the other translation components.

The factorized solution is not unique because

\[\mathbf{X} - \mathbf{T} = \mathbf{P} \mathbf{W} = \mathbf{P} \mathbf{Q} \mathbf{Q}^{-1}\mathbf{W}\]

where \(\mathbf{Q} \in \mathcal{GL}(3)\). If \(\mathbf{P}\) is restricted to be an orthonormal matrix, then the solution is unique up to an unknown initial orientation of the world reference frame.

Exercise 16.11

Define

\[\begin{split}\mathbf{J} = \begin{bmatrix} \mathbf{D}_1 & \mathbf{V}_1\\ \mathbf{D}_2 & \mathbf{V}_2\\ \mathbf{D}_3 & \mathbf{V}_3 \end{bmatrix}\end{split}\]

where \(\mathbf{D}_\cdot, \mathbf{V}_\cdot\) take on the structure in Figure 16.17b.

\[\begin{split}\mathbf{J}^\top \mathbf{J} = \begin{bmatrix} \mathbf{D}_1^\top & \mathbf{D}_2^\top & \mathbf{D}_3^\top\\ \mathbf{V}_1^\top & \mathbf{V}_3^\top & \mathbf{V}_3^\top \end{bmatrix} \begin{bmatrix} \mathbf{D}_1 & \mathbf{V}_1\\ \mathbf{D}_2 & \mathbf{V}_2\\ \mathbf{D}_3 & \mathbf{V}_3 \end{bmatrix} = \left[ \begin{array}{c|c} \mathbf{D}_1^\top \mathbf{D}_1 + \mathbf{D}_2^\top \mathbf{D}_2 + \mathbf{D}_3^\top \mathbf{D}_3 & \mathbf{D}_1^\top \mathbf{V}_1 + \mathbf{D}_2^\top \mathbf{V}_2 + \mathbf{D}_3^\top \mathbf{V}_3\\ \hline \mathbf{V}_1^\top \mathbf{D}_1 + \mathbf{V}_2^\top \mathbf{D}_2 + \mathbf{V}_3^\top \mathbf{D}_3 & \mathbf{V}_1^\top \mathbf{V}_1 + \mathbf{V}_2^\top \mathbf{V}_2 + \mathbf{V}_3^\top \mathbf{V}_3 \end{array} \right]\end{split}\]

Appendix C.8.2 shows that the above matrix sub-blocks can be inverted via the Schur complement identity.

References

Fis

Bob Fisher. Fundamental matrix. http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/EPSRC_SSAZ/node22.html. Accessed on 2017-07-22.

Jep

Allan Jepson. Epipolar geometry. http://www.cs.toronto.edu/ jepson/csc420/notes/epiPolarGeom.pdf. Accessed on 2017-07-22.

Lana

Michael Langer. Camera models. http://www.cim.mcgill.ca/ langer/558/4-cameramodel.pdf. Accessed on 2017-07-22.

Lanb

Michael Langer. Projective geometry of image formation. http://www.cim.mcgill.ca/ langer/558/1-imageprojection.pdf. Accessed on 2017-07-22.

TK92

Carlo Tomasi and Takeo Kanade. Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9(2):137–154, 1992.