The Pinhole Camera

Homogeneous Coordinates

This linearizes the projection equations i.e. a matrix-vector product. The original mapping from 3D Cartesian to 2D Cartesian is nonlinear due to the division by \(w\). The mapping from 4D homogeneous to 3D homogeneous is linear because the division by \(w\) have been side-stepped.

Solutions in this representation are not guaranteed to be the same as those for the original problem. These solutions minimize more abstract objective functions based on algebraic error. They are generally close enough to provide a good starting point for nonlinear optimization of the true cost function.

Learning Extrinsic Parameters

The idiosyncrasies of the camera intrinsics have been discarded via pre-multiplying \(\Lambda^{-1}\). This can be interpreted as a normalized camera i.e. \(\Lambda\) is identity. The transformed coordinates are known as normalized image coordinates.

Reparameterization (section B.4) is one trick to convert constrained optimization problems into unconstrained ones.

Exercise 14.1

[1]:
import numpy

sensor_w = 1
sensor_h = 1

fov_h = numpy.deg2rad(60)

focal_length = (sensor_w / 2) / numpy.tan(fov_h / 2)
print('{:.4f} cm'.format(focal_length))

pixel_w = sensor_w / 100
pixel_h = sensor_h / 200

fl_x = focal_length / pixel_w
fl_y = focal_length / pixel_h
print('{:.4f} px'.format(fl_x))
print('{:.4f} px'.format(fl_y))
0.8660 cm
86.6025 px
173.2051 px

Exercise 14.2

The dolly zoom adjusts the distance of the camera to a foreground object in the scene while simultaneously controlling the camera’s field of view (a function of the focal length). This keeps the foreground object a constant size in the image throughout the entire capture sequence.

In the illustrations, the green points \((X, Y, Z)\) on the constant plane are projected onto the image plane \((x, y)\) at the same location throughout the dolly zoom. Since the image plane is a square, the focal length is the same for both dimensions.

The initial configuration gives the relation

\[\frac{\phi}{Z - w} = \frac{x}{X} = \frac{y}{Y}\]

where \(w = 0\) and \(\phi = 1\). When the camera moves forward to \(w' = 100\), the focal length changes to

\[\begin{split}\frac{\phi'}{Z - w'} &= \frac{x}{X} = \frac{y}{Y} = \frac{\phi}{Z - w}\\ \phi' &= \frac{\phi}{Z - w} (Z - w').\end{split}\]

Exercise 14.3

\[\begin{split}\Lambda_\text{ortho} &= \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 \end{bmatrix}\\\\ \Lambda_\text{perspective} &= \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 / f & 1 \end{bmatrix}\\\\ \Lambda_\text{weak perspective} &= \begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & z_r / f \end{bmatrix}\end{split}\]

Exercise 14.4

The point that is orthogonal to the homogeneous lines \(\mathbf{l}_1\) and \(\mathbf{l}_2\) is \(\tilde{\mathbf{x}} = \mathbf{l}_1 \times \mathbf{l}_2\).

Notice that all parallel homogeneous lines will converge at infinity where the last component of the homogeneous representation is zero.

[2]:
import numpy

_ = '{} = {} x {}'
pair_lines = [(numpy.asarray([3, 1, 1]), numpy.asarray([-1, 0, 1])),
              (numpy.asarray([1, 0, 1]), numpy.asarray([3, 0, 1]))]
for (a, b) in pair_lines:
    print(_.format(numpy.cross(a, b), a, b))
[ 1 -4  1] = [3 1 1] x [-1  0  1]
[0 2 0] = [1 0 1] x [3 0 1]

Exercise 14.5

Since the definition of a homogeneous line is \(\mathbf{l} \tilde{\mathbf{x}} = 0\), the line that joins the homogeneous points \(\tilde{\mathbf{x}}_1\) and \(\tilde{\mathbf{x}}_2\) is \(\mathbf{l} = \tilde{\mathbf{x}}_1 \times \tilde{\mathbf{x}}_2\).

[3]:
import numpy

_ = '{} = {} x {}'
pair_lines = [(numpy.asarray([2, 2, 1]), numpy.asarray([-2, -2, 1]))]
for (a, b) in pair_lines:
    print(_.format(numpy.cross(a, b), a, b))
[ 4 -4  0] = [2 2 1] x [-2 -2  1]

Exercise 14.6

\[\begin{split}\begin{bmatrix} x & y & 1 \end{bmatrix} \begin{bmatrix} a & b & c\\ b & d & e\\ c & e & f \end{bmatrix} \begin{bmatrix} x\\ y\\ 1 \end{bmatrix} = ax^2 + 2bxy + 2cx + dy^2 + 2ey + f = 0\end{split}\]

Notice that we can multiply the solution by any real value (e.g. \(\frac{1}{f}\)) and still satisfy the condition. The scale ambiguity enables us to find a unique solution with a minimum of five points. Stacking each point’s equation forms \(\mathbf{A} \mathbf{w} = \mathbf{0}\); the right null space of \(\mathbf{A}\) can be found using the SVD (see 14.30).

Exercise 14.7

One idea is to reduce the problem to a regular camera calibration by computing a local homography for each corner and converting image coordinates to projector coordinates.

Exercise 14.8

The minimum number of binary striped light patterns to estimate the camera-projector correspondences for a projector image of size \(H \times W\) are \(\log_2(H)\) horizontal patterns and \(\log_2(W)\) vertical patterns.

##Exercise 14.9

The book’s author blended the pixel contribution from different cameras using the depth information as the weight. Another approach could be to estimate a set of points around that region of interest and ray cast the point set.

Exercise 14.10

Since the position of the object and point light source are known, simply cast a ray to the light from any point on the plane.