High-Accuracy Stereo Depth Maps Using Structured Light

Motivation(s)

The recent resurgence of interest in stereo correspondence has been spurred by breakthroughs in matching strategies, optimization algorithms, and a lack of complex ground truth data sets. Previous works have focused entirely on hand-labeled or trivial synthetic content.

Proposed Solution(s)

The authors propose structured light patterns to uniquely label each pixel, which is used to establish inter-image correspondence. This technique needs a pair of cameras and one or more light projectors. See the end of section 2 for the processing pipeline.

Evaluation(s)

Several high quality depth maps were generated using a single camera, a linear stage to translate the camera, and a single video projector. One advantage of this approach is that the rectified view disparities mitigate the need to calibrate video projectors. One disadvantage is the lack of scene interreflections.

Future Direction(s)

  • Would a camera simulation with synthetic scenery be an accurate measure of this technique and its improved variations?

  • How much would the accuracy improve if the view disparity estimates were done via probabilistic programming?

Question(s)

  • Would a better projector simplify gray-code decoding?

Analysis

Structured light gray code patterns can vastly improve the precision of depth maps generated by stereo cameras.

The view disparity estimates and how the disparity estimates were combined seems kind of hackish. It is interesting to note that gray codes yield better results because the proposed technique can avoid estimating the albedo. On the other hand, sine waves are susceptible to scene interreflections and nonlinearities of the camera and projector.

Notes

  • Structured Light

    • Gray Codes

      • Since only one bit changes at a time, it is suited for binary position encoding.

      • Requires \(\log_2 n\) patterns to distinguish \(n\) locations.

      • Decoding

        • Need code pattern and its inverse.

          • Fogging inside projector adds a low-frequency average of intensities to the projected pattern.

          • Take the brighter of the pattern to be the label.

        • Need different exposures to distinguish light patterns.

          • Shadowed areas, low albedo surfaces, high reflectance, or oblique angles typically result in unknown labels.

        • Since each illumination pixel is 2-4 camera pixels wide, gray codes are interpolated along prominent directions using a sliding 1D window.

    • Sine Waves

      • Takes advantage of gray-level resolution to reduce the number of required images or improve the precision for the same number of images.

      • Projects patterns at two frequencies and 12 phases.

      • Decoding

        • Assuming a linear image formation process, reformulate the problem to be linear in the number of unknowns.

        • Use linear least squares to solve for the information matrix.

  • Disparity Computation

    • View disparities

      • Independently compute LR and RL disparities and cross-check.

    • Illumination Disparities

      • Pixels that have high residual errors are treated as outliers.

      • The projective depth and pixel code values can be used to solve for the illumination source’s projection matrix via least squares in homogeneous coordinates.

References

SS03

Daniel Scharstein and Richard Szeliski. High-accuracy stereo depth maps using structured light. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 1, I–I. IEEE, 2003.