Multi-view stereo imagery allows for the reconstruction of dense 3D models from multiple sensors and has been applied to computer graphics, robotics, computer-aided design, and human-computer interactions. It relies solely on finding feature correspondences with epipolar constraints, which has led it to be a system that is viewed as fundamentally flawed when dealing with featureless objects. In their research, Nvidia proposes polarimetric multi-view stereo imaging, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction. In doing so, it reveals surface normal information, which is helpful to propagate depth to featureless regions (something that conventional multi-view stereo cannot do as well). In their research, Nvidia propose a polarization imaging model that can handle real-world objects with mixed polarization, which is possible due to polarimetric multi-view stereo's ability to be applied outdoors in uncontrolled illumination. The researchers prove that there are exactly two types of ambiguities on estimating surface azimuth angles from polarization, which they resolve by completing graph optimization and iso-depth contour tracing. In doing so, they significantly improve the initial depth map estimates, which are later fused together for complete 3D reconstruction. The results of the study showed high-quality 3D reconstruction and overall better performance than conventional multi-view stereo methods, especially on featureless 3D objects (such as ceramic tiles), office rooms with white walls, and highly reflective cars in the outdoors.

Polarization images provide information about surface normal vector fields for a wide range of materials (specular, diffuse, glass, metal, etc.). Polarimetric multi-view stereo is a completely passive approach, and it can be applied to a wide variety of objects in the outdoors with uncontrolled illumination. Using a polarization camera such as the FD-1665P, the scene may be completely characterized using a single temporally-synchronized shot, which avoids motion blur and allows for imaging of dynamic scenes. All prior work assumes either pure diffuse polarization reflection or pure specular polarization reflection, which has made this technology impractical for many real-world objects with mixed polarization reflection.

Nvidia's research proves that polarized images can determine the azimuth angles of surface normals with two types of ambiguities: the π–ambiguity and the π/2-ambiguity. The goal was to simultaneously resolve such ambiguities in azimuth angle estimation, and use the azimuth angles to propagate depth estimation from sparse points with sufficient features to featureless regions for dense 3D reconstruction. They resolve the π/2-ambiguity with graph optimization and bypass the π–ambiguity with iso-depth contour tracing. This significantly improved the initial depth maps estimated from a classical multi-view stereo approach. Nvidia’s approach is completely passive and can work under uncontrolled illumination in the outdoors, instead of active illumination, diffuse lighting, or distant lighting. This allows for their method to be applicable to a wide variety of objects with mixed polarized diffuse and specular reflections, instead of being limited to either diffuse reflection only or specular reflection only.

Flowchart of proposed polarimetric multi-view algorithm

Shown above is the proposed polarimetric multi-view stereo algorithm. The input consists of polarized images captured at multiple viewpoints, either with polarization cameras or with a linear polarizer rotated at multiple angles. Classical multi-view stereo methods are used to first recover the camera positions as well as the initial 3D shape for well-textured regions. They then compute the phase angle maps for each view from the corresponding polarized images, from which they resolve the ambiguities to estimate azimuth angles to recover depth for featureless regions. Finally, the depth maps from multiple views are fused together to recover the complete 3D shape.

In their study, Nvidia captured five scenes under both natural indoor and outdoor illumination (vase, tile, balloon, corner, car). All images were captured using a Canon EOS 7D camera with a 50 mm lens. A Hoya linear polarizer was mounted in front of the camera lens. For each view, seven images were captured with the polarizer angles spaced 30° apart. Exemplar images and the camera poses recovered from VisualSFM are shown in the leftmost column (shown below).

    Top to bottom rows: Vase, Tile, Balloon, Corner, Car

In the figure shown above, Nvidia shows comparisons against two multi-view stereo methods, MVE and Gipuma, by showing the results after depth fusion for all the methods. For sake of example, in the car image, MVW produced a reasonable reconstruction, but with many outliers. Gipuma could only reconstruct a skeleton of the car. Nvidia’s polarimetric multi-view stereo method achieved the most complete and accurate 3D reconstruction of the car, thanks to the estimated phase angle from polarization and depth propagation.

In conclusion, Nvidia successfully presented polarimetric multi-viewed stereo, a completely passive, novel approach for dense 3D reconstruction. Polarimetric multi-view stereo shows its strength especially for featureless regions and non-Lambertian surfaces, where it propagates depth estimated from well-textured regions to featureless regions guided by the azimuth angles estimated from polarized images. Results demonstrated high-quality 3D reconstruction and better performance than standard multi-view stereo methods.

To learn more about polarimetric multi-view stereo, click here.