Nvidia Research: Polarimetric Multi-View Stereo

Multi-view stereo imagery allows for the reconstruction of dense 3D models from multiple sensors and has been applied to computer graphics, robotics, computer-aided design, and human-computer interactions. It relies solely on finding feature correspondences with epipolar constraints, which has led it to be a system that is viewed as fundamentally flawed when dealing with featureless objects. In their research, Nvidia proposes polarimetric multi-view stereo imaging, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction. In doing so, it reveals surface normal information, which is helpful to propagate depth to featureless regions (something that conventional multi-view stereo cannot do as well). In their research, Nvidia propose a polarization imaging model that can handle real-world objects with mixed polarization, which is possible due to polarimetric multi-view stereo's ability to be applied outdoors in uncontrolled illumination. The researchers prove that there are exactly two types of ambiguities on estimating surface azimuth angles from polarization, which they resolve by completing graph optimization and iso-depth contour tracing. In doing so, they significantly improve the initial depth map estimates, which are later fused together for complete 3D reconstruction. The results of the study showed high-quality 3D reconstruction and overall better performance than conventional multi-view stereo methods, especially on featureless 3D objects (such as ceramic tiles), office rooms with white walls, and highly reflective cars in the outdoors.

Polarization images provide information about surface normal vector fields for a wide range of materials (specular, diffuse, glass, metal, etc.). Polarimetric multi-view stereo is a completely passive approach, and it can be applied to a wide variety of objects in the outdoors with uncontrolled illumination. Using a polarization camera such as the FD-1665P, the scene may be completely characterized using a single temporally-synchronized shot, which avoids motion blur and allows for imaging of dynamic scenes. All prior work assumes either pure diffuse polarization reflection or pure specular polarization reflection, which has made this technology impractical for many real-world objects with mixed polarization reflection.

Nvidia's research proves that polarized images can determine the azimuth angles of surface normals with two types of ambiguities: the π–ambiguity and the π/2-ambiguity. The goal was to simultaneously resolve such ambiguities in azimuth angle estimation, and use the azimuth angles to propagate depth estimation from sparse points with sufficient features to featureless regions for dense 3D reconstruction. They resolve the π/2-ambiguity with graph optimization and bypass the π–ambiguity with iso-depth contour tracing. This significantly improved the initial depth maps estimated from a classical multi-view stereo approach. Nvidia’s approach is completely passive and can work under uncontrolled illumination in the outdoors, instead of active illumination, diffuse lighting, or distant lighting. This allows for their method to be applicable to a wide variety of objects with mixed polarized diffuse and specular reflections, instead of being limited to either diffuse reflection only or specular reflection only.

Flowchart of proposed polarimetric multi-view algorithm

Shown above is the proposed polarimetric multi-view stereo algorithm. The input consists of polarized images captured at multiple viewpoints, either with polarization cameras or with a linear polarizer rotated at multiple angles. Classical multi-view stereo methods are used to first recover the camera positions as well as the initial 3D shape for well-textured regions. They then compute the phase angle maps for each view from the corresponding polarized images, from which they resolve the ambiguities to estimate azimuth angles to recover depth for featureless regions. Finally, the depth maps from multiple views are fused together to recover the complete 3D shape.

In their study, Nvidia captured five scenes under both natural indoor and outdoor illumination (vase, tile, balloon, corner, car). All images were captured using a Canon EOS 7D camera with a 50 mm lens. A Hoya linear polarizer was mounted in front of the camera lens. For each view, seven images were captured with the polarizer angles spaced 30° apart. Exemplar images and the camera poses recovered from VisualSFM are shown in the leftmost column (shown below).

    Top to bottom rows: Vase, Tile, Balloon, Corner, Car

In the figure shown above, Nvidia shows comparisons against two multi-view stereo methods, MVE and Gipuma, by showing the results after depth fusion for all the methods. For sake of example, in the car image, MVW produced a reasonable reconstruction, but with many outliers. Gipuma could only reconstruct a skeleton of the car. Nvidia’s polarimetric multi-view stereo method achieved the most complete and accurate 3D reconstruction of the car, thanks to the estimated phase angle from polarization and depth propagation.

In conclusion, Nvidia successfully presented polarimetric multi-viewed stereo, a completely passive, novel approach for dense 3D reconstruction. Polarimetric multi-view stereo shows its strength especially for featureless regions and non-Lambertian surfaces, where it propagates depth estimated from well-textured regions to featureless regions guided by the azimuth angles estimated from polarized images. Results demonstrated high-quality 3D reconstruction and better performance than standard multi-view stereo methods.

To learn more about polarimetric multi-view stereo, click here.

Comparison of Polarimetric Cameras

Of the four characteristics of light used in scene analysis, wavelength, amplitude, coherence, and polarization, polarization is relatively new to the remote sensing community. Polarization can be used to distinguish features in both natural and manmade objects that other detection modes used in remote sensing cannot. The use of polarization imaging is not as prevalent as panchromatic and multispectral techniques. Evaluation of polarimetric cameras can help in its adoption. Jarrad A. Smoke, of the Naval Postgraduate School in Monterey, CA, is doing just this. In his thesis, he compared two polarization imaging technologies to access their capabilities and potential use in remote sensing.

Smoke compared two different polarimetric imaging systems: the Bossa Nova Salsa polarimetric camera and the FluxData FD-1665P. He collected images with both systems and evaluated their performance. The two systems operate on different principles. The Salsa uses a Division of Time Polarimeter (DoTP), which is sensitive to movement, and FluxData’s camera uses a Division of Amplitude Polarimeter (DoAmP), which is designed to split the incoming light without errors from scene movement. The objective of this study was to determine the similarities and differences of the images captured and compare how well each camera assessed and depicted the effects of polarization noting the advantages/disadvantages of each technique.  His analysis aimed at a better understanding of how imaging can be utilized to find a common ground for polarization imaging.

The Bossa Nova Tech Salsa polarization camera consists of both polarization analysis and regular digital video camera capabilities. The main feature of the Salsa camera is its ability to display full Stokes parameters and calculations in real time on each pixel. The camera technology uses a patented polarization filter based on a ferroelectric liquid crystal, which reduce the acquisition time when taking images and separates polarized light onto a 782 x 582 pixel detector operating in the 400 to 700 nm range. The camera has a standard 1 megapixel 12-bit CCD sensor and an interchangeable F mounted lens. The camera is powered by fifteen-volt DC and has FireWire and USB connections for data input/output to a connected computer. The Salsa uses a division of time polarimeter (DoTP) to capture imagery. Using sequential images taken with the polarization filter rotated to different orientations, the camera constructs the linear Stokes vectors pixel by pixel using a modified Pickering method. Because of the filter rotation, movement in the scene results in miscalculations which limits the Salsa’s use to static scenes.

The FD-1665P 3 CCD Camera captures video and images at three linear polarization directions simultaneously, in full color. By doing so, all timing and movement issues are eliminated. The Stokes parameters can be calculated on the output images or video stream by the user and is not built-in to the camera’s processing. The camera sensor type is a Progressive Scan Charge Coupled Device (CCD), Sony ICX285, with a sensor size of 1628 x 1236 pixels. At full resolution, the FD-1665P is capable of 30 frames per second. The sensor converts light into electric charges that process to electronic signals for digital images. The camera offers 1.4 megapixels and an interchangeable F mount lens. The three CCD sensors on three polarizers utilizes division of amplitude polarimeters (DoAmP), which avoids timing issues observed with DoTP. With DoAmP, the images are captured simultaneously through a non-polarizing 3-way beam splitter prism which pass through three non-color selective polarizers before being refocused onto the sensors.

FluxData Input/Output Channels (left) and Registered Images (right)

For this study, the FD-1665P polarization filters were oriented at 0, 135 and 90 degrees (traditionally they are oriented at 0, 45, and 90 degrees). By reversing the sign of the calculations, it allowed the Stokes vector calculation to produce results corresponding to those obtained from the Salsa. The ability to output data prior to Stokes calculations is a feature that is not available on the Salsa. Each channel of the FD-1665P offers a set of features to control analog and digital controls of the process, as displayed in the image above. The ability to manipulate each channel gives the user control to counter the effects of gain and saturation effects.

Smoke collected imagery in and around Monterey by mounting the cameras on tripods next to each other and capturing images at approximately the same time to get similar sun angles in both systems.

The aspect of how the cameras determine polarization were compared using the ENVI and IDL software packages commonly used in remote sensing. The comparison of cameras and images followed the method of testing used in modern day cameras on phones. Ease of use, quality of photos, cost, support and various other aspects were all analyzed to determine the best use of each camera and how a customer would use each technology. Stokes vectors and products were calculated for the FD-1665P using ENVI and IDL I in real time. Band algebra and histogram manipulation were used to compare the FluxData and Salsa images. The total intensity captured by both cameras varied greatly and manipulation of the histogram scale in ENVI was utilized to scale the image and eliminate noise and saturation.

08 September, 2016. Hermann Hall, Monterey, CA. FluxData (left) Salsa (right)

Again, it is easier to distinguish and identify objects using the FD-1665P. When zooming in and capturing values of DoLP,

the FD-1665P gave higher values on objects such as windows and cars. The Salsa loses much of its detail when zooming in on objects.

The first objective in comparing the cameras was to compare the Stokes images. The resulting calculations from each camera portrayed similar results. However, the FD-1665P displayed a higher resolution image with more detail as compared to the Salsa Stokes. One huge advantage of the FD-1665P being the detail and differences captured in the background. The DoTP of the Salsa is affected by movement and gives false data for moving objects such as trees, whereas FluxData’s DoAMP is not affected by movement (but registration errors do occur from very small errors in the alignment of the frames on the camera in shadows and along some objects outlines).

The remaining analysis in Smoke’s study focused on the FD-1665P.  This camera’s higher resolution and DoAmP technology allows a more accurate pixel calculation of Stokes vectors in moving scenes. In addition, the FD-1665P captures a color image and the Umov effect was explored to see how wavelength and color affect polarization for its use in classifying objects.

Smoke found that although both cameras provide similar polarization representations, FluxData’s division of amplitude polarimeter minimizes the effects of false polarization because of scene movement. The advantage of this technique over the Salsa’s division of time polarimeter allows the camera to be used on moving vehicles and aircraft because it is not affected by rotating filter to calculate Stokes. In future work, FluxData’s DoAmP technique can be used on ground and air vehicle to capture overhead angles to help expand the library of polarization scenes. Additionally, FluxData has the capability to calculate Stokes in real time with software and code implementations. Future work with real time imagery will allow the user to select areas of interest more easily and adjust angles to best capture a scene and identify objects.

To read the full article, click here.