Traditional Culture Encyclopedia - Traditional festivals - Which principles affect the image quality of 3D reconstruction

Which principles affect the image quality of 3D reconstruction

Three-dimensional reconstruction is based on the so-called triangulation principle. For two images that have been calibrated (i.e., the internal and external parameters of the camera are known), and assuming that on both images, the corresponding points are a pair (i.e., a projection of the same point on the surface of the scene object), then based on the center of projection of the two images, two straight lines pass through the pair of corresponding points, and ultimately converge to a point in space, thus providing the three-dimensional stereo coordinates of a certain point in the surface of the scene object.

Take two images as an example, specify the camera matrix P and P' of these two images in the same world coordinate system, is a corresponding point of the two images, that is, they satisfy the pair of polar geometric constraints, and now to calculate the point corresponding to the point based on the P and P' point in space. m of the anti-projection line with the anti-projection line Determines the plane through the center of gravity of the two cameras a plane, two rays that are not parallel to each other must converge at a point in space. The two rays, which are not parallel, must meet at a point in space. That is, the counterprojection ray corresponding to the point, and its two camera baselines, is a triangle, the intersection of the camera's center of gravity and the counterprojection line as its apex, to determine the point of space is the point of intersection, as shown in Figure 4.1.

Figure 4.1 Principle of three-dimensional reconstruction

There is an exception to the rule that the corresponding point in three-dimensional space, distributed over two camera baselines, will not fulfill its recovery task, this is due to the fact that the two rays of the counter-projection in the case overlap the baseline, and therefore the spatial point cannot be uniquely determined.

4.2 MVSNet

MVS is a technique for recovering the dense structure of a scene from multi-view perspectives with a certain degree of overlap. The traditional method utilizes geometric and optical consistency to construct the matching cost, performs the accumulation of the matching cost, and then estimates the depth value. Although the traditional method has a high depth estimation accuracy, there is still a lot of room for improving the depth estimation completeness of the traditional method due to the existence of the lack of texture or the false matching in the scene with drastically changing illumination conditions. In recent years convolutional neural networks have been successfully applied to feature matching to improve the accuracy of stereo matching. Against this background, Yaoyao et al, at the Hong Kong University of Science and Technology, proposed an end-to-end deep estimation framework based on deep learning, MVSNet, in 2018.

Multi-view stereo (MVS) matching is a core computer field problem. Reconstructing Multi-view Stereo matching can be thought of as an inverse process of photographing a given scene. Camera mapping transforms a three-dimensional scene into two dimensions, while multi-view stereo matching reconstructs just the opposite, its from this look. Different viewpoints to capture the image, to recover the real three-dimensional scene.

Traditional approaches use hand-designed similarity metrics and regularization methods to compute dense correspondences of the scene (e.g., using Normalized Cross-Correlation and semi-global matching). These methods can achieve good results on non-Lambertian surfaces, scenes without weakly textured regions. However, in weakly textured regions, artificially designed similarity metrics become untrustworthy, thus leading to incomplete reconstruction results. As can be seen from the rankings of the MVS dataset, these methods have high accuracy, however there is still a lot of room for improvement in the completeness of reconstruction methods.

Recent advances in the study of convolutional neural networks have sparked enthusiasm for perfecting stereo matching reconstruction. Conceptually, learning-based algorithms are able to capture global semantic information, such as a priori conditions based on highlights and reflections, facilitating more robust matches. A number of two-view stereo matches have been explored, replacing hand-designed similarity metrics or regularization methods with neural networks. These methods show better results and progressively outperform traditional methods in the field of stereo matching. In fact, the stereo matching task is perfectly suited for the use of CNNs because the image pairs are already corrected, so the stereo matching problem translates into a pixel-by-pixel parallax estimation in the horizontal direction.