Method for fusing multi-panorama and reconstructing three-dimensional image
1. A method for fusing multi-panorama images and reconstructing three-dimensional images is characterized by comprising the following steps:
step 1: acquiring a data image of an indoor scene through a scanning platform: an RGB map and a depth map;
step 2: camera motion localization, i.e. pose estimation;
and step 3: preprocessing the acquired RGB image of the indoor scene and the corresponding depth map to construct a single 3D panorama;
and 4, step 4: acquiring and constructing data of a plurality of panoramas;
and 5: fusing multiple panoramas, namely stitching consistent alignment and registration among multiple panoramas;
step 6: and completing the anti-noise three-dimensional reconstruction of the indoor scene through the fused multiple panoramic pictures.
2. The method for fusing the three-dimensional reconstruction of the multiple panoramas according to claim 1, wherein the scanning platform in the step (1) adopts a wheeled balance robot with a chassis of two wheels, a stable new cradle head is erected on the platform of the wheeled balance robot through a truss, three asynchronous RGB-D depth cameras are fixedly placed on the newly erected cradle head, 120 degrees are formed among the three RGB-D depth cameras so as to cover the whole visual field, and the three RGB-D depth cameras can rotate in situ in the scene through rotation of the scanning platform.
3. The method for multi-panorama fusion three-dimensional reconstruction according to claim 2, wherein the step (2) is specifically operated as follows:
calculating the pose of the camera according to the acquired RGB image and the depth image, and estimating the motion of the camera: the motion of the camera is estimated according to the point pairs through matching of feature points between corresponding frames, and a final global optimization value is obtained through solution of ICP, so that the pose of the camera is estimated.
4. The method for multi-panorama fusion three-dimensional reconstruction according to claim 3, wherein the step (3) is specifically operated as follows:
constructing a panoramic image through equal rectangular image projection, transforming original color and depth measurement values into an equal rectangular representation form of the required panoramic image so as to perform statistical modeling on sensor noise, and optimizing the initially acquired panoramic image through a filtering or complementing method so as to keep the geometric quality of the panoramic image; when processing is carried out in the panoramic image domain, a conventional data structure is not selected to be used for generating point cloud or patch grids, and an organized image is generated;
the problem of misregistration of the cameras is solved through a scanning platform fixedly provided with a plurality of asynchronous cameras; according to the coaxiality of the camera motion, under the condition of not depending on synchronism or obvious landmark co-occurrence, the states of the cameras are jointly obtained; the regularization constraint implementation is realized under a factor graph optimization framework; the regularization term comprises three terms: a landmark observation factor term, an attitude regularization factor term and a smoothness factor term; since all cameras and axes constitute a fixed object and move together during scanning, a unified physical model and external model can be used to describe their motion; all the cameras and the axes are changed into a mixture by utilizing the characteristic of coaxial rotation, and the mixture moves together in the scanning process; especially for in-situ rotation, the multiple poses of the camera, i.e. six-axis, three-dimensional translation and three-dimensional rotation, which need to be considered originally can be represented by only one degree of freedom, i.e. the azimuth angle of the rotator.
5. The method for multi-panorama fusion three-dimensional reconstruction according to claim 4, wherein the specific method in step (4) is as follows:
acquiring data images of different positioning points by controlling the motion of the scanning platform; during scanning, the mobile device is required to perform in-situ rotation at a plurality of positioning points, and because the scanning platform adopts a two-wheeled balance robot, the in-situ rotation at the plurality of positioning points can be realized by setting the same speed in different directions of two driving wheels; the number and the positions of the positioning points are set according to the size and the structure of an indoor scene;
and then constructing panoramic pictures corresponding to different positioning points by the method in the step (3) according to the obtained data images of different positioning points.
6. The method for multi-panorama fusion three-dimensional reconstruction according to claim 5, wherein the specific method in step (5) is as follows:
and for the registration between the two panoramas, constructing a dense corresponding relation between pixels of the two panoramas, establishing and minimizing a geometric distance in an iterative mode, estimating relative transformation between the two panoramas by adopting an ICP (inductively coupled plasma) algorithm, and finally realizing the fusion of the plurality of panoramas.
Background
With the vigorous development of AI technology and the continuous emergence of novel equipment, three-dimensional reconstruction is a hot research topic in the field of computer graphics, and the main task is to perform three-dimensional modeling on a real physical world based on data acquired by various sensors and by adopting mathematical tools such as multi-view geometry, probability statistics, optimization theory and the like, and establish a bridge between the real world and a virtual world. Therefore, three-dimensional reconstruction has wide application in a plurality of different fields such as manufacturing, medical treatment, movie and television production, cultural relic protection, augmented reality, virtual reality, positioning navigation and the like. The application and development of the indoor three-dimensional scene reconstruction technology in augmented reality are particularly rapid, and the technology comprises indoor augmented reality games, robot navigation, AR furniture house watching and the like.
At present, the application and development of the technology related to the reconstruction of three-dimensional scenes in augmented reality are particularly rapid, and especially the technology related to the field of indoor reconstruction is more rapid, but most of the general traditional technologies are realized by a single RGB-D camera, and the reconstruction of indoor three-dimensional scenes by synchronously using a plurality of RGB-D cameras and by means of fusion of a plurality of panoramas is still novel at present.
The task of understanding and navigating modern scenes requires having a database of highly recognized 3D scenes, mostly obtained by hand-held scanning or panoramic scanning. The handheld scanning technique takes the RGB-D video stream as input and utilizes modern dense reconstruction systems or visual SLAM algorithms for tracking and integrating sequential frames. Panoramic scanning, on the other hand, arranges the scanning process in multiple in-situ rotations to build a 3D panorama for progressive integration at different viewpoints. Compared to handheld scanning, which requires continuous attention to areas with sufficient geometric or photometric features for reliable tracking, panoramic scanning is much easier to track in-situ rotation and has become a practical alternative for industrial or commercial applications. Currently, various techniques have been developed to construct a 360-degree panorama by using a panorama scan, and can be classified into three categories, i.e., 2D to 2D, 2D to 3D, and 3D to 3D, according to its input and output image types (i.e., whether or not depth information is included). While it is possible to recover coarse depth information by using a 2D RGB camera for canonical stitching and VR/AR applications, the depth quality is generally not acceptable for high definition 3D reconstruction. Current 3D-to-3D technologies based on a single RGB-D camera limit the freedom of view when the sensor is moving and therefore cannot cover most spherical panoramas. This narrow field of view problem can be solved by using multiple RGB-D cameras (e.g. vertically aligned for horizontal rotation), but this in turn introduces new camera calibration and synchronization problems.
Definitions of related nouns and symbols
1. Equal rectangle projection (ERP projection)
Equidistant columnar projection is the most widely applied VR video projection mode at present, and the first mode is ancient Greek seafarers 100 years before the GongyuanoInvented for drawing map. By mapping the longitude of the earth to vertical lines with equal spacing and mapping the latitude of the earth to horizontal lines with equal spacing, a map with a 2:1 aspect ratio can be generated, as shown in fig. 1. In panoramic images and videos, the realization idea of equidistant columnar projection is to use the same number of sampling points to store the data on each weft, thereby obtaining a rectangular video on a corresponding two-dimensional plane, and the values of u and v can be [0,1 ] on a normalized plane coordinate system]Any value within.
BA model optimization (Bundle Adjustment)
The BA-map optimization is simply to extract the optimal 3D model and camera parameters from the visual image. Consider several rays emanating from an arbitrary feature point that become a pixel or detected feature point in the imaging plane of several cameras. If we adjust the camera poses and the spatial positions of the feature points so that these rays eventually converge to the optical center of the camera, we call BA.
3. Factor Graph (Factor Graph)
The factor graph is one of the probability graphs, which are many, the most common being Bayesian networks and Markov Random Fields. In probability maps, it is a common problem to find the edge distribution of a certain variable. There are many solutions to this problem, one of which is to convert Bayesian networks and Markov Random Fields into a Facor Graph and then solve it with sum-product algorithm. And based on Factor Graph, the sum-product algorithm can be used for efficiently solving the edge distribution of each variable. The more detailed explanation is that a global function with multiple variables is factorized to obtain the product of several local functions, and a bipartite graph obtained based on the product is called a factor graph. The factor graph is a representation graph of factorization of functions, and generally comprises two nodes, namely variable nodes and function nodes. As is known, a global function can be decomposed into the product of a plurality of local functions, and factorization works, and these local functions and corresponding variables can be represented on a factor graph.
Disclosure of Invention
In order to combine the mode of constructing the 3D panorama with the three-dimensional reconstruction of a large indoor scene and develop and utilize an indoor slam mobile robot, the invention solves the problems of non-registration of a plurality of asynchronous cameras when constructing the 3D panorama and the inherent noise of a sensor when reconstructing, and provides a method for fusing the three-dimensional reconstruction of the plurality of asynchronous cameras.
Firstly, a plurality of 3D panoramas are constructed through a plurality of asynchronous cameras, and then the obtained plurality of panoramas are fused and spliced together so as to reconstruct the noise-resistant large indoor scene. Our approach requires multiple unsynchronized RGB-D cameras mounted on a mobile robot platform, which can perform in-situ rotation at different positions in the scene through the rotation of the mobile robot. Such that multiple unsynchronized cameras rotate on the same common axis to provide a new viewing angle for the unsynchronized cameras without requiring sufficient overlap of their fields of view. Based on this key observation, we propose a new way to synchronously track these cameras, namely through the building of mobile platform hardware and corresponding regularization term constraints. The key point of the method is to perform anti-noise three-dimensional reconstruction of a large indoor scene by solving the problem of asynchronism of a plurality of cameras and fusing a plurality of 3D panoramas.
The primary problem with multi-camera panoramic scanning is how to recover the relative poses of these RGB-D frames, most commercial depth sensors (e.g., Kinect and PrimeSense) do not support shutter synchronization, and forced time-stamping (timestamps) grouping can result in misalignment of the final image by ignoring motion during the shutter interval. Secondly, there is also a problem that due to the inherent noise of the sensor, previous work has been dealing with the noise during integration of successive scan frames by several common data structures, such as Truncated Signed Distance Function (TSDF), Probability Signed Distance Function (PSDF), and Surfels. But few further consider the effects of noise during the frame registration process. Since both the subsequent inter-panorama registration and image fusion steps are affected by these uncertain measurements, it is important to model the depth measurement noise after panorama construction.
The invention realizes the functions: the three-dimensional reconstruction of the large indoor scene is completed through two-step processing, the three-dimensional reconstruction of the large indoor scene is mainly divided into two steps, firstly, the construction of a single 3D panoramic picture is performed, secondly, the fusion of a plurality of panoramic pictures is performed, and the whole indoor scene can be reconstructed in a three-dimensional mode through the two steps.
A method for fusing multi-panorama and reconstructing three-dimensional comprises the following steps:
step 1: acquiring a data image of an indoor scene through a scanning platform: an RGB map and a depth map;
step 2: camera motion localization, i.e. pose estimation;
and step 3: preprocessing the acquired RGB image of the indoor scene and the corresponding depth map to construct a single 3D panorama;
and 4, step 4: acquiring and constructing data of a plurality of panoramas;
and 5: fusing multiple panoramas, namely stitching consistent alignment and registration among multiple panoramas;
step 6: and completing the anti-noise three-dimensional reconstruction of the indoor scene through the fused multiple panoramic pictures.
The scanning platform in the step (1) adopts a wheeled balance robot with two wheels on a chassis, a stable new cradle head is erected on the platform of the wheeled balance robot through a truss, three asynchronous RGB-D depth cameras are fixedly placed on the newly erected cradle head, 120 degrees are formed among the three RGB-D depth cameras so as to cover the whole visual field, and the three RGB-D depth cameras can rotate in situ in the scene through the rotation of the scanning platform.
The step (2) is specifically operated as follows:
calculating the pose of the camera according to the acquired RGB image and the depth image, and estimating the motion of the camera: the motion of the camera is estimated according to the point pairs through matching of feature points between corresponding frames, and a final global optimization value is obtained through solution of ICP, so that the pose of the camera is estimated.
The step (3) is specifically operated as follows:
constructing a panoramic image through equal rectangular image projection, transforming original color and depth measurement values into an equal rectangular representation form of a required panoramic image so as to perform statistical modeling on sensor noise, and optimizing the initially acquired panoramic image through a filtering or complementing method so as to keep the geometric quality of the panoramic image. When processing in the panoramic image domain, instead of using a conventional data structure to generate a point cloud or patch mesh, an organized image is produced.
The problem of misregistration of the cameras is solved through a scanning platform fixedly provided with a plurality of asynchronous cameras; according to the coaxiality of the camera motion, under the condition of not depending on synchronism or obvious landmark co-occurrence, the states of the cameras are jointly obtained; the method is realized through regularization constraint under a factor graph optimization framework. The regularization term comprises three terms: a landmark observation factor term, an attitude regularization factor term, and a smoothness factor term. Since all cameras and axes constitute a fixed object and move together during scanning, a unified physical model and external model can be used to describe their motion. By utilizing the characteristic of coaxial rotation, all the cameras and the shafts are changed into a mixture body and move together in the scanning process. Especially for in-situ rotation, a plurality of poses of the camera which need to be considered originally, namely six-axis, three-dimensional translation and three-dimensional rotation, can be reflected by only one degree of freedom, namely the azimuth angle of the rotator.
The specific method of the step (4) is as follows:
and acquiring data images of different positioning points by controlling the motion of the scanning platform. During scanning, the mobile device is required to perform in-situ rotation at a plurality of positioning points, and since the scanning platform adopts a two-wheel wheeled balance robot, the in-situ rotation at the plurality of positioning points can be realized by setting the same speed in different directions of two driving wheels. The number and the position of the positioning points are set according to the size and the structure of an indoor scene.
Then, constructing panoramic pictures corresponding to different positioning points by the method in the step (3) according to the obtained data images of the different positioning points;
the specific method of the step (5) is as follows:
and for the registration between the two panoramas, constructing a dense corresponding relation between pixels of the two panoramas, establishing and minimizing a geometric distance in an iterative mode, estimating relative transformation between the two panoramas by adopting an ICP (inductively coupled plasma) algorithm, and finally realizing the fusion of the plurality of panoramas.
The invention has the following beneficial effects:
the method provided by the invention provides a flexible two-step three-dimensional reconstruction frame based on the indoor scene by combining the construction and fusion of the panoramic image, combines the advantages of the traditional slam high-quality algorithm and the 3D-based panoramic image, can obtain a more accurate indoor scene reconstruction effect, realizes higher-quality reconstruction, and is realized by carrying the mobile robot, so that a new possibility is provided for the subsequent positioning and navigation of large indoor scene service robots and the like.
The invention tracks the unsynchronized cameras together by limiting the motion consistency thereof without relying on obvious parallax or shutter synchronization, so that the reconstruction result is more accurate.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
The objects and effects of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
In order to realize the three-dimensional reconstruction of an indoor scene, the method combines the construction and fusion of the panoramic image and provides a flexible two-step three-dimensional reconstruction frame based on the indoor scene, combines the advantages of the traditional slam high-quality algorithm and the 3D-based panoramic image, can obtain more accurate indoor scene reconstruction effect, solves the problems of non-registration of a plurality of asynchronous cameras when constructing the 3D panoramic image and the inherent noise of a sensor when reconstructing, provides a method for fusing the three-dimensional reconstruction of the multi-panoramic image, and implements a flow chart as shown in figure 1. The specific implementation steps are as follows:
step (1), a data image of an indoor scene, namely an RGB map and a corresponding depth map, is obtained through scanning of an RGB-D depth camera installed on a scanning platform. The scanning platform adopts a wheeled balance robot with two wheels on a chassis, a stable new cloud deck is erected on the platform of the wheeled balance robot through a truss, three asynchronous RGB-D depth cameras are fixedly placed on the newly erected cloud deck, an angle of 120 degrees is formed between the three RGB-D depth cameras so as to cover the whole visual field (from a ceiling to a floor), and the three RGB-D depth cameras can rotate in situ in the scene through the rotation of the scanning platform.
And (2) estimating the pose of the camera.
Calculating the pose of the camera according to the acquired RGB image and the depth image, and estimating the motion of the camera: the motion of the camera is estimated according to the point pairs through matching of feature points between corresponding frames, and a final global optimization value is obtained through solution of ICP, so that the pose of the camera is estimated.
And (3) preprocessing the acquired RGB images of the indoor scene and the corresponding depth maps to construct a single 3D panorama.
Original RGB-D pixels are uniformly re-projected to a target domain through equal rectangular image projection and adjacent relation of the original RGB-D pixels is kept, original color and depth measurement values are transformed into equal rectangular representation forms of a required panoramic image so as to carry out statistical modeling on sensor noise, and optimization (initially acquired panoramic images are optimized through a filtering or completion method, such as a GC filter) is carried out so as to keep geometric quality of the panoramic images. When processing in the panoramic image domain, instead of using a conventional data structure to generate a point cloud or patch mesh, an organized image is produced, which would be more favorable for statistics and optimization of the original depth measurements. Such a panorama can convey most of the valid measurements since almost all of the original image areas can be merged into the panorama with little occlusion, since each original frame pair to be integrated has little disparity to construct the panorama. For construction of panoramas, there are several alternative configurations, such as cube maps, stereographic projection images, and equirectangular image projections. Among them, equal rectangular image projection is the best method to uniformly re-project the original RGB-D pixels to the target domain and maintain their neighborhood.
The problem of misregistration of the cameras is solved through a scanning platform fixedly provided with a plurality of asynchronous cameras; from the coaxiality of the camera motions, their states are jointly derived without relying on synchronicity or significant landmark co-occurrence. The method is realized through regularization constraint under a factor graph optimization framework. The regularization term comprises three terms: namely a landmark observation factor item (establishing a relationship between the frame attitude and the landmark points), an attitude regularization factor item (adjusting the camera motion to be consistent with the horizontal rotation and estimating the attitude of the rotation axis), and a smoothness factor item (limiting the angular velocity to be consistent between successive frames so that the angular velocity is kept at a constant speed). Since all cameras and axes constitute a fixed object and move together during scanning, a unified physical model and external model can be used to describe their motion. Particularly with such in-place rotation, once these external problems between the axis and multiple cameras are resolved, the state of the cameras can be parameterized to the azimuth of the rotator (moving platform) by 1-DoF only. By utilizing the characteristic of coaxial rotation, all the cameras and the shafts are changed into a mixture body and move together in the scanning process. Especially for in-situ rotation, the multiple poses of the camera, i.e. six-axis, three-dimensional translation and three-dimensional rotation, which need to be considered originally can be represented by only one degree of freedom, i.e. the azimuth angle of the rotator.
And (4) acquiring and constructing data of a plurality of panoramic pictures.
In order to realize the three-dimensional reconstruction of the whole large indoor scene, the data acquisition of one positioning point is difficult to cover the whole indoor scene, and the data acquisition of at least 2-3 positioning points is required to cover the whole indoor scene. At this time, the data images of different positioning points can be acquired by controlling the movement of the scanning platform. During scanning, the mobile device is required to perform in-situ rotation at a plurality of positioning points, and since the scanning platform adopts a two-wheel wheeled balance robot, the in-situ rotation at the plurality of positioning points can be realized by setting the same speed in different directions of two driving wheels. The number and the position of the positioning points are set according to the size and the structure of an indoor scene.
Then, constructing panoramic pictures corresponding to different positioning points by the method in the step (3) according to the obtained data images of the different positioning points;
and (5) fusing the multiple panoramas, namely, stitching the multiple panoramas in a consistent alignment and registration manner.
And for the registration between the two panoramas, constructing a dense corresponding relation between pixels of the two panoramas, establishing and minimizing a geometric distance in an iterative mode, estimating relative transformation between the two panoramas by adopting an ICP (inductively coupled plasma) algorithm, and finally realizing the fusion of the plurality of panoramas.
And (6) completing the anti-noise three-dimensional reconstruction of the indoor scene through the fusion of the multiple panoramas.