Method, device and equipment for producing high-precision map and computer storage medium
1. A production method of a high-precision map comprises the following steps:
acquiring point cloud data and front-view image data which are respectively acquired at each position point by acquisition equipment to obtain a point cloud sequence and a front-view image sequence;
carrying out registration of the front-view image and the point cloud data on the point cloud sequence and the front-view image sequence;
converting the front-view image sequence into a top view according to the registration result and determining coordinate information of each pixel in the top view;
and identifying the map elements of the top view to obtain high-precision map data.
2. The method of claim 1, wherein registering the sequence of point clouds and the sequence of orthographic images with point cloud data comprises:
registering adjacent images in the front-view image sequence to obtain a set formed by corresponding pixels in the adjacent images;
and projecting the point cloud data to the set to obtain coordinate information of each pixel in the set.
3. The method of claim 2, wherein prior to said projecting point cloud data to said collection, further comprising:
and carrying out distortion correction on the point cloud data according to the movement amount of the laser radar equipment which collects the point cloud data and rotates for one circle.
4. The method of claim 2, wherein prior to said projecting point cloud data onto said collection, further comprising:
determining a reference point cloud in the point cloud sequence;
and taking the reference point cloud as a reference, and registering other point cloud data frame by frame.
5. The method of claim 4, wherein determining a reference point cloud in the sequence of point clouds comprises:
taking the first frame in the point cloud sequence as a reference, and registering other point cloud data frame by frame;
and taking one frame of point cloud with the highest registration point ratio with the front and rear frames of point cloud data in the point cloud sequence as a reference point cloud.
6. The method of claim 4 or 5, wherein the registering other point cloud data frame-by-frame comprises:
learning a conversion matrix between two frames of point clouds from the point clouds serving as reference and adjacent point clouds which are not registered;
converting the point cloud serving as the reference by using the conversion matrix to obtain the adjacent point cloud after registration;
and taking the adjacent point clouds as new reference, and transferring to the step of learning a conversion matrix between two frames of point clouds from the point clouds taken as the reference and the adjacent point clouds which are not registered until the registration of all point cloud data in the point cloud image sequence is completed.
7. The method of claim 6, wherein the learning of a transformation matrix between two frames of point clouds from a reference point cloud and its neighboring point clouds that have not been registered comprises:
learning a conversion matrix between two frames of point clouds from the point clouds serving as the reference and the adjacent point clouds by adopting an Iterative Closest Point (ICP) algorithm;
wherein the loss function of the ICP algorithm is: and the distance mean value or weighted mean value between each conversion point obtained by converting each point in the point cloud serving as the reference according to the conversion matrix and the nearest point of each conversion point in the adjacent point cloud.
8. The method of claim 7, wherein prior to determining the reference point cloud in the sequence of point cloud images, further comprising:
determining a set formed by corresponding point clouds in adjacent images;
and when the weighted average is determined, determining the weight value adopted by each distance according to whether the point in the point cloud serving as the reference belongs to the set formed by the corresponding point clouds.
9. The method of claim 2, wherein the projecting the point cloud data onto the collection to obtain coordinate information for each pixel in the collection comprises:
projecting the coordinates of the point cloud data to the set to obtain coordinate information of point clouds corresponding to pixels in the front-view image;
and converting the coordinate information of the point cloud corresponding to the pixel in the front-view image into the coordinate information of the pixel according to the conversion from the laser radar coordinate system to the image acquisition equipment coordinate system and the translation matrix.
10. The method of claim 1, wherein converting the sequence of elevational images to a top view and determining coordinate information for each pixel in the top view in accordance with the registration result comprises:
converting each frame of front-view images in the front-view sequence into each top-view image based on inverse perspective transformation;
matching on the corresponding top view according to the coordinate information of the pixels in the front view image, and determining the coordinate information of the pixels in the top view;
and according to the coordinate information of the pixels in the top views, splicing the top views to obtain the final top view.
11. The method of claim 1, wherein identifying map elements for the overhead view, resulting in high-precision map data comprises:
identifying road information for the top view;
and superposing the identified road information to the top view for displaying to obtain high-precision map data.
12. An apparatus for producing a high-precision map, comprising:
the acquisition unit is used for acquiring point cloud data and front-view image data which are respectively acquired at each position point by acquisition equipment to obtain a point cloud sequence and a front-view image sequence;
the registration unit is used for registering the point cloud sequence and the front-view image sequence with front-view images and point cloud data;
the conversion unit is used for converting the front-view image sequence into a top view according to the registration result and determining coordinate information of each pixel in the top view;
and the identification unit is used for identifying the map elements of the top view to obtain high-precision map data.
13. The apparatus of claim 12, wherein the registration unit comprises:
the first registration subunit is used for registering adjacent images in the front-view image sequence to obtain a set formed by corresponding pixels in the adjacent images;
and the projection subunit is used for projecting the point cloud data to the set to obtain the coordinate information of each pixel in the set.
14. The apparatus of claim 13, wherein the registration unit further comprises:
and the corrector subunit is used for carrying out distortion correction on the point cloud data according to the movement amount of the laser radar equipment which collects the point cloud data and rotates for one circle and then providing the point cloud data for the projection subunit.
15. The apparatus of claim 13, wherein the registration unit further comprises:
a reference subunit, configured to determine a reference point cloud in the point cloud sequence;
and the second registration subunit is used for registering other point cloud data frame by taking the reference point cloud as a reference, and providing the registered point cloud data to the projection subunit.
16. The apparatus according to claim 15, wherein the reference subunit is configured to provide a first frame of the point cloud sequence to the second registration subunit as a reference, register other point cloud data frame by frame, and obtain a registration result from the second registration subunit; and taking one frame of point cloud with the highest registration point ratio with the front and rear frames of point cloud data in the point cloud sequence as a reference point cloud.
17. The apparatus of claim 15 or 16, wherein the second registration subunit is specifically configured to:
learning a conversion matrix between two frames of point clouds from the point clouds serving as reference and adjacent point clouds which are not registered;
converting the point cloud serving as the reference by using the conversion matrix to obtain the adjacent point cloud after registration;
and taking the adjacent point clouds as new reference, and transferring to the operation of learning a conversion matrix between two frames of point clouds from the point clouds serving as the reference and the adjacent point clouds which are not registered until the registration of all point cloud data in the point cloud image sequence is completed.
18. The apparatus of claim 17, wherein the second registration subunit, when learning a transformation matrix between two frames of point clouds from a reference point cloud and its neighboring point clouds that have not been registered, is specifically configured to:
learning a conversion matrix between two frames of point clouds from the point clouds serving as the reference and the adjacent point clouds by adopting an Iterative Closest Point (ICP) algorithm;
wherein the loss function of the ICP algorithm is: and the distance mean value or weighted mean value between each conversion point obtained by converting each point in the point cloud serving as the reference according to the conversion matrix and the nearest point of each conversion point in the adjacent point cloud.
19. The apparatus of claim 18, wherein the registration unit further comprises:
the third registration subunit is used for determining a set formed by corresponding point clouds in adjacent images;
and when the second registration subunit determines the weighted average value, determining the weight value adopted by each distance according to whether the point in the point cloud serving as the reference belongs to the set formed by the corresponding point clouds.
20. The apparatus according to claim 13, wherein the shadow casting unit is specifically configured to project coordinates of the point cloud data onto the set to obtain coordinate information of the point cloud corresponding to a pixel in the front view image; and converting the coordinate information of the point cloud corresponding to the pixel in the front-view image into the coordinate information of the pixel according to the conversion from the laser radar coordinate system to the image acquisition equipment coordinate system and the translation matrix.
21. The apparatus according to claim 12, wherein the conversion unit is specifically configured to convert each frame of the front view image in the front view sequence into each top view image based on an inverse perspective transformation; matching on the corresponding top view according to the coordinate information of the pixels in the front view image, and determining the coordinate information of the pixels in the top view; and according to the coordinate information of the pixels in the top views, splicing the top views to obtain the final top view.
22. The device according to claim 12, wherein the identification unit is specifically configured to identify the overhead view with respect to road information; and superposing the identified road information to the top view for displaying to obtain high-precision map data.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.
25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-11.
Background
High-precision maps, which are an important part of an automatic driving system, are one of the key factors for promoting the development of automatic driving. Conventional maps are less accurate and can only provide road-level route planning. The high-precision map can help to know position information in advance, accurately plan a driving route, predict road surface complex information and avoid potential risks better by providing high-precision positioning, lane-level path planning capability and abundant road element information, and the like. Therefore, how to realize the production of high-precision maps becomes an urgent problem to be solved.
Disclosure of Invention
In view of the above, the present disclosure provides a method, an apparatus, a device and a computer storage medium for producing a high-precision map.
According to a first aspect of the present disclosure, there is provided a method for producing a high-precision map, including:
acquiring point cloud data and front-view image data which are respectively acquired at each position point by acquisition equipment to obtain a point cloud sequence and a front-view image sequence;
carrying out registration of the front-view image and the point cloud data on the point cloud sequence and the front-view image sequence;
converting the front-view image sequence into a top view according to the registration result and determining coordinate information of each pixel in the top view;
and identifying the map elements of the top view to obtain high-precision map data.
According to a second aspect of the present disclosure, there is provided a production apparatus of a high-precision map, including:
the acquisition unit is used for acquiring point cloud data and front-view image data which are respectively acquired at each position point by acquisition equipment to obtain a point cloud sequence and a front-view image sequence;
the registration unit is used for registering the point cloud sequence and the front-view image sequence with front-view images and point cloud data;
the conversion unit is used for converting the front-view image sequence into a top view according to the registration result and determining coordinate information of each pixel in the top view;
and the identification unit is used for identifying the map elements of the top view to obtain high-precision map data.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to a fifth aspect of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a flowchart of a method for producing a high-precision map according to an embodiment of the present disclosure;
fig. 2 is a flow chart of a preferred registration process provided by the embodiments of the present disclosure;
fig. 3 is a flowchart of a method for registering point cloud data frame by frame according to an embodiment of the disclosure;
FIGS. 4a and 4b are exemplary diagrams of a front view image and a top view image, respectively;
fig. 5 is a structural diagram of a production device of a high-precision map provided by an embodiment of the present disclosure;
FIG. 6 is a block diagram of an electronic device used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Although there are some high-precision map productions, they are mainly based on point cloud technology. The method comprises the steps of collecting a large amount of dense point cloud data by laser radar equipment, processing and identifying the point cloud data to obtain information such as roads and ground marks, correcting the data in a manual mode, and finally generating high-precision map data. However, this conventional approach is highly dependent on the point cloud data. Due to the fact that the urban road is complex in spatial structure, in order to guarantee the precision of the high-precision map, a large amount of manpower is needed for registration, and therefore production efficiency of the high-precision map is low, labor cost is high, requirements for professional skills of operators are high, and finally large-scale production of the high-precision map is affected.
In view of the above, the present disclosure provides a method for producing a high-precision map, which is different from the above-described conventional manner. The method provided by the present disclosure is described in detail below with reference to examples.
Fig. 1 is a flowchart of a method for producing a high-precision map according to an embodiment of the present disclosure, where an execution subject of the method may be a recommendation device, and the device may be an application located in a local terminal, or may also be a functional unit such as a Software Development Kit (SDK) or a plug-in located in the application located in the local terminal, or may also be located at a server side, which is not particularly limited in this embodiment of the present disclosure. As shown in fig. 1, the method may include:
in 101, point cloud data and front view image data respectively acquired by an acquisition device at each position point are acquired, and a point cloud sequence and a front view image sequence are obtained.
At 102, the point cloud sequence and the orthographic image sequence are registered with the orthographic image and the point cloud data.
In 103, the front-view image sequence is converted into a top-view image according to the registration result and coordinate information of each pixel in the top-view image is determined.
At 104, map elements are identified for the top view to obtain high-precision map data.
According to the technical scheme, the image data acquired by the image acquisition equipment and the point cloud data acquired by the laser radar equipment are fused, so that the mutual fused automatic registration is realized, and the final high-precision map is generated based on the registration result. The mode does not need to spend a large amount of additional manpower for manual registration, improves the production efficiency, reduces the labor cost and the requirement on the professional skills of operators, and provides a basis for the large-scale production of high-precision maps.
The respective steps in the above-described embodiment are described in detail below.
First, the detailed description of the step 101, namely, obtaining point cloud data and front-view image data respectively collected by the collection device at each position point to obtain a point cloud sequence and a front-view image sequence, is described in detail with reference to an embodiment.
The acquisition equipment involved in this step mainly includes the following two types:
image acquisition devices, such as cameras, video cameras, etc., can perform image acquisition either periodically or after being triggered.
The laser radar equipment can acquire data, namely point cloud data, of a surrounding environment surface reflection point set in a mode of emitting laser scanning at fixed time or after being triggered. These point cloud data include coordinate information of the points, typically coordinates under the lidar device coordinate system.
It may also comprise a device with positioning function, i.e. responsible for the acquisition of position information, such as a GNSS (Global Navigation Satellite System) device.
In the disclosure, a mobile device (e.g., a collection vehicle) may be used to carry the collection devices, and then the collection devices may collect data at a certain frequency during the travel of the mobile device, or may be triggered to collect data at the same location.
For example, the image acquisition device acquires the obtained front-view images according to a certain acquisition frequency to form a front-view imageVisual image sequenceWherein, IiAt a time tiAnd acquiring a frame of front-view image.
The laser radar equipment acquires point cloud data obtained according to a certain acquisition frequency to form a point cloud sequenceWherein P isiAt a time tiAnd collecting a frame of point cloud data. Each frame of point cloud data comprises coordinate information of M points, namely one frame of point cloud data comprises coordinate information of Pi={p1,p2,...,pMIn which p isjIs the coordinate information of the j-th point.
The position data acquired by the position acquisition equipment according to a certain acquisition frequency form a position sequenceWherein L isiAt a time tiCollected position data.
The N is the number of times of data acquisition performed by the acquisition device, that is, the number of data acquired by each acquisition device.
As a preferred embodiment, in order to ensure the synchronization of the data and the subsequent registration process, the acquisition devices may be clocked and/or jointly calibrated in advance.
Wherein, when the clock synchronization is carried out between the acquisition devices, the millimeter level is preferably required. Specific synchronization modes can be selected from "PPS (Pulse Per Second) + NMEA (National Marine Electronics Association)" based on GPS, or IEEE 1588 (or IEEE 802.1AS) clock synchronization protocol based on ethernet.
The method comprises the following steps of carrying out combined calibration on the acquisition equipment, mainly aiming at obtaining internal and external reference information of the image acquisition equipment in the acquisition equipment, external reference information of the laser radar equipment, and conversion and translation from a laser radar coordinate system to an image acquisition equipment coordinate systemMatrix M1And an internal reference information matrix M of the image acquisition device2。
The joint calibration mode mainly comprises the steps of presetting a calibration plate, and adjusting laser radar equipment and image acquisition equipment to photograph and shoot and catch clouds. And then finding at least three corresponding two-dimensional points on the image and three-dimensional points of the point cloud, namely forming three point pairs. PNP (multipoint-n-point) solution is carried out by utilizing the three point pairs, and then the transformation relation between the coordinate system of the laser radar equipment and the coordinate system of the image acquisition equipment can be obtained.
The clock synchronization and the joint calibration between the devices can adopt the current mature technology, and are not detailed here.
The above step 102, i.e., "performing registration of the point cloud data with the point cloud data by the orthographic image" on the point cloud sequence and the orthographic image sequence, is described in detail below with reference to the embodiment.
The method comprises the steps of carrying out registration on an orthographic image and point cloud data on a point cloud sequence and an orthographic image sequence, wherein the idea is to register adjacent images in the orthographic image sequence firstly to obtain a set formed by corresponding pixels in the adjacent images, and then projecting the point cloud data into the set so as to obtain coordinate information of each pixel in the set. The registration process actually determines the more accurate pixels on one hand and determines the coordinate information of the pixels on the other hand. A preferred implementation is described below, and fig. 2 is a flowchart of a preferred registration process provided in the embodiment of the present disclosure, and as shown in fig. 2, the flowchart may include the following steps:
in 201, adjacent images in the front-view image sequence are registered, and a set formed by corresponding pixels in the adjacent images is obtained.
Because the image acquisition is carried out according to a certain frequency in the image acquisition equipment, the adjacent two frames of images are different. The purpose of this step is to assume that two consecutive frames of images are corresponding to each other for obtaining which pixels in the adjacent images are corresponding to each otherAnd imageAfter image registration, an image is obtainedK pixels in (1) is a sum imageK pixels in (a) are corresponding, respectively represented as a set: wherein the imageIs formed by a plurality of pixelsAnd imagesPixel point ofIn response to this, the mobile terminal is allowed to,andcorrespondingly, and so on.
In the registration, a feature-based method, a deep learning method, or the like may be employed. The characteristic-based method mainly comprises the following steps: determining the feature of each pixel in the two-frame image, wherein the feature can adopt SIFT (Scale-invariant feature transform); and then carrying out feature matching based on a similarity mode so as to obtain corresponding pixel points. For example, matching between two pixels whose similarity between features exceeds a preset similarity threshold is successful.
The deep learning method mainly comprises the following steps: using a convolutional neural Network, a VGG (Visual Geometry Group Network) layer, etc. to generate a feature vector representation of each pixel, and then performing feature matching based on the feature vector representation of each pixel in the two frames of images, thereby obtaining a corresponding pixel point. For example, matching between two pixels whose similarity between the feature vector representations exceeds a preset similarity threshold is successful.
In 202, distortion correction is performed on the point cloud data according to the movement amount of the laser radar device which collects the point cloud data and rotates for one circle.
This step is preferably performed to help improve the accuracy of the point cloud data in the subsequent registration process.
The image capturing device employs a global shutter, which can be considered to be acquired instantaneously. However, the lidar device is not instantaneously acquired, and is generally acquired after the transceiver rotates one circle, namely 360 degrees. Assuming that a rotation is 100ms, in a frame of point cloud data formed in one acquisition period, the difference between the initial point and the final point is 100ms, and in addition, the lidar device is acquired in the movement process, so that the point cloud data has distortion and cannot truly reflect a real environment at a certain moment. In order to better register the image data and the point cloud data, distortion correction is performed on the point cloud data in this step.
Because the laser radar is based on the coordinate system of the laser radar when the laser spot coordinates are calculated, the reference coordinate system of each row of laser spots is different in the movement process of the laser radar. But they are in the same frame point cloud, so the distortion correction needs to be unified in the same coordinate system.
The idea of distortion correction is to calculate the motion of the lidar during acquisition and then compensate for this amount of motion, including compensation for rotation and translation, on each frame of point cloud. Firstly, a first laser point in a frame of point cloud is determined, a subsequent laser point can determine a rotation angle and a translation amount relative to the first laser point, and then compensation conversion of rotation and translation is carried out, so that coordinate information of the corrected laser point can be obtained.
Still further, after step 202, a set of corresponding point clouds in neighboring images may also be determined.
Specifically, a projection matrix from the point cloud to the image can be obtained according to an internal reference information matrix of the image acquisition equipment, a rotation matrix from a coordinate system of the image acquisition equipment to an image plane, and a conversion and translation matrix from a laser radar coordinate system to the coordinate system of the image acquisition equipment; the point cloud data is then projected onto an image using a projection matrix. After projection, a corresponding point cloud set in adjacent images may be determined. Suppose two consecutive frames of imagesAnd imageAfter the projection, a projected image is obtainedK in (1)1Set of points and projection onto an imageK in (1)2The set formed by points is used for solving the intersection of the two sets, namely the imageAnd imageSet of medium corresponding point cloudsThe purpose of this set will be referred to in the subsequent embodiments.
In 203, a reference point cloud in the sequence of point clouds is determined.
In particular, the first frame in the point cloud sequence may be used as the reference point cloud.
In practice, however, it may not be the first frame of point clouds in the sequence of point clouds that is most accurate. Accordingly, a preferred way of determining a reference point cloud is provided in the present disclosure. Specifically, the first frame in the point cloud sequence may be used as a reference, and other point cloud data may be registered frame by frame; and taking one frame of point cloud with the highest registration point ratio with the front and rear frames of point cloud data in the point cloud image sequence as a reference point cloud.
When other point cloud data are registered frame by frame, the method as shown in fig. 3 can be adopted, and the method comprises the following steps:
in 301, a transformation matrix between two frames of point clouds is learned from a point cloud as a reference and its neighboring point clouds that have not been registered.
If the first frame is used as a reference, the first frame point cloud and the second frame point cloud are determined, and the transformation matrix of the first frame point cloud and the second frame point cloud is learned from the two frame point clouds.
Because two adjacent frames of point clouds are actually in a relation of rotation and translation, the first frame of point cloud can theoretically obtain the second frame of point cloud after the first frame of point cloud is rotated and translated. In an actual scenario, since the acquisition device may have a bump or the like during the traveling process and may have some deviations, for example, an ICP (Iterative Closest Point) method may be used to learn the transformation matrix. For example, where the rotation matrix is denoted as R and the translation matrix is denoted as t, then in learning R and t, the loss function may be: and (3) converting each point in the point cloud serving as the reference according to the conversion matrix to obtain a distance average or a weighted average between each conversion point and the nearest point of each conversion point in the adjacent point cloud.
For example, the following loss function may be employed:
wherein E (R, t) represents a loss function, xiRepresenting points in a point cloud as a reference, e.g. a first frame point cloud,R(xi) + t denotes the x pair according to the transformation matrixiThe conversion to be performed is carried out in such a way that,is to xiThe closest point on the match is then transformed in an adjacent point cloud, such as a second frame of point cloud. n is the number of points that can be matched. After learning point by point, the transformation matrices R and t can be finally learned with the goal of minimizing the above-mentioned loss function.
For another example, the following loss function may also be employed:
unlike the above equation (1), the weighting coefficient w is addediThe value of which can be determined according to whether the point in the point cloud as the reference belongs to the set formed by the corresponding point cloudsTo be determined. For example, the following formula may be employed:
where α ≧ 1, α ═ 1.5 or α ═ 2.0 can be used, for example.
In addition to the ICP method described above, a method based on depth feature learning such as DGR (Deep global registration) may also be employed.
In 302, the point cloud serving as a reference is transformed by using the transformation matrix to obtain a registered adjacent point cloud.
For example, each point in the first frame of point cloud is transformed by the transformation matrices R and t to obtain each point after registration in the second frame of point cloud.
In 303, judging whether an adjacent point cloud which is not registered exists in the point cloud sequence, if so, executing step 304; otherwise, ending the current registration process.
In 304, the adjacent point cloud is taken as a new reference, and the process goes to execute step 301.
For example, the registered second frame point cloud is continuously used as a new reference, and the above process is performed on the third frame point cloud to obtain the registered points. And then, taking the registered third frame as a new reference, and performing registration on the fourth frame point cloud by the above process, and so on.
And after the registration of all the frame point clouds in the point cloud sequence is finished, taking one frame point cloud with the highest registration point ratio with the front and rear frame point cloud data in the point cloud image sequence as a reference point cloud. Wherein, the jth frame point cloud PjRegistration point ratio of (A)jThe following formula may be used to determine:
the match () represents the intersection of points that can be registered in the two frames of point clouds before and after, and can be embodied as the intersection of each point obtained by converting one frame of point cloud according to the conversion matrix and each point in the other frame of point cloud. I represents the number of points in the set, e.g. Pj-1And | represents the number of points in the j-1 th frame point cloud.
With continued reference to fig. 2. At 204, other point cloud data is registered frame by frame with the reference point cloud as a reference.
After the reference point cloud is determined, the method shown in fig. 3 is adopted, and the reference point cloud is used as a reference to register other point cloud data frame by frame. And if the reference point cloud is the first frame, sequentially registering the subsequent frame point clouds. And if the reference point cloud is a non-first frame, forward and backward registering each frame of point cloud by taking the reference point cloud as a reference. And finally obtaining a point cloud sequence after registration.
In 205, the point cloud data after registration is projected to the set obtained in step 201, so as to obtain coordinate information of each pixel in the set.
The method specifically comprises the steps of projecting coordinates of point cloud data to a set to obtain coordinate information of point clouds corresponding to pixels in an orthographic view image; and converting the coordinate information of the point cloud corresponding to the pixel in the front-view image into the coordinate information of the pixel according to the conversion from the laser radar coordinate system to the image acquisition equipment coordinate system and the translation matrix.
Wherein the set is actually the corresponding pixels obtained after registration of adjacent images. The point cloud data is projected on the image respectively, and the point (namely, laser point) falling into the set is taken, so that the coordinate information of the point cloud corresponding to each pixel in the sets in the front-view image can be obtained. For the projection manner of the point cloud data to the image, reference may be made to the related description in the previous embodiment, which is not described herein again.
After the coordinate information of the point cloud corresponding to the pixel in the front-view image is obtained, the coordinate system of the laser radar device is different from the coordinate system of the image acquisition device, so that the coordinate information of the point cloud needs to be converted into the coordinate information of the pixel, that is, the coordinate information of the point cloud needs to be converted into the coordinate system of the image acquisition device.
The above step 103, i.e., "converting the front-view image sequence into a top-view image according to the registration result and determining the coordinate information of each pixel in the top-view image", is described in detail below with reference to the embodiment.
In this step, each frame of front-view image in the front-view sequence may be converted into a top-view image based on inverse perspective transformation; and then matching on the top view according to the coordinate information of the pixels in the front view image, and determining the coordinate information of each pixel in the top view.
Among them, the inverse perspective transformation is a common way of performing image projection transformation at present. The essence is to transform the front-view image acquired by the image acquisition equipment into a z-0 plane under a world coordinate system.
Assuming that the coordinate information of the pixel in the front view is represented as (u, v), it needs to be converted into coordinates (x, y, z) in the world coordinate system. The following parameters can be obtained in the combined calibration process:
γ: the projection of the optical axis o of the image acquisition equipment on the plane where z is 0 and the included angle of the y axis are formed;
θ: the optical axis o of the image acquisition equipment deviates from the plane z which is equal to 0;
2 α: a view angle of the image capture device;
Rx: the resolution of the image acquisition equipment in the horizontal direction;
Ry: and the resolution of the image acquisition equipment in the vertical direction.
The inverse perspective transformation model can be expressed as follows:
wherein h is the height of the image acquisition equipment from the ground, and cot () is a cotangent function.
Through the above-described inverse perspective transformation, an orthographic image such as that shown in fig. 4a can be converted into a top view as shown in fig. 4 b.
According to the inverse perspective transformation theory, each frame of front-view image in the front-view sequence can be converted into a top view, and if N frames of front-view images exist, N top views are obtained. The top views are actually overlapped, and especially most of the adjacent top views are overlapped. In the process, the coordinate information of the pixels in the top view can be obtained, so that the pixels in each top view can be spliced one by one based on the position information of the pixels in each top view, and finally the high-definition map is obtained.
The following describes in detail the above step 104, i.e., "identifying the map elements for the top view to obtain the high-precision map data", with reference to the embodiment.
In this step, the road information may be recognized for the top view obtained in step 103; and then, the identified road information is superposed into a top view for display, and high-precision map data is obtained.
Wherein the road information may include lane lines, lane line types (e.g., white solid line, single yellow solid line, double yellow solid line, yellow dashed solid line, diversion line, yellow no-stop line, etc.), colors, guidance arrow information of lanes, lane types (e.g., main lane, bus lane, tidal lane, etc.), and the like.
In performing the above recognition, a semantic segmentation model based on a deep neural network may be used to segment road information, such as deep lab v 3. Image recognition techniques based on deep neural networks, such as fast-RCNN (Regions with CNN features), may also be employed to identify the above-mentioned road information.
Here, the recognition based on the above-described plan view is mainly recognition of the ground elements, that is, mainly recognition of the road information. While the identification of other map elements, such as traffic signs, buildings, etc., is identified from the elevational image. This section may take the form of prior art and is not limiting in this disclosure.
After the identified road information is superposed into the top view for display, an operator can directly compare the data of the top view with the superposed road information, correct the data with problems and generate the final high-precision map data.
The above is a detailed description of the method provided by the present disclosure, and the following is a detailed description of the apparatus provided by the present disclosure with reference to the embodiments.
Fig. 5 is a block diagram of a device for producing a high-precision map according to an embodiment of the disclosure, and as shown in fig. 5, the device 500 may include: an acquisition unit 510, a registration unit 520, a transformation unit 530 and an identification unit 540. The main functions of each component unit are as follows:
the acquiring unit 510 is configured to acquire point cloud data and front-view image data acquired by the acquiring device at each position point, so as to obtain a point cloud sequence and a front-view image sequence.
The acquisition equipment at least comprises image acquisition equipment for acquiring an orthographic view image and laser radar equipment for acquiring point cloud data.
As a preferred embodiment, in order to ensure the synchronization of the data and the subsequent registration process, the acquisition devices may be clocked and/or jointly calibrated in advance.
And a registration unit 520, configured to perform registration between the ortho-view image and the point cloud data on the point cloud sequence and the ortho-view image sequence.
A converting unit 530, configured to convert the front-view image sequence into a top-view image according to the registration result and determine coordinate information of each pixel in the top-view image.
And the identifying unit 540 is configured to identify the map elements for the top view to obtain high-precision map data.
Specifically, the registration unit 520 may include: the first registration subunit 521 and the projection subunit 522 may further include a syndrome subunit 523, a reference subunit 524, a second registration subunit 525, and a third registration subunit 526.
The first registration subunit 521 is configured to register adjacent images in the front-view image sequence to obtain a set formed by corresponding pixels in the adjacent images.
And a projection subunit 522, configured to project the point cloud data to the set, so as to obtain coordinate information of each pixel in the set.
In a preferred embodiment, the corrector subunit 523 is configured to perform distortion correction on the point cloud data according to the movement amount of the laser radar apparatus that collects the point cloud data and provides the point cloud data to the shadow casting unit 522.
As another preferred embodiment, the reference subunit 524 is configured to determine a reference point cloud in the point cloud sequence.
And a second registration subunit 525, configured to register, frame by frame, other point cloud data with the reference point cloud as a reference, and provide the registered point cloud data to the projection subunit 522.
The two ways may be combined, for example, distortion correction is performed on the point cloud data by using the corrector subunit 523, then the reference point cloud is determined by the reference subunit 524, and registration is performed by the second registration subunit 525.
The reference subunit 524 may use the first frame of point clouds in the point cloud sequence as a base point cloud. As a preferred embodiment, the reference subunit 524 is specifically configured to provide the first frame in the point cloud sequence to the second registration subunit 525 as a reference, perform registration on other point cloud data frame by frame, and obtain a registration result from the second registration subunit 525; and taking one frame of point cloud with the highest registration point ratio with the front and rear frames of point cloud data in the point cloud sequence as a reference point cloud.
A second registration subunit 525 for:
learning a conversion matrix between two frames of point clouds from the point clouds serving as reference and adjacent point clouds which are not registered;
converting the point cloud serving as the reference by using the conversion matrix to obtain the registered adjacent point cloud;
and taking the adjacent point clouds as new benchmarks, and turning to execute the operation of learning a conversion matrix between two frames of point clouds from the point clouds serving as the benchmarks and the adjacent point clouds which are not registered until the registration of all point cloud data in the point cloud image sequence is completed.
As an implementation manner, the second registration subunit 525, when learning a transformation matrix between two frames of point clouds from a point cloud as a reference and its neighboring point clouds that have not been registered yet, is specifically configured to: learning a conversion matrix between two frames of point clouds from the point clouds serving as the reference and the adjacent point clouds by adopting an iterative closest point ICP algorithm; wherein, the loss function of the ICP algorithm is as follows: and (3) converting each point in the point cloud serving as the reference according to the conversion matrix to obtain a distance average or a weighted average between each conversion point and the nearest point of each conversion point in the adjacent point cloud.
In a preferred embodiment, the third registration subunit 526 is configured to determine a set of corresponding point clouds in neighboring images.
When the second registration subunit 525 determines the weighted average, the weight value adopted by each distance is determined according to whether a point in the point cloud serving as a reference belongs to a set formed by corresponding point clouds.
A shadow casting unit 522, specifically configured to project coordinates of the point cloud data to a set to obtain coordinate information of a point cloud corresponding to a pixel in the front view image; and converting the coordinate information of the point cloud corresponding to the pixel in the front-view image into the coordinate information of the pixel according to the conversion from the laser radar coordinate system to the image acquisition equipment coordinate system and the translation matrix.
A converting unit 530, specifically configured to convert each frame of front-view images in the front-view sequence into a top-view image based on an inverse perspective transformation; and matching on the top view according to the coordinate information of the pixels in the front view image, and determining the coordinate information of each pixel in the top view.
An identifying unit 540, specifically configured to identify the road information for the top view; and superposing the identified road information to a top view for displaying to obtain high-precision map data.
Wherein the road information may include lane lines, lane line types (e.g., white solid line, single yellow solid line, double yellow solid line, yellow dashed solid line, diversion line, yellow no-stop line, etc.), colors, guidance arrow information of lanes, lane types (e.g., main lane, bus lane, tidal lane, etc.), and the like.
In performing the above recognition, a semantic segmentation model based on a deep neural network may be used to segment road information, such as deep lab v 3. Image recognition techniques based on deep neural networks, such as fast-RCNN (Regions with CNN features), may also be employed to identify the above-mentioned road information.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
As shown in fig. 6, is a block diagram of an electronic device of a method for producing a high-precision map according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the production method of the high-precision map. For example, in some embodiments, the method of producing high-precision maps may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608.
In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 802 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method of producing high-precision maps described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the method of producing high-precision maps in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller 30, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility existing in the traditional physical host and virtual Private Server (VPs) service. The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.