6DoF pose estimation method of reflective workpiece
1. A method for estimating the 6DoF pose of a reflective workpiece is characterized by comprising the following steps:
extracting the characteristics of the reflective workpiece through a convolutional neural network and a point cloud network respectively;
dividing the features into a plurality of subsets equally by adopting a dividing and dividing feature extraction algorithm;
fusing a plurality of subsets, inputting the fused subsets into a deep Hough voting network, and outputting the 3D key points of the subsets;
and fitting each 3D key point by using a RANSAC fitting algorithm to obtain a 6DoF pose estimation result.
2. The method for estimating the 6DoF pose of a reflective workpiece according to claim 1, wherein the extracting the features of the reflective workpiece through a convolutional neural network and a point cloud network specifically comprises:
respectively inputting the point cloud data of the reflective workpiece into a convolutional neural network and a point cloud network, and extracting the characteristics of the reflective workpiece, wherein the characteristics comprise: surface information of the reflective workpiece, point clouds, and geometric information in a corresponding normal map.
3. The method of estimating a 6DoF pose of a retroreflective workpiece according to claim 1, wherein the averaging the features into a plurality of subsets using a divide-and-conquer feature extraction algorithm specifically comprises: and dividing the features into a plurality of subsets with the same semantic label according to a clustering algorithm.
4. The method for estimating the 6DoF pose of a reflective workpiece according to claim 1, wherein the fusing of the subsets is performed, the fused subsets are input into a deep Hough voting network, and 3D key points of the subsets are output, and the method specifically comprises the following steps:
after the plurality of subsets are fused, voting is carried out on each subset through a sharing sensor of the deep Hough voting network, and then the subsets are transmitted to a clustering module of the deep Hough voting network for processing, so that 3D key points of the subsets are obtained.
Background
Due to light change, scene shading and object truncation, 6DoF becomes an important component of applications such as a robot grabbing task and augmented reality, and especially due to unstructured factors such as light reflection on the surface of a metal part and mutual shading during random placement, a huge challenge is brought to a pose estimation task of an object.
At present, for the above problems, parts are identified by using an approximate nearest neighbor algorithm on the basis of SURF feature extraction, and the like, which only aim at parts with regular shapes and orderly arrangement, and deep research on estimation of pose of parts with complex shapes and randomly stacked arrangement is lacked. However, the workpiece is made of galvanized material, and the surface is uneven, so that the reflection is severe, so that the complete 3D modeling cannot be performed on the workpiece by using tools such as realsense and the like no matter how multi-angle light adjustment is performed or a proper amount of developer is sprayed, and modeling software such as solidwork and the like can achieve a certain effect, but has a great difference from the real state of the workpiece. On the other hand, most of the existing methods based on the key points use 8 vertexes of a 3D bounding box of an object as the key points, and compared with the incomplete shape of a workpiece caused by the missing point cloud information, the method cannot well process the detection problem of the key points if the key points are not in the workpiece. Therefore, in the task of grabbing workpieces such as metal by the existing industrial robot, the pose estimation effect is poor due to the reflection of the surface of the workpiece or due to the fact that the workpieces are placed randomly and are shielded from each other and other unstructured factors.
Disclosure of Invention
The application provides a 6DoF pose estimation method of a reflective workpiece, which is used for solving the technical problem that the pose estimation effect is poor due to the reflection of light on the surface of the workpiece or due to the mutual shielding and other unstructured factors when the workpiece is placed at will in the grabbing task of the existing industrial robot to the workpieces such as metal.
In view of the above, a first aspect of the present application provides a method for estimating a 6DoF pose of a reflective workpiece, the method including:
extracting the characteristics of the reflective workpiece through a convolutional neural network and a point cloud network respectively;
dividing the features into a plurality of subsets equally by adopting a dividing and dividing feature extraction algorithm;
fusing a plurality of subsets, inputting the fused subsets into a deep Hough voting network, and outputting the 3D key points of the subsets;
and fitting each 3D key point by using a RANSAC fitting algorithm to obtain a 6DoF pose estimation result.
Optionally, the extracting the features of the reflective workpiece through the convolutional neural network and the point cloud network specifically includes:
respectively inputting the point cloud data of the reflective workpiece into a convolutional neural network and a point cloud network, and extracting the characteristics of the reflective workpiece, wherein the characteristics comprise: surface information of the reflective workpiece, point clouds, and geometric information in a corresponding normal map.
Optionally, the dividing and dividing feature extraction algorithm is used to divide the features into a plurality of subsets, and specifically includes: and dividing the features into a plurality of subsets with the same semantic label according to a clustering algorithm.
Optionally, the fusing the plurality of subsets and inputting the fused subsets into a deep hough voting network to output the 3D key points of each subset specifically includes:
after the plurality of subsets are fused, voting is carried out on each subset through a sharing sensor of the deep Hough voting network, and then the subsets are transmitted to a clustering module of the deep Hough voting network for processing, so that 3D key points of the subsets are obtained.
According to the technical scheme, the method has the following advantages:
the application provides a method for estimating a 6DoF pose of a reflective workpiece, which comprises the following steps: firstly, extracting the characteristics of a reflective workpiece through a convolutional neural network and a point cloud network respectively, wherein the characteristics comprise: surface information of the reflective workpiece, point cloud and geometric information in the corresponding normal map; dividing the features into a plurality of subsets by adopting a dividing and dividing feature extraction algorithm; then fusing a plurality of subsets, inputting the fused subsets into a deep Hough voting network, and outputting the 3D key points of each subset; and finally, fitting each 3D key point through a RANSAC fitting algorithm to obtain a 6DoF pose estimation result.
The 6DoF pose estimation method of the reflective workpiece comprises the steps of firstly extracting surface information of the reflective workpiece by using a convolutional neural network, simultaneously extracting geometrical information in point cloud and normal mapping of the point cloud by using a point cloud network, then equally dividing the characteristics into a plurality of subsets by using a divide-and-conquer characteristic extraction algorithm, and then detecting 3D key points of the subsets by using a deep Hough voting network, wherein the dense 2D-3D correspondence has robustness on scenes with serious workpiece information loss caused by reflection and the like, and finally performing 6DoF pose parameter estimation on predicted key points by using a voting method without parameter fitting limitation, namely RANSAC, so that the tolerance on noise problems caused by illumination is higher, and the accuracy is higher under the condition of serious reflection. Therefore, the technical problem that in the task of grabbing workpieces such as metal by the existing industrial robot, the pose estimation effect is poor due to the reflection of the surface of the workpiece or due to the fact that the workpieces are placed randomly and are shielded mutually and the like due to unstructured factors is solved.
Drawings
FIG. 1 is a schematic flow chart of a first embodiment of a method for estimating a 6DoF pose of a reflective workpiece according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a second embodiment of a method for estimating a 6DoF pose of a reflective workpiece according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The applicant considers that the information of the reflective workpiece is incomplete due to unstructured factors such as mutual shielding when the reflective workpiece is placed randomly, the information loss caused by the reflective problem is properly converted into the shielding problem, the point cloud data is directly processed, the point of the obvious information part can be voted according to the visible part by using a deep Hough voting network, and the position of the key point of the metal workpiece can be deduced under the condition that the judgment key point caused by incomplete information is possibly not on the surface of the workpiece. The following are specific methods of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of a method for estimating a 6DoF pose of a reflective workpiece according to an embodiment of the present disclosure.
The method for estimating the 6DoF pose of the reflective workpiece provided by the embodiment of the application comprises the following steps:
101, extracting the characteristics of the reflective workpiece through a convolutional neural network and a point cloud network respectively.
It should be noted that, in this embodiment, the point cloud data of the reflective workpiece is input to the convolutional neural network to extract and obtain the surface features and the geometric information of the reflective workpiece, where the point cloud data is not limited to be obtained by scanning, and may be obtained by converting depth information of an RGB-D image; simultaneously, inputting the point cloud data into a point cloud network to extract geometrical information in the point cloud and the normal map thereof; thereby obtaining the characteristics of the reflective workpiece.
Step 102, dividing the features into a plurality of subsets by adopting a divide-and-conquer feature extraction algorithm.
The subsets of the present embodiment are classified according to the same semantic tags, that is, the features of the same semantic tags are classified into one class.
In the feature extraction technique of divide and conquer in this embodiment, it should be noted that:
under the conditions of illumination or shielding and the like, information of a workpiece is seriously lost, point cloud acquisition is incomplete, contact among local features is weakened due to uncertainty factors, and key points obtained by voting of real points of the workpiece in a single area may not be on the surface of the workpiece, so that point cloud data of the workpiece is uniformly divided into P sub-parts (subsets) at the feature extraction stage, 2D feature points after semantic segmentation are classified into the subsets for voting, and then the features of the subsets are collected and classified.
Given a set of subset points consisting of visible partsAnd a set of selected keypoints belonging to the same instance I
Let xiAs 3D coordinates, fiFor extracted features, p isiIs represented by pi=[xi;fi]
After extracting the characteristics of the points in each subset, calculating the offset of the ith point relative to the selected jth key point
Learning is then supervised according to the following loss function:
wherein the content of the first and second substances,representing the ground truth translation offset, M is the total number of selected target keypoints and N is the total number of subsets.
By grouping, extracting and summarizing the features, the feature extraction is further improved in a parallel mode.
And 103, fusing the plurality of subsets, inputting the fused subsets into a deep Hough voting network, and outputting the 3D key points of the subsets.
The deep Hough voting network comprises a key point detection module and a semantic segmentation module, wherein the key point detection module and the semantic segmentation module are composed of a plurality of shared perceptrons. The semantic segmentation module forces the model to extract global and local features on the instance to distinguish different objects, which is beneficial to positioning a point on the object and is beneficial to the key point reasoning process. Therefore, in this embodiment, after a plurality of subsets are fused, the fused subsets are input to a deep hough voting network, it should be noted that votes in each subset are independently processed by a shared sensor, and then are transmitted to a voting clustering module for processing, so as to obtain 3D key points of each subset.
And step 104, fitting each 3D key point through a RANSAC fitting algorithm to obtain a 6DoF pose estimation result.
It should be noted that, because the point cloud data discreteness caused by factors such as reflection is large, oh noises exist, and the RANSAC algorithm is more suitable for the scene without parameter fitting limitation caused by uncertainty factors such as illumination intensity, and is beneficial to realizing final fitting. Therefore, each 3D key point is fitted through the RANSAC fitting algorithm, and a 6DoF pose estimation result of the reflective workpiece is obtained.
The method for estimating the 6DoF pose of the reflective workpiece comprises the steps of firstly extracting surface information of the reflective workpiece by using a convolutional neural network, simultaneously extracting geometrical information in point cloud and normal mapping of the point cloud by using a point cloud network, then equally dividing the characteristics into a plurality of subsets by using a divide-and-conquer characteristic extraction algorithm, and then detecting 3D key points of the subsets by using a deep Hough voting network, wherein the dense 2D-3D corresponding relation has robustness on scenes with serious workpiece information loss caused by reflection and the like, and finally performing 6DoF pose parameter estimation on a predicted key point by using a voting method without parameter fitting limit, namely RANSAC, so that the method has higher tolerance on noise problems caused by illumination and has higher accuracy under the condition of serious reflection. Therefore, the technical problem that in the task of grabbing workpieces such as metal by the existing industrial robot, the pose estimation effect is poor due to the reflection of the surface of the workpiece or due to the fact that the workpieces are placed randomly and are shielded mutually and the like due to unstructured factors is solved.
The above is a first embodiment of the method for estimating the 6DoF pose of the reflective workpiece provided in the embodiment of the present application, and the following is a second embodiment of the method for estimating the 6DoF pose of the reflective workpiece provided in the embodiment of the present application.
Referring to fig. 2, fig. 2 is a flowchart illustrating a second embodiment of a method for estimating a 6DoF pose of a reflective workpiece according to the present application.
The method for estimating the 6DoF pose of the reflective workpiece provided by the embodiment of the application comprises the following steps:
step 201, respectively inputting point cloud data of a reflective workpiece into a convolutional neural network and a point cloud network, and extracting characteristics of the reflective workpiece, wherein the characteristics comprise: surface information of the reflective workpiece, point clouds, and geometric information in the corresponding normal map.
Step 201 of this embodiment is the same as step 101 of this embodiment, please refer to step 101 for description, and will not be described herein again.
Step 202, the features are divided into a plurality of subsets with the same semantic labels according to a clustering algorithm.
And 203, after the plurality of subsets are fused, voting is carried out on each subset through a shared sensor of the deep Hough voting network, and then the voted subsets are transmitted to a clustering module of the deep Hough voting network for processing, so that 3D key points of each subset are obtained.
For steps 202 and 203, the voting clustering and classifying technique based on VoteNet in this embodiment needs to be described as follows:
a clustering algorithm is used to distinguish point-to-target keypoint votes on different instances having the same semantic label and on the same instance, clustering the votes by uniform sampling and grouping according to spatial consistency.
From a set of votesUsing { y basis in 3D Euclidean spaceiSample the farthest point of K votes, get K1To form K clusters:
wherein, (3+ C) is the number of feature subsets after the semantic segmentation module.
The multilayer perceptron polymerization technology based on PointNet is described as follows: :
given a voting cluster C ═ wi},i=1,...,n,wjDenotes the center of the cluster, wi=[zi;hi],zi∈R3As voting areas, hi∈RCAs a voting feature. To use the geometry of the local vote, we pass z'i=(zi-zj) The/r converts the voting position into local normalized coordinates. The aggregate input is then passed through a PointNet-like module, generating cluster p (c):
voting in each subset is by MLP1Independently processed, then maximally converged to a single feature vector and delivered to MLP2Information from different votes in MLP2Are further combined.
And 204, fitting each 3D key point through a RANSAC fitting algorithm to obtain a 6DoF pose estimation result.
Step 204 of this embodiment is the same as step 104 of this embodiment, please refer to step 104 for description, and will not be described herein again.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed method can be implemented in other ways, and the above embodiments are only used for illustrating the technical solutions of the present application, but not limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:用于识别交通标志牌的方法及装置