Storage tray detecting and positioning method based on RGBD camera
1. A storage tray detecting and positioning method based on an RGBD camera is characterized by comprising the following steps:
step 1, collecting RGB pictures containing storage trays in a field collection and online crawling mode, storing the RGB pictures, and establishing a training data set. And inputting the training data set into a Yolov5 network model to train weight parameters to obtain a trained Yolov5 model.
And 2, setting parameters such as an image format, a resolution ratio and a frame rate of the RGBD camera into a uniform format, starting the RGBD camera after hardware initialization and software initialization are carried out on the RGBD camera, respectively acquiring an RGB image and a depth image containing a storage tray, and carrying out Gaussian filtering and noise reduction processing on each acquired frame of RGB image and depth image.
And 3, deploying the Yolov5 model trained in the step 1 on a miniature artificial intelligence edge computing development platform. Inputting the collected RGB image data containing the storage trays into a trained Yolov5 network model, carrying out inference calculation by using a deep learning inference optimizer TensrT, realizing target detection in the RGB images, returning to the step 2 to obtain the images again if the target storage trays are not detected, and selecting the target storage trays in the RGB images by using rectangular frames if the targets are detected in the images.
Step 4, aligning the depth image containing the storage tray under the same timestamp with the RGB image, and acquiring depth information Z of the geometric center of the rectangular frame of the target storage tray in the RGB imagecAnd depth information Z of the middle point of the left and right boundaries1And Z2. And calibrating and acquiring an internal reference matrix of the RGB camera of the RGBD camera according to a camera calibration principle.
Step 5, calculating coordinates o (u, v) of the geometric center o of the target rectangular frame in the step 3 under the pixel coordinate system, and calculating coordinates a (u, v) of the middle points a and b of the left and right boundaries of the target rectangular frame under the pixel coordinate system1,v1) And b (u)2,v2). And then, converting the calculated pixel coordinates into image coordinates by using internal parameters of the RGB camera according to the translation relation.
And 6, converting the 2D coordinates in the image coordinate system in the step 5 into 3D coordinates in a camera coordinate system according to the perspective projection relation. And calculating the relative posture of the plane of the tray under the coordinate system of the camera according to the 3D coordinates of the two points a and b in the step 5.
2. The method for detecting and positioning warehouse trays based on RGBD cameras as claimed in claim 1, wherein the establishment of the training data set in step 1 specifically comprises:
1-1, acquiring 500 warehousing tray pictures with resolution ratio of over 1080p by acquiring the warehousing tray pictures on site and crawling the warehousing tray pictures on the internet to construct a training data set, wherein the training data set comprises the warehousing tray pictures shot at a plurality of angles;
and 1-2, marking and storing the training data set picture by using a Labelimage marking tool, wherein a yolo format marking mode is used, and a positive sample containing the storage tray is used, and a negative sample containing no storage tray is used.
3. The storage tray detecting and positioning method based on the RGBD camera as claimed in claim 1, wherein the Gaussian filtering noise reduction process in the step 2 comprises: and carrying out weighted average on the whole image, wherein the value of each pixel point is obtained by carrying out weighted average on the value of each pixel point and other pixel values in the neighborhood. The gaussian filter equation is as follows:
where σ is the standard deviation and x, y are the pixel coordinates.
4. The method for detecting and positioning the warehousing tray based on the RGBD camera as claimed in claim 1, wherein the specific process of aligning the depth image with the RGB image in the step 4 is as follows: and calling an official SDK interface of the RGBD camera to realize the mapping from pixel points in the depth image to pixel points in the RGB image, namely aligning the depth image with the RGB image.
5. The storage tray detecting and positioning method based on the RGBD camera as claimed in claim 1, wherein in step 4, an internal reference matrix of the RGB camera is obtained, and the form of the internal reference matrix is as follows:
wherein f isxAnd fyFocal lengths in the x and y directions, (u)0,v0) And the conversion from the pixel coordinate to the image coordinate and then to the camera coordinate is realized through the internal reference matrix.
6. The method as claimed in claim 1, wherein in step 5, the pixel coordinates are converted into image coordinates according to the translation relationship, and the calculation formula is as follows:
xi=(u-u0)dx,yi=(v-v0)dy,i=c,1,2
wherein (u)0,v0) Is the principal point coordinate, i.e. the coordinate of the origin of the image coordinate system in the pixel coordinate system, in units of pixels. dxAnd dyWhich represents the actual physical size of each pixel in the u-axis and v-axis directions, in mm. The coordinates of o, a and b in the image coordinate system are respectively o (x)c,yc),a(x1,y1),b(x2,y2)。
7. The method as claimed in claim 1, wherein the step 6 is performed by converting 2D image coordinates into 3D coordinates in a camera coordinate system according to a perspective projection relationship. The calculation formula is as follows:
wherein f is the focal length of the RGB camera, ZiAre depth values. Pass the 2D coordinates o (x) in the image coordinate system in step 5c,yc)、a(x1,y1) And b (x)2,y2) Can calculate that the 3D coordinates in the camera coordinate system are respectively O (X)c,Yc,Zc)、A(X1,Y1,Z1) And B (X)2,Y2,Z2)。
From coordinates (u, v) in the pixel coordinate system to coordinates (X) in the camera coordinate systemc,Yc,Zc) Seat ofThe standard conversion, written in matrix form, is as follows:
where K is the internal reference matrix of the RGB camera, ZcIs the depth value of the (u, v) point.
8. The RGBD camera-based warehouse tray detecting and positioning method according to claim 1, wherein in step 6, the relative posture of the tray plane in the camera coordinate system is calculated from the 3D coordinates of the two points a and b. The calculation formula is as follows:
whereinFor the yaw angle of the warehouse pallet plane relative to the camera, the pose of the pallet plane relative to the camera, i.e. the relative displacement O (X), can be obtained from steps 5 and 6c,Yc,Zc) And relative attitude
Background
With the rapid development of logistics information technology, robots have been widely used in many fields. The automatic guided vehicle has the highest popularization degree and the widest application range, and the body shadow of the automatic guided vehicle can be seen in large and small warehouses. Tray fork truck AGV is a common fork formula AGV, and its load capacity is big, the removal is nimble, and work efficiency is high, plays important role in intelligent storage. The tray forklift AGV mainly comprises a fixed rail and a trackless forklift, wherein the fixed rail forklift has a fixed advancing path and a relatively single working mode; the trackless forklift is not limited by a fixed path, the intelligent degree is higher, the application scene is wider, the implementation mode is more flexible, and the trackless forklift is the future direction of development of the automatic forklift in terms of cost, technology and application.
Due to the fact that the storage environment is complex and changeable, the storage environment is affected by factors such as uneven illumination, limited overall positioning precision and accumulated conveying errors, and automatic conveying is difficult to achieve. Therefore, the difficulty is eliminated, accurate, quick and robust detection and positioning of the storage tray are realized, and the intelligent level of the automatic forklift is improved.
The existing storage tray detection method is mainly based on laser radar, a vision sensor and the combination of the laser radar and the vision sensor. Although the detection method based on the laser radar has stronger robustness to illumination, the cost of the sensor is generally too expensive, the data is sparse, the characteristics are not obvious, and the detection method is not suitable for large-scale use. The visual sensor has rich characteristic information and low price, and is more suitable for practical application compared with a detection method based on a laser radar.
Disclosure of Invention
The invention aims to overcome various difficulties in realizing automatic carrying of a forklift and the defects of the existing storage tray detection method, and provides a storage tray detection and positioning method based on an RGBD camera, which relates to an RGBD camera and a micro artificial intelligence edge calculation development platform, wherein the RGBD camera is used for collecting images, and a trained YOLOV5 model is deployed on the micro artificial intelligence edge calculation development platform module and used for reasoning and calculation. The RGB image is input into a YOLOV5 network for tray detection, and the position and posture of the tray center relative to the camera are calculated by acquiring the tray depth information in the depth map and combining with the camera internal parameters.
The purpose of the invention is realized by the following technical scheme: a storage tray detecting and positioning method based on an RGBD camera comprises the following steps:
step 1, collecting RGB pictures containing storage trays in a field collection and online crawling mode, storing the RGB pictures, and establishing a training data set. And inputting the training data set into a Yolov5 network model to train weight parameters to obtain a trained Yolov5 model.
And 2, setting parameters such as an image format, a resolution ratio and a frame rate of the RGBD camera into a uniform format, starting the RGBD camera after hardware initialization and software initialization are carried out on the RGBD camera, respectively acquiring an RGB image and a depth image containing a storage tray, and carrying out Gaussian filtering and noise reduction processing on each acquired frame of RGB image and depth image.
And 3, deploying the Yolov5 model trained in the step 1 on a miniature artificial intelligence edge computing development platform. Inputting the collected RGB image data containing the storage trays into a trained Yolov5 network model, carrying out inference calculation by using a deep learning inference optimizer TensrT, realizing target detection in the RGB images, returning to the step 2 to obtain the images again if the target storage trays are not detected, and selecting the target storage trays in the RGB images by using rectangular frames if the targets are detected in the images.
Step 4, aligning the depth image containing the storage tray under the same timestamp with the RGB image, and acquiring depth information Z of the geometric center of the rectangular frame of the target storage tray in the RGB imagecAnd depth information Z of the middle point of the left and right boundaries1And Z2. And calibrating and acquiring an internal reference matrix of the RGB camera of the RGBD camera according to a camera calibration principle.
Step 5, calculating coordinates o (u, v) of the geometric center o of the target rectangular frame in the step 3 under the pixel coordinate system, and calculating coordinates a (u, v) of the middle points a and b of the left and right boundaries of the target rectangular frame under the pixel coordinate system1,v1) And b (u)2,v2). And then, converting the calculated pixel coordinates into image coordinates by using internal parameters of the RGB camera according to the translation relation.
And 6, converting the 2D coordinates in the image coordinate system in the step 5 into 3D coordinates in a camera coordinate system according to the perspective projection relation. And calculating the relative posture of the plane of the tray under the coordinate system of the camera according to the 3D coordinates of the two points a and b in the step 5.
Further, the establishing of the training data set in the step 1 specifically includes:
1-1, acquiring 500 warehousing tray pictures with resolution ratio of over 1080p by acquiring the warehousing tray pictures on site and crawling the warehousing tray pictures on the internet to construct a training data set, wherein the training data set comprises the warehousing tray pictures shot at a plurality of angles;
and 1-2, marking and storing the training data set picture by using a Labelimage marking tool, wherein a yolo format marking mode is used, and a positive sample containing the storage tray is used, and a negative sample containing no storage tray is used.
Further, in the step 2, the specific process of performing gaussian filtering and noise reduction processing includes: and carrying out weighted average on the whole image, wherein the value of each pixel point is obtained by carrying out weighted average on the value of each pixel point and other pixel values in the neighborhood. The gaussian filter equation is as follows:
where σ is the standard deviation and x, y are the pixel coordinates.
Further, the specific process of aligning the depth image and the RGB image in step 4 is as follows: and calling an official SDK interface of the RGBD camera to realize the mapping from pixel points in the depth image to pixel points in the RGB image, namely aligning the depth image with the RGB image.
Further, in step 4, an internal reference matrix of the RGB camera needs to be obtained, and the form of the internal reference matrix is as follows:
wherein f isxAnd fyIn the x and y directionsFocal length of (u)0,v0) And the conversion from the pixel coordinate to the image coordinate and then to the camera coordinate is realized through the internal reference matrix.
Further, in step 5, the pixel coordinates are converted into image coordinates according to the translation relationship, and the calculation formula is as follows:
xi=(u-u0)dx,yi=(v-v0)dy,i=c,1,2
wherein (u)0,v0) Is the principal point coordinate, i.e. the coordinate of the origin of the image coordinate system in the pixel coordinate system, in units of pixels. dxAnd dyWhich represents the actual physical size of each pixel in the u-axis and v-axis directions, in mm. The coordinates of o, a and b in the image coordinate system are respectively o (x)c,yc),a(x1,y1),b(x2,y2)。
Further, in step 6, the 2D image coordinates are converted into 3D coordinates in a camera coordinate system according to the perspective projection relationship. The calculation formula is as follows:
wherein f is the focal length of the RGB camera, ZiAre depth values. Pass the 2D coordinates o (x) in the image coordinate system in step 5c,yc)、a(x1,y1) And b (x)2,y2) Can calculate that the 3D coordinates in the camera coordinate system are respectively O (X)c,Yc,Zc)、A(X1,Y1,Z1) And B (X)2,Y2,Z2)。
From coordinates (u, v) in the pixel coordinate system to coordinates (X) in the camera coordinate systemc,Yc,Zc) The coordinate transformation of (2) is written in matrix form as follows:
where K is the internal reference matrix of the RGB camera, ZcIs the depth value of the (u, v) point.
Further, in step 6, the relative posture of the plane of the tray in the camera coordinate system is calculated by the 3D coordinates of the two points a and b. The calculation formula is as follows:
whereinFor the yaw angle of the warehouse pallet plane relative to the camera, the pose of the pallet plane relative to the camera, i.e. the relative displacement O (X), can be obtained from steps 5 and 6c,Yc,Zc) And relative attitude
The beneficial results of the invention are: the invention provides a storage tray detecting and positioning method based on an RGBD (red, green and blue) camera, which utilizes the characteristic that the RGBD camera can obtain both RGB (red, green and blue) images and depth images, firstly sends the RGB images into a trained YOLOV5 network for inference calculation, then obtains the depth information of a storage tray from the depth images, and finally performs coordinate transformation by combining the depth information with camera internal parameters through a projection principle to finally calculate the displacement and the posture of the storage tray relative to the RGBD camera. The invention does not need to lay a characteristic label and appoint the color form of the pallet, and compared with the prior art, the invention has the characteristics of simple implementation, high accuracy, strong robustness and the like, and greatly improves the safety and intelligent level of the automatic forklift.
Drawings
Fig. 1 is a flow chart of a warehouse tray detecting and positioning method based on an RGBD camera.
FIG. 2 is a diagram of the warehouse tray test results.
Fig. 3 is a front view of the warehouse tray during inspection.
FIG. 4 is a top view of the warehouse tray during inspection.
Detailed Description
The technical solutions of the present invention are further clearly and completely described below with reference to the accompanying drawings, but the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
As shown in fig. 1, the invention provides a storage tray detecting and positioning method based on an RGBD camera, which includes the following steps:
the RGBD camera used in the embodiment of the invention is Realsense D435, and the camera comprises an RGB camera, two infrared cameras and an infrared projector, wherein the RGB camera is used for acquiring RGB images of 8-bit 3 channels; the infrared camera acquires depth distance information of each pixel point by using a binocular principle, namely a depth image of a 16-bit single channel; the infrared projector is used for performing infrared light supplement in a scene with dark light.
Step 1-1, collecting and storing RGB pictures of the storage tray, and establishing a training data set. The method specifically comprises the following steps:
(1) acquiring 500 warehousing tray pictures with resolution ratio of 1080p or more by using a Realsense D435 camera to acquire the warehousing tray pictures on site and crawling the warehousing tray pictures on the network, and constructing a training data set, wherein the training data set comprises the warehousing tray pictures shot at a plurality of angles;
(2) and (3) marking and storing the training data set pictures by using a Labelimage marking tool, wherein a yolo format marking mode is used, and a positive sample containing the storage tray is used, and a negative sample containing no storage tray is used.
Step 1-2, inputting the training data set into a Yolov5 network model to train weight parameters, and obtaining a trained Yolov5 model. The method specifically comprises the following steps:
(1) configuring a virtual environment, downloading the latest YOLOV5 source code to the local, and installing a relevant environment required by the YOLOV5 operation;
(2) installing necessary dependent packages in a virtual environment according to requirements of requirements.txt files;
(3) downloading a pre-training model according to a link given by YOLOV5, and putting the pre-training model in a catalogue at the same level as detect.py after the pre-training model is downloaded;
(4) the system aims at the large target of the storage tray, and the hardware performance of the development platform is limited, so that the YOLOV5 model which has the fastest speed and is suitable for large target detection is selected;
(5) modifying the parameter configuration of YOLOV5, wherein the detection types are only one type of storage trays, and only the number of the modified types is equal to 1 and other parameters are kept at default values;
(6) starting training;
(7) the original model obtained by YOLOV5 training is stored in a pt file.
And 2-1, configuring parameters such as an output image format, resolution, frame rate and the like of the Realsense D435 RGBD camera to be in a uniform format, and initializing hardware and software of the Realsense D435 RGBD camera. If the initialization is successful, directly starting a camera to acquire an image; if the initialization fails, the exception is thrown out, and the program is terminated.
And 2-2, starting the Realsense D435 RGBD camera, and respectively acquiring an RGB image and a depth image in two processing modes. If the image is successfully acquired, the next step is carried out to preprocess the image; and if the image acquisition fails, acquiring the image data again.
And 2-3, carrying out image preprocessing on each collected RGB image and depth image, and reducing noise by using Gaussian filtering. The specific process of carrying out Gaussian filtering noise reduction treatment comprises the following steps: and carrying out weighted average on the whole image, wherein the value of each pixel point is obtained by carrying out weighted average on the value of each pixel point and other pixel values in the neighborhood. The gaussian filter equation is as follows:
where σ is the standard deviation and x, y are the pixel coordinates.
And 3-1, deploying the Yolov5 model trained in the step 1-2 on a miniature artificial intelligence edge computing development platform, wherein a deep learning reasoning optimizer TensrT is used in the reasoning process, the TensrT is an SDK (software development kit) for deep learning reasoning, the TensrT provides an API (application programming interface) and a resolver to introduce the trained model from a deep learning framework and generate a reasoning engine, the calculated amount is reduced, the precision is kept, the rapid and efficient deployment reasoning is realized, and the obtained calculation result is output by a network. The micro artificial intelligence edge computing development platform is loaded with an 8-Core ARM 64-bit CPU and a 512-Core GPU of a Tensor Core, a ubuntu18.04 system is preassembled, and a software Driver and a software library are installed and comprise an ROS, a CUDA, a CuDNN, a GPU Driver, an OpenCV and a TensorRT.
And 3-2, analyzing and processing the image, inputting the acquired image data into a trained Yolov5 network model for reasoning and calculation, and outputting the calculation result by a network. The method specifically comprises the following steps:
(1) installing TensorRT, and downloading a corresponding deep learning inference optimizer TensorRT according to the CUDA and CUDNN versions of the TensorRT;
(2) creating a network and a builder object by using an API of TensorRT, and then converting a YOLOV5 model in a pt format into a wts binary file;
(3) building an inference engine by using the builder object, loading weight, obtaining an output head of YOLOV5, serializing to the local, and generating YOLOV5. engine;
(4) reading yolov5.engine, creating runtime deserialization loading engine, and loading IExecutionContext to execute inference calculation;
(5) the detected target is framed by a rectangular frame, and the detection result is output, as shown in fig. 2.
And 3-3, detecting the target in the RGB image, returning to the step 2-2 to obtain the image again if the target storage tray is not detected, and selecting the target storage tray in the RGB image by using a rectangular frame if the target is detected in the image, and entering the next step.
Step 4-1, aligning the depth image under the same time stamp with the RGB image, and acquiring depth information Z of the geometric center of the target storage tray in the RGB imagecAnd depth information Z of the middle point of the left and right boundaries1And Z2. Specifically, call RealseAnd the SDK interface of the nse D435 camera realizes the mapping from the pixel points in the depth image to the pixel points in the RGB image, namely the depth image is aligned with the RGB image, and then the depth value of the area corresponding to the tray is obtained.
And 4-2, calibrating and obtaining the internal reference matrix of the RGB camera of the Realsense D435 RGBD camera according to a camera calibration principle. The internal reference matrix is of the form:
wherein f isxAnd fyFocal lengths in the x and y directions, (u)0,v0) The coordinate of the main point is converted from the pixel coordinate to the image coordinate and then to the camera coordinate through the internal reference matrix.
Step 5-1, as shown in fig. 3, is a front view of the warehouse pallet during detection, and calculates coordinates o (u, v) of the geometric center o of the target rectangular frame in the step 3-3 under a pixel coordinate system, for calculating the displacement of the warehouse pallet relative to the camera.
Step 5-2, as shown in fig. 4, is a top view of the warehouse tray during detection, and calculates coordinates a (u) of the middle points a and b of the left and right boundaries of the target rectangular frame in the step 3-3 under the pixel coordinate system1,v1) And b (u)2,v2) And the angle posture of the warehousing tray relative to the camera is calculated.
And 5-3, converting the pixel coordinates in the steps 5-1 and 5-2 into image coordinates according to the translation relation. The specific calculation formula is as follows:
xi=(u-u0)dx,yi=(v-v0)dy,i=c,1,2
wherein (u)0,v0) Is the principal point coordinate, i.e. the coordinate of the origin of the image coordinate system in the pixel coordinate system, in units of pixels. dx,dyWhich represents the actual physical size of each pixel in the u-axis and v-axis directions, in mm. The coordinates of o, a and b in the image coordinate system are respectively o (x)c,yc),a(x1,y1),b(x2,y2) Reference may be made to fig. 3.
And 5-4, converting the 2D coordinates in the image coordinate system in the step 5-3 into 3D coordinates in a camera coordinate system according to the perspective projection relation. The calculation formula is as follows:
wherein f is the focal length of the RGB camera, ZiAre depth values. Pass the 2D coordinates o (x) in the image coordinate system in step 5-3 abovec,yc)、a(x1,y1) And b (x)2,y2) Can calculate that the 3D coordinates in the camera coordinate system are respectively O (X)c,Yc,Zc)、A(X1,Y1,Z1) And B (X)2,Y2,Z2) Reference may be made to fig. 4.
From coordinates (u, v) in the pixel coordinate system to coordinates (X) in the camera coordinate systemc,Yc,Zc) The coordinate transformation of (2) is written into a matrix form:
where K is the internal reference matrix of the RGB camera, ZcIs the depth value of the (u, v) point.
And 5-5, calculating the relative posture of the plane of the tray under the camera coordinate system according to the 3D coordinates of the two points A, B in the step 5-4. The calculation formula is as follows:
wherein phi is the yaw angle of the storage tray plane relative to the camera, the posture of the tray plane relative to the camera, namely the relative displacement O (X) can be obtained by the stepsc,Yc,Zc) And a relative pose phi.
The invention does not need to lay a characteristic label and appoint the color form of the pallet, and compared with the prior art, the invention has the characteristics of simple implementation, high accuracy, strong robustness and the like, and greatly improves the safety and intelligent level of the automatic forklift. It should be noted that, the steps in the method for detecting and positioning a storage tray based on an RGBD camera provided by the present invention can be implemented by using modules, devices, units, etc. corresponding to the method for detecting and positioning a storage tray based on an RGBD camera.
Those skilled in the art will appreciate that the method and various apparatuses provided by the present invention can be considered as a hardware system, and the apparatuses included in the method and various apparatuses for implementing various functions can also be considered as structures in hardware components; means for performing the functions may also be regarded as structures within both software modules and hardware components for performing the methods.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various modifications or alterations may be made by those skilled in the art without departing from the technical principles of the present invention, and should also be construed as the scope of the present invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:基于图像识别进行数据映射的方法及系统