Human body key point data expansion method based on VR
1. A human body key point data expansion method based on virtual reality VR is characterized in that: the method comprises the following steps:
s1: a user wears a virtual reality head mounted display (VR) HMD;
s2: synchronizing time service of a VR HMD and camera connection device, and setting sensor data acquisition frequency of the VR HMD to be consistent with shooting frequency of a camera;
s3: establishing a camera aperture center as an original point OcWith the coordinate axis Xc、YcAnd Zc(ii) a Set up with the user directly belowThe surface being an origin OwWith coordinate axis X in the world coordinate systemw、YwAnd Zw(ii) a Placing a camera at a certain distance right in front of a user, keeping the angle between a camera coordinate system and a world coordinate system consistent, shooting user actions to be identified, and acquiring RGB (red, green and blue) videos of the user actions;
s4: extracting 2D pixel coordinates of key points of each frame and HMD sensor data;
s5: calculating the origin O of the camera relative to the world coordinate systemwThe offset vector T and the rotation matrix R after translation and rotation;
s6: obtaining pixel coordinates (u, v) of the key points of the human body in the RGB picture from the obtained RGB video by using a key point extraction method;
s7: using a camera calibration method based on a calibration object to acquire a camera internal parameter f;
s8: acquiring a world coordinate system (X) of a corresponding point through pixel coordinates (u, v) of four key points of a nose, a neck and left and right shoulders of a user by using a camera imaging principlew,Yw,Zw);
S9: calculating the 3D world coordinate System (X) of the keypointsw,Yw,Zw);
S10: calculating an offset vector T and a rotation matrix R of a required shooting angle and position;
s11: calculating and storing the 2D pixel coordinates (u, v) of the corresponding key point at the new camera position;
s12: judging whether the 2D pixel coordinates (u, v) reach the required data volume; if so, ending the program; if not, the process returns to S10.
2. The method of claim 1, in which the method comprises, for each of the one or more virtual reality VR-based human body key point data augmentation: the user actions include lying down sitting, nodding and standing up.
3. The method of claim 2, in which the method comprises, for each of the VR-based human body key point data augmentation: the offset vector T is calculated as follows:
the calculation formula of the rotation matrix R is as follows:
R=RxRyRz
wherein, theta,And σ is the angle of rotation about the x, Y and z axes, respectively, Y0As the origin of coordinates O of the cameracRelative world origin of coordinates OwDistance in the Y-axis direction; z0As the origin of coordinates O of the cameracRelative world origin of coordinates OwAnd the distance in the Z-axis direction.
4. The method of claim 3, in which the method comprises, for each of the one or more virtual reality VR models, a method of human body key point data augmentation in which: the world coordinate system (X)w,Yw,Zw) The calculation formula of (a) is as follows:
wherein Z isvrObtaining depth information for VR sensor during user movement, (u)0,v0) Half the resolution of the image.
5. The method of claim 4, in which the method comprises the steps of: the calculation formula of the 2D pixel coordinates (u, v) in S11 is as follows:
Background
In an online platform implemented by using VR technology, users all exist in a virtual scene in an avatar mode. These avatars can follow the user's movements, but the specific actions require the user to actively select, and the avatars cannot directly synchronize the user's actions, which undoubtedly reduces the sense of immersion of the virtual world. The actions of the user are automatically recognized in a deep learning mode, action data are needed as input, and the action data have multiple modes and are good and bad.
The data set of the motion recognition is mainly divided into an RGB video data set, a depth video data set and a human body skeleton sequence data set according to data modalities. The human skeleton sequence can also be called as human key points, the 3D human key points can be obtained by a depth camera, and can also be obtained by a binocular camera and a key point extraction algorithm. The 2D human body key points can be obtained through an RGB video and key point extraction algorithm, and specific data are 2D pixel coordinates of a certain key point (such as a neck, a left eye and a right eye) of a human body in an RGB picture. The 2D human key points can be used for motion recognition, but when the RGB video, the 2D key points and the 3D key points collected by the same motion are respectively used for motion recognition, the effect of using the 2D key points is often inferior to that of using the RGB video or the 3D human key points. But adopt human key point to carry out action recognition, compare RGB video's advantage in being difficult to receive the illumination influence, can better protect user privacy.
In order to improve the effect of using the 2D key points as motion recognition, the method can be implemented by adding data of the 2D key points of the same motion, which are taken at different angles and distances. In general this requires RGB video that can cover different angles, distances. The RGB video data set for motion recognition can be obtained by collecting network videos or self-shooting, and all reasonable shooting angles cannot be covered in the network videos or the self-shooting.
Without a depth camera, but with a VRHMD (virtual reality head mounted display), coordinate conversion between 2D and 3D can be achieved using depth information provided by the VRHMD. The depth information of the nose, the neck and the left and right shoulders is basically consistent with the depth information collected by the VRHMD in the process of human body movement. In one frame, a 3D world coordinate system of a certain key point of a human body in space is fixed, but 2D pixel coordinates corresponding to the key point in a shot picture are mapped to different positions according to the shooting angle and distance, and the mapping relationship is shown in fig. 1.
Disclosure of Invention
In view of the above, the present invention provides a VR-based method for expanding human body key point data.
In order to achieve the purpose, the invention provides the following technical scheme:
a human body key point data expansion method based on virtual reality VR comprises the following steps:
s1: a user wears a virtual reality head mounted display (VR) HMD;
s2: synchronizing time service of a VR HMD and camera connection device, and setting sensor data acquisition frequency of the VR HMD to be consistent with shooting frequency of a camera;
s3: establishing a camera aperture center as an original point OcWith the coordinate axis Xc、YcAnd Zc(ii) a Establishing a ground point O with the ground right below the user as an originwWith coordinate axis X in the world coordinate systemw、YwAnd Zw(ii) a Placing a camera at a certain distance right in front of a user, keeping the angle between a camera coordinate system and a world coordinate system consistent, shooting user actions to be identified, and acquiring RGB (red, green and blue) videos of the user actions;
s4: extracting 2D pixel coordinates of key points of each frame and HMD sensor data;
s5: calculating the origin O of the camera relative to the world coordinate systemwThe offset vector T and the rotation matrix R after translation and rotation;
s6: obtaining pixel coordinates (u, v) of the key points of the human body in the RGB picture from the obtained RGB video by using a key point extraction method;
s7: using a camera calibration method based on a calibration object to acquire a camera internal parameter f;
s8: acquiring a world coordinate system (X) of a corresponding point through pixel coordinates (u, v) of four key points of a nose, a neck and left and right shoulders of a user by using a camera imaging principlew,Yw,Zw);
S9: calculating the 3D world coordinate System (X) of the keypointsw,Yw,Zw);
S10: calculating an offset vector T and a rotation matrix R of a required shooting angle and position;
s11: calculating and storing the 2D pixel coordinates (u, v) of the corresponding key point at the new camera position;
s12: judging whether the 2D pixel coordinates (u, v) reach the required data volume; if so, ending the program; if not, the process returns to S10.
Optionally, the user actions include groveling, nodding, and standing up.
Optionally, the calculation formula of the offset vector T is as follows:
the calculation formula of the rotation matrix R is as follows:
R=RxRyRz
wherein, theta,And σ is the angle of rotation about the x, Y and z axes, respectively, Y0As the origin of coordinates O of the cameracRelative world origin of coordinates OwDistance in the Y-axis direction; z0As the origin of coordinates O of the cameracRelative world origin of coordinates OwAnd the distance in the Z-axis direction.
Optionally, the world coordinate system (X)w,Yw,Zw) The calculation formula of (a) is as follows:
wherein Z isvrObtaining depth information for VR sensor during user movement, (u)0,v0) Half the resolution of the image.
Optionally, the calculation formula of the 2D pixel coordinates (u, v) in S11 is as follows:
the invention has the beneficial effects that: through VR HMD and ordinary RGB camera, human key point training data under a large amount of different shooting angles, positions has been obtained. The extended 2D human body key point data is combined with VR sensor data to train an action recognition model, so that the precision and the stability of the model are greatly improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a mapping relationship diagram;
FIG. 2 is a schematic view of a photograph;
FIG. 3 is a schematic view of a camera after rotation;
FIG. 4 is a flow chart of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
Synchronizing time service of the VR HMD and the camera connecting device, and setting sensor data acquisition frequency of the VR HMD to be consistent with shooting frequency of the camera.
The camera is put at a reasonable position. For the subsequent data enhancement processing, it is necessary to ensure that the center of the camera aperture is the origin OcRelative to the ground right below the user as the origin OwThe offset vector T and rotation matrix R data of the world coordinate system of (3) are known. Usually placed at a distance right in front, the two coordinate systems are kept angularly identical. Shooting user actions needing to be identified, such as lying down, nodding and standing up.
The shooting schematic is shown in fig. 2.
The offset vector formula is as follows:
the rotation matrix R is formulated as follows, theta,σ is the angle of rotation around the x, y, z axes, respectively: y is0As the origin of coordinates O of the cameracRelative world origin of coordinates OwDistance in the Y-axis direction;
R=RxRyRz
and acquiring pixel coordinates (u, v) of the key points of the human body in the RGB picture from the shot RGB video by using a key point extraction method.
And acquiring the camera internal parameter f by using a camera calibration method based on a calibration object.
Acquiring a world coordinate system (X) of a corresponding point through pixel coordinates (u, v) of four key points of a nose, a neck and left and right shoulders by using a camera imaging principlew,Yw,Zw)。
The formula is as follows, ZvrObtaining depth information for VR sensor during user movement, (u)0,v0) Half the resolution of the image.
Calculating the origin O of the camera relative to the world coordinate systemwA translated, rotated offset vector T and a rotation matrix R. 3D world coordinate System (X) of keypointsw,Yw,Zw) The new 2D pixel coordinates (u, v) corresponding to the key points can be obtained through formula calculation without changing, and the formula is as follows:
Z0as the origin of coordinates O of the cameracRelative world origin of coordinates OwDistance in Z-axis direction, Z of the dotted line camera0、Y0Representative camera OcOnly relative to OwAnd rotating, wherein no displacement exists in the X-axis direction and the Z-axis direction, and the displacement corresponds to the last formula (acquiring new 2D pixel coordinates corresponding to key points in a shot picture after the camera rotates through the 3D world coordinates of the key points).
The shooting schematic diagram after the camera is rotated is shown in fig. 3. The dark lines represent the 2D key point data extracted from the original (front) shot video, and the light lines represent the 2D key point data extracted from the RGB picture taken after the camera has rotated (this data is obtained directly from the above formula).
Repeatedly acquiring the origin O of the camera relative to the world coordinate system when shooting a certain framewAfter arbitrary translation and rotation, the extension of the data of the human body key point is realized by the 2D key point pixel coordinate data of the shoulder and above of the user at a new shooting position and angle. The scheme can acquire the 2D pixel coordinates of four key points at any shooting angle and distance.
The overall flow chart of the invention is shown in fig. 4.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种手语互译方法及系统