Three-dimensional face recognition generation method and related device
1. A method for generating three-dimensional face recognition is characterized by comprising the following steps:
acquiring an RGB image and a three-dimensional depth image of a target face;
calculating a rotation and translation matrix according to the internal and external parameters of the RGB image and the three-dimensional depth image, wherein the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation and translation matrix;
decomposing the color channel of the RGB image into R, G, B independent channels;
decomposing a coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels;
and generating a three-dimensional image of the target face through the target model.
2. The generation method of claim 1, wherein prior to the six data input channels with the R, G, B three channels and the x, y, z three channels as target models, the method further comprises:
acquiring a group of training samples, wherein the training samples comprise RGB images of at least two human faces;
inputting the training sample into an initial model, wherein the initial model is a model established based on a deep neural network;
calculating a loss value of the training sample through the initial model;
judging whether the loss value is smaller than a preset value or not, and if so, determining the initial model as a target model;
if not, adjusting the parameters of the initial model according to the loss value, and inputting the training sample into the initial model with updated parameters again.
3. The method of generating as claimed in claim 2, wherein said calculating a loss value of said training sample by said initial model comprises:
extracting a feature vector of the training sample through the initial model;
and calculating a loss value according to the feature vector.
4. The generation method according to claim 1, wherein after the obtaining of the positional relationship between the RGB image and the three-dimensional depth image by rotation and/or translation according to the rotation-translation matrix, the method further comprises:
aligning the rotated and/or translated RGB image and the three-dimensional depth image to the same size.
5. The generation method according to claim 4, wherein before the inputting of the RGB image and the facial image information of the three-dimensional depth image into the target model through the six data input channels, the method further comprises:
the input sizes of the six data input channels are adjusted to be the same as the size of the RGB image and the three-dimensional depth image.
6. The generation method according to any one of claims 1 to 5, wherein the taking the R, G, B three channels and the x, y, z three channels as six data input channels of a target model comprises:
and taking the R, G, B three channels and the x, y and z three channels as six data input channels of the target model through a concat function, wherein the concat function is used for connecting a plurality of channels as the input of the next network layer.
7. The generation method according to any one of claims 1 to 5, wherein the generating a three-dimensional image of a target face by the target model includes:
extracting a characteristic vector of the face image information through the target model;
calculating a total loss value of the target model according to the feature vector;
judging whether the total loss value is smaller than a preset value or not, if so, generating a target face three-dimensional image through the target model;
if not, adjusting the parameters of the target model according to the total loss value, and inputting the face image information into the updated target model again.
8. The generation method according to any one of claims 1 to 5, wherein the acquiring the RGB image and the three-dimensional depth image of the target face comprises:
acquiring an RGB image of a target face through an RGB camera;
and acquiring a three-dimensional depth image of the target face through a three-dimensional depth camera.
9. A three-dimensional face recognition generation apparatus, comprising:
the first acquisition unit is used for acquiring an RGB image and a three-dimensional depth image of a target face;
the first calculation unit is used for calculating a rotation and translation matrix according to internal and external parameters of the RGB image and the three-dimensional depth image, and the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
the rotation translation unit is used for obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation translation matrix;
a first decomposition unit for decomposing the color channel of the RGB image into R, G, B three independent channels;
the second decomposition unit is used for decomposing the coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
the synthesis unit is used for taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
the first input unit is used for inputting the RGB images and the face image information of the three-dimensional depth image into the target model through the six data input channels;
and the generating unit is used for generating a three-dimensional image of the target face through the target model.
10. The generation apparatus according to claim 9, characterized in that before the synthesis unit, the apparatus further comprises:
the second acquisition unit is used for acquiring a group of training samples, and the training samples comprise RGB images of at least two human faces;
the second input unit is used for inputting the training samples into an initial model, and the initial model is a model established based on a deep neural network;
a second calculation unit, configured to calculate a loss value of the training sample through the initial model;
the judging unit is used for judging whether the loss value is smaller than a preset value or not;
a determining unit, configured to determine the initial model as a target model after the determining unit determines that the loss value is smaller than a preset value;
and the adjusting unit is used for adjusting the parameters of the initial model according to the loss value after the judging unit judges that the loss value is greater than or equal to the preset value, and inputting the training sample into the initial model with updated parameters again.
Background
With the rapid progress of image processing and pattern recognition technology and the convenience of computer vision, the application of face recognition technology in modern life is more and more, and with the improvement of the technical requirement on the living body detection of face recognition, the application of a three-dimensional camera in an actual face recognition product is more and more extensive.
However, at present, the methods of combining the RGB camera and the three-dimensional depth camera to perform three-dimensional face recognition are few and few, and the low face recognition rate is also a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a three-dimensional face recognition generation method and a related device, and the face recognition rate is greatly improved.
A first aspect of the embodiments of the present application provides a method for generating three-dimensional face recognition, including:
acquiring an RGB image and a three-dimensional depth image of a target face;
calculating a rotation and translation matrix according to the internal and external parameters of the RGB image and the three-dimensional depth image, wherein the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation and translation matrix;
decomposing the color channel of the RGB image into R, G, B independent channels;
decomposing a coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels;
and generating a three-dimensional image of the target face through the target model.
Optionally, before the three R, G, B channels and the three x, y, and z channels are used as six data input channels of a target model, the method further includes:
acquiring a group of training samples, wherein the training samples comprise RGB images of at least two human faces;
inputting the training sample into an initial model, wherein the initial model is a model established based on a deep neural network;
calculating a loss value of the training sample through the initial model;
judging whether the loss value is smaller than a preset value or not, and if so, determining the initial model as a target model;
if not, adjusting the parameters of the initial model according to the loss value, and inputting the training sample into the initial model with updated parameters again.
Optionally, the calculating, by the initial model, a loss value of the training sample includes:
extracting a feature vector of the training sample through the initial model;
and calculating a loss value according to the feature vector.
Optionally, after the obtaining the positional relationship between the RGB image and the three-dimensional depth image by rotation and/or translation according to the rotation and translation matrix, the method further includes:
aligning the rotated and/or translated RGB image and the three-dimensional depth image to the same size.
Optionally, before the inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels, the method further includes:
the input sizes of the six data input channels are adjusted to be the same as the size of the RGB image and the three-dimensional depth image.
Optionally, the taking the R, G, B three channels and the x, y, and z three channels as six data input channels of a target model includes:
and taking the R, G, B three channels and the x, y and z three channels as six data input channels of the target model through a concat function, wherein the concat function is used for connecting a plurality of channels as the input of the next network layer.
Optionally, the generating a three-dimensional image of a target face through the target model includes:
extracting a characteristic vector of the face image information through the target model;
calculating a total loss value of the target model according to the feature vector;
judging whether the total loss value is smaller than a preset value or not, if so, generating a target face three-dimensional image through the target model;
if not, adjusting the parameters of the target model according to the total loss value, and inputting the face image information into the updated target model again.
Optionally, the acquiring the RGB image and the three-dimensional depth image of the target face includes:
acquiring an RGB image of a target face through an RGB camera;
and acquiring a three-dimensional depth image of the target face through a three-dimensional depth camera.
A second aspect of the embodiments of the present application provides a device for generating three-dimensional face recognition, including:
the first acquisition unit is used for acquiring an RGB image and a three-dimensional depth image of a target face;
the first calculation unit is used for calculating a rotation and translation matrix according to internal and external parameters of the RGB image and the three-dimensional depth image, and the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
the rotation translation unit is used for obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation translation matrix;
a first decomposition unit for decomposing the color channel of the RGB image into R, G, B three independent channels;
the second decomposition unit is used for decomposing the coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
the synthesis unit is used for taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
the first input unit is used for inputting the RGB images and the face image information of the three-dimensional depth image into the target model through the six data input channels;
and the generating unit is used for generating a three-dimensional image of the target face through the target model.
Optionally, before the synthesizing unit, the apparatus further includes:
the second acquisition unit is used for acquiring a group of training samples, and the training samples comprise RGB images of at least two human faces;
the second input unit is used for inputting the training samples into an initial model, and the initial model is a model established based on a deep neural network;
a second calculation unit, configured to calculate a loss value of the training sample through the initial model;
the judging unit is used for judging whether the loss value is smaller than a preset value or not;
a determining unit, configured to determine the initial model as a target model after the determining unit determines that the loss value is smaller than a preset value;
and the adjusting unit is used for adjusting the parameters of the initial model according to the loss value after the judging unit judges that the loss value is greater than or equal to the preset value, and inputting the training sample into the initial model with updated parameters again.
A third aspect of the embodiments of the present application provides a device for generating three-dimensional face recognition, including:
the device comprises a processor, a memory, an input and output unit and a bus;
the processor is connected with the memory, the input and output unit and the bus;
the processor performs the following operations:
acquiring an RGB image and a three-dimensional depth image of a target face;
calculating a rotation and translation matrix according to the internal and external parameters of the RGB image and the three-dimensional depth image, wherein the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation and translation matrix;
decomposing the color channel of the RGB image into R, G, B independent channels;
decomposing a coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels;
and generating a three-dimensional image of the target face through the target model.
An embodiment of the present application provides a computer-readable storage medium, where a program is stored on the computer-readable storage medium, and when the program is executed on a computer, the method for generating three-dimensional face recognition according to any one of the first aspect is executed.
According to the technical scheme, the embodiment of the application has the following advantages:
the method comprises the steps of firstly obtaining RGB images and three-dimensional depth images of a target face, inputting the RGB images and the three-dimensional depth images into a target model obtained by training a plurality of RGB image training samples based on a depth neural network model, carrying out feature extraction and loss value calculation through the target model, and finally generating the three-dimensional image of the target face. The method greatly improves the face recognition rate.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of a method for generating three-dimensional face recognition in the embodiment of the present application;
fig. 2-1 is a schematic flow chart of another embodiment of a method for generating three-dimensional face recognition in the embodiment of the present application;
fig. 2-2 is a schematic flow chart of another embodiment of a method for generating three-dimensional face recognition in the embodiment of the present application;
fig. 3 is a schematic flow chart of an embodiment of a device for generating three-dimensional face recognition in the embodiment of the present application;
fig. 4 is a schematic flow chart of another embodiment of a three-dimensional face recognition generation apparatus in an embodiment of the present application;
fig. 5 is a schematic flow chart of another embodiment of a three-dimensional face recognition generation apparatus in the embodiment of the present application.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of protection of the present application.
The embodiment of the application provides a three-dimensional face recognition generation method and a related device, and the face recognition rate is greatly improved.
Referring to fig. 1, an embodiment of a method for generating three-dimensional face recognition in the embodiment of the present application includes:
101. acquiring an RGB image and a three-dimensional depth image of a target face;
it should be noted that in the embodiment of the present application, the RGB image is an array of MxNx3 color pixels, where each color pixel is a set of three values, and the three values respectively correspond to red, green and blue components of the RGB image at a specific spatial position. RGB can also be viewed as a "stack" of three gray scale images that when fed to the red, green and blue inputs of a color display produce a color image on the screen, the three images that form an RGB color image are conventionally referred to as red, green and blue component images. A picture, if it has (or appears to have) height, width and depth, may be referred to as a three-dimensional (or 3D) image, or a three-dimensional depth image.
In the embodiment of the application, to improve the face recognition rate, the RGB image and the three-dimensional depth image of the target face need to be simultaneously obtained to perform the correlation calculation, so as to obtain the final RGB-D image, i.e., the three-dimensional image of the target face.
102. Calculating a rotation and translation matrix according to the internal and external parameters of the RGB image and the three-dimensional depth image, wherein the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
it should be noted that, in the embodiment of the present application, after the RGB image and the three-dimensional depth image are obtained, a mapping relationship mapped from the RGB image to the three-dimensional depth image, or a mapping relationship mapped from the three-dimensional depth image to the RGB image needs to be obtained, specifically, a rotational translation matrix is calculated according to internal and external parameters of the RGB image and the three-dimensional depth image, and the mapping relationship is obtained through the rotational translation matrix, where the rotational translation matrix may reflect the mapping relationship between the RGB image and the three-dimensional depth image.
103. Obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation and translation matrix;
it should be noted that, in the embodiment of the present application, the mapping relationship mapped by the rotation and translation matrix is correspondingly rotated and/or translated to adjust the position relationship between the RGB image and the three-dimensional depth image, which is to better eliminate the position deviation between the RGB image and the three-dimensional depth image.
104. Decomposing the color channel of the RGB image into R, G, B independent channels;
105. decomposing a coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
in this embodiment of the present application, in order to input image information of an acquired RGB image and an acquired three-dimensional depth image into a target model for performing relevant training, an RGB color channel of the acquired RGB image needs to be decomposed into R, G, B three independent channels, and face image information of the acquired three-dimensional depth image needs to be decomposed into x, y, and z three independent channels, where x, y, and z represent three-dimensional coordinates in an image, and in practice, the three channels represent information such as dimensions, that is, three dimensions of length, width, and height are respectively corresponded to.
106. Taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
it should be noted that, in the embodiment of the present application, R, G, B color channels and x, y, and z depth map coordinate channels are connected by a Concat function, and these channels are used as inputs of the next network layer, that is, as data input channels of a target model that has been trained in advance.
107. Inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels;
in the embodiment of the application, six-dimensional information is input through the data input channel and is trained in the target model.
It should be noted that, generally, in the field of numerical analysis, 1 to 5 hidden layers can solve most problems. When processing image or voice data, the network structure is complicated, and hundreds of neural layers are required. And the training also needs a great amount of calculation power. Therefore, some pre-trained models, such as YOLO, ResNet, VGG, etc., can be used to extract the major parts of these network layers, put them into their own network, and train the model based on them. In this case, the model still only needs to train several layers that are added later on by itself.
In the embodiment of the application, a neural network and data are designed, a network layer may use ResNet5, a VGG convolutional neural network, or other network structures, and the specific application is not limited.
108. And generating a three-dimensional image of the target face through the target model.
It should be noted that, in the embodiment of the present application, training is performed through a target model, a loss value is calculated, model adjustment is performed according to the loss value, and finally, a three-dimensional image of a target face is generated.
In the embodiment of the application, a three-dimensional face recognition generation method is designed, firstly, an RGB image and a three-dimensional depth image of a target face are obtained, the RGB image and the three-dimensional depth image are input into a target model obtained according to training of a plurality of RGB image training samples based on a depth neural network model, feature extraction and loss value calculation are carried out through the target model, and finally, the three-dimensional image of the target face is generated. The method greatly improves the face recognition rate.
The above general description of the three-dimensional face recognition generation method is described below in detail.
Referring to fig. 2, another embodiment of the method for generating three-dimensional face recognition in the embodiment of the present application includes:
201. acquiring an RGB image of a target face through an RGB camera;
202. acquiring a three-dimensional depth image of the target face through a three-dimensional depth camera;
it should be noted that, in the embodiment of the present application, an RGB camera and a three-dimensional depth camera are used to obtain an RGB image and a three-dimensional image of a human face of the same person.
203. Calculating a rotation and translation matrix according to the internal and external parameters of the RGB image and the three-dimensional depth image, wherein the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
204. obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation and translation matrix;
in the embodiment of the present application, steps 203 to 204 are similar to steps 102 to 103 described above, and are not described herein again.
205. Aligning the rotated and/or translated RGB image and the three-dimensional depth image to the same size;
it should be noted that, in the embodiment of the present application, in order to better fuse the RGB image and the three-dimensional depth image, the acquired RGB image and the three-dimensional depth image need to be aligned to the same size, for example, a size of 112 × 112 may be used, and the specific size is not limited, as long as the sizes are consistent.
206. Decomposing the color channel of the RGB image into R, G, B independent channels;
207. decomposing a coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
in the embodiment of the present application, steps 206 to 207 are similar to steps 104 to 105, and are not described herein again.
208. Acquiring a group of training samples, wherein the training samples comprise RGB images of at least two human faces;
it should be noted that it is generally difficult to obtain RGB images and depth images of that many people, and it is generally difficult to train an initial model with RGB information first, and then fine-tune with RGB-D information to obtain a better recognition model, so that the present application trains a visible light face recognition model input as RGB with open-source ms1m face recognition data first, and then performs repeated training in the manner described in the present application to obtain a final target model.
In the embodiment of the application, a group of RGB images containing a plurality of faces is obtained as a training sample for training an initial model.
209. Inputting the training sample into an initial model, wherein the initial model is a model established based on a deep neural network;
it should be noted that, in the embodiment of the present application, after the training samples are obtained, each training object (RGB image) is sequentially input into an initial model, where the initial model is a model established based on a deep neural network (such as ResNet).
210. Extracting a feature vector of the training sample through the initial model;
211. calculating a loss value according to the feature vector;
in this embodiment of the present application, the feature vector of each training object is extracted through the convolutional neural network of the initial model, then the full connection layer is flattened, and then the full connection layer is sent to the loss function to realize classification, so as to obtain a loss value.
In addition, it should be noted that the characteristic norm extracted from the high-quality picture is large, the characteristic norm extracted from the low-quality picture is small, and when the characteristic norm is propagated backwards, the characteristic of the low-quality picture generates a larger gradient and can also obtain more attention of the network.
212. Judging whether the loss value is smaller than a preset value, if so, executing a step 213; if not, go to step 214.
It should be noted that, in the embodiment of the present application, a preset value is set, whether the network is converged is determined by determining whether the loss value is smaller than the preset value, if the loss value is smaller than the preset value, it is determined that the network is converged, and step 213 is executed; if the loss value is greater than or equal to the predetermined value, it indicates that the network has not converged, and step 214 is executed.
213. Determining the initial model as a target model;
it should be noted that, in the embodiment of the present application, if the loss value is smaller than the preset value, it indicates that the network has converged, the initial model is determined as the target model, and step 215 is performed.
214. Adjusting parameters of the initial model according to the loss value, and inputting the training sample into the initial model with updated parameters again;
it should be noted that, in this embodiment of the application, if the loss value is greater than or equal to the preset value, it indicates that the network does not converge, the parameters of the initial model are adjusted according to the loss value, the training sample is re-input into the initial model after the parameters are updated, the loss value is recalculated, and whether the loss value is smaller than the preset value is determined, until the loss value is smaller than the preset value, this step may be ended, and step 213 is performed.
215. Using the R, G, B three channels and the x, y and z three channels as six data input channels of a target model through a concat function, wherein the concat function is used for connecting a plurality of channels as the input of a next network layer, and the target model is a model obtained by training based on a deep neural network model according to a plurality of RGB image training samples;
it should be noted that, in the embodiment of the present application, R, G, B color channels and x, y, and z depth map coordinate channels are connected by a Concat function, and these channels are used as inputs of the next network layer, that is, as data input channels of a target model that has been trained in advance.
216. Adjusting input sizes of the six data input channels to be the same as sizes of the RGB image and the three-dimensional depth image;
it should be noted that the size of the network data input is consistent with the size of the RGB image and the three-dimensional depth image aligned in step 205, that is, 112 × 112, the data input is 6 channels (RGBxyz), during training, the size of the Batchsize is set according to the actual situation, the size of the Batchsize is inversely related to the complexity of the network, that is, the video memory amount occupied by different networks under the same Batchsize is different. For example, if an 8 card Tesla V10032G is used, the actual Batchsize is set to 512, i.e., the network input size is 512 x 6 x 112.
217. Inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels;
in the embodiment of the present application, step 217 is similar to step 107 described above, and is not described herein again.
218. Extracting a characteristic vector of the face image information through the target model;
219. calculating a total loss value of the target model according to the feature vector;
in the embodiment of the application, the feature vector of each training object is extracted through the convolutional neural network of the target model, then the full connection layer is flattened, and then the flattening is sent to a loss function to realize classification so as to obtain a loss value.
It should be noted that the characteristic norm of the extracted high-quality picture is large, the characteristic norm of the extracted low-quality picture is small, and when the characteristic norm is propagated backwards, the characteristic of the low-quality picture generates a larger gradient and can also obtain more attention of the network.
220. Judging whether the total loss value is smaller than a preset value;
it should be noted that, in the embodiment of the present application, a preset value is set, whether the network is converged is determined by determining whether the total loss value is smaller than the preset value, if the total loss value is smaller than the preset value, it is determined that the network is converged, and step 221 is executed; if the loss value is greater than or equal to the predetermined value, it indicates that the network is not converged, and step 222 is executed.
It should be noted that, in the embodiment of the present application, a loss function is designed to supervise input data and tags until the network converges, and the loss function may be a loss function such as softmax, cosface, arcface, asofmax, and the like.
Furthermore, it is important to select the learning rate, which needs to be readjusted each time other parameters of the network are adjusted.
To find the optimum learning rate, one can start with a very low value and then multiply it slowly by a constant until a very high value is reached. For example, in the embodiment of the present application, the initial learning rate of the Arcface loss used in training may be set to 0.1, and the learning rate is multiplied by 0.1 every 15w times of training until 50w times of training.
221. And generating a three-dimensional image of the target face through the target model.
It should be noted that, in the embodiment of the present application, if the total loss value is smaller than the preset value, it indicates that the network has converged, and the target face three-dimensional image is directly generated through the target model.
222. And adjusting parameters of the target model according to the total loss value, and inputting the face image information into the updated target model again.
It should be noted that, in this embodiment of the application, if the total loss value is greater than or equal to the preset value, it indicates that the network is not converged, the parameters of the target model are adjusted according to the total loss value, the face image information is re-input into the updated target model, the loss value is re-calculated, and whether the loss value is smaller than the preset value is determined, until the loss value is smaller than the preset value, this step may be ended, and step 221 is performed.
In the embodiment of the application, a three-dimensional face recognition generation method based on an RGBD camera is designed, a face RGB image and a three-dimensional depth image of the same person are obtained through an RGB camera and a three-dimensional depth camera respectively, the face RGB image and the three-dimensional depth image form three-dimensional face information together, a pre-trained target model is put into the three-dimensional face information to be trained, a target face three-dimensional image is generated, the whole training frame is completed based on a depth network, the method is simple and rapid, and the face recognition rate is improved to a great extent.
Referring to fig. 3, an embodiment of a device for generating three-dimensional face recognition in the embodiment of the present application includes:
a first obtaining unit 301, configured to obtain an RGB image of a target face and a three-dimensional depth image;
a first calculating unit 302, configured to calculate a rotation and translation matrix according to internal and external parameters of the RGB image and the three-dimensional depth image, where the rotation and translation matrix may reflect a mapping relationship between the RGB image and the three-dimensional depth image;
a rotation translation unit 303, configured to obtain a position relationship between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation translation matrix;
a first decomposition unit 304 for decomposing the color channel of the RGB image into R, G, B three independent channels;
a second decomposition unit 305, configured to decompose a coordinate channel of the three-dimensional depth image into three independent channels, namely x, y, and z;
a synthesizing unit 306, configured to use the R, G, B three channels and the x, y, and z three channels as six data input channels of a target model, where the target model is a model obtained by training based on a deep neural network model according to a plurality of RGB image training samples;
a first input unit 307, configured to input the RGB image and face image information of the three-dimensional depth image into the target model through the six data input channels;
and the generating unit 308 is used for generating a three-dimensional image of the target face through the target model.
In the embodiment of the application, a three-dimensional face recognition generation method is designed, and includes acquiring an RGB image and a three-dimensional depth image of a target face through a first acquisition unit 301, inputting the RGB image and the three-dimensional depth image into a target model obtained by training a plurality of RGB image training samples based on a depth neural network model through a first input unit 307, performing feature extraction and loss value calculation through the target model, and generating a three-dimensional image of the target face through a generation unit 308. The method greatly improves the face recognition rate.
The functions of the units of the three-dimensional face recognition generating device are described in general, and the functions of the units of the three-dimensional face recognition generating device are described in detail below.
Referring to fig. 4, in the embodiment of the present application, another embodiment of a device for generating three-dimensional face recognition includes:
a first obtaining unit 401, configured to obtain an RGB image of a target face and a three-dimensional depth image;
a first calculating unit 402, configured to calculate a rotation and translation matrix according to internal and external parameters of the RGB image and the three-dimensional depth image, where the rotation and translation matrix may reflect a mapping relationship between the RGB image and the three-dimensional depth image;
a rotation translation unit 403, configured to obtain a position relationship between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation translation matrix;
a first decomposition unit 404 for decomposing the color channel of the RGB image into R, G, B three independent channels;
a second decomposition unit 405, configured to decompose a coordinate channel of the three-dimensional depth image into three independent channels, namely x, y, and z;
a second obtaining unit 406, configured to obtain a group of training samples, where the training samples include RGB images of at least two human faces;
a second input unit 407, configured to input the training sample into an initial model, where the initial model is a model established based on a deep neural network;
a second calculating unit 408, configured to calculate a loss value of the training sample through the initial model;
a judging unit 409, configured to judge whether the loss value is smaller than a preset value;
a determining unit 410, configured to determine the initial model as a target model after the determining unit 409 determines that the loss value is smaller than a preset value;
and an adjusting unit 411, configured to, after the determining unit 409 determines that the loss value is greater than or equal to the preset value, adjust the parameter of the initial model according to the loss value, and input the training sample into the initial model with updated parameters again.
A synthesizing unit 412, configured to use the R, G, B three channels and the x, y, and z three channels as six data input channels of a target model, where the target model is a model obtained by training based on a deep neural network model according to a plurality of RGB image training samples;
a first input unit 413, configured to input the RGB image and face image information of the three-dimensional depth image into the target model through the six data input channels;
and the generating unit 414 is configured to generate a three-dimensional image of the target face through the target model.
In the embodiment of the present application, the functions of each unit module correspond to the steps in the embodiments shown in fig. 1 to fig. 2, and are not described herein again.
Referring to fig. 5, another embodiment of the apparatus for generating three-dimensional face recognition in the embodiment of the present application includes:
a processor 501, a memory 502, an input-output unit 503, and a bus 504;
the processor 501 is connected with the memory 502, the input/output unit 503 and the bus 504;
the processor 501 performs the following operations:
acquiring an RGB image and a three-dimensional depth image of a target face;
calculating a rotation and translation matrix according to the internal and external parameters of the RGB image and the three-dimensional depth image, wherein the rotation and translation matrix can reflect the mapping relation between the RGB image and the three-dimensional depth image;
obtaining the position relation between the RGB image and the three-dimensional depth image through rotation and/or translation according to the rotation and translation matrix;
decomposing the color channel of the RGB image into R, G, B independent channels;
decomposing a coordinate channel of the three-dimensional depth image into three independent channels of x, y and z;
taking the R, G, B three channels and the x, y and z three channels as six data input channels of a target model, wherein the target model is obtained by training a plurality of RGB image training samples based on a deep neural network model;
inputting the RGB image and the face image information of the three-dimensional depth image into the target model through the six data input channels;
and generating a three-dimensional image of the target face through the target model.
In this embodiment, the functions of the processor 501 correspond to the steps in the embodiments shown in fig. 1 to fig. 2, and are not described herein again.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.