Virtual character face pinching method and device, electronic equipment and storage medium
1. A method for a virtual character to pinch a face, the method comprising:
responding to an acquisition request of a user, and acquiring a user face image of the user;
extracting skeleton point parameters of the user face image by using a trained skeleton point generating network to obtain skeleton point parameters corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
and rendering the human face based on the skeleton point parameters to obtain a virtual human face image corresponding to the human face image of the user.
2. The method of claim 1, wherein the rendering a face based on the skeletal point parameters to obtain a virtual face image corresponding to the user face image comprises:
inputting the skeleton point parameters into a preset rendering engine so that the preset rendering engine can reconstruct human face information based on the skeleton point parameters;
and receiving face information reconstructed by the preset rendering engine, and determining a virtual face image corresponding to the user face image based on the face information.
3. The method of claim 1 or 2, wherein the skeletal point generation network comprises a feature extraction layer and a fully connected layer; the method for extracting the bone point parameters of the user face image by using the trained bone point generating network to obtain the bone point parameters corresponding to the user face image comprises the following steps:
inputting the user face image into a feature extraction layer included in the skeleton point generation network to obtain image features output by the feature extraction layer;
inputting the image features output by the feature extraction layer into a full connection layer included by the skeleton point generation network to obtain skeleton point parameters output by the full connection layer;
and determining the bone point parameters corresponding to the face image of the user based on the bone point parameters output by the full-connection layer.
4. The method of claim 3, wherein the skeletal point generation network is trained by:
acquiring a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
respectively extracting the features of the user face image sample and the virtual face image sample by using a feature extraction layer included in the skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample characteristics by using a full-connection layer included in the skeleton point generating network, and determining predicted skeleton point parameters corresponding to the virtual face image samples;
determining a first loss function value based on the first image sample characteristic and the second image sample characteristic, and determining a second loss function value based on the preset bone point parameter and the predicted bone point parameter;
and adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network.
5. The method of claim 4, further comprising a domain discrimination network, said determining a first loss function value based on the first image sample characteristic and the second image sample characteristic comprising:
inputting the first image sample characteristic and the second image sample characteristic into the domain discrimination network to obtain a loss function value output by the domain discrimination network;
determining the first loss function value based on the loss function value output by the domain discrimination network.
6. The method of claim 5, wherein said adjusting the bone point generation network based on the first loss function value and the second loss function value, resulting in a trained bone point generation network, comprises:
judging whether the loss function sum value corresponding to the first loss function value and the second loss function value is smaller than a preset threshold value or not;
if not, adjusting any one or more networks in the skeleton point generating network and the domain distinguishing network, and determining a first loss function value and a second loss function value after adjustment based on the adjusted networks;
and obtaining a trained network until the loss function sum value corresponding to the adjusted first loss function value and the second loss function value is smaller than a preset threshold value.
7. The method of claim 5 or 6, wherein the inputting the first image sample characteristic and the second image sample characteristic into the domain discrimination network to obtain a loss function value output by the domain discrimination network comprises:
inputting the first image sample characteristics into the domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first annotation category indicated by the user face image sample; and the number of the first and second groups,
inputting the second image sample characteristics into the domain discrimination network to obtain a second image category output by the domain discrimination network, and determining a second comparison result between the second image category output by the domain discrimination network and a second annotation category indicated by the virtual face image sample;
determining a loss function value of the domain discrimination network output based on the first comparison result and the second comparison result.
8. The method of any of claims 5-7, further comprising a gradient inversion layer coupled to the domain discrimination network; the method further comprises the following steps:
reversing the gradient value corresponding to the loss function value output by the domain discrimination network by using the gradient reversing layer to obtain a reversed gradient value;
and adjusting the skeleton point generation network according to the reversed gradient value.
9. The method according to any of claims 1-8, wherein the virtual face image sample is obtained by:
responding to the input request of the user, and acquiring preset skeleton point parameters input by the user;
and performing face rendering on the preset skeleton point parameters by using a preset rendering engine to obtain a virtual face image sample.
10. An apparatus for a virtual character to pinch a face, the apparatus comprising:
the acquisition module is used for responding to an acquisition request of a user and acquiring a user face image of the user;
the generating module is used for extracting the bone point parameters of the user face image by utilizing the trained bone point generating network to obtain the bone point parameters corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
and the rendering module is used for rendering the human face based on the skeleton point parameters to obtain a virtual human face image corresponding to the human face image of the user.
11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of virtual character face-pinching according to any one of claims 1 to 9.
12. A computer-readable storage medium, having stored thereon a computer program for performing, when executed by a processor, the steps of the method for virtual character face-pinching according to any one of claims 1 to 9.
Background
With the development of computer technology and mobile terminals, more and more role-playing games are emerging. In order to meet the personalized requirements of different players, when creating a virtual character of a player, a face-pinching function is usually provided for the player, so that the player can create a favorite character according to the preference of the player, for example, a virtual character similar to a real face of the player can be created.
In the related art, a face pinching scheme is provided, which may train a specific skeleton point parameter regressor for each user, and in a case that the skeleton point parameter regressor may output a corresponding skeleton point parameter, a game character may be rendered by using the skeleton point parameter. Because the face pinching scheme adopts an iterative regression mode to train the bone point parameter regressor, the convergence of the training process is slow, and different bone point parameter regressors need to be trained for different users, thus wasting time and labor.
Disclosure of Invention
The embodiment of the disclosure at least provides a method and a device for pinching faces by virtual characters, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for a virtual character to pinch a face, where the method includes:
responding to an acquisition request of a user, and acquiring a user face image of the user;
extracting skeleton point parameters of the user face image by using a trained skeleton point generating network to obtain skeleton point parameters corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
and rendering the human face based on the skeleton point parameters to obtain a virtual human face image corresponding to the human face image of the user.
In a possible implementation manner, the performing face rendering based on the skeletal point parameter to obtain a virtual face image corresponding to the user face image includes:
inputting the skeleton point parameters into a preset rendering engine so that the preset rendering engine can reconstruct human face information based on the skeleton point parameters;
and receiving face information reconstructed by the preset rendering engine, and determining a virtual face image corresponding to the user face image based on the face information.
In one possible embodiment, the skeletal point generation network comprises a feature extraction layer and a fully connected layer; the method for extracting the bone point parameters of the user face image by using the trained bone point generating network to obtain the bone point parameters corresponding to the user face image comprises the following steps:
inputting the user face image into a feature extraction layer included in the skeleton point generation network to obtain image features output by the feature extraction layer;
inputting the image features output by the feature extraction layer into a full connection layer included by the skeleton point generation network to obtain skeleton point parameters output by the full connection layer;
and determining the bone point parameters corresponding to the face image of the user based on the bone point parameters output by the full-connection layer.
In one possible embodiment, the skeletal point generation network is trained as follows:
acquiring a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
respectively extracting the features of the user face image sample and the virtual face image sample by using a feature extraction layer included in the skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample characteristics by using a full-connection layer included in the skeleton point generating network, and determining predicted skeleton point parameters corresponding to the virtual face image samples;
determining a first loss function value based on the first image sample characteristic and the second image sample characteristic, and determining a second loss function value based on the preset bone point parameter and the predicted bone point parameter;
and adjusting the skeleton point generation network based on the first loss function value and the second loss function value to obtain a trained skeleton point generation network.
In a possible embodiment, the method further comprises determining a first loss function value based on the first image sample characteristic and the second image sample characteristic, and the determining comprises:
inputting the first image sample characteristic and the second image sample characteristic into the domain discrimination network to obtain a loss function value output by the domain discrimination network;
determining the first loss function value based on the loss function value output by the domain discrimination network.
In a possible embodiment, the adjusting the bone point generating network based on the first loss function value and the second loss function value to obtain a trained bone point generating network includes:
judging whether the loss function sum value corresponding to the first loss function value and the second loss function value is smaller than a preset threshold value or not;
if not, adjusting any one or more networks in the skeleton point generating network and the domain distinguishing network, and determining a first loss function value and a second loss function value after adjustment based on the adjusted networks;
and obtaining a trained network until the loss function sum value corresponding to the adjusted first loss function value and the second loss function value is smaller than a preset threshold value.
In a possible implementation, the inputting the first image sample feature and the second image sample feature into the domain discriminant network to obtain a loss function value output by the domain discriminant network includes:
inputting the first image sample characteristics into the domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first annotation category indicated by the user face image sample; and the number of the first and second groups,
inputting the second image sample characteristics into the domain discrimination network to obtain a second image category output by the domain discrimination network, and determining a second comparison result between the second image category output by the domain discrimination network and a second annotation category indicated by the virtual face image sample;
determining a loss function value of the domain discrimination network output based on the first comparison result and the second comparison result.
In a possible implementation, the system further comprises a gradient inversion layer connected with the domain discrimination network; the method further comprises the following steps:
reversing the gradient value corresponding to the loss function value output by the domain discrimination network by using the gradient reversing layer to obtain a reversed gradient value;
and adjusting the skeleton point generation network according to the reversed gradient value.
In one possible implementation, the virtual face image sample is obtained according to the following steps:
responding to the input request of the user, and acquiring preset skeleton point parameters input by the user;
and performing face rendering on the preset skeleton point parameters by using a preset rendering engine to obtain a virtual face image sample.
In a second aspect, an embodiment of the present disclosure further provides an apparatus for pinching a face by a virtual character, where the apparatus includes:
the acquisition module is used for responding to an acquisition request of a user and acquiring a user face image of the user;
the generating module is used for extracting the bone point parameters of the user face image by utilizing the trained bone point generating network to obtain the bone point parameters corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
and the rendering module is used for rendering the human face based on the skeleton point parameters to obtain a virtual human face image corresponding to the human face image of the user.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the method of virtual character face-pinching according to the first aspect and any of its various embodiments.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for virtual character face-pinching according to the first aspect and any of the various embodiments thereof.
According to the scheme for pinching the face by the virtual character provided by the embodiment of the disclosure, under the condition that the face image of the user is obtained, the trained bone point generation network can be used for extracting the bone point parameters of the face image of the user so as to obtain the bone point parameters corresponding to the face image of the user. The skeleton point generating network is used as a general network and can be obtained based on user face image samples and virtual face image samples, and therefore any real user face can extract corresponding skeleton point parameters through the skeleton point generating network, and time and labor are saved.
Meanwhile, the skeleton point generating network can learn the characteristics of the user face and the virtual face at the same time, preset skeleton point parameters can also provide data support for the skeleton point parameters corresponding to the user face image, and then the virtual face image similar to the user face can be quickly rendered, and the face pinching efficiency and the similarity are high.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 illustrates a flowchart of a method for pinching a face by a virtual character according to an embodiment of the present disclosure;
fig. 2 is a schematic application diagram illustrating a method for pinching a face by a virtual character according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating an application of another method for pinching faces by a virtual character according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating an apparatus for pinching a face of a virtual character according to an embodiment of the present disclosure;
fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The term "and/or" herein merely describes an associative relationship, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Research shows that a face pinching scheme provided in the related art can train a specific skeletal point parameter regressor for each user, and can render a game character by using the skeletal point parameters under the condition that the skeletal point parameter regressor can output corresponding skeletal point parameters. Because the face pinching scheme adopts an iterative regression mode to train the bone point parameter regressor, the convergence of the training process is slow, and different bone point parameter regressors need to be trained for different users, thus wasting time and labor.
Based on the research, the present disclosure provides a method and an apparatus for a virtual character to pinch a face, an electronic device, and a storage medium.
To facilitate understanding of the present embodiment, first, a method for pinching a face by a virtual character disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the method for pinching a face by a virtual character provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the method of pinching the face by the virtual character may be implemented by a processor invoking computer readable instructions stored in a memory.
Referring to fig. 1, which is a flowchart of a method for pinching a face by a virtual character according to an embodiment of the present disclosure, the method includes steps S101 to S103, where:
s101: responding to an acquisition request of a user, and acquiring a user face image of the user;
s102: extracting the bone point parameters of the user face image by using the trained bone point generating network to obtain the bone point parameters corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
s103: and rendering the human face based on the skeleton point parameters to obtain a virtual human face image corresponding to the human face image of the user.
Here, in order to facilitate understanding of the method for pinching the face by the virtual character provided by the embodiment of the present disclosure, first, an application scenario of the method is explained in detail. The virtual character face-pinching method provided by the embodiment of the disclosure can be mainly applied to the technical field of games or other related technical fields with face-pinching requirements. Taking the technical field of games as an example, in the case that a game user intends to use a virtual character to perform game physical examination, it is often desirable that the virtual character used is more similar to a real person so as to further improve the game immersion.
In order to meet the personalized requirements of users, the related art provides a corresponding face-pinching scheme, which can train a specific skeletal point parameter regressor for each user, and render game characters by using skeletal point parameters output by the regressor. However, the above face-pinching scheme adopts an iterative regression method to train the skeletal point parameter regressor, the training process is slow to converge, and different skeletal point parameter regressors need to be trained for different users, which is time-consuming and labor-consuming.
In order to solve the above problems, the embodiments of the present disclosure provide a scheme for implementing face-pinching by combining a user face image sample and a skeleton point generation network obtained by training a virtual face image sample generated by using preset skeleton point parameters, where the training of the skeleton point generation network is completed to satisfy a parameter matching degree between an output target skeleton point parameter and the preset skeleton point parameter, and a feature matching degree between the virtual face image sample and a corresponding user face image sample to satisfy a target condition, so that a virtual face image finally rendered for a user face image of a user may be more similar to the user face image, and the skeleton point generation network is used as a general network, which may be fast converged, and the overall face-pinching efficiency is also high.
The face image of the user may be a face image of a user obtained after authorization of the user, where the face image may be a face image captured by a user terminal held by the user at present, or a face image pre-stored at the user terminal.
The skeleton point generating network can be obtained based on a user face image sample and a virtual face image sample, namely, the generating network can simultaneously learn characteristics of a real user face and characteristics of a virtual face, so that the similarity between the real face (corresponding to the user face) and the virtual face obtained by pinching the face finally is restricted through the consistency of the characteristics.
In the embodiment of the disclosure, the virtual face image corresponding to the face image of the user can be obtained based on the bone point parameter corresponding to the face image of the user. The method can be specifically determined by the following steps:
firstly, inputting the skeleton point parameters into a preset rendering engine so that the preset rendering engine can reconstruct face information based on the skeleton point parameters;
and step two, receiving face information reconstructed by a preset rendering engine, and determining a virtual face image corresponding to the face image of the user based on the face information.
Here, the face rendering may be implemented using a preset rendering engine. In the field of game technology, the preset rendering engine may be a game engine, and the preset rendering engine may be used to reconstruct face information from the input bone point parameters. The face information may correspond to a three-dimensional face model, which may be determined based on relevant parameters of each bone point, and in the case of determining bone point parameters corresponding to the user face image, the three-dimensional face model may be determined based on corresponding parameter values, and then a rendered virtual face image may be determined through a conversion relationship of a three-dimensional space value and a two-dimensional space.
The skeleton point generation network in the embodiment of the present disclosure may include a feature extraction layer for performing feature extraction and a full connection layer for generating skeleton point parameters, which may be specifically implemented by the following steps:
inputting a user face image into a feature extraction layer included in a skeleton point generation network to obtain image features output by the feature extraction layer;
inputting the image features output by the feature extraction layer into a full-connection layer included by a skeleton point generation network to obtain skeleton point parameters output by the full-connection layer;
and step three, determining the bone point parameters corresponding to the face image of the user based on the bone point parameters output by the full connection layer.
Here, the user face image may be input to the feature extraction layer to extract an image feature, and when the image feature is input to the full connection layer, a bone point parameter corresponding to the user face image may be determined.
It should be noted that the feature extraction layer in the embodiment of the present disclosure may be composed of four residual convolution modules. The fully-connected layer may be one layer or multiple layers, where different numbers of fully-connected layers may be provided in combination with different training purposes, for example, three fully-connected layers may be provided.
In order to facilitate understanding of the above-mentioned determination process of the bone point parameters, it can be further explained in conjunction with fig. 2.
As shown in fig. 2, for an acquired user face image (i.e., a real face image), in the case of a feature extraction layer included in a skeleton point generation network, an image feature output by the feature extraction layer may be determined, and the image feature is input to three fully-connected layers connected in sequence, so that a skeleton point parameter corresponding to the user face image may be obtained. It can be seen that the fully connected layer maps the corresponding relationship between the image features and the bone point parameters.
Considering that the training process of the skeleton point generation network is a key step for realizing the face pinching of the virtual character, the following can specifically describe the training process.
The virtual character face pinching method provided by the embodiment of the disclosure can train skeletal points to generate a network through the following steps:
the method comprises the steps of firstly, obtaining a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
secondly, feature extraction is carried out on the user face image sample and the virtual face image sample respectively by utilizing a feature extraction layer included in the skeleton point generation network, and a first image sample feature and a second image sample feature are obtained; extracting skeleton point parameters of the second image sample characteristics by using a full-connection layer included in a skeleton point generating network, and determining predicted skeleton point parameters corresponding to the virtual human face image samples;
determining a first loss function value based on the first image sample characteristic and the second image sample characteristic, and determining a second loss function value based on a preset bone point parameter and a predicted bone point parameter;
and step four, adjusting the skeleton point generating network based on the first loss function value and the second loss function value to obtain the trained skeleton point generating network.
In the embodiment of the present disclosure, the training process and the application process of the bone point generation network are the same, and both the feature extraction layer included in the bone point generation network is required to perform feature extraction and the full-connection layer included in the bone point generation network is required to perform bone point parameter generation.
Different from the determination of the bone point parameters of a single user face image in the application process, the determination of the bone point parameters corresponding to a virtual face can be performed in the training process, which mainly considers that the difficulty of directly determining the bone point parameters of the user face image is high, a game face like a real face is often required to be pinched out by professional art staff by adjusting hundreds of parameters, the time of more than one hour is probably spent for one picture, and the virtual face image can be generated by using the preset bone point parameters input by a user and a preset rendering engine, so that the time and the labor are saved. In addition, the preset skeleton point parameters can be used as reference information of predicted skeleton point parameters output by a full-connection layer, and the skeleton point generation network can be trained through the parameter matching degree.
To approximate a virtual face to a real face, here, the training of the skeletal point generation network may be constrained by feature consistency between the first image features (corresponding to the features of the user's face image samples) and the second image features (corresponding to the features of the virtual face image samples).
The above feature consistency may correspond to a first loss function value, the parameter matching degree may correspond to a second loss function value, and the bone point generating network is adjusted by the two loss function values together to obtain a trained bone point generating network.
In this embodiment of the present disclosure, the determining of the first loss function value may be implemented based on a domain discrimination network, and specifically may be implemented by the following steps:
step one, inputting the first image sample characteristic and the second image sample characteristic into a domain discrimination network to obtain a loss function value output by the domain discrimination network;
and step two, determining a first loss function value based on the loss function value output by the domain discrimination network.
In the embodiment of the present disclosure, the loss function value output by the domain identification network may be determined according to the following steps:
step one, inputting the characteristics of a first image sample into a domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first annotation category indicated by a user face image sample; inputting the characteristics of the second image sample into a domain discrimination network to obtain a second image category output by the domain discrimination network, and determining a second comparison result between the second image category output by the domain discrimination network and a second annotation category indicated by the virtual face image sample;
and step two, determining a loss function value output by the domain discrimination network based on the first comparison result and the second comparison result.
Here, it is assumed that the label of the first annotation class indicated by the user face image sample is 0, and the label of the second annotation class indicated by the virtual face image sample is 1, and the first loss function value is optimized to enable the feature of the real face to be predicted as 0 after passing through the domain discrimination network, and the feature of the virtual face to be predicted as 1 after passing through the domain discrimination network. It can be seen that the loss function value is smaller as the first image type and the first label type are closer to each other and the second image type and the second label type are closer to each other.
The virtual character face pinching method provided by the embodiment of the disclosure can adjust the skeleton point generation network based on the first loss function value and the second loss function value, and specifically includes the following steps:
step one, judging whether a loss function sum value corresponding to a first loss function value and a second loss function value is smaller than a preset threshold value or not;
if not, adjusting any one or more networks in the skeleton point generating network and the domain distinguishing network, and determining a first loss function value and a second loss function value after adjustment based on the adjusted networks;
and step three, obtaining a trained network until the loss function sum value corresponding to the adjusted first loss function value and the second loss function value is smaller than a preset threshold value.
Here, a network adjustment policy may be determined based on the comparison of the loss function value and the value with a preset threshold, for example, only the skeleton point generating network is adjusted, or only the domain discriminating network is adjusted, or both the former network and the latter network are adjusted until the respective networks converge if the loss function value and the value are sufficiently small.
It should be noted that, in the process of adjusting the skeleton point generation network, the adjustment may be performed for a feature extraction layer included in the skeleton point generation network, or may be performed for a full connection layer, where the adjustment may be performed in combination with different training requirements, and no specific limitation is made herein.
For ease of explanation of the training process with respect to the network, further description is provided herein in conjunction with FIG. 3:
as shown in fig. 3, the input is a real face image sample (corresponding to a user face image sample) and a virtual face image sample, and in the case of passing through a feature extraction layer included in the skeleton point generation network, a first image sample feature corresponding to the user face image sample and a second image sample feature corresponding to the virtual face image sample can be determined.
And inputting the first image sample characteristic and the second image sample characteristic into a domain discrimination network to obtain a loss function value output by the domain discrimination network as a first loss function value. In addition, predicted bone point parameters can be determined through the last two fully connected layers included by the bone point generating network, and a second loss function value can be determined based on the preset bone point parameters and the predicted bone point parameters.
Training of the network can be performed based on the first loss function value and the second loss function value.
In view of the technical purpose that the virtual character face pinching method provided by the embodiment of the present disclosure aims to generate a virtual face more similar to a real face, here, counterlearning can be implemented by using a gradient inversion layer so that a skeleton point generation network cannot distinguish the real face from the virtual face, thereby facilitating the subsequent extraction of skeleton point parameters of a user face image to be processed by using the skeleton point generation network.
The gradient inversion layer in the disclosed embodiments may be connected to a domain discrimination network. And reversing the gradient value corresponding to the loss function value output by the domain discrimination network by using the gradient reversing layer to obtain a reversed gradient value, and adjusting the skeleton point generation network according to the reversed gradient value.
It can be known that, through the effect of the gradient inversion layer, the features of the real face can be predicted to come from the virtual face, and the features of the virtual face can be predicted to come from the real face, so that the purpose of enabling the features of the real face and the features of the virtual face to be consistent is achieved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a virtual character face-pinching device corresponding to the virtual character face-pinching method, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the virtual character face-pinching method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 4, a schematic view of an apparatus for pinching a face of a virtual character provided in an embodiment of the present disclosure is shown, where the apparatus includes: an acquisition module 401, a generation module 402, and a rendering module 403; wherein the content of the first and second substances,
an obtaining module 401, configured to respond to an obtaining request of a user, and obtain a user face image of the user;
a generating module 402, configured to perform skeleton point parameter extraction on the user face image by using the trained skeleton point generating network, to obtain a skeleton point parameter corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
and a rendering module 403, configured to perform face rendering based on the skeleton point parameters, to obtain a virtual face image corresponding to the user face image.
According to the embodiment of the disclosure, under the condition that the user face image is obtained, the trained bone point generating network can be used for extracting the bone point parameters of the user face image so as to obtain the bone point parameters corresponding to the user face image. The skeleton point generating network is used as a general network and can be obtained based on user face image samples and virtual face image samples, and therefore any real user face can extract corresponding skeleton point parameters through the skeleton point generating network, and time and labor are saved.
Meanwhile, the skeleton point generating network can learn the characteristics of the user face and the virtual face at the same time, preset skeleton point parameters can also provide data support for the skeleton point parameters corresponding to the user face image, and then the virtual face image similar to the user face can be quickly rendered, and the face pinching efficiency and the similarity are high.
In a possible implementation manner, the rendering module 403 is configured to perform face rendering based on the skeleton point parameters according to the following steps to obtain a virtual face image corresponding to the face image of the user:
inputting the skeleton point parameters into a preset rendering engine so that the preset rendering engine reconstructs face information based on the skeleton point parameters;
and receiving face information reconstructed by a preset rendering engine, and determining a virtual face image corresponding to the face image of the user based on the face information.
In one possible embodiment, the skeletal point generation network comprises a feature extraction layer and a full connection layer; a generating module 402, configured to perform skeleton point parameter extraction on the user face image by using the trained skeleton point generating network according to the following steps, so as to obtain a skeleton point parameter corresponding to the user face image:
inputting a user face image into a feature extraction layer included in a skeleton point generation network to obtain image features output by the feature extraction layer;
inputting the image features output by the feature extraction layer into a full-connection layer included in a skeleton point generation network to obtain skeleton point parameters output by the full-connection layer;
and determining the bone point parameters corresponding to the face image of the user based on the bone point parameters output by the full-connection layer.
In one possible implementation, the training module 404 is further included:
a training module 404 for training the skeletal point generation network according to the following steps:
acquiring a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
respectively extracting the features of the user face image sample and the virtual face image sample by using a feature extraction layer included in the skeleton point generation network to obtain a first image sample feature and a second image sample feature; extracting skeleton point parameters of the second image sample characteristics by using a full-connection layer included in a skeleton point generating network, and determining predicted skeleton point parameters corresponding to the virtual human face image samples;
determining a first loss function value based on the first image sample characteristic and the second image sample characteristic, and determining a second loss function value based on a preset bone point parameter and a predicted bone point parameter;
and adjusting the skeleton point generating network based on the first loss function value and the second loss function value to obtain the trained skeleton point generating network.
In a possible implementation, further comprising a domain discrimination network, the training module 404 is configured to determine the first loss function value based on the first image sample characteristic and the second image sample characteristic according to the following steps:
inputting the first image sample characteristic and the second image sample characteristic into a domain discrimination network to obtain a loss function value output by the domain discrimination network;
a first loss function value is determined based on the loss function value output by the domain discrimination network.
In a possible implementation manner, the training module 404 is configured to adjust the bone point generation network based on the first loss function value and the second loss function value to obtain a trained bone point generation network, according to the following steps:
judging whether the loss function sum value corresponding to the first loss function value and the second loss function value is smaller than a preset threshold value or not;
if not, adjusting any one or more networks in the skeleton point generating network and the domain distinguishing network, and determining a first loss function value and a second loss function value after adjustment based on the adjusted networks;
and obtaining a trained network until the loss function sum value corresponding to the adjusted first loss function value and the second loss function value is smaller than a preset threshold value.
In one possible implementation, the training module 404 is configured to input the first image sample feature and the second image sample feature into the domain discriminant network to obtain a loss function value output by the domain discriminant network according to the following steps:
inputting the first image sample characteristics into a domain discrimination network to obtain a first image category output by the domain discrimination network, and determining a first comparison result between the first image category output by the domain discrimination network and a first annotation category indicated by a user face image sample; and the number of the first and second groups,
inputting the characteristics of the second image sample into a domain discrimination network to obtain a second image class output by the domain discrimination network, and determining a second comparison result between the second image class output by the domain discrimination network and a second annotation class indicated by the virtual face image sample;
and determining a loss function value output by the domain discrimination network based on the first comparison result and the second comparison result.
In a possible implementation mode, the system further comprises a gradient inversion layer connected with the domain discrimination network; training module 404, further configured to:
reversing the gradient value corresponding to the loss function value output by the domain discrimination network by using the gradient reversing layer to obtain a reversed gradient value;
and adjusting the skeleton point generation network according to the reversed gradient value.
In a possible implementation manner, the obtaining module 401 is configured to obtain a virtual face image sample according to the following steps:
responding to an input request of a user, and acquiring preset skeleton point parameters input by the user;
and performing face rendering on the preset skeleton point parameters by using a preset rendering engine to obtain a virtual face image sample.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
An embodiment of the present disclosure further provides an electronic device, as shown in fig. 5, which is a schematic structural diagram of the electronic device provided in the embodiment of the present disclosure, and the electronic device includes: a processor 501, a memory 502, and a bus 503. The memory 502 stores machine-readable instructions executable by the processor 501 (for example, execution instructions corresponding to the obtaining module 401, the generating module 402, and the rendering module 403 in the apparatus in fig. 4, and the like), when the electronic device is operated, the processor 501 and the memory 502 communicate through the bus 503, and when the machine-readable instructions are executed by the processor 501, the following processes are performed:
responding to an acquisition request of a user, acquiring a face image of the user and a trained skeleton point generation network;
extracting the bone point parameters of the user face image by using the trained bone point generating network to obtain the bone point parameters corresponding to the user face image; the skeleton point generating network is obtained by training a user face image sample and a virtual face image sample generated based on preset skeleton point parameters;
and rendering the human face based on the skeleton point parameters to obtain a virtual human face image corresponding to the human face image of the user.
The disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the method for pinching the face of the virtual character described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the method for pinching a face by a virtual character in the foregoing method embodiments, which may be specifically referred to in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种数字摄影测量方法、电子设备及系统