Content generation method and device based on movement data redirection and computer equipment
1. A method for content generation based on motion data redirection, the method comprising the steps of:
responding to a target content generation request sent by a client, analyzing the target content generation request, and acquiring first motion data acquired when a source object makes at least one target action and second motion data acquired when the target object makes any one action;
obtaining a target conversion factor according to the first motion data, the second motion data and a pre-trained conversion factor calculation model, wherein the target conversion factor is used for converting the bone length of the source object in the first motion data into the bone length of the target object, and the bone length is determined based on the position of a human key point in the motion data;
obtaining redirected first motion data according to the first motion data, the target conversion factor and a preset motion data redirection function;
generating target content of the target object to make the at least one target action according to the redirected first motion data;
and sending the target content to the client, wherein the client displays the target content in an interactive interface.
2. The method for generating content based on movement data redirection according to claim 1, wherein before responding to the target content generation request sent by the client, the method further comprises the steps of:
obtaining a motion data sample and a random conversion factor of the source object; the motion data sample of the source object is motion data acquired when the source object performs a plurality of actions;
obtaining a reversely redirected motion data sample according to the motion data sample, the random conversion factor and a preset motion data reverse redirection function, wherein the preset motion data reverse redirection function converts the bone length of the source object in the motion data sample based on the reciprocal of the random conversion factor;
randomly acquiring motion data acquired when the source object performs an action from the motion data sample;
and training the conversion factor calculation model according to the random conversion factor, the motion data sample subjected to reverse redirection, the motion data acquired when the source object performs an action, a preset model loss function and a preset model optimization algorithm to obtain the pre-trained conversion factor calculation model.
3. The method for generating content based on movement data redirection according to claim 2, wherein the training of the conversion factor calculation model according to the movement data sample, the movement data sample after reverse redirection, the movement data collected when the source object performs an action, a preset model loss function and a preset model optimization algorithm to obtain the pre-trained conversion factor calculation model comprises:
randomly initializing trainable parameters of the conversion factor calculation model to obtain the initialized conversion factor calculation model;
inputting the motion data sample, the motion data sample after reverse reorientation and the motion data acquired when the source object performs an action into the initialized conversion factor calculation model to obtain a prediction conversion factor for converting the motion data sample after reverse reorientation into the motion data sample;
iteratively training the trainable parameters of the conversion factor calculation model according to the predicted conversion factor, the random conversion factor, a preset model loss function and a preset model optimization algorithm until the error between the predicted conversion factor and the random conversion factor output by the conversion factor calculation model meets a preset error threshold value, and obtaining the conversion factor calculation model which is pre-trained.
4. The content generation method based on movement data redirection according to any one of claims 1 to 3, wherein the pre-trained conversion factor calculation model includes an action coding sub-model, a movement coding sub-model and a conversion factor prediction sub-model, and the obtaining of the target conversion factor according to the first movement data, the second movement data and the pre-trained conversion factor calculation model includes:
inputting the first motion data into the motion coding submodel to obtain the coded first motion data output by the motion coding submodel;
inputting the second motion data into the action coding sub-model to obtain the coded second motion data output by the action coding sub-model;
and inputting the coded first motion data and the coded second motion data into the conversion factor prediction model to obtain the target conversion factor output by the conversion factor prediction model.
5. The method for generating content based on movement data redirection according to any claim 1 to 3, wherein the analyzing the target content generation request to obtain the first movement data collected when the source object makes at least one target action and the second movement data collected when the target object makes any one action comprises the steps of:
analyzing the target content generation request, and acquiring a source content identifier and a static image containing the human key points of the target object; the source content corresponding to the source content identification is a dynamic image of the source object making at least one target action;
acquiring the first motion data according to the source content, wherein the first motion data comprises position information of the human key points when the source object makes at least one target action;
and acquiring second motion data according to the static image, wherein the second motion data comprises the position information of the key points of the human body when the target object performs the action in the static image.
6. The method for content generation based on motion data redirection according to claim 5, further comprising the steps of:
responding to a live broadcast interaction request sent by a main broadcast client, analyzing the live broadcast interaction request, and storing the source content in the live broadcast interaction request;
and sending content generation popup data to a viewer client side added in a live broadcast room, wherein the viewer client side generates popup data according to the content, displays the content generation popup in a live broadcast room interface, responds to a trigger instruction of the content generation popup, acquires the static image uploaded by a viewer, and sends the target content generation request comprising the static image and the source content identification to a server.
7. A method for content generation based on motion data redirection according to any of the claims 1 to 3, characterized in that the method further comprises:
acquiring surface texture data of the target object acquired when the target object performs any one of the motions;
generating target content of the target object to make the at least one target action according to the redirected first motion data, comprising the steps of:
generating the target content of the target object for making the at least one target action according to the redirected first motion data and the surface texture data of the target object.
8. A method for motion-data-redirection-based content generation according to any of claims 1 to 3, wherein the target object is a virtual character object, the method further comprising:
acquiring a virtual human movement model corresponding to the target object;
generating target content of the target object to make the at least one target action according to the redirected first motion data, comprising the steps of:
inputting the redirected first motion data into a virtual character motion model corresponding to the target object, and generating the target content of the target object for making the at least one target motion.
9. A content generation apparatus for redirection based on motion data, comprising:
the first response unit is used for responding to a target content generation request sent by a client, analyzing the target content generation request, and acquiring first motion data acquired when a source object makes at least one target action and second motion data acquired when the target object makes any one action;
a first obtaining unit, configured to obtain a target conversion factor according to the first motion data, the second motion data, and a pre-trained conversion factor calculation model, where the target conversion factor is used to convert a bone length of the source object in the first motion data into a bone length of the target object, and the bone length is determined based on a position of a human key point in the motion data;
the redirection unit is used for obtaining redirected first motion data according to the first motion data, the target conversion factor and a preset motion data redirection function;
a content generating unit, configured to generate target content of the target object making the at least one target action according to the redirected first motion data;
and the display unit is used for sending the target content to the client, wherein the client displays the target content in an interactive interface.
10. A computer device, comprising: processor, memory and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the computer program.
11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
Background
The motion data redirection refers to the process of redirecting the motion data of the source object to a target object with different length, proportion and even topological structure of the skeleton, namely editing and modifying the motion data of the source object according to the skeleton structure of the target object on the premise of keeping the original motion characteristics.
Currently, motion data redirection has been widely applied to drive virtual characters to synthesize animated content, or to drive real characters to synthesize professional motion video content. However, the existing method for generating content based on motion data redirection is often implemented based on motion data cross reconstruction, so that the problems of distorted actions in the content and low content generation efficiency are caused, and the motion data cross reconstruction requires a series of human key point sets of a target object as input, so that the problems of difficult data acquisition and high acquisition cost are caused.
Disclosure of Invention
The embodiment of the application provides a content generation method and device based on movement data redirection and computer equipment, which can solve the technical problems of action distortion and low generation efficiency, and the technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a content generation method based on motion data redirection, including:
responding to a target content generation request sent by a client, analyzing the target content generation request, and acquiring first motion data acquired when a source object makes at least one target action and second motion data acquired when the target object makes any one action;
obtaining a target conversion factor according to the first motion data, the second motion data and a pre-trained conversion factor calculation model, wherein the target conversion factor is used for converting the bone length of the source object in the first motion data into the bone length of the target object, and the bone length is determined based on the position of a human key point in the motion data;
obtaining redirected first motion data according to the first motion data, the target conversion factor and a preset motion data redirection function;
generating target content of the target object to make the at least one target action according to the redirected first motion data;
and sending the target content to the client, wherein the client displays the target content in an interactive interface.
In a second aspect, an embodiment of the present application provides a content generation apparatus based on motion data redirection, including:
the first response unit is used for responding to a target content generation request sent by a client, analyzing the target content generation request, and acquiring first motion data acquired when a source object makes at least one target action and second motion data acquired when the target object makes any one action;
a first obtaining unit, configured to obtain a target conversion factor according to the first motion data, the second motion data, and a pre-trained conversion factor calculation model, where the target conversion factor is used to convert a bone length of the source object in the first motion data into a bone length of the target object, and the bone length is determined based on a position of a human key point in the motion data;
the redirection unit is used for obtaining redirected first motion data according to the first motion data, the target conversion factor and a preset motion data redirection function;
a content generating unit, configured to generate target content of the target object making the at least one target action according to the redirected first motion data;
and the display unit is used for sending the target content to the client, wherein the client displays the target content in an interactive interface.
In a third aspect, embodiments of the present application provide a computer device, a processor, a memory, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the method according to the first aspect.
In the embodiment of the application, a server responds to a target content generation request sent by a client, analyzes the target content generation request, acquires first motion data acquired when a source object makes at least one target motion and second motion data acquired when the target object makes any one motion, and then obtains a target conversion factor according to the first motion data, the second motion data and a pre-trained conversion factor calculation model, wherein the target conversion factor is used for converting the bone length of the source object in the first motion data into the bone length of the target object, the bone length is determined based on the position of a human key point in the motion data, then obtains redirected first motion data according to the first motion data target conversion factor and a preset motion data redirection function, generates target content of the target object making at least one target motion based on the redirected first motion data, and finally, the server sends the target content to the client, and the client displays the target content in an interactive interface. According to the method and the device, the target conversion factor can be obtained through the first motion data acquired when the source object makes at least one target action, the second motion data acquired when the target object makes any one action and the pre-trained conversion factor calculation model, and as the target conversion factor is only used for changing the length of bones and does not change the included angle between the bones, the distortion of actions in generated contents can be effectively avoided, the content generation efficiency is improved, and in addition, as the pre-trained conversion factor calculation model only needs the second motion data acquired when the target object makes any one action as input, the acquisition difficulty and the acquisition cost of the motion data are also reduced.
For a better understanding and implementation, the technical solutions of the present application are described in detail below with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic view of an application scenario of a content generation method based on motion data redirection according to an embodiment of the present application;
fig. 2 is a schematic view of another application scenario of the content generation method based on motion data redirection according to the embodiment of the present application;
fig. 3 is a schematic flowchart of a content generation method based on motion data redirection according to a first embodiment of the present application;
fig. 4 is a schematic flowchart of S101 in a content generation method based on motion data redirection according to a first embodiment of the present application;
fig. 5 is a schematic diagram of another application scenario of the content generation method based on motion data redirection according to the embodiment of the present application;
fig. 6 is another schematic flow chart of a content generation method based on motion data redirection according to the first embodiment of the present application;
FIG. 7 is a schematic diagram of a training process of a conversion factor calculation model according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a redirection process of first motion data according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a conversion factor calculation model provided in an embodiment of the present application;
fig. 10 is a flowchart illustrating a content generation method based on motion data redirection according to a second embodiment of the present application;
fig. 11 is a schematic structural diagram of a content generation apparatus based on motion data redirection according to a third embodiment of the present application;
fig. 12 is a schematic structural diagram of a computer device according to a fourth embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if/if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
As will be appreciated by those skilled in the art, the terms "client," "terminal device," and "terminal device" as used herein include both wireless signal receiver devices, which include only wireless signal receiver devices without transmit capability, and receiving and transmitting hardware devices, which include receiving and transmitting hardware devices capable of two-way communication over a two-way communication link. Such a device may include: cellular or other communication devices such as personal computers, tablets, etc. having single or multi-line displays or cellular or other communication devices without multi-line displays; PCS (personal communications Service), which may combine voice, data processing, facsimile and/or data communications capabilities; a PDA (Personal Digital Assistant), which may include a radio frequency receiver, a pager, internet/intranet access, a web browser, a notepad, a calendar and/or a GPS (Global positioning system) receiver; a conventional laptop and/or palmtop computer or other device having and/or including a radio frequency receiver. As used herein, a "client," "terminal device" can be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or situated and/or configured to operate locally and/or in a distributed fashion at any other location(s) on earth and/or in space. The "client", "terminal Device" used herein may also be a communication terminal, a web terminal, a music/video playing terminal, such as a PDA, an MID (Mobile Internet Device) and/or a Mobile phone with music/video playing function, and may also be a smart tv, a set-top box, and the like.
The hardware referred to by the names "server", "client", "service node", etc. is essentially a computer device with the performance of a personal computer, and is a hardware device having necessary components disclosed by the von neumann principle, such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, an output device, etc., wherein a computer program is stored in the memory, and the central processing unit loads a program stored in an external memory into the internal memory to run, executes instructions in the program, and interacts with the input and output devices, thereby accomplishing specific functions.
It should be noted that the concept of "server" as referred to in this application can be extended to the case of a server cluster. According to the network deployment principle understood by those skilled in the art, the servers should be logically divided, and in physical space, the servers may be independent from each other but can be called through an interface, or may be integrated into one physical computer or a set of computer clusters. Those skilled in the art will appreciate this variation and should not be so limited as to restrict the implementation of the network deployment of the present application.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a content generation method based on motion data redirection according to an embodiment of the present application, where the application scenario includes a client 101 and a server 102 provided in the embodiment of the present application, and the client 101 interacts with the server 102.
The hardware to which the client 101 is directed essentially refers to a computer device, and in particular, as shown in fig. 1, it may be a computer device of the type of a smartphone, a smart interactive tablet, a personal computer, or the like. The client 101 may access the internet via a known network access method to establish a data communication link with the server 102.
The server 102 is a service server, and may be responsible for further connecting with related audio data servers, video streaming servers, and other servers providing related support, so as to form a logically associated server cluster for providing services for related terminal devices, such as the client 101 shown in fig. 1.
In the embodiment of the present application, a user may click to access an application (for example, YY) installed on the client 101, and browse, generate, and upload content in the application, where the content may be short videos, user dynamics, news, and the like.
Referring to fig. 2, fig. 2 is a schematic view of another application scenario of the content generation method based on motion data redirection according to the embodiment of the present application, where the application scenario is a live webcast scenario, in which a client 101 is divided into an anchor client 1011 and a viewer client 1012, and the anchor client 1011 and the viewer client 1012 interact with each other through a server 102.
The anchor client 1011 is a client that sends a webcast video, and is generally a client used by an anchor (i.e., a live anchor user) in webcasting.
The spectator client 1012 is the end that receives and watches the live video, and is typically the client employed by the spectator watching the video in the live network (i.e., the live spectator user).
Similarly, the hardware pointed to by the anchor client 1011 and the spectator client 1013 are both computer devices in nature.
In this embodiment, the anchor client 1011 and the audience client 1012 can join the same live broadcast room (i.e., a live broadcast channel), and the live broadcast room is a chat room implemented by means of internet technology, and generally has an audio/video broadcast control function. The anchor user is live in the live room through the anchor client 1011, and the viewer at the viewer client 1012 can log in to the server 102 to watch the live.
In the live broadcast room, interaction between the anchor user and the audience users can be realized through known online interaction modes such as voice, video, characters and the like, generally, the anchor user performs programs for the audience users in the form of audio and video streams, and economic transaction behaviors can also be generated in the interaction process.
Specifically, for the viewer user to click access to an application (e.g., YY) installed on the viewer client 1012, choose to enter any live room to view the live broadcast, and various forms of online interaction between the viewer user and the anchor user may occur during the live broadcast, such as: audience users can give virtual gifts to the anchor, and the anchor user can also stimulate the generation of the interaction behaviors of the audience users through the functions provided by the live broadcast room.
Based on the application scenario, the embodiment of the application provides a content generation method based on motion data redirection. Referring to fig. 3, fig. 3 is a schematic flowchart of a content generation method based on motion data redirection according to a first embodiment of the present application, where the method includes the following steps:
s101: responding to a target content generation request sent by a client, analyzing the target content generation request, and acquiring first motion data acquired when a source object makes at least one target action and second motion data acquired when the target object makes any one action.
S102: and obtaining a target conversion factor according to the first motion data, the second motion data and a pre-trained conversion factor calculation model, wherein the target conversion factor is used for converting the bone length of the source object in the first motion data into the bone length of the target object, and the bone length is determined based on the position of a human key point in the motion data.
S103: and obtaining the redirected first motion data according to the first motion data, the target conversion factor and a preset motion data redirection function.
S104: and generating target content of the target object for making the at least one target action according to the redirected first motion data.
S105: and sending the target content to the client, wherein the client displays the target content in an interactive interface.
In this embodiment, a content generation method based on motion data redirection is mainly explained by taking a server as an execution subject, and specifically, the method includes the following steps:
and aiming at the step S101, the client sends a target content generation request to the server, the server responds to the target content generation request sent by the client, analyzes the target content generation request, and acquires first motion data acquired when the source object makes at least one target action and second motion data acquired when the target object makes any one action.
In the embodiment of the present application, the motion data is data for driving a virtual character to synthesize animation content or driving a real character to synthesize motion video content, and typically, the motion data includes a position set of human key points of a certain object (which may be a virtual character or a real character) at different time stamps.
In short, if the motion data is used to drive the virtual character to synthesize the animation content, the motion data includes the position information of the human key points of the virtual character in each frame of animation image, and if the motion data is used to drive the real character to synthesize the motion video content, the motion data includes the position information of the human key points of the real character in each frame of motion video image.
The position information of the key points of the human body is used for positioning the head position and the limb position of the object in each frame picture. In an alternative embodiment, the body key points include, but are not limited to, the top of the head, five sense organs, neck, and major joint points of the limbs, etc.
Since motion data are inconsistent due to different statures and body types even though the same motion sequence is performed for different objects, in order to generate target content of at least one target motion performed by a target object (which may be a virtual character or a real character), it is necessary to redirect motion data of at least one target motion performed by a source object, that is, redirect motion data of a first motion data mentioned in the embodiment of the present application.
Before redirecting the motion data, the server needs to acquire first motion data acquired when the source object makes at least one target motion and second motion data acquired when the target object makes any one motion.
Specifically, in an optional embodiment, the target content generation request sent by the client includes first motion data acquired when the source object makes at least one target action and second motion data acquired when the target object makes any one action, and thus, after the server parses the target content generation request, the server may directly acquire the first motion data acquired when the source object makes at least one target action and the second motion data acquired when the target object makes any one action in the target content generation request.
It is to be understood that in this case the acquisition of the first and second movement data is performed by the client.
In another alternative embodiment, the acquisition of the first motion data and the second motion data is performed by a server or by a motion data acquisition device connected to the server.
The client side sends a target content generation request which comprises a source content identification and a static image containing a human body key point of a target object, and then the server responds to the target content generation request, and obtains first motion data acquired when the source object makes at least one target action and second motion data acquired when the target object makes any one action according to the source content identification obtained through analysis and the static image containing the human body key point of the target object.
Specifically, referring to fig. 4, the step of analyzing the target content generation request in step S101, and acquiring first motion data acquired when the source object performs at least one target motion and second motion data acquired when the target object performs any one motion includes steps S1011 to S1013, as follows:
s1011: analyzing the target content generation request, and acquiring a source content identifier and a static image containing the human key points of the target object; and the source content corresponding to the source content identification is a dynamic image of the source object making at least one target action.
S1012: and acquiring the first motion data according to the source content, wherein the first motion data comprises position information of the human key points when the source object makes at least one target action.
S1013: and acquiring second motion data according to the static image, wherein the second motion data comprises the position information of the key points of the human body when the target object performs the action in the static image.
In this embodiment, after analyzing the target content generation request to obtain the source content identifier and the static image including the human key point of the target object, the server calls the corresponding source content according to the source content identifier.
The source content corresponding to the source content identifier is a dynamic image for the source object to perform at least one target action, and the dynamic image may be an animation or a video, and a specific format of the dynamic image is not limited herein.
In an alternative embodiment, the static image may be a whole body image of the target object, and the motion performed by the target object in the whole body image is not limited herein, and the static image may include a human body key point of the target object.
Then, the server collects the first motion data and the second motion data respectively based on the source content and the static image.
Specifically, the server may input the source content and the static image into a preset motion data acquisition model respectively to obtain the first motion data and the second motion data, or the server may send the source content and the static image to a motion data acquisition device and receive the first motion data and the second motion data returned by the motion data acquisition device.
The collected first motion data comprise position information of the human key points when the source object makes at least one target action, and the collected second motion data comprise position information of the human key points when the target object makes an action in the static image.
In the embodiment of the present application, for different application scenarios, the process of triggering the client to generate the target content generation request is different, and specific reference may be made to the detailed description in the second embodiment of the present application, so as to determine how the method provided in the embodiment of the present application may be specifically applied in a short video scenario and a live network scenario.
And aiming at the step S102, the server calculates a model according to the first motion data, the second motion data and the pre-trained conversion factor to obtain a target conversion factor.
In an optional embodiment, the pre-trained conversion factor calculation model may be set in the server 102, and after the server 102 acquires the first motion data and the second motion data, the first motion data and the second motion data are input to the pre-trained conversion factor calculation model to obtain the target conversion factor.
In another alternative embodiment, in order to reduce the load of the server 102, please refer to fig. 5, and fig. 5 is a schematic diagram of another application scenario of the content generation method based on motion data redirection according to the embodiment of the present application, in which the server 102 is connected to the motion data processing server 103 to form a logically associated service cluster, and the motion data processing server 103 may be configured to process motion data and may also be configured to perform training of a conversion factor calculation model.
After the first motion data and the second motion data are acquired by the server 102, the first motion data and the second motion data may be sent to the motion data processing server 103, a pre-trained conversion factor calculation model is set in the motion data processing server 103, and after the first motion data and the second motion data are received, the first motion data and the second motion data are input to the pre-trained conversion factor calculation model to obtain a target conversion factor, and the target conversion factor is returned to the server 102.
Wherein the target conversion factor is used to convert the bone length of the source object in the first motion data to the bone length of the target object, the bone length being determined based on the locations of the human keypoints in the motion data.
In the embodiment of the present application, the bone refers to the side formed by connecting key points of the human body, and the bone length refers to the relative bone length, for example: setting the length of the bone between the head key point and the middle hip key point to be 1, the relative bone length of the bone formed by connecting other human body key points can be scaled relative to 1, and the reorientation of the motion data can be more conveniently carried out by adopting the relative bone length.
The following will describe in detail a training process of the conversion factor calculation model, which may be performed in the server 102 or the athletic data processing server 103, but is not limited thereto. Specifically, referring to fig. 6, fig. 6 is another schematic flow chart of the content generation method based on motion data redirection according to the first embodiment of the present application, and before executing step S102, the method further includes steps S106 to S109, which are as follows:
s106: obtaining a motion data sample and a random conversion factor of the source object; wherein the motion data samples of the source object are motion data acquired when the source object makes a plurality of motions.
S107: and obtaining a reversely redirected motion data sample according to the motion data sample, the random conversion factor and a preset motion data reverse redirection function, wherein the preset motion data reverse redirection function converts the bone length of the source object in the motion data sample based on the reciprocal of the random conversion factor.
S108: and randomly acquiring motion data acquired when the source object performs an action from the motion data samples.
S109: and training the conversion factor calculation model according to the random conversion factor, the motion data sample subjected to reverse redirection, the motion data acquired when the source object performs an action, a preset model loss function and a preset model optimization algorithm to obtain the pre-trained conversion factor calculation model.
In an alternative embodiment, the conversion factor calculation model may be any deep learning model, such as: a recurrent neural network model, a long-short term memory network model, and the like. In another alternative embodiment, the conversion factor calculation model may also be an integrated model of several deep learning models, such as: the conversion factor calculation model may include a first deep learning model for motion encoding, a second deep learning model for encoding, and a third deep learning model for prediction.
Before the conversion factor calculation model is not trained, motion data samples and random conversion factors of the source object need to be acquired.
Wherein the motion data samples of the source object are motion data acquired when the source object makes a plurality of motions. Here, the action made by the source object is not limited, and may be any of several actions.
In the present embodiment, the motion data sample of the ith source object is represented as Xi,T is included in the motion data samples representing the ith source objectiX k x 2 real numbers, where tiThe sequence length of a motion data sample is defined as the number of a plurality of actions performed by a source object, k is the number of key points of a human body under a certain action, and 2 is the number of key points of the human bodyThe motion data sample is a two-dimensional motion data sample, and the position of a key point of a human body is determined by two coordinates.
The random conversion factor is generated by the server through any random number generation algorithm, and can be any positive number.
After the random conversion factor is obtained, the server obtains a reversely redirected motion data sample according to the motion data sample, the random conversion factor and a preset motion data reverse redirection function.
Before understanding the preset motion data reverse redirection function, the preset motion data redirection function is explained.
Assuming that a preset motion data redirection function is called to process motion data a (motion data acquired when an object a performs any motion), in the preset motion data redirection function, a bone length of the object a is acquired according to a position of a human body key point in the motion data a, and then the bone length of the object a is transformed based on a conversion factor, for example: shortening, lengthening or keeping unchanged, then calculating the position of the key point of the human body according to the length of the transformed bone, and further obtaining the motion data after reorientation based on the conversion factor.
It should be noted that, in the embodiment of the present application, the preset motion data redirection function only changes the length of the bone, and does not change the size of the included angle between the bone pair.
And converting the bone length of the source object in the motion data sample by the preset motion data reverse redirection function based on the reciprocal of the random conversion factor to obtain a reversely redirected motion data sample.
In the present embodiment, the random conversion factor is represented as αiThe preset motion data redirection function is represented as f, and the preset motion data reverse redirection function is represented as f-1The motion data samples after reverse redirection are denoted Xi', then Xi'=f-1(Xi,αi)。
Through carrying out reverse redirection operation to the motion data sample for when training conversion factor computational model, need not gather mated motion data, promptly, need not gather the motion data that makes the same action by different objects, can reduce the data acquisition degree of difficulty and acquire the cost by very big degree.
In addition to reverse redirection of motion data samples, it is also necessary to redirect from the motion data samples XiRandomly acquiring motion data x acquired when the source object performs an actioniWherein x isi∈Rk×2The motion data acquired when the ith source object makes an action includes k multiplied by 2 real numbers, wherein k refers to the number of key points of a human body under a certain action, 2 refers to the fact that the motion data sample is a two-dimensional motion data sample, and the position of one key point of the human body is determined by two coordinates.
Then, according to the random conversion factor alphaiThe motion data sample XiThe inversely redirected motion data sample Xi' motion data x collected while the source object is making an actioniAnd training the conversion factor calculation model to obtain the pre-trained conversion factor calculation model.
The preset model loss function may be any one of loss functions, for example: a mean square error loss function, a cross entropy loss function, etc., and are not limited herein.
The preset model optimization algorithm may also be any deep learning optimization algorithm, for example: momentum optimization algorithm, RMSProp optimization algorithm or Adam optimization algorithm, etc.
Specifically, before training is started, trainable parameters of the conversion factor calculation model need to be randomly initialized to obtain the conversion factor calculation model after initialization, then the motion data sample after reverse redirection, and motion data acquired when the source object makes an action are input into the conversion factor calculation model after initialization to obtain a predicted conversion factor for converting the motion data sample after reverse redirection into the motion data sample, and then the trainable parameters of the conversion factor calculation model are iteratively trained according to the predicted conversion factor, the random conversion factor, a preset model loss function, and a preset model optimization algorithm until an error between the predicted conversion factor output by the conversion factor calculation model and the random conversion factor meets a preset error threshold, and obtaining the conversion factor calculation model which is pre-trained.
In an alternative embodiment of the method according to the invention,wherein, FθA conversion factor calculation model representing the trainable parameters as theta, theta representing the trainable difference parameters that have not been trained yet,representing the predictive conversion factor, L represents a preset loss function,show to makeTheta value at which the minimum value meets a predetermined error threshold, i.e., a pre-trained trainable parameter
Referring to fig. 7, fig. 7 is a schematic diagram illustrating a training process of a conversion factor calculation model according to an embodiment of the present application. As shown in FIG. 7, motion data samples X of the source object are acquirediAnd a random conversion factor alphaiThen according to a random conversion factor alphaiFor the motion data sample X of the source objectiReversely reorienting to obtain a reversely reoriented motion data sample Xi' and, from said motion data samples XiRandomly acquiring motion data x acquired when the source object performs an actioniThen according to said random conversion factor alphaiThe motion data sample XiThe inversely redirected motion data sample Xi' motion data x collected while the source object is making an actioniTraining the conversion factor calculation model to obtain the pre-trained conversion factor calculation model, namely obtaining the pre-trained trainable parameters
In an alternative embodiment, the first motion data may be represented as Xs,XsThe value of epsilon Rts multiplied by k multiplied by 2 represents that t is included in first motion data collected when the source object makes at least one target motionsX k x 2 real numbers, tsThe motion data sample is a two-dimensional motion data sample, and the position of a human body key point is determined by two coordinates.
The second motion data may be represented as xt,xt∈Rk×2The second motion data acquired when the target object performs any motion includes k × 2 real numbers.
The target transition factor is expressed as α, then For the trainable parameters to be pre-trained,representing a pre-trained conversion factor calculation model.
Referring to fig. 8, fig. 8 is a schematic view illustrating a redirection process of first motion data according to an embodiment of the present application. As is clear from FIG. 8, the pre-trained trainable parameters are obtainedThe first motion data X may then be combinedsAnd second motion data xtInput to the pre-trained conversion factor calculation modelTo obtain the target conversion factor alpha.
The following description is made with respect to S103, and as shown in fig. 8, the first motion data X is used as the basissObtaining the redirected first motion data X by the target conversion factor alpha and a preset motion data redirection function ft,Xt=f(Xs,α)。
In an alternative embodiment, please refer to fig. 9, fig. 9 is a schematic structural diagram of a conversion factor calculation model provided in the embodiment of the present application, and as shown in fig. 9, the pre-trained conversion factor calculation model includes an action coding sub-model, a motion coding sub-model, and a conversion factor predictor sub-model. After inputting the first motion data and the second motion data into the pre-trained conversion factor calculation model, inputting the first motion data into a motion coding sub-model to obtain coded first motion data output by the motion coding sub-model, inputting the second motion data into an action coding sub-model to obtain coded second motion data output by the action coding sub-model, and then inputting the coded first motion data and the coded second motion data into a conversion factor prediction model to obtain the target conversion factor output by the conversion factor prediction model.
The motion coding sub-model and the motion coding sub-model are different in that, from a functional perspective, the motion coding sub-model codes motion data corresponding to a certain motion, and the motion coding sub-model codes motion data corresponding to at least one motion.
As will be described below with respect to S104, the server generates target content of the target object making the at least one target action according to the redirected first motion data.
The redirected first motion data comprises a set of positions of the human key points when the target object makes at least one target motion, i.e. a combination of positions of the human key points of the target object under different timestamps can also be understood.
The server can generate target content of the target object for making at least one target action based on the redirected first motion data.
For example: assuming that the source object is a dancer (real person), the first motion data is motion data collected when the dancer makes at least one dancing action, the target object is a disabled person (real person), and the second motion data is motion data collected when the disabled person is sitting, after calculating the model according to the first motion data, the second motion data and the pre-trained conversion factor to obtain the target conversion factor, the first motion data may be redirected using the target conversion factor to obtain redirected first motion data having a bone length determined by the position of the body's keypoint, to convert the bone length of the target object from the bone length of the source object, thus, the target content generated based on the redirected first motion data is the content of the disabled person making at least one dance action.
For another example: assuming that the source object is a player (real character), the first motion data is motion data collected when the player makes at least one fun action, the target object is a cartoon character (virtual character), and the second motion data is motion data collected when the cartoon character stands, after obtaining the target conversion factor according to the first motion data, the second motion data and the pre-trained conversion factor calculation model, the first motion data may be redirected using the target conversion factor to obtain redirected first motion data having a bone length determined by the position of the body's keypoint, to convert the bone length of the target object from the bone length of the source object, thus, the target content generated based on the redirected first motion data is the content of the cartoon character making at least one glary action.
In order to make the generated target content more realistic, the method further needs to acquire surface texture data of the target object acquired when the target object performs the any one motion, and then generate the target content of the target object performing the at least one target motion according to the redirected first motion data and the surface texture data of the target object.
In an alternative embodiment, the surface texture data further includes texture data of a face surface of the target object and texture data of a limb (i.e., bone) surface of the target object, including skin texture, clothing texture, and the like.
In the embodiment, the generated target content can be more vivid and the effect is better by acquiring the surface texture data of the target object.
In another optional embodiment, if the target object is a virtual character, the virtual character motion model corresponding to the target object may be directly obtained.
Various texture information of the target object is already stored in the virtual character motion model, and the target content of the target object making the at least one target action can be generated only by inputting the redirected first motion data into the virtual character motion model corresponding to the target object.
In this embodiment, since the extraction process of the surface texture information of the target object is reduced, the generation speed of the target content can be further increased.
As will be described with respect to step S105, the server sends the target content to the client, and the client receives the target content and adds the target content to the interactive interface for display.
The interactive interface is different in different application scenarios, and in a short video application scenario, the interactive interface may be a short video generation interface of a user, and in a live webcast application scenario, the interactive interface may be a live webcast interface, which is not limited herein.
According to the method and the device, the target conversion factor can be obtained through the first motion data acquired when the source object makes at least one target action, the second motion data acquired when the target object makes any one action and the pre-trained conversion factor calculation model, and as the target conversion factor is only used for changing the length of bones and does not change the included angle between the bones, the distortion of actions in generated contents can be effectively avoided, the content generation efficiency is improved, and in addition, as the pre-trained conversion factor calculation model only needs the second motion data acquired when the target object makes any one action as input, the acquisition difficulty and the acquisition cost of the motion data are also reduced.
Referring to fig. 10, fig. 10 is a schematic flow chart of a content generation method based on motion data redirection according to a second embodiment of the present application, where on the basis of executing steps S101 to S105, the method further includes steps S110 to S111, which are as follows:
s110, responding to a live broadcast interaction request sent by a main broadcast client, analyzing the live broadcast interaction request, and storing the source content in the live broadcast interaction request.
And S111, sending content generation popup data to a viewer client side added into a live broadcast room, wherein the viewer client side generates popup data according to the content, displays the content generation popup in a live broadcast room interface, responds to a trigger instruction of the content generation popup, acquires the static image uploaded by a viewer, and sends the target content generation request comprising the static image and the source content identification to a server.
Steps S110 to S111 are used to describe how to trigger the viewer client to generate a target generation request and send the target generation request to the server in the live webcast application scenario.
Specifically, the server responds to a live broadcast interaction request sent by a main broadcast client, analyzes the live broadcast interaction request, and stores the source content in the live broadcast interaction request. Wherein the source content may be at least one video frame played in a video window of a live broadcast, for example: video content of a main dance.
In an alternative embodiment, the anchor client may be triggered to send the live interaction request as follows. The server issues live broadcast interactive control data to an anchor client, the anchor client receives the live broadcast interactive control data, the live broadcast interactive control is displayed in a live broadcast room interface according to the live broadcast interactive control data, the anchor interacts with the live broadcast interactive control, and the anchor client is triggered to execute a process related to the live broadcast interactive control, for example: and recording the video pictures played in the video window, generating a live broadcast interaction request based on the recorded video pictures, and sending the live broadcast interaction request to a server.
And then, the server responds to the live broadcast interaction request sent by the anchor client, analyzes the live broadcast interaction request, and sends content generation popup data to the audience client which joins the live broadcast room.
The content generation popup is used for displaying the content generation popup in a live broadcast interface, and related prompt instructions are displayed in the content generation popup to guide audience users to generate target content, namely, a target video of actions of the audience users on a main broadcast is generated. The hint instructions may include source content and guide words, etc
In an optional embodiment, the content generation popup includes a photo upload control and a confirmation interaction control, the audience user uploads a static image including a human body key point of the audience user through interaction with the photo upload control, for example, a whole body photo, then clicks the confirmation interaction control, sends a trigger instruction for the content generation popup, the audience client responds to the trigger instruction for the content generation popup, obtains the static image uploaded by the audience user, and sends a target content generation request including the static image and a source content identifier to the server.
In the embodiment, how to trigger the viewer client to generate the target generation request and send the target generation request to the server in a live webcast scene is described in detail, so as to more clearly determine how the content generation method based on motion data redirection is applied to the live webcast scene.
In addition, in the embodiment of the present application, how to trigger the viewer client to generate the target generation request and send the target generation request to the server in the short video scene is further described, specifically as follows:
in a short video scene, a viewer client and a main broadcasting client do not need to be distinguished, and the clients are collectively called as clients.
In an optional embodiment, a content generation function may be added to the interactive interface, specifically, the first content generation component may be displayed in the short video generation interface, and the user may browse the content in the first content generation component, for example: and selecting one of the short videos as source content, uploading a static image including a human body key point of the client, such as a full-body photograph, triggering the client to execute a process associated with the first content generation assembly, generating a target content generation request including the static image and a source content identifier, and sending the target content generation request to the server.
In another alternative embodiment, a second content generation component may be displayed directly below each short video in the short video browsing interface, and the user may upload a still image including a key point of a human body of the user, such as a whole body photograph, by clicking the second content generation component, and then trigger the client to execute a process associated with the second content generation component, generate a target content generation request including a source content identifier and the still image associated with the second content generation component, and send the target content generation request to the server.
In the embodiment, how to trigger the viewer client to generate the target generation request and send the target generation request to the server in the short video application scenario is illustrated, so as to more clearly determine how the content generation method based on the motion data redirection is applied to the short video application scenario.
Please refer to fig. 11, which is a schematic structural diagram of a content generating device based on motion data redirection according to a third embodiment of the present application. The apparatus may be implemented as all or part of a server in software, hardware, or a combination of both. The apparatus 11 comprises:
a first response unit 111, configured to respond to a target content generation request sent by a client, analyze the target content generation request, and obtain first motion data acquired when a source object makes at least one target action and second motion data acquired when the target object makes any one action;
a first obtaining unit 112, configured to obtain a target conversion factor according to the first motion data, the second motion data, and a pre-trained conversion factor calculation model, where the target conversion factor is used to convert a bone length of the source object in the first motion data into a bone length of the target object, and the bone length is determined based on a position of a key point of a human body in the motion data;
a redirection unit 113, configured to obtain redirected first motion data according to the first motion data, the target conversion factor, and a preset motion data redirection function;
a content generating unit 114, configured to generate target content of the target object for making the at least one target action according to the redirected first motion data;
and the display unit is used for sending the target content to the client, wherein the client displays the target content in an interactive interface.
It should be noted that, when the content generating device based on the redirection of motion data provided in the foregoing embodiment executes the content generating method based on the redirection of motion data, only the division of the above functional modules is taken as an example, and in practical applications, the above functions may be distributed to different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. In addition, the content generation device based on the motion data redirection provided by the above embodiment and the content generation method based on the motion data redirection belong to the same concept, and details of the implementation process are shown in the method embodiment and are not described herein again.
Fig. 12 is a schematic structural diagram of a computer device according to a fourth embodiment of the present application. As shown in fig. 12, the computer device 12 may include: a processor 120, a memory 121, and a computer program 122 stored in the memory 121 and executable on the processor 120, such as: a content generation method based on motion data redirection; the steps of the first to second embodiments are implemented when the processor 120 executes the computer program 122.
The processor 120 may include one or more processing cores, among others. The processor 120 is connected to various parts in the computer device 12 by various interfaces and lines, executes various functions of the computer device 12 and processes data by executing or executing instructions, programs, code sets or instruction sets stored in the memory 121 and calling data in the memory 121, and optionally, the processor 120 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), Programmable Logic Array (PLA). The processor 120 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing contents required to be displayed by the touch display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 120, but may be implemented by a single chip.
The Memory 121 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 121 includes a non-transitory computer-readable medium. The memory 121 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 121 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch instructions, etc.), instructions for implementing the above-mentioned method embodiments, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 121 may alternatively be at least one storage device located remotely from the aforementioned processor 120.
The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the foregoing illustrated embodiment, and a specific execution process may refer to specific descriptions in the illustrated embodiment, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the steps of the above-described embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.
The present invention is not limited to the above-described embodiments, and various modifications and variations of the present invention are intended to be included within the scope of the claims and the equivalent technology of the present invention if they do not depart from the spirit and scope of the present invention.