Behavior identification method, system, equipment and storage medium of webcast anchor
1. A behavior identification method of a webcast anchor is characterized by comprising the following steps:
acquiring network live broadcast video data;
detecting an anchor time sequence action in the network live video data by using a time sequence evaluation module to generate first anchor action sequence data;
inferring a anchor action sequence using the linear conditional random field to generate second anchor action sequence data; and
and performing anchor behavior identification and summary description by utilizing a multi-classification support vector machine based on the second anchor action sequence data.
2. The method of claim 1, wherein the detecting, with the timing evaluation module, a anchor timing action in the webcast video data to generate first anchor action sequence data comprises:
extracting depth features in a live network video;
performing a time series evaluation on the depth features to generate a plurality of probability sequences corresponding to a start time and an end time of a anchor action; and
based on the plurality of probability sequences, first anchor action sequence data is generated.
3. The method of claim 2, wherein the using linear conditional random field to infer the anchor action sequence to generate the second anchor action sequence data comprises:
modeling a main broadcasting action sequence by utilizing a linear conditional random field, and reasoning a more reasonable second main broadcasting action sequence data action sequence according to a logic relation between front and back actions in the first main broadcasting action sequence data.
4. The method of claim 3, wherein the identifying and profiling the anchor behavior using a multi-classification support vector machine based on the second anchor action sequence data comprises:
collecting anchor behavior data of the webcast video to construct a data set;
selecting various anchor behaviors in the network live video as recognition targets;
modeling of a anchor action sequence is realized by utilizing a multi-classification support vector machine, and an anchor action identification result is obtained through mapping between the anchor action sequence and anchor action semantics; and
and extracting key actions in the second anchor action sequence data to carry out anchor action summary description.
5. The method for behavior recognition of a webcast anchor according to claim 2, wherein the extracting depth features in the webcast video comprises:
and extracting spatial features and temporal features in the live webcast video through a double-flow convolutional network, and fusing the spatial features and the temporal features into global space-time features.
6. The method of claim 5, wherein the time-series evaluating the depth features to generate a plurality of probability sequences corresponding to a start time and an end time of a anchor action comprises:
and performing time sequence evaluation on the global space-time characteristics through a time sequence convolutional layer, and respectively generating a starting time probability sequence and an ending time probability sequence of the anchor action through a sigmoid activation function.
7. The method of claim 6, wherein generating first anchor action sequence data based on the plurality of probability sequences comprises:
generating candidate time sequence action nominations based on the starting time probability sequence and the ending time probability sequence;
utilizing a softmax layer to realize the classification of actions to obtain a detection result of the anchor time sequence action; and
generating the first anchor action sequence data based on the detection results of a plurality of the anchor time-series actions.
8. A behavior recognition system of a webcast anchor, comprising:
the acquisition module is used for acquiring network live broadcast video data;
the time sequence evaluation module is used for detecting an anchor time sequence action in the network live video data to generate first anchor action sequence data;
the anchor action sequence reasoning module is used for reasoning an anchor action sequence by utilizing a linear conditional random field to generate second anchor action sequence data; and
and the behavior identification and summary description module is used for carrying out anchor behavior identification and summary description by utilizing a multi-classification support vector machine based on the second anchor action sequence data.
9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the method for behavior recognition of a live webcast according to any of claims 1 to 7.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when being executed by a processor, implements the steps of the method for behavior recognition of a live webcast according to any one of claims 1 to 7.
Background
With the advent of the media age, live network video has become a popular entertainment mode in daily life of internet users. According to the disclosure of the 47 th statistical report of the development conditions of the Chinese Internet published by the Chinese Internet information center, the scale of the China network live broadcast users reaches 6.17 hundred million and accounts for 62.4 percent of the whole netizens as long as 12 months in 2020. The network video live broadcast adopts a popularization mode of driving audiences by a main broadcast, compared with the traditional media main broadcast, the network video live broadcast has the advantages of various types, various expression forms and rich live broadcast contents, particularly, certain main broadcasts are profitable and accumulate popularity, bad behaviors are mixed in the live broadcast contents, and great harm is brought to the network ecological environment. Therefore, it is urgently needed to identify and supervise the anchor behavior by adopting an intelligent analysis technology aiming at the specific characteristics and requirements of the current network live video.
The traditional human behavior recognition method mainly aims at a cut video only containing a single action, so that the recognition method is relatively simple.
However, the webcast video contains rich and complex anchor actions, and it is difficult for the conventional human behavior recognition method to accurately recognize and classify the rich and complex actions in the webcast video.
Disclosure of Invention
The invention provides a behavior identification method, a system, equipment and a storage medium of a live webcast anchor, aiming at overcoming the problem that the prior art is difficult to accurately identify and classify a plurality of rich and complex actions in a live webcast video, obtaining higher identification accuracy and generating reliable summary description. Therefore, the method is applied to the network live broadcast video, and the realization of the anchor behavior recognition is feasible and has important application value.
Specifically, the embodiment of the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for identifying a behavior of a webcast anchor, where the method includes:
acquiring network live broadcast video data;
detecting an anchor time sequence action in the network live video data by using a time sequence evaluation module to generate first anchor action sequence data;
inferring a anchor action sequence using the linear conditional random field to generate second anchor action sequence data; and
and performing anchor behavior identification and summary description by utilizing a multi-classification support vector machine based on the second anchor action sequence data.
Further, the behavior identification method of the webcast anchor further comprises the following steps:
the detecting, with a timing evaluation module, an anchor timing action in the webcast video data to generate first anchor action sequence data includes:
extracting depth features in a live network video;
performing a time series evaluation on the depth features to generate a plurality of probability sequences corresponding to a start time and an end time of a anchor action; and
based on the plurality of probability sequences, first anchor action sequence data is generated.
Further, the behavior identification method of the webcast anchor further comprises the following steps:
said inferring the anchor action sequence using the linear conditional random field to generate second anchor action sequence data comprises:
modeling a main broadcasting action sequence by utilizing a linear conditional random field, and reasoning a more reasonable second main broadcasting action sequence data action sequence according to a logic relation between front and back actions in the first main broadcasting action sequence data.
Further, the behavior identification method of the webcast anchor further comprises the following steps:
the identifying and profiling the anchor behavior by using a multi-classification support vector machine based on the second anchor action sequence data comprises:
collecting anchor behavior data of the webcast video to construct a data set;
selecting various anchor behaviors in the network live video as recognition targets;
modeling of a anchor action sequence is realized by utilizing a multi-classification support vector machine, and an anchor action identification result is obtained through mapping between the anchor action sequence and anchor action semantics; and
and extracting key actions in the second anchor action sequence data to carry out the anchor action summary description.
Further, the behavior identification method of the webcast anchor further comprises the following steps:
the extracting the depth features in the live webcast video comprises:
and extracting spatial features and temporal features in the live webcast video through a double-flow convolutional network, and fusing the spatial features and the temporal features into global space-time features.
Further, the behavior identification method of the webcast anchor further comprises the following steps:
the time-series evaluating the depth features to generate a plurality of probability sequences corresponding to a start time and an end time of a anchor action comprises:
and performing time sequence evaluation on the global space-time characteristics through a time sequence convolutional layer, and respectively generating a starting time probability sequence and an ending time probability sequence of the anchor action through a sigmoid activation function.
Further, the behavior identification method of the webcast anchor further comprises the following steps:
the generating first anchor action sequence data based on the plurality of probability sequences comprises:
generating candidate time sequence action nominations based on the starting time probability sequence and the ending time probability sequence;
utilizing a softmax layer to realize the classification of actions to obtain a detection result of the anchor time sequence action; and
generating the first anchor action sequence data based on the detection results of a plurality of the anchor time-series actions.
In a second aspect, an embodiment of the present invention further provides a behavior recognition system for a webcast anchor, including:
the acquisition module is used for acquiring network live broadcast video data;
the time sequence evaluation module is used for detecting an anchor time sequence action in the network live video data to generate first anchor action sequence data;
the anchor action sequence reasoning module is used for reasoning an anchor action sequence by utilizing a linear conditional random field to generate second anchor action sequence data; and
and the behavior identification and summary description module is used for carrying out anchor behavior identification and summary description by utilizing a multi-classification support vector machine based on the second anchor action sequence data.
In a third aspect, an embodiment of the present invention further provides an apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the behavior recognition method for a webcast anchor when executing the program.
In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the above-mentioned behavior recognition method for a webcast.
According to the technical scheme, the invention provides a behavior identification method, a system, equipment and a storage medium of a live webcast anchor.A time sequence evaluation method is firstly utilized to realize time sequence detection of anchor actions and generate an anchor action sequence contained in a video; then, modeling is carried out on the anchor action sequence by utilizing a linear conditional random field, and a more reasonable action sequence is deduced according to the logical relation between the front action and the back action in the action sequence; and finally, classifying the action sequences by using a multi-classification support vector machine to obtain corresponding anchor action categories, and realizing the summary description of the anchor actions by extracting the key action categories in the action sequences.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a behavior recognition method for a webcast anchor according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a behavior recognition system of a live webcast according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The various terms or phrases used herein have the ordinary meaning as is known to those skilled in the art, and even then, it is intended that the present invention not be limited to the specific terms or phrases set forth herein. To the extent that the terms and phrases referred to herein have a meaning inconsistent with the known meaning, the meaning ascribed to the present invention controls; and have the meaning commonly understood by a person of ordinary skill in the art if not defined herein.
In order to strengthen the guide specification of network live broadcast, strengthen the guide and value introduction, and prevent and restrain bad wind and qi propagation such as bad customs and the like, the national broadcast television central office official network releases 'notice about strengthening network live broadcast and E-commerce live broadcast management' in 11 months in 2020, wherein a network live broadcast platform is definitely specified to carry out label classification management on live broadcast program contents and corresponding main broadcast behaviors, carry out classification labeling according to 'music', 'dance', 'singing', 'body building', 'game' and 'food', and research and adopt targeted benefit and punishment management measures according to the characteristics of live broadcast programs with different contents.
In the prior art, a traditional human behavior recognition method mainly aims at a cut video only containing a single action, so that the recognition method is relatively simple. However, the webcast video contains rich and complex anchor actions, and it is difficult for the conventional human behavior recognition method to accurately recognize and classify the rich and complex actions in the webcast video.
In view of the above, in a first aspect, an embodiment of the present invention provides a behavior recognition method for a webcast anchor by using advantages of a linear conditional random field in a structural inference.
Specifically, the anchor behavior in the webcast video is composed of a plurality of action elements, and in the process of a section of anchor behavior, a certain boundary exists between an anchor action segment and a non-action segment, and the time sequence position of the action and the corresponding action category are obtained through time sequence action detection, so that the accuracy rate of anchor behavior identification is improved. Meanwhile, in order to realize behavior recognition of the anchor, mapping between the anchor action sequence and the anchor high-level behavior needs to be established, the action sequence of the anchor has an obvious time sequence context relationship, strong correlation exists between actions before and after the anchor, and the category of the former action can reason the category of the latter action, so that time sequence context modeling can be carried out on the time sequence relationship in the action sequence, and a more accurate action sequence is obtained. The linear conditional random field is an efficient discriminant model, is favored by many researchers due to the characteristics of easy processing and accurate reasoning, can generate a more accurate anchor action sequence through reasoning, and further realizes the identification of anchor behaviors through multi-classification.
The behavior recognition method of the webcast anchor of the present invention is described below with reference to fig. 1.
Fig. 1 is a flowchart of a behavior recognition method for a live webcast according to an embodiment of the present invention.
In this embodiment, it should be noted that the behavior identification method based on the webcast anchor may include the following steps:
s1: acquiring network live broadcast video data;
s2: detecting an anchor time sequence action in the network live video data by using a time sequence evaluation module to generate first anchor action sequence data;
s3: inferring a anchor action sequence using the linear conditional random field to generate second anchor action sequence data; and
s4: and based on the second anchor action sequence data, utilizing a multi-classification support vector machine to perform anchor behavior identification and summary description.
In addition, the method for identifying the behavior of the webcast anchor provided by an embodiment of the present invention can also be described as including the following steps: firstly, a time sequence evaluation module is utilized to realize the detection of the anchor time sequence action and is used for generating an anchor action sequence; secondly, realizing the principle action sequence reasoning by utilizing a linear conditional random field, and capturing the logical relationship between principle actions; and finally, realizing the anchor behavior identification and the summary description by using a multi-classification support vector machine, and returning a final identification result.
The time sequence evaluation module of the invention is divided into three parts, firstly extracting the global space-time characteristics of the network live video, then generating the starting and ending probability sequence of the action by using time sequence evaluation, finally calculating the nomination of the candidate action, obtaining the detection result of the main broadcast time sequence action after action classification, and determining the final main broadcast action sequence.
For example, step S2 may be to implement anchor timing action detection using a timing evaluation module.
Specifically, in order to determine a main broadcasting action sequence contained in a network live video, the method firstly extracts the global space-time characteristics of the network live video by using a double-current convolutional network; then, performing time sequence evaluation on the global space-time characteristics to generate a starting probability sequence and an ending probability sequence of the action; and finally, calculating to obtain candidate action nominations according to the probability sequence, and classifying actions by utilizing a softmax layer to obtain a main broadcasting time sequence action detection result, and determining a final main broadcasting action sequence (which can be called as first main broadcasting action sequence data).
More specifically, in this embodiment, it should be noted that the method for identifying a behavior of a live webcast may further include: detecting a anchor timing action in the webcast video data with a timing evaluation module to generate first anchor action sequence data S2, which may further include but is not limited to: extracting depth features in a live network video; performing a time series evaluation on the depth features to generate a plurality of probability sequences corresponding to a start time and an end time of the anchor action; and generating first anchor action sequence data based on the plurality of probability sequences.
Further, in this embodiment, it should be noted that the method for identifying a behavior of a live webcast further includes: the method for extracting the depth features in the live webcast video comprises the following steps: and extracting spatial features and temporal features in the live webcast video through a double-current convolutional network, and fusing the spatial features and the temporal features into global space-time features.
Specifically, a live webcast video is an uncut video stream, the video stream is cut according to a frame interval delta, and an optical flow stack formed by stacking a frame of video image and five frames of optical flow graphs is extracted from each cut video segment; then inputting the video frame into a spatial stream convolution network and a stacking optical stream stack into a temporal stream convolution network, wherein both the two networks take ResNet as a basic framework and respectively output spatial characteristics and temporal characteristics from the last layer of the two networks; and finally, fusing the two channel characteristics to obtain the global space-time characteristic. The invention ensures that the depth characteristic dimensions of the network live broadcast videos with different lengths are the same by controlling the frame interval delta, and the delta is determined by the data set scale.
Further, in this embodiment, it should be noted that the method for identifying a behavior of a live webcast further includes: time-sequential evaluation of the depth features to generate a plurality of probability sequences corresponding to a start time and an end time of the anchor action comprises: and performing time sequence evaluation on the global space-time characteristics through the time sequence convolutional layer, and respectively generating a starting time probability sequence and an ending time probability sequence of the anchor action through a sigmoid activation function.
Specifically, since the time sequence convolutional layer can capture local semantic information such as an action boundary, the invention adopts the time sequence convolutional layer to estimate the start time and the end time of the anchor action in the global spatio-temporal feature sequence, and the time sequence convolutional layer is expressed as follows:
Conv(cf,ck,Act) (1)
wherein, cf、ckAct is the convolution kernel number, convolution kernel size and activation function of the timing convolutional layer, respectively. The timing evaluation module of the invention is composed of two timing convolution layers Conv (256, 3, ReLU) and Conv (2, 1, Sigmod), two Sigmoid activation functions are used as classifiers in the last layer, and the starting probability sequence and the ending probability sequence of the action are respectively generated, wherein the Sigmoid function formula is as follows:
and mapping the features into a (0, 1) range through a sigmoid activation function to obtain a probability curve. Through the above process, the starting probability sequence P of the time sequence action is obtainedsAnd end probability sequence PEThe probability of the start and end of an action at each location of the global spatiotemporal features is reflected.
Further, in this embodiment, it should be noted that the method for identifying a behavior of a live webcast further includes: generating first anchor action sequence data based on the plurality of probability sequences comprises: generating candidate time sequence action nominations based on the starting time probability sequence and the ending time probability sequence; utilizing a softmax layer to realize the classification of actions to obtain a detection result of the anchor time sequence actions; and generating first anchor action sequence data based on the detection results of the plurality of anchor time series actions.
Specifically, a start probability sequence P is generated from a timing evaluation moduleSAnd end probability sequence PEThen, the invention generates candidate time sequence action nomination according to the probability sequence. Firstly, selecting half of the maximum probability value in each probability sequence as a threshold value; then selecting all probability points greater than the threshold value to form a new sequence BSAnd BEThey respectively store point coordinates t satisfying the screening conditionsAnd te(ii) a Next match BSAt a starting point position tsAnd BEEnd position t ineAs a time sequence action nomination interval, when matching, the time meeting the starting point position is earlier than the end position, and the obtained time sequence action nomination is expressed as:
wherein p istsAnd pteRespectively the probability values of the start and end points, pccAnd pcrThe classification confidence score and the regression confidence score are calculated from the boundary matching confidence map, respectively. The sorted set of time-series action nominations is denoted as Ψ.
After the time sequence action nomination and the corresponding confidence score are generated, a soft-NMS algorithm is adopted to reduce the confidence score of the redundant result, namely the ranking of the redundant result in the nomination set is reduced, and the soft-NMS algorithm is expressed as follows:
wherein M is BoundingBox, biFor the time sequence nomination interval, siAnd N is a set threshold value, and the confidence score of the nominated name with the superposition of the candidate time sequence positioning interval and the BoundingBox smaller than the threshold value is inhibited through the above formula. After the redundancy-removed time sequence action candidate nominations are obtained, K candidate nominations with the confidence coefficient score higher than that of the candidate nominations are selected according to the data set condition, the corresponding characteristic sequence intervals are sent to a softmax layer for classification, the network live video anchor time sequence action detection results are finally obtained, and anchor action sequences are further obtained by combining all the action detection results.
However, since there is a false detection phenomenon in the time-series operation detection process, there may be an erroneous operation type in the operation sequence, thereby affecting the effectiveness of the anchor behavior recognition. Based on this, the present invention utilizes linear conditional random fields to enable reasoning about the anchor action sequence (i.e., step S3).
In this embodiment, it should be noted that the method for identifying a behavior of a webcast anchor may further include: the anchor action sequence is inferred using a linear conditional random field to generate second anchor action sequence data S3, which may further include, but is not limited to: and modeling the anchor action sequence by utilizing a linear conditional random field, and reasoning out a more reasonable second anchor action sequence data action sequence according to the logical relation between the front action and the back action in the first anchor action sequence data.
Specifically, firstly, modeling an action category in a anchor action sequence by using a linear conditional random field, and determining the conditional probability of the model; then, in order to generate accurate model parameters, reasoning each parameter of the linear conditional random field by adopting maximum likelihood estimation when training the anchor action sequence data; and finally, calculating the expectation of the model through a forward-backward algorithm, realizing the reasoning of the model, and finally obtaining a more accurate anchor action sequence (which can be called as second anchor action sequence data).
For example, for the construction of a linear conditional random field model:
and after obtaining the anchor action sequence generated by the time sequence action detection, modeling the anchor action sequence by utilizing a linear conditional random field. Given a anchor action sequence X, the conditional probability of a linear conditional random field is equivalent to the product normalized form of the potential function, which is formulated as follows:
where Z (X) is a normalization factor used to sum over all possible output anchor action sequences, as follows:
wherein, tk(Yi-1,YiX, i) is a transfer characteristic function between marker positions i-1 and i of the action sequence; sl(YiX, i) is a state feature function of a position i on the action sequence;λkAnd mulWeights corresponding to the transfer characteristic function and the state characteristic function are obtained through training set learning.
The weight, the transfer characteristic function and the state characteristic function are expressed by a unified symbol when K exists1A transfer characteristic and K2When the state is characterized, a characteristic function f is usedk(Yi-1,YiAnd, X, i) is represented as follows:
by summing the characteristic functions at each position i and using ωkRepresenting the weight corresponding to the characteristic function, and simplifying the parameterized form of the linear conditional random field into:
for example, training for a linear conditional random field model:
in order to capture the logical relationship between the front and back actions, accurate model parameters are deduced, and maximum likelihood estimation is adopted to carry out reasoning on all parameters in the model when the anchor action sequence data is trained. The maximum likelihood estimation takes the maximum likelihood λ as the true λ solution, i.e. the value of λ is the solution of the model parameter when the logarithm value of λ is the maximum value of P (Y | X, λ), and its detailed formula is as follows:
for example, reasoning for a linear conditional random field model:
since L (λ) is a convex function, the function takes a maximum at the point where the derivative is 0, and the partial derivative formula for λ is derived as follows:
wherein, OkFor the frequency of occurrence of samples in the training samples, EkIs the expectation of the feature k on a linear conditional random field model, and the expectation is calculated by using a forward-backward algorithm to realize the reasoning of the model. Conditional probability P (Y)i|X)、P(Yi-1,Yi| X) and the corresponding mathematical expectation formula are as follows:
wherein alpha isiAnd betaiFor forward and backward vectors, MiThe function being a state transition matrix at position i, YiIndicates the current action type of the position of the anchor action sequence i, Yi-1Indicates the type of action at the previous time. The inference of the linear conditional random field is realized through the above formula, an action sequence inference model is constructed, action sequences in a test set are input into the conditional random field model, and an anchor action sequence which accords with anchor behavior logic of the live video is inferred according to the weight of a characteristic function.
After the inferred anchor action sequence is obtained, behavior recognition of the anchor needs to be realized according to the anchor action sequence, and behavior semantics of the anchor in the network live video needs to be determined.
In this embodiment, it should be noted that the method for identifying a behavior of a webcast anchor may further include: based on the second anchor action sequence data, utilizing a multi-classification support vector machine for anchor behavior identification and summary description S4, which may further include but is not limited to: collecting anchor behavior data of the network live video to construct a data set; selecting various anchor behaviors in a network live video as recognition targets; modeling of a anchor action sequence is realized by utilizing a multi-classification support vector machine, and an anchor action identification result is obtained through mapping between the anchor action sequence and anchor action semantics; and extracting the key actions in the second anchor action sequence data to carry out anchor action summary description.
Specifically, collecting anchor behavior data of a live webcast video to construct a data set, and selecting five representative anchor behaviors in the live webcast video as recognition targets; then, a multi-classification support vector machine is adopted to realize the high-efficiency modeling of the anchor action sequence, and the anchor action recognition result is obtained through the mapping from the anchor action sequence to the anchor action semantics; and finally, realizing the behavior summary description of the anchor by extracting the key actions in the anchor action sequence.
For example, for a multi-class support vector machine:
after the inferred anchor action sequence is obtained, the invention firstly collects anchor behavior data of the live network video to construct a data set, and selects five representative anchor behaviors in the live network video, wherein the behaviors are respectively as follows: the method comprises the steps of art live broadcast, fine food live broadcast, fine makeup live broadcast, body-building live broadcast and sensitive content live broadcast.
The invention adopts a decision tree type multi-classification support vector machine method, the idea is to divide each root node into two subclasses, continue to classify the subclasses until each class is traversed, at this time, an inverted binary tree structure is formed, each decision node of the binary tree trains a support vector machine classifier, thereby realizing multi-classification of the anchor behavior. Training a two-class support vector machine at each decision node in the binary tree, and training a sample set A { (x) in a given action sequence1,y1),(x2,y2),…,(xN,yN)},yiE { -1,1}, and assuming that the sample space is linearly time-divisible, the classification hyperplane can be expressed by the following linear equation:
ωTx+b=0 (13)
wherein, omega is the normal vector of the classification hyperplane, and b is the offset, which is used for making the classification hyperplane more flexible. Assuming that the hyperplane (ω, b) correctly classifies all training behavior samples, then for the training behavior sample (x)i,yi) For epsilon A:
in order to improve the robustness to unknown sample points, the support vector machine finds the classification hyperplane of the maximum interval between different behaviors, and the calculation formula is as follows:
and generating a final support vector machine model when the 1/| omega | | | is maximum, and carrying out anchor behavior identification based on the model to obtain the behavior category of the anchor action sequence.
For example, for generating a anchorperson behavior profile:
the anchor behavior outline description provided by the invention is to select the key action category in the anchor behavior after obtaining the anchor behavior recognition result, and generate the high-level outline description of the anchor behavior, thereby being convenient for rapidly understanding the network live video content and being beneficial to further managing the network live video content. Specifically, when constructing the anchor behavior data set, a subset of anchor actions is established for each anchor behavior category, each subset containing all actions having a strong association with the anchor behavior. After the anchor behavior category of the network live video and the anchor action sequence corresponding to the anchor behavior category are obtained, the actions existing in the behavior category subset are screened out from the anchor action sequence, and finally, the anchor behavior summary description is generated by combining the behavior category and the extracted key actions.
For example, after obtaining the inferred anchor action sequence { cooking, running, cooking, eating food } through the linear conditional random field model, the anchor action is firstly identified by using a multi-classification support vector machine, if the action category is the food live broadcast, then the action categories contained in the food live broadcast subset are extracted from the action sequence, if two key action categories of cooking and eating are extracted, an anchor action summary description is generated: "this is a food live broadcast, the anchor has cooked and eaten the food". Through the behavior summary description method, the high-level general description of the anchor behavior in the network live broadcast video can be completed, and a reference basis can be provided for the subsequent network live broadcast content supervision.
In summary, according to the method for identifying the anchor behavior of the webcast video with the linear conditional random field provided by the embodiment of the present invention, the anchor action sequence included in the video is first generated, and then the complex behavior of the anchor is inferred according to the action sequence to implement the behavior identification. Secondly, the invention provides a method for realizing the inference of the anchor action sequence by adopting a linear conditional random field, and correcting the error detection action in the action sequence by combing the logical relationship between the front and back anchor actions in the action sequence. Finally, the invention provides a method for realizing the mapping from the anchor action to the anchor action by utilizing a multi-classification support vector machine, and provides a novel anchor action outline description method, which can quickly and effectively identify the anchor action in the network live video.
Based on the same inventive concept, on the other hand, an embodiment of the present invention provides a behavior recognition system for a live webcast.
The behavior recognition system of the webcast anchor provided by the present invention is described below with reference to fig. 2, and the behavior recognition system of the webcast anchor and the behavior recognition method of the webcast anchor described above may be referred to in a corresponding manner.
Fig. 2 is a schematic structural diagram of a behavior recognition system of a live webcast according to an embodiment of the present invention.
In this embodiment, it should be noted that the behavior recognition system 1 of the webcast includes: the acquisition module 10 is used for acquiring live network video data; the time sequence evaluation module 20 is configured to detect an anchor time sequence action in the webcast video data to generate first anchor action sequence data; an anchor action sequence inference module 30 for inferring an anchor action sequence using a linear conditional random field to generate second anchor action sequence data; and a behavior recognition and summary description module 40, configured to perform anchor behavior recognition and summary description by using a multi-classification support vector machine based on the second anchor action sequence data.
The behavior recognition system of the webcast anchor provided by the embodiment of the present invention can be used for executing the behavior recognition method of the webcast anchor described in the above embodiment, and the working principle and the beneficial effect are similar, so detailed descriptions are omitted here, and specific contents can be referred to the introduction of the above embodiment.
In this embodiment, it should be noted that each module in the apparatus according to the embodiment of the present invention may be integrated into a whole or may be separately disposed. The modules can be combined into one module, and can also be further split into a plurality of sub-modules.
Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the invention.
In this embodiment, it should be noted that the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. Processor 310 may invoke logic instructions in memory 330 to perform a method of behavioral recognition of a webcast, the method comprising: acquiring network live broadcast video data; detecting an anchor time sequence action in the network live video data by using a time sequence evaluation module to generate first anchor action sequence data; inferring a anchor action sequence using the linear conditional random field to generate second anchor action sequence data; and performing anchor behavior identification and summary description by using a multi-classification support vector machine based on the second anchor action sequence data.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a processor, is implemented to perform a method of behavior recognition of a webcast, the method comprising: acquiring network live broadcast video data; detecting an anchor time sequence action in the network live video data by using a time sequence evaluation module to generate first anchor action sequence data; inferring a anchor action sequence using the linear conditional random field to generate second anchor action sequence data; and performing anchor behavior identification and summary description by using a multi-classification support vector machine based on the second anchor action sequence data.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the present disclosure, reference to the description of the terms "embodiment," "this embodiment," "yet another embodiment," or the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:应急车道的识别方法、系统及装置