Cover determining method and device, computer equipment and storage medium

文档序号:7776 发布日期:2021-09-17 浏览:49次 中文

1. A cover determination method, the method comprising:

respectively acquiring quality reference information of a plurality of frames of images in a video in response to a jacket determining request of the video, wherein the quality reference information is used for indicating the image quality of a corresponding image;

respectively acquiring content reference information of the multiple frames of images, wherein the content reference information is used for indicating the richness of the image content of the corresponding image;

determining a cover of the video from the multi-frame images based on the quality reference information and the content reference information of the multi-frame images.

2. The method according to claim 1, wherein the respectively obtaining content reference information of the plurality of frames of images comprises:

for any frame image in the multi-frame images, carrying out Laplacian edge detection on the image to obtain an edge detection image;

determining a variance value of the edge detection image based on pixel values of the edge detection image, the variance value being determined as the content reference information.

3. The method according to claim 1, wherein the respectively obtaining quality reference information of a plurality of frames of images in the video comprises:

and for any frame of image in the multi-frame images, inputting the image into an image quality model, predicting the image quality of the image through the image quality model to obtain quality reference information of the image, wherein the image quality model is used for predicting the image quality of the image.

4. The method of claim 1, wherein the determining, from the multi-frame image, a cover of the video based on the quality reference information and the content reference information of the multi-frame image comprises:

selecting a first image from the multi-frame images, and if the content reference information of the first image is larger than a content score threshold value, determining the first image as a cover of the video, wherein the first image is the image with the highest quality reference information in the multi-frame images.

5. The method of claim 4, wherein after said selecting the first image from the plurality of images, the method further comprises:

if the content reference information of the first image is smaller than or equal to the content score threshold, selecting a second image from the multi-frame images, wherein the second image is an image with the highest quality reference information except the first image in the multi-frame images;

and if the content reference information of the second image is larger than the content score threshold value, determining the second image as the cover of the video.

6. The method of claim 4, wherein the determining the cover of the video from the multi-frame image based on the quality reference information and the content reference information of the multi-frame image comprises:

and if the images with the content reference information larger than the content score threshold value do not exist in the multi-frame images, determining the first image as the cover of the video.

7. A cover determination device, the device comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for responding to a jacket determination request of a video and respectively acquiring quality reference information of a plurality of frames of images in the video, and the quality reference information is used for indicating the image quality of corresponding images;

the second acquisition module is used for respectively acquiring content reference information of the multiple frames of images, and the content reference information is used for indicating the richness of the image content of the corresponding image;

and the determining module is used for determining the cover of the video from the multi-frame images based on the quality reference information and the content reference information of the multi-frame images.

8. The apparatus of claim 7, wherein the second obtaining module is configured to:

for any frame image in the multi-frame images, carrying out Laplacian edge detection on the image to obtain an edge detection image;

determining a variance value of the edge detection image based on pixel values of the edge detection image, the variance value being determined as the content reference information.

9. A computer device comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code loaded into and executed by the processor to implement the cover determination method as claimed in any one of claims 1 to 6.

10. A computer-readable storage medium having stored therein at least one program code, the at least one program code being loaded and executed by a processor to implement the cover determination method as claimed in any one of claims 1 to 6.

Background

With the continuous development of internet technology, the emergence of various videos greatly enriches the lives of people. In order to make the user know the content of the video more quickly and accurately or improve the click rate of the video, a corresponding video cover is usually required to be set for each video.

Currently, the cover determination method is generally as follows: the method comprises the steps of obtaining multi-frame images in a video, obtaining quality reference information of each frame of image, wherein the quality reference information is used for indicating the image quality of the corresponding image, selecting the image with the highest quality reference information from the multi-frame images, and determining the image with the highest quality reference information as a cover page of the video.

However, the cover determined by the above process may have a single content, such as a solid image, which may not accurately express the video content, and thus the accuracy of determining the cover is reduced.

Disclosure of Invention

The embodiment of the application provides a cover determining method and device, computer equipment and a storage medium, which can improve the accuracy of the determined cover. The technical scheme is as follows:

in one aspect, a cover determination method is provided, the method comprising:

respectively acquiring quality reference information of a plurality of frames of images in the video in response to a cover determining request for the video, wherein the quality reference information is used for indicating the image quality of the corresponding images;

respectively acquiring content reference information of the multiple frames of images, wherein the content reference information is used for indicating the richness of the image content of the corresponding images;

and determining a cover page of the video from the multi-frame images based on the quality reference information and the content reference information of the multi-frame images.

In some embodiments, respectively acquiring the content reference information of the plurality of frames of images includes:

performing Laplacian edge detection on any frame image in the multi-frame images to obtain an edge detection image;

based on the pixel values of the edge detection image, a variance value of the edge detection image is determined, which is determined as the content reference information.

In some embodiments, respectively obtaining the quality reference information of the plurality of frames of images in the video includes:

and for any frame image in the multi-frame images, inputting the image into an image quality model, predicting the image quality of the image through the image quality model, and obtaining the quality reference information of the image, wherein the image quality model is used for predicting the image quality of the image.

In some embodiments, determining the cover page of the video from the plurality of frames of images based on the quality reference information and the content reference information of the plurality of frames of images comprises:

and selecting a first image from the multi-frame images, and if the content reference information of the first image is greater than a content score threshold value, determining the first image as the cover of the video, wherein the first image is the image with the highest quality reference information in the multi-frame images.

In some embodiments, after selecting the first image from the plurality of images, the method further comprises:

if the content reference information of the first image is less than or equal to the content score threshold value, selecting a second image from the multi-frame images, wherein the second image is the image with the highest quality reference information except the first image in the multi-frame images;

and if the content reference information of the second image is larger than the content score threshold value, determining the second image as the cover page of the video.

In some embodiments, determining the cover page of the video from the plurality of frames of images based on the quality reference information and the content reference information of the plurality of frames of images comprises:

and if the images with the content reference information larger than the content score threshold value do not exist in the multi-frame images, determining the first image as the cover of the video.

In one aspect, there is provided a cover determination apparatus, the apparatus including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for responding to a cover page determination request of a video and respectively acquiring quality reference information of a plurality of frames of images in the video, and the quality reference information is used for indicating the image quality of corresponding images;

the second acquisition module is used for respectively acquiring content reference information of the multiple frames of images, and the content reference information is used for indicating the richness of the image content of the corresponding image;

and the determining module is used for determining the cover of the video from the multi-frame images based on the quality reference information and the content reference information of the multi-frame images.

In some embodiments, the second obtaining module is configured to:

performing Laplacian edge detection on any frame image in the multi-frame images to obtain an edge detection image;

based on the pixel values of the edge detection image, a variance value of the edge detection image is determined, which is determined as the content reference information.

In some embodiments, the first obtaining module is configured to:

and for any frame image in the multi-frame images, inputting the image into an image quality model, predicting the image quality of the image through the image quality model, and obtaining the quality reference information of the image, wherein the image quality model is used for predicting the image quality of the image.

In some embodiments, the determining module is to:

and selecting a first image from the multi-frame images, and if the content reference information of the first image is greater than a content score threshold value, determining the first image as the cover of the video, wherein the first image is the image with the highest quality reference information in the multi-frame images.

In some embodiments, the determining module is further configured to:

if the content reference information of the first image is less than or equal to the content score threshold value, selecting a second image from the multi-frame images, wherein the second image is the image with the highest quality reference information except the first image in the multi-frame images;

and if the content reference information of the second image is larger than the content score threshold value, determining the second image as the cover page of the video.

In some embodiments, the determining module is to:

and if the images with the content reference information larger than the content score threshold value do not exist in the multi-frame images, determining the first image as the cover of the video.

In one aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having stored therein at least one program code, the at least one program code being loaded into and executed by the processor to implement the cover page determination method described above.

In one aspect, a computer-readable storage medium having at least one program code stored therein is provided, the at least one program code being loaded into and executed by a processor to implement the cover page determination method described above.

In one aspect, a computer program product is provided, the computer program product comprising computer program code stored in a computer readable storage medium, the computer program code being read by a processor of a computer device from the computer readable storage medium, the computer program code being executed by the processor such that the computer device performs the cover page determination method described above.

According to the technical scheme, when the cover of the video is determined, the quality reference information of the image is considered, the content reference information of the image is also considered, and the content reference information can indicate the content abundance degree of the corresponding image, so that the determined cover has high image quality and rich image content, the determined cover can accurately express the video content, and the accuracy of determining the cover is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of an environment for implementing a cover determination method according to an embodiment of the present application;

FIG. 2 is a flow chart of a cover determination method provided in an embodiment of the present application;

FIG. 3 is a flow chart of a cover determination method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a cover determination apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of an implementation environment of a cover determination method according to an embodiment of the present application. Referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102.

The terminal 101 may be at least one of a smartphone, a smart watch, a desktop computer, a laptop computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, a laptop portable computer, and the like. The terminal 101 may run various different types of applications, such as a video application, a social application, a live application, and the like, and the terminal 101 has a communication function and can access the internet. The terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 101. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer.

The server 102 may be an independent physical server, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like. The server 102 and the terminal 101 may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the application. Alternatively, the number of the servers 102 may be more or less, and the embodiment of the present application is not limited thereto. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.

In some embodiments, the cover page determining method provided by the present application is executed by the terminal 101 and the server 102 together, and the corresponding processes are as follows: the user can log in application programs such as a video application program, a social application program, a live application program and the like through operation on the terminal 101, and publish a video in the application programs, correspondingly, the terminal 101 triggers sending of a cover page determining request for the video to the server 102 in response to the publishing operation of the video, and after receiving the cover page determining request for the video, the server 102 determines the cover page of the video by using the cover page determining method provided by the application.

Fig. 2 is a flowchart of a cover determination method according to an embodiment of the present application. The embodiment is described with a server as an execution subject, and referring to fig. 2, the embodiment includes:

201. the server responds to a cover page determination request of the video, and respectively acquires quality reference information of a plurality of frames of images in the video, wherein the quality reference information is used for indicating the image quality of the corresponding images.

In the embodiment of the present application, the cover page determination request is for requesting determination of a cover page of a video. In some embodiments, the cover page determination request is triggered by the terminal, for example, when a user wants to publish a video in any application, a publishing operation is performed on the video, and the terminal triggers a process of sending the cover page determination request to the server in response to the publishing operation on the video.

In an embodiment of the present application, the quality reference information is a quality score of the image, and the quality score is used for indicating the image quality of the corresponding image. The image quality comprises image fidelity and image intelligibility, wherein the image fidelity is used for expressing the deviation degree between the current image and the standard image, and the image intelligibility is used for expressing the degree that a person or a machine can extract relevant characteristic information from the current image. Optionally, the image quality is embodied in the aspect of composition ratio, color contrast, color saturation, light and shade contrast and the like of the image. For example, the larger the quality reference information of the image is, the better the image quality of the image is, that is, the better the effects corresponding to the composition ratio, the color contrast, the color saturation and the light and dark contrast of the image are.

202. The server respectively obtains content reference information of the multiple frames of images, and the content reference information is used for indicating the richness of the image content of the corresponding images.

In the embodiment of the application, the content reference information is content scores of the images, and the content scores are used for indicating the richness of the image contents of the corresponding images. For example, the larger the content reference information, the richer the image content representing the image.

203. The server determines a cover page of the video from the plurality of frames of images based on the quality reference information and the content reference information of the plurality of frames of images.

In some embodiments, the server selects an image with high quality reference information and high content reference information from the multi-frame images based on the quality reference information and the content reference information of the multi-frame images, and determines the selected image as a cover page of the video.

According to the technical scheme, when the cover of the video is determined, the quality reference information of the image is considered, the content reference information of the image is also considered, and the content reference information can indicate the content abundance degree of the corresponding image, so that the determined cover has high image quality and rich image content, the determined cover can accurately express the video content, and the accuracy of determining the cover is improved.

In some embodiments, respectively acquiring the content reference information of the plurality of frames of images includes:

performing Laplacian edge detection on any frame image in the multi-frame images to obtain an edge detection image;

based on the pixel values of the edge detection image, a variance value of the edge detection image is determined, which is determined as the content reference information.

In some embodiments, respectively obtaining the quality reference information of the plurality of frames of images in the video includes:

and for any frame image in the multi-frame images, inputting the image into an image quality model, predicting the image quality of the image through the image quality model, and obtaining the quality reference information of the image, wherein the image quality model is used for predicting the image quality of the image.

In some embodiments, determining the cover page of the video from the plurality of frames of images based on the quality reference information and the content reference information of the plurality of frames of images comprises:

and selecting a first image from the multi-frame images, and if the content reference information of the first image is greater than a content score threshold value, determining the first image as the cover of the video, wherein the first image is the image with the highest quality reference information in the multi-frame images.

In some embodiments, after selecting the first image from the plurality of images, the method further comprises:

if the content reference information of the first image is less than or equal to the content score threshold value, selecting a second image from the multi-frame images, wherein the second image is the image with the highest quality reference information except the first image in the multi-frame images;

and if the content reference information of the second image is larger than the content score threshold value, determining the second image as the cover page of the video.

In some embodiments, determining the cover page of the video from the plurality of frames of images based on the quality reference information and the content reference information of the plurality of frames of images comprises:

and if the images with the content reference information larger than the content score threshold value do not exist in the multi-frame images, determining the first image as the cover of the video.

Fig. 2 is a basic flow chart of the present application, and the following further explains the scheme provided in the present application based on a specific embodiment, and fig. 3 is a flow chart of a cover determination method provided in an embodiment of the present application. The embodiment is described with a server as an execution subject, and referring to fig. 3, the embodiment includes:

301. and the server responds to a cover page determination request of the video, and frames of the video are extracted to obtain multi-frame images of the video.

In some embodiments, the server is responsive to a cover determination request for a video in which a second number of frame images are extracted every interval of a first number of frame images, capable of extracting a plurality of frame images from which the video is derived.

Wherein, the first quantity and the second quantity are both preset fixed quantities. Taking the first number of 10 and the second number of 2 as an example, assuming that the video includes 100 frames of images, 20 frames of images can be extracted by extracting 2 frames of images every 10 frames of images. Therefore, the video is subjected to frame extraction, and the subsequent cover page determining process is carried out on the partial image obtained by frame extraction, so that the processing content of the server is reduced, the processing speed of the server is increased, and the cover page determining efficiency is also improved.

It should be noted that the frame extraction process in step 301 is an optional step. In some embodiments, the server may perform step 302 based on a plurality of frame images of the video, that is, all frame images of the video, without performing a frame extraction process of the video in response to the cover page determination request for the video. For example, assuming that 10 frames of images can be extracted in 1 second, a 10-second video can be extracted to obtain 100 frames of images, and then the subsequent cover determination process is performed based on the 100 frames of images obtained by frame extraction.

302. And the server respectively acquires the quality reference information of the plurality of frames of images, wherein the quality reference information is used for indicating the image quality of the corresponding images.

In an embodiment of the present application, the quality reference information is a quality score of the image, and the quality score is used for indicating the image quality of the corresponding image. In some embodiments, the Quality reference information is obtained based on an Image Quality Assessment (IQA), which is a method for assessing the Quality of an Image. Optionally, the image quality evaluation includes a full-reference image quality evaluation, a half-reference image quality evaluation, and a no-reference image quality evaluation. The full reference image quality evaluation refers to the quality of an image predicted by the complete information of a reference image; the semi-reference image quality evaluation refers to the quality of an image predicted by referring to partial information of the image; the no-reference picture quality evaluation means that the picture quality is predicted without referring to the picture information.

In some embodiments, for any one of the plurality of frames of images, the server inputs the image into an image quality model, and predicts the image quality of the image through the image quality model to obtain the quality reference information of the image, wherein the image quality model is used for predicting the image quality of the image.

Wherein the image quality model is used for predicting the image quality of the image. In some embodiments, the image quality model is a deep learning model, which is an algorithm that gradually extracts higher-level features from the original input based on multiple processing layers containing complex structures or consisting of multiple non-linear transformations. For example, the image quality model may be a Deep Convolutional Neural Network (DBCNN), which is a reference-free image quality evaluation method based on Deep learning.

Taking an image quality model as an example of a deep convolutional neural network, the image quality model comprises an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer. The input layer is used for inputting the current image acquired by the server into an image quality model and converting the input image into a digital matrix so as to facilitate the subsequent operation process of the image quality model; the convolution layer is used for performing convolution operation on the matrix generated by the input layer, local features are extracted from the image based on the convolution operation result, and the image quality model can comprise one or more convolution layers; the pooling layer is used for quantizing the feature extraction values obtained by the convolutional layer to obtain a matrix with a smaller dimension so as to further extract image features, and the image quality model can comprise one or more pooling layers; the full connection layer is used for integrating the extracted local features into complete image features through a weight matrix and calculating the quality score of the image based on the complete image features; the output layer is used for acquiring the quality score output by the full connection layer and outputting the quality score of the image as the quality reference information of the image.

The image quality model adopted in the embodiment of the application is a trained model. In some embodiments, the server obtains a plurality of sample images and an average subjective Score (MOS) of the plurality of sample images, and performs model training based on the plurality of sample images and the average subjective Score of the plurality of sample images to obtain an image quality model. The average subjective score is a score obtained by manually and directly evaluating an image. Specifically, the training process of the image quality model includes: in the first iteration process, respectively inputting the plurality of sample images into an initial model to obtain a scoring training result of the first iteration process; determining a loss function based on the score training result of the first iteration process and the average subjective score corresponding to the sample image, and adjusting model parameters in the initial model based on the loss function; taking the model parameters after the first iteration adjustment as model parameters of the second iteration, and then carrying out the second iteration; and repeating the iteration process for a plurality of times, in the Nth process, taking the model parameters after the N-1 th iteration adjustment as new model parameters, carrying out model training until the training meets the target conditions, and acquiring the model corresponding to the iteration process meeting the target conditions as an image quality model. Wherein N is a positive integer and is greater than 1. Optionally, the target condition met by training may be that the number of training iterations of the initial model reaches a target number, which may be a preset number of training iterations; alternatively, the target condition met by the training may be that the loss value meets a target threshold condition, such as a loss value less than 0.00001. The embodiments of the present application do not limit this.

In some embodiments, the image quality model used in the embodiments of the present application is a deep bilinear model, which is composed of two convolutional neural networks, that is, includes two feature extractors. In an alternative embodiment, the two convolutional neural networks are respectively used for processing the images under different distortion conditions, for example, the two convolutional neural networks are respectively used for processing the synthesized distortion and the true distortion, and the depth bilinear model can simultaneously process the synthesized distortion and the true distortion. Optionally, the synthetic distortion and the real distortion are respectively processed through two convolutional neural networks in the depth bilinear model to obtain a feature corresponding to the synthetic distortion and a feature corresponding to the real distortion, the two output features are subjected to bilinear combination to obtain a combined feature, and then the image quality of the image is predicted based on the combined feature. Alternatively, the bilinear combination process is to sum the two features by weight, or to average the two feature sums by weight, or to take the product of the two features.

303. The server respectively obtains content reference information of the multiple frames of images, and the content reference information is used for indicating the richness of the image content of the corresponding images.

In the embodiment of the application, the content reference information is content scores of the images, and the content scores are used for indicating the richness of the image contents of the corresponding images.

In some embodiments, for any one of the plurality of frames of images, the server performs Laplacian (Laplacian) edge detection on the image to obtain an edge detection image, determines a variance value of the edge detection image based on a pixel value of the edge detection image, and determines the variance value as the content reference information.

In the embodiment of the present application, the laplacian edge detection is a method for performing edge detection based on a laplacian operator, and specifically, edge detection is performed by using a second derivative of an image. The laplacian is used to measure high frequency components in an image, and the high frequency components refer to regions in the image where brightness or gray scale changes drastically. The number of the high frequency components is positively correlated with the richness of the image content, that is, the image content of the image is richer as the number of the high frequency components included in the image is larger. The detection of the high-frequency component mainly detects an image edge, which refers to a detailed edge included in the image. It should be noted that the number of detail edges included in the image is positively correlated with the number of high-frequency components, that is, the greater the number of detail edges included in the image, the greater the number of high-frequency components included in the image. Accordingly, the number of detail edges included in the image is positively correlated with the richness of the image content, that is, the greater the number of detail edges included in the image, the richer the image content of the image.

In some embodiments, for any one of the plurality of frames of images, the server performs laplacian edge detection on the image using the following formula (1).

Wherein x represents a horizontal direction; y represents a vertical direction;represents the second derivative; f is the pixel value of the pixel point.

In the above embodiment, by laplacian edge detection, based on the pixel value change of the image in the horizontal direction and the pixel value change of the image in the vertical direction, a rapid change of the pixel value of the image can be detected, and thus, the distribution of detail edges included in the image can be detected, so that the richness of the image content can be determined.

In some embodiments, the server determines the variance value of the edge detection image based on the pixel values of the edge detection image by: the server obtains pixel values of a plurality of pixel points in the edge detection image, averages the pixel values of the plurality of pixel points to obtain an average pixel value, determines a difference value between the pixel value of each pixel point and the average pixel value to obtain a plurality of difference values, and determines a variance value of the image based on a sum of squares of the plurality of difference values and the number of the plurality of pixel points.

Optionally, the server determines the variance value of the edge detection image based on the number of the plurality of pixel points included in the edge detection image, the pixel values of the plurality of pixel points, the average pixel value of the plurality of pixel points, and the following formula (2).

Wherein D is the variance value of the edge detection image; e is the average pixel value of a plurality of pixel points; n is the number of a plurality of pixel points included in the edge detection image; x is the number of1、x2、…、xnThe pixel values of the corresponding pixel points.

In other embodiments, the server may further obtain the content score of the image first and then obtain the quality score of the image. The execution sequence of step 302 and step 303 is not limited in the embodiment of the present application.

In the embodiment of the application, the high-frequency components in the image can be measured by using the Laplacian operator, the content scoring is referred on the basis of referring to the quality scoring, and the image with high quality scoring and high content scoring can be determined to be used as the cover, so that the determined cover is good in image quality and rich in image content, and the accuracy of determining the cover is improved.

304. The server determines a cover page of the video from the plurality of frames of images based on the quality reference information and the content reference information of the plurality of frames of images.

In some embodiments, after the server obtains the quality reference information and the content reference information of the multi-frame images, in the multi-frame images, whether the content reference information of the corresponding image is larger than the content score threshold is sequentially judged according to the sequence of the quality reference information from top to bottom until the image with the content reference information first larger than the content score threshold is determined, and the determined image is determined as the cover page of the video.

The content score threshold is a predetermined fixed threshold, such as 1300. The content score threshold is used to filter the image based on the content reference information. The images with the content reference information larger than the content score threshold are images with rich content, and the images with the content reference information smaller than the content score threshold are images with not rich content. The images are screened by setting the content score threshold value, so that the images with rich content are screened.

In some embodiments, the server selects a first image from the multiple frames of images, and if the content reference information of the first image is greater than a content score threshold, the first image is determined as a cover of the video, and the first image is an image with the highest quality reference information in the multiple frames of images; if the content reference information of the first image is smaller than or equal to the content score threshold, selecting a second image from the multi-frame images, wherein the second image is the image with the highest quality reference information except the first image in the multi-frame images, and if the content reference information of the second image is larger than the content score threshold, determining the second image as the cover of the video; and if the content reference information of the second image is less than or equal to the content score threshold, judging the image with the highest quality reference information in the rest images until the image with the content reference information which is firstly greater than the content score threshold is determined, and determining the determined image as the cover of the video.

In some embodiments, if there is no image with content reference information greater than the content score threshold in the multi-frame image, the server determines the first image as a cover page of the video.

In a specific example, assuming that 100 frames of images are obtained by video frame extraction, the quality reference information (quality Score) of the 100 frames of images is calculated by using a deep convolutional neural network and is marked as Score 1; calculating a variance value by using a Laplacian operator, and marking the variance value as Score 2; a threshold T (e.g., 1300) was set for Score2 and Score1 was screened using Score2 to determine the cover of the video. The specific screening process is as follows: recording an image with the highest Score1, an image with the 1 th highest Score, and an image with the third highest Score1 as C in the 100 frames of images, firstly judging whether the Score2 of the image A is greater than a threshold T, and if the Score2 of the image A is greater than the threshold T, determining the image A as a cover; if the Score2 of the image a is not greater than the threshold T, sequentially judging whether the Score2 of the corresponding image is greater than the threshold T based on the arrangement order of the scores 1 from top to bottom until determining the image which is first greater than the threshold T. If the Score2 of all the images is not greater than the threshold value T, the image (A) with the highest Score1 is selected and determined as the cover of the video.

According to the technical scheme, when the cover of the video is determined, the quality reference information of the image is considered, the content reference information of the image is also considered, and the content reference information can indicate the content abundance degree of the corresponding image, so that the determined cover has high image quality and rich image content, the determined cover can accurately express the video content, and the accuracy of determining the cover is improved.

Fig. 4 is a schematic structural diagram of a cover determination apparatus provided in an embodiment of the present application, and referring to fig. 4, the apparatus includes:

a first obtaining module 401, configured to obtain, in response to a jacket determination request for a video, quality reference information of multiple frames of images in the video, where the quality reference information is used to indicate image quality of corresponding images;

a second obtaining module 402, configured to obtain content reference information of the multiple frames of images, where the content reference information is used to indicate the richness of the image content of the corresponding image;

a determining module 403, configured to determine a cover of the video from the multi-frame images based on the quality reference information and the content reference information of the multi-frame images.

According to the technical scheme, when the cover of the video is determined, the quality reference information of the image is considered, the content reference information of the image is also considered, and the content reference information can indicate the content abundance degree of the corresponding image, so that the determined cover has high image quality and rich image content, the determined cover can accurately express the video content, and the accuracy of determining the cover is improved.

In some embodiments, the second obtaining module 402 is configured to:

performing Laplacian edge detection on any frame image in the multi-frame images to obtain an edge detection image;

based on the pixel values of the edge detection image, a variance value of the edge detection image is determined, which is determined as the content reference information.

In some embodiments, the first obtaining module 401 is configured to:

and for any frame image in the multi-frame images, inputting the image into an image quality model, predicting the image quality of the image through the image quality model, and obtaining the quality reference information of the image, wherein the image quality model is used for predicting the image quality of the image.

In some embodiments, the determining module 403 is configured to:

and selecting a first image from the multi-frame images, and if the content reference information of the first image is greater than a content score threshold value, determining the first image as the cover of the video, wherein the first image is the image with the highest quality reference information in the multi-frame images.

In some embodiments, the determining module 403 is further configured to:

if the content reference information of the first image is less than or equal to the content score threshold value, selecting a second image from the multi-frame images, wherein the second image is the image with the highest quality reference information except the first image in the multi-frame images;

and if the content reference information of the second image is larger than the content score threshold value, determining the second image as the cover page of the video.

In some embodiments, the determining module 403 is configured to:

and if the images with the content reference information larger than the content score threshold value do not exist in the multi-frame images, determining the first image as the cover of the video.

It should be noted that: in the cover determination device provided in the above embodiment, when determining the cover, only the division of the above function modules is exemplified, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules to complete all or part of the functions described above. In addition, the cover determination device and the cover determination method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

The computer device referred to in the present application may be provided as a terminal. Fig. 5 shows a block diagram of a terminal 500 according to an exemplary embodiment of the present application. The terminal 500 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 500 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.

In general, the terminal 500 includes: a processor 501 and a memory 502.

The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one program code for execution by processor 501 to implement the cover determination methods provided by the method embodiments herein.

In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, display screen 505, camera assembly 506, audio circuitry 507, positioning assembly 508, and power supply 509.

The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 504 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 505 may be one, disposed on the front panel of the terminal 500; in other embodiments, the display screens 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in other embodiments, the display 505 may be a flexible display disposed on a curved surface or a folded surface of the terminal 500. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 505 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.

The positioning component 508 is used for positioning the current geographic Location of the terminal 500 for navigation or LBS (Location Based Service). The Positioning component 508 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union's galileo System.

Power supply 509 is used to power the various components in terminal 500. The power source 509 may be alternating current, direct current, disposable or rechargeable. When power supply 509 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: acceleration sensor 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515, and proximity sensor 516.

The acceleration sensor 511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 501 may control the display screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may cooperate with the acceleration sensor 511 to acquire a 3D motion of the user on the terminal 500. The processor 501 may implement the following functions according to the data collected by the gyro sensor 512: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 513 may be disposed on a side frame of the terminal 500 and/or underneath the display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, a user's holding signal of the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 514 is used for collecting a fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 514 may be disposed on the front, back, or side of the terminal 500. When a physical button or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or the vendor Logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the display screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the display screen 505 is increased; when the ambient light intensity is low, the display brightness of the display screen 505 is reduced. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 515.

A proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 500. The proximity sensor 516 is used to collect the distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually decreases, the processor 501 controls the display screen 505 to switch from the bright screen state to the dark screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 becomes gradually larger, the display screen 505 is controlled by the processor 501 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting of terminal 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The computer device provided by the embodiment of the application can be provided as a server. Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the memory 602 stores at least one program code, and the at least one program code is loaded and executed by the processors 601 to implement the cover page determining method provided by the above-mentioned method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory, including program code, which is executable by a processor in a terminal or a server to perform the cover page determination method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact-Disc Read-Only Memory (cd-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware associated with program code, and the program may be stored in a computer readable storage medium, and the above mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:视频搜索方法、装置及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!