Commodity image quality identification method and device, computing equipment and storage medium
1. A quality identification method of a commodity image is characterized by comprising the following steps:
acquiring a commodity image;
inputting the commodity image into a pre-trained image background recognition model, and obtaining a background quality score of the commodity image based on an output result of the image background recognition model;
inputting the commodity image to a pre-trained commodity position recognition model, and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position recognition model;
inputting the commodity image to a pre-trained user visual experience recognition model, and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and obtaining the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
2. The method of claim 1, wherein prior to said inputting the commodity image to a pre-trained user visual experience recognition model, the method further comprises:
constructing a user visual experience recognition model and a twin model of the user visual experience recognition model;
for any sample commodity image, inputting the sample commodity image and the visual experience quality scoring label of the sample commodity image into the twin model, and inputting a noise sample image obtained by noise processing of the sample commodity image and the visual experience quality scoring label of the noise sample image into the user visual experience identification model;
calculating a loss function according to the difference between the output result of the twin model and the output result of the user visual experience recognition model;
and outputting the trained user visual experience recognition model when the preset loss condition is met.
3. The method of claim 2, wherein calculating a loss function based on a difference between the output of the twin model and the output of the user visual experience recognition model further comprises:
if the visual experience quality score output by the twin model is greater than or equal to the visual experience quality score output by the user visual experience recognition model, the loss function is not calculated;
and if the visual experience quality score output by the twin model is smaller than the visual experience quality score output by the user visual experience recognition model, calculating the loss function.
4. The method of claim 3, wherein said calculating said loss function further comprises:
calculating a first regression loss of the twin model according to the visual experience quality score output by the twin model, and calculating a second regression loss of the user visual experience recognition model according to the visual experience quality score output by the user visual experience recognition model;
and calculating the loss function according to the first regression loss and the second regression loss.
5. The method of any of claims 2-4, wherein obtaining a quality of visual experience score for the image of the good based on the output of the user visual experience recognition model further comprises:
acquiring a visual experience sub-score and a noise sub-score output by the user visual experience recognition model;
and obtaining the visual experience quality score of the commodity image according to the visual experience sub-score and the noise sub-score.
6. The method of any of claims 1-4, wherein the image background recognition model comprises: a plurality of bottleeck modules of different configurations.
7. The method according to any one of claims 1 to 4, wherein the obtaining of the product location quality score of the product image based on the output result of the product location identification model further comprises:
acquiring commodity position categories, commodity coordinate information and coordinate probability information output by the commodity position identification model;
and obtaining the commodity position quality score of the commodity image according to the commodity position category, the commodity coordinate information and/or the coordinate probability information.
8. An apparatus for recognizing quality of an image of a commodity, comprising:
the acquisition module is used for acquiring a commodity image;
the first execution module is used for inputting the commodity image to a pre-trained image background recognition model and obtaining a background quality score of the commodity image based on an output result of the image background recognition model;
a second execution module for inputting the commodity image to a pre-trained commodity position recognition model and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position recognition model
The third execution module is used for inputting the commodity image to a pre-trained user visual experience recognition model and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and the user obtains the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the quality identification method of the commodity image according to any one of claims 1-7.
10. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for identifying quality of an image of an article according to any one of claims 1 to 7.
Background
In the field of electronic commerce, commodity images can display commodity information to users in an intuitive and rapid manner. The quality of the image of the merchandise directly or indirectly affects the desire of the user to browse or purchase the merchandise. So that the identification of the image quality of the commodity is particularly important.
However, the inventor finds that the following defects exist in the prior art in the implementation process: in the prior art, a manual identification mode is adopted when the quality of a commodity image is identified. However, the quality recognition efficiency of the commodity image is low and the recognition accuracy is low in this way.
Disclosure of Invention
In view of the above, the present invention has been made to provide a method, an apparatus, a computing device, and a storage medium for quality recognition of an image of a commodity that overcome or at least partially solve the above problems.
According to an aspect of the present invention, there is provided a method for identifying quality of an image of a commodity, including:
acquiring a commodity image;
inputting the commodity image into a pre-trained image background recognition model, and obtaining a background quality score of the commodity image based on an output result of the image background recognition model;
inputting the commodity image to a pre-trained commodity position recognition model, and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position recognition model;
inputting the commodity image to a pre-trained user visual experience recognition model, and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and obtaining the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
According to another aspect of the present invention, there is provided a quality recognition apparatus of an image of a commodity, including:
the acquisition module is used for acquiring a commodity image;
the first execution module is used for inputting the commodity image to a pre-trained image background recognition model and obtaining a background quality score of the commodity image based on an output result of the image background recognition model;
a second execution module for inputting the commodity image to a pre-trained commodity position recognition model and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position recognition model
The third execution module is used for inputting the commodity image to a pre-trained user visual experience recognition model and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and the user obtains the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the quality identification method of the commodity image.
According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the above method for identifying quality of an image of a commodity.
According to the quality identification method and device of the commodity image, the computing equipment and the storage medium, the commodity image is obtained; inputting the commodity image into an image background recognition model, and obtaining a background quality score of the commodity image based on an output result of the image background recognition model; inputting the commodity image into a commodity position identification model, and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position identification model; inputting the commodity image into a user visual experience recognition model, and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model; and obtaining a total quality score according to the background quality score, the commodity position quality score and the visual experience quality score. According to the scheme, the total quality score of the commodity image is automatically obtained from three dimensions of the image background, the commodity position and the user visual experience, the commodity image quality scoring precision is improved, and the commodity image quality scoring efficiency is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart illustrating a method for identifying quality of an image of a commodity according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an image background recognition model according to an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of a bottleeck _3x3 module according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of a bottleeck _5x5 module according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a structure of a product location identification model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a CBL unit according to an embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating a Focus unit according to an embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating an SPP unit according to an embodiment of the present invention;
FIG. 9 is a diagram illustrating a CSP1_ X unit according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram illustrating a reset component according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating a CSP2_ X unit according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating an exemplary structure of a user visual experience recognition model according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram illustrating a quality recognition apparatus for an image of a commodity according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a flowchart illustrating a method for identifying quality of an image of a commodity according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S110, a product image is acquired.
The commodity image is specifically an image for quality identification in the following process, and the commodity image contains relevant information of the commodity. For example, the commodity image may be a commodity display diagram in a shopping website or the like. The present embodiment is not limited to the type, format, size, and the like of the acquired product image.
Unlike the prior art, in the present embodiment, after the commodity image is acquired, the quality of the commodity image is not manually identified, but the quality score of the commodity image is automatically obtained from three dimensions of the background of the commodity image, the position of the commodity in the commodity image, and the user visual experience of the commodity image through the subsequent steps S120 to S150 based on the machine learning algorithm.
And step S120, inputting the commodity image into a pre-trained image background recognition model, and obtaining the background quality score of the commodity image based on the output result of the image background recognition model.
The image background adopted by the commodity image directly influences the awareness degree of the user on the commodity characteristics, so that the purchasing desire of the user is influenced. For example, if the background of the image of the product is too cluttered, the characteristics related to the product cannot be highlighted, and the desire of the user to purchase the product is reduced. In the step, the quality of the commodity image is identified from the image background dimension of the commodity image, so that the background quality score of the commodity image is obtained.
Specifically, the present embodiment is constructed with an image background recognition model in advance, and the image background recognition model is constructed based on a neural network algorithm. The present embodiment does not limit the specific structure of the image background recognition model. Alternatively, the specific structure of the image background recognition model may be as shown in fig. 2.
As can be seen from fig. 2, the image background recognition model includes a plurality of different structured bowdleeck modules (the plurality of different structured bowdleeck modules are a bowdleeck _3x3 module and a bowdleeck _5x5 module), an Input layer (Input layer), a Conv layer (convolution layer), a Concat layer (splice layer), an avgpoling layer (average pooling layer), a Flatten layer (flattening layer), and a Class _ prediction layer (result output layer).
As shown in fig. 3, the bottleeck _3x3 module includes: an Input layer (Input layer), a DW _3x3 layer (depth separable convolution layer), a Conv layer (convolution layer), a BN layer (Batch Normalization bulk Normalization layer), a MaxPooling layer (maximum pooling layer), an AvgPooling layer (average pooling layer), and an Add layer (addition). In the Bottleneck _3x3 module, the gradient in the back propagation process can not disappear by adopting the DW _3x3 layer and the Conv _1x1 layer at the edge in the training process, so that the loss can be ensured to continuously decrease, the convergence speed of the model is improved on one hand, and the training precision of the model is improved on the other hand. In addition, the Bottleneck _3x3 module comprises 3 DW _3x3 layers, so that the image background recognition model can effectively learn the background characteristics of the commodity image, and the recognition accuracy of the image background recognition model is improved.
As shown in fig. 4, the bottleeck _5x5 module includes: an Input layer, a DW _3x3 layer, a Conv layer, a BN layer, an AvgPooling layer, a DW _1x5 layer, a DW _5x1 layer, a DW _5x5 layer, and an Add layer. In addition, the Bottleneck _5x5 module has additional DW _1x5, DW _5x1 and DW _5x5 layers compared with the Bottleneck _3x3 module. The DW _1x5 layer and the DW _5x1 layer can increase extracted features through channel separation, and the characteristic receptive fields extracted by the DW _1x5 layer, the DW _5x1 layer, the DW _5x5 layer and the DW _3x3 layer are different, so that background characteristics of commodity images can be learned from multiple dimensions, and the fitting degree and accuracy of the algorithm to complex problems and non-linear problems are further increased.
In addition, the DW layers (including the DW _1x5 layer, the DW _5x1 layer, the DW _5x5 layer and the DW _3x3 layer) are adopted, so that the parameter and the calculation amount of the image background recognition model can be reduced, the calculation resources are saved, and the calculation efficiency is improved; and the receptive field of the model can be increased, and the identification precision of the model is improved.
Further, a sample commodity image required by the training image background recognition model is obtained. The sample merchandise image may be an image of the merchandise that contains various image backgrounds. Among them, the various image backgrounds may include: white background, black background, life scene background, solid color background, etc. And generating a background class label of the sample commodity image aiming at any obtained sample commodity image, inputting the sample commodity image and the corresponding background class label into the constructed image background recognition model, and outputting the trained image background recognition model when a preset loss condition is met.
After the trained image background recognition model is obtained, the commodity image obtained in step S110 is subjected to background recognition by using the image background recognition model. Specifically, the image background recognition model outputs a background classification result of the commodity image.
In an optional implementation manner, a background quality score matched with the background classification result output by the image background recognition model may be found in advance according to a mapping relationship between different background classifications and different background quality scores, where the matched background quality score is the background quality score of the commodity image obtained in step S110. By adopting the mode, the background quality score of the commodity image can be quickly obtained according to the output result of the image background recognition model.
In yet another alternative embodiment, the commodity images are applied in different scenes, and the background quality scores corresponding to the background classifications are different. For example, a large amount of data is analyzed, and it is found that in a digital product application scene, a user purchase rate corresponding to a commodity image with a pure white background is high; in the application scene of the food products, the purchase rate of the user corresponding to the commodity image with the color background is high. Based on this, in the embodiment, after obtaining the background classification result output by the image background recognition model, the application scene of the commodity image obtained in step S110 is further obtained. And then searching for a matched background quality score according to the background classification result and the application scene. Wherein, the background quality score matched with the background classification result and the application scene is the background quality score of the commodity image obtained in step S110. By adopting the mode, the background quality score of the commodity image can be accurately obtained according to the output result of the image background recognition model.
Step S130, inputting the commodity image to a pre-trained commodity position recognition model, and obtaining the commodity position quality score of the commodity image based on the output result of the commodity position recognition model.
The position occupied by the commodity in the commodity image can also influence the perception of the commodity characteristics by the user, and then the purchasing desire of the user is influenced. For example, when a product is located at the boundary of a product image, the user's desire to purchase or the like may be reduced. Based on this, the present step identifies the quality of the product image from the product position dimension of the product image, and obtains a product position quality score of the product image.
Specifically, the present embodiment is constructed in advance with a commodity position identification model, which is constructed based on a neural network algorithm. The present embodiment does not limit the specific structure of the product position recognition model. Alternatively, the specific structure of the product location identification model is shown in fig. 5.
As can be seen from fig. 5, the goods location identification model includes: CBL cells, Focus cells, SPP cells, CSP1_ X cells, and CSP2_ X cells.
As shown in fig. 6, the CBL unit has a structure of Conv + BN + leak _ relu, where Conv is a convolutional layer, BN is a normalization layer, and leak _ relu is a leak _ relu activation function. By adopting the structure, the characteristic extraction effect of the model can be enhanced, so that the prediction precision of the model can be improved.
As shown in fig. 7, the Focus unit may perform a slicing (Slice) operation on the input, thereby achieving channel separation. For example, after the original n × 3 input image is sliced, a feature map of n/2 × 12 is generated, so that feature extraction is increased, and the prediction accuracy of the model is further improved. After the Focus unit slices the input image, tensor splicing is further performed through concat, so that tensor dimensionality is expanded, and finally data are output through a CBL in the Focus unit.
As shown in fig. 8, the SPP unit includes a plurality of Maxpool layers, for example, the SPP unit may adopt a maximum pooling mode of 1 × 1, 3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, 13 × 13, and perform multi-scale fusion after feature extraction, thereby enhancing the robustness and accuracy of the network, reducing the number of parameters in the model, and increasing the prediction speed of the model.
As shown in fig. 9, the CSP1_ X unit can down-sample the feature map, thereby increasing the receptive field and enhancing feature extraction on small target samples. The CSP1_ X unit includes X units of response elements, and a specific structure diagram of the response elements is shown in fig. 10, where ADD is a tensor addition operation. The resource component makes use of a residual structure to deepen the network hierarchy of the model, enhance the characteristic extraction effect and inhibit overfitting in the training process of the model. Furthermore, multi-scale and multi-dimensional feature fusion is carried out through Concat in the CSP1_ X unit, so that the diversity of features is enriched, and the prediction accuracy of the model is further improved.
As shown in fig. 11, the CSP2_ X unit can down-sample the feature map to increase the receptive field. The CSP2_ X unit is different from the CSP1_ X unit in that the CSP2_ X unit replaces X resource components of the CSP1_ X unit with 2X CBL units, the CBL units are used for enhancing the feature extraction effect, and the Concat is used for carrying out multi-scale and multi-dimensional feature fusion, so that the diversity of features is enriched, and the prediction accuracy of the model is further improved.
Further, as can be seen from fig. 5, the commodity location identification model includes three Output layers (Output1, Output2, and Output 3). Output1 is used to Output the commodity position type, which is specifically the subject type or the non-subject type. The subject category is that the goods occupy a subject position of the goods image, and the non-subject category is that the goods occupy a non-subject position of the goods image. Output2 is used to Output the product coordinate information, which is the specific coordinate information of the product in the product image. Output3 is used to Output coordinate probability information, which is specifically the prediction probability of the corresponding coordinate.
Further, sample commodity images required by training the commodity position recognition model are obtained. The sample merchandise image may be an image of the merchandise including various merchandise locations. And aiming at any one obtained sample commodity image, generating a commodity position label of the sample commodity image, inputting the sample commodity image and the corresponding commodity position label into the constructed commodity position identification model, and outputting the trained commodity position identification model when a preset loss condition is met.
After the trained product position recognition model is obtained, the product position is performed on the product image obtained in step S110 using the product position recognition model. Specifically, the commodity position category, the commodity coordinate information, and the coordinate probability information output by the commodity position identification model are acquired, and the commodity position quality score of the commodity image acquired in step S110 is obtained according to the commodity position category, the commodity coordinate information, and/or the coordinate probability information. For example, the product location quality score of the product image may be obtained from the product location category, product coordinate information, and/or a mapping relationship between coordinate probability information and quality score.
And step S140, inputting the commodity image to a pre-trained user visual experience recognition model, and obtaining the visual experience quality score of the commodity image based on the output result of the user visual experience recognition model.
The user's visual experience (e.g., aesthetic experience, etc.) of the merchandise image may also affect the user's perception of the merchandise characteristics, which in turn affects the user's desire to purchase. For example, when the composition of the commodity is poor, the purchase desire of the user is reduced, and the like. Based on the above, the quality of the commodity image is identified from the visual experience dimension of the user, so that the visual experience quality score of the commodity image is obtained.
Specifically, the present embodiment is constructed with a user visual experience recognition model in advance, and the user visual experience recognition model is constructed based on a neural network algorithm. The embodiment does not limit the specific structure of the user visual experience recognition model. Alternatively, the specific structure of the user visual experience recognition model is shown in fig. 12.
As can be seen in fig. 12, the user visual experience recognition model includes: an Input layer, a Conv layer, a Flatten layer, and two Output layers (Output1 and Output 2). Wherein, Output1 in the user visual experience recognition model is used for outputting a visual experience sub-score, which is specifically a user aesthetic experience score; output2 in the user visual experience recognition model is used to Output a noise sub-score, specifically a noise score for an image.
In the process of training the user visual experience recognition model, in order to improve the recognition accuracy of the user visual experience recognition model, the twin model of the user visual experience recognition model is further constructed besides the user visual experience recognition model. The twin model and the user visual experience recognition model have the same structure, but training samples and model parameters of the twin model and the user visual experience recognition model are different in the training process.
And further acquiring sample data required by training the user visual experience recognition model and the twin model. Specifically, for any sample commodity image, the sample commodity image and the visual experience quality scoring label of the sample commodity image are input to the twin model, and the noise-added sample image obtained by subjecting the sample commodity image to noise addition processing and the visual experience quality scoring label of the noise-added sample image are input to the user visual experience recognition model. In this embodiment, the specific noise processing manner is not limited, and for example, random gaussian and/or random filtering noise processing may be performed on the sample commodity image. In this embodiment, the visual experience quality scoring labels include a user aesthetic experience scoring label (where the user aesthetic experience scoring label may be obtained through aesthetic indexes such as composition, color matching, contrast, texture, etc.), and a noise sub-scoring label.
And calculating a loss function according to the difference between the output result of the twin model and the output result of the user visual experience recognition model, and outputting the trained user visual experience recognition model when the preset loss condition is met. In particular, the output results also include a visual experience sub-score and a noise sub-score. If the visual experience quality score output by the twin model is greater than or equal to the visual experience quality score output by the user visual experience recognition model, a loss function is not calculated; and if the visual experience quality score output by the twin model is smaller than the visual experience quality score output by the user visual experience recognition model, calculating a loss function. Judging whether the loss function meets a preset loss condition or not, and if so, outputting the current user visual experience recognition model; and if the preset loss condition is not met, adjusting the model parameters of the user visual experience recognition model and then carrying out next training. And outputting the trained user visual experience recognition model until the preset loss condition is met. Optionally, in the process of calculating the loss function, specifically, a first regression loss of the twin model is calculated according to the visual experience quality score output by the twin model, a second regression loss of the user visual experience recognition model is calculated according to the visual experience quality score output by the user visual experience recognition model, and finally the loss function is calculated according to the first regression loss and the second regression loss. For example, the difference between the first regression loss and the second regression loss may be evaluated as the loss function.
After the pre-trained user visual experience recognition model is obtained, the commodity image is input into the pre-trained user visual experience recognition model, the visual experience sub-score and the noise sub-score output by the user visual experience recognition model are obtained, and then the visual experience quality score of the commodity image is obtained according to the visual experience sub-score and the noise sub-score. For example, the result of the weighted summation of the visual experience sub-score and the noise sub-score can be used as the visual experience quality score of the commodity image.
Optionally, in order to ensure the accuracy of the quality score of the final commodity image, sample data used in training the image background recognition model, the commodity position recognition model, and the user visual experience recognition model in this embodiment is the same.
In addition, the execution sequence of step S120, step S130 and step S140 is not limited in this embodiment. Step S120, step S130, and step S140 may be sequentially executed in a corresponding order, or may be executed concurrently.
And S150, obtaining the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
Specifically, corresponding weight coefficients are respectively allocated to the background quality score, the commodity position quality score and the visual experience quality score, so that the total quality score of the commodity image is obtained according to the weighted sum result of the background quality score, the commodity position quality score and the visual experience quality score.
Therefore, the total quality score of the commodity image is automatically obtained from the background of the commodity image, the position of the commodity in the commodity image and the three dimensions of the user visual experience of the commodity image based on the machine learning algorithm, and the efficiency of the commodity image quality score is further improved while the commodity image quality score accuracy is improved.
Fig. 13 is a schematic structural diagram illustrating a product image quality recognition apparatus according to an embodiment of the present invention.
As shown in fig. 13, the product image quality recognition apparatus 1300 includes: an acquisition module 1310, a first execution module 1320, a second execution module 1330, a third execution module 1340, and a synthesis module 1350.
An obtaining module 1310 for obtaining an image of a commodity;
a first executing module 1320, configured to input the commodity image into a pre-trained image background recognition model, and obtain a background quality score of the commodity image based on an output result of the image background recognition model;
a second executing module 1330, configured to input the commodity image into a pre-trained commodity position identification model, and obtain a commodity position quality score of the commodity image based on an output result of the commodity position identification model
The third executing module 1340 is configured to input the commodity image to a pre-trained user visual experience recognition model, and obtain a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and the integrating module 1350 is configured to enable the user to obtain a total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
In an alternative embodiment, the third executing module 1340 is further configured to: before inputting the commodity image into a pre-trained user visual experience recognition model, constructing a user visual experience recognition model and a twin model of the user visual experience recognition model;
for any sample commodity image, inputting the sample commodity image and the visual experience quality scoring label of the sample commodity image into the twin model, and inputting a noise sample image obtained by noise processing of the sample commodity image and the visual experience quality scoring label of the noise sample image into the user visual experience identification model;
calculating a loss function according to the difference between the output result of the twin model and the output result of the user visual experience recognition model;
and outputting the trained user visual experience recognition model when the preset loss condition is met.
In an alternative embodiment, the third executing module 1340 is further configured to: if the visual experience quality score output by the twin model is greater than or equal to the visual experience quality score output by the user visual experience recognition model, the loss function is not calculated;
and if the visual experience quality score output by the twin model is smaller than the visual experience quality score output by the user visual experience recognition model, calculating the loss function.
In an alternative embodiment, the third executing module 1340 is further configured to: calculating a first regression loss of the twin model according to the visual experience quality score output by the twin model, and calculating a second regression loss of the user visual experience recognition model according to the visual experience quality score output by the user visual experience recognition model;
and calculating the loss function according to the first regression loss and the second regression loss.
In an alternative embodiment, the third executing module 1340 is further configured to: acquiring a visual experience sub-score and a noise sub-score output by the user visual experience recognition model;
and obtaining the visual experience quality score of the commodity image according to the visual experience sub-score and the noise sub-score.
In an alternative embodiment, the image background recognition model comprises: a plurality of bottleeck modules of different configurations.
In an alternative embodiment, the second executing module 1330 is further configured to: acquiring commodity position categories, commodity coordinate information and coordinate probability information output by the commodity position identification model;
and obtaining the commodity position quality score of the commodity image according to the commodity position category, the commodity coordinate information and/or the coordinate probability information.
The specific implementation process of each module in the apparatus may refer to the description of the corresponding part in the method embodiment shown in fig. 1, which is not described herein again.
Therefore, the total quality score of the commodity image is automatically obtained from the background of the commodity image, the position of the commodity in the commodity image and the three dimensions of the user visual experience of the commodity image based on the machine learning algorithm, and the efficiency of the commodity image quality score is further improved while the commodity image quality score accuracy is improved.
An embodiment of the present invention provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer executable instruction may execute the quality identification method for the commodity image in any of the above method embodiments.
The executable instructions may be specifically configured to cause the processor to:
acquiring a commodity image;
inputting the commodity image into a pre-trained image background recognition model, and obtaining a background quality score of the commodity image based on an output result of the image background recognition model;
inputting the commodity image to a pre-trained commodity position recognition model, and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position recognition model;
inputting the commodity image to a pre-trained user visual experience recognition model, and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and obtaining the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
In an alternative embodiment, the executable instructions may be specifically configured to cause the processor to:
prior to said inputting said commodity image into a pre-trained user visual experience recognition model,
constructing a user visual experience recognition model and a twin model of the user visual experience recognition model;
for any sample commodity image, inputting the sample commodity image and the visual experience quality scoring label of the sample commodity image into the twin model, and inputting a noise sample image obtained by noise processing of the sample commodity image and the visual experience quality scoring label of the noise sample image into the user visual experience identification model;
calculating a loss function according to the difference between the output result of the twin model and the output result of the user visual experience recognition model;
and outputting the trained user visual experience recognition model when the preset loss condition is met.
In an alternative embodiment, the executable instructions may be specifically configured to cause the processor to:
if the visual experience quality score output by the twin model is greater than or equal to the visual experience quality score output by the user visual experience recognition model, the loss function is not calculated;
and if the visual experience quality score output by the twin model is smaller than the visual experience quality score output by the user visual experience recognition model, calculating the loss function.
In an alternative embodiment, the executable instructions may be specifically configured to cause the processor to:
calculating a first regression loss of the twin model according to the visual experience quality score output by the twin model, and calculating a second regression loss of the user visual experience recognition model according to the visual experience quality score output by the user visual experience recognition model;
and calculating the loss function according to the first regression loss and the second regression loss.
In an alternative embodiment, the executable instructions may be specifically configured to cause the processor to:
acquiring a visual experience sub-score and a noise sub-score output by the user visual experience recognition model;
and obtaining the visual experience quality score of the commodity image according to the visual experience sub-score and the noise sub-score.
In an alternative embodiment, the image background recognition model comprises: a plurality of bottleeck modules of different configurations.
In an alternative embodiment, the executable instructions may be specifically configured to cause the processor to:
acquiring commodity position categories, commodity coordinate information and coordinate probability information output by the commodity position identification model;
and obtaining the commodity position quality score of the commodity image according to the commodity position category, the commodity coordinate information and/or the coordinate probability information.
Therefore, the total quality score of the commodity image is automatically obtained from the background of the commodity image, the position of the commodity in the commodity image and the three dimensions of the user visual experience of the commodity image based on the machine learning algorithm, and the efficiency of the commodity image quality score is further improved while the commodity image quality score accuracy is improved.
Fig. 14 is a schematic structural diagram of a computing device according to an embodiment of the present invention. The specific embodiments of the present invention are not intended to limit the specific implementations of computing devices.
As shown in fig. 14, the computing device may include: a processor (processor)1402, a Communications Interface 1404, a memory 1406, and a communication bus 1408.
Wherein: the processor 1402, communication interface 1404, and memory 1406 communicate with each other via a communication bus 1408. A communication interface 1404 for communicating with network elements of other devices, such as clients or other servers. The processor 1402, configured to execute the program 1410, may specifically perform relevant steps in the above method embodiments.
In particular, program 1410 may include program code that includes computer operating instructions.
Processor 1402 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
Memory 1406 is used to store programs 1410. Memory 1406 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Program 1410 may be specifically configured to cause processor 1402 to perform the following operations:
the executable instructions may be specifically configured to cause the processor to:
acquiring a commodity image;
inputting the commodity image into a pre-trained image background recognition model, and obtaining a background quality score of the commodity image based on an output result of the image background recognition model;
inputting the commodity image to a pre-trained commodity position recognition model, and obtaining a commodity position quality score of the commodity image based on an output result of the commodity position recognition model;
inputting the commodity image to a pre-trained user visual experience recognition model, and obtaining a visual experience quality score of the commodity image based on an output result of the user visual experience recognition model;
and obtaining the total quality score of the commodity image according to the background quality score, the commodity position quality score and the visual experience quality score.
In an alternative embodiment, program 1410 may be specifically configured to cause processor 1402 to perform the following operations:
before inputting the commodity image into a pre-trained user visual experience recognition model, constructing a user visual experience recognition model and a twin model of the user visual experience recognition model;
for any sample commodity image, inputting the sample commodity image and the visual experience quality scoring label of the sample commodity image into the twin model, and inputting a noise sample image obtained by noise processing of the sample commodity image and the visual experience quality scoring label of the noise sample image into the user visual experience identification model;
calculating a loss function according to the difference between the output result of the twin model and the output result of the user visual experience recognition model;
and outputting the trained user visual experience recognition model when the preset loss condition is met.
In an alternative embodiment, program 1410 may be specifically configured to cause processor 1402 to perform the following operations:
if the visual experience quality score output by the twin model is greater than or equal to the visual experience quality score output by the user visual experience recognition model, the loss function is not calculated;
and if the visual experience quality score output by the twin model is smaller than the visual experience quality score output by the user visual experience recognition model, calculating the loss function.
In an alternative embodiment, program 1410 may be specifically configured to cause processor 1402 to perform the following operations:
calculating a first regression loss of the twin model according to the visual experience quality score output by the twin model, and calculating a second regression loss of the user visual experience recognition model according to the visual experience quality score output by the user visual experience recognition model;
and calculating the loss function according to the first regression loss and the second regression loss.
In an alternative embodiment, program 1410 may be specifically configured to cause processor 1402 to perform the following operations:
acquiring a visual experience sub-score and a noise sub-score output by the user visual experience recognition model;
and obtaining the visual experience quality score of the commodity image according to the visual experience sub-score and the noise sub-score.
In an alternative embodiment, the image background recognition model comprises: a plurality of bottleeck modules of different configurations.
In an alternative embodiment, program 1410 may be specifically configured to cause processor 1402 to perform the following operations:
acquiring commodity position categories, commodity coordinate information and coordinate probability information output by the commodity position identification model;
and obtaining the commodity position quality score of the commodity image according to the commodity position category, the commodity coordinate information and/or the coordinate probability information.
Therefore, the total quality score of the commodity image is automatically obtained from the background of the commodity image, the position of the commodity in the commodity image and the three dimensions of the user visual experience of the commodity image based on the machine learning algorithm, and the efficiency of the commodity image quality score is further improved while the commodity image quality score accuracy is improved.
The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.