Method for training image recognition model, image recognition method and image recognition device
1. A method of training an image recognition model, the method comprising:
acquiring an initial training image, and performing amplification processing of image signal processing on the initial training image to obtain a processed training image;
training an image recognition model based on the processed training images.
2. The method of claim 1, wherein the performing an augmented processing of image signal processing on the initial training image comprises:
performing at least one of the following on the initial training image: RGB domain augmentation processing, HSV domain augmentation processing, and YUV domain augmentation processing.
3. The method of claim 2, wherein the RGB domain augmentation process comprises at least one of color information adjustment, gamma transformation, and random histogram equalization;
the parameters in the color correction matrix and the offset matrix adopted in the color information adjustment are obtained by finely adjusting preset parameters according to random variables; the gamma coefficient adopted in the gamma transformation is obtained by finely adjusting preset parameters according to random variables; the random histogram equalization refers to determining whether to perform histogram equalization based on a random variable.
4. The method according to claim 3, wherein when the RGB domain augmentation process includes at least two of color information adjustment, gamma transformation, and random histogram equalization, the color information adjustment is performed before the gamma transformation, and the gamma transformation is performed before the random histogram equalization.
5. The method of claim 2, wherein the HSV domain augmentation process includes saturation adjustment and/or contrast adjustment;
the saturation parameter used in the saturation adjustment is obtained by finely adjusting a preset parameter according to a random variable, and the contrast parameter used in the contrast adjustment is obtained by finely adjusting the preset parameter according to the random variable.
6. The method of claim 2, wherein the YUV domain augmentation process comprises YUV domain noise reduction and/or edge enhancement;
the parameters of the low-pass filter adopted in the YUV domain noise reduction are obtained by finely adjusting preset parameters according to random variables, and the parameters adopted in the edge enhancement are obtained by finely adjusting the preset parameters according to the random variables.
7. The method of claim 1, wherein in training the image recognition model, the image recognition model is trained based on both the initial training image and the processed training image.
8. An image recognition method, characterized in that the method comprises:
acquiring an image to be identified;
performing image recognition on the image to be recognized based on a trained image recognition model, wherein the image recognition model is trained based on the method for training the image recognition model according to any one of claims 1 to 7.
9. An image recognition device, characterized in that the image recognition device comprises a memory and a processor, the memory having stored thereon a computer program for execution by the processor, the computer program, when executed by the processor, causing the processor to carry out the image recognition method as claimed in claim 8.
10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed, performs a method of training an image recognition model according to any one of claims 1-7 or performs an image recognition method according to claim 8.
Background
Image recognition, which refers to a technique for processing, analyzing and understanding images by using a computer to recognize various targets and objects, is a practical application of applying a deep learning algorithm. Image recognition is generally performed based on a trained neural network model, and the quality of an image to be recognized significantly affects the performance of an image recognition algorithm performed by the neural network model.
In model training, the adopted image training set is often images shot by a certain camera, and in the actual use of image recognition products (such as face recognition products) or SDKs, images to be recognized input to a model may be different, and the quality of the images cannot be guaranteed, so that the performance of the model is poor when recognizing images shot by a certain camera. Therefore, it is a topic worth studying to improve the robustness of the model to the imaging device (module).
Disclosure of Invention
According to an aspect of the present application, there is provided a method for training an image recognition model, the method comprising: acquiring an initial training image, and performing amplification processing of image signal processing on the initial training image to obtain a processed training image; training an image recognition model based on the processed training images.
In an embodiment of the present application, the performing an amplification process of image signal processing on the initial training image includes: performing at least one of the following on the initial training image: RGB domain augmentation processing, HSV domain augmentation processing, and YUV domain augmentation processing.
In one embodiment of the present application, the RGB domain augmentation process includes at least one of color information adjustment, gamma transformation, and random histogram equalization; wherein, the parameters in the color correction matrix and the offset matrix adopted in the color information adjustment are obtained by finely adjusting the preset parameters according to random variables; the gamma coefficient adopted in the gamma transformation is obtained by finely adjusting preset parameters according to random variables; the random histogram equalization refers to determining whether to perform histogram equalization based on a random variable.
In one embodiment of the present application, when the RGB domain augmentation process includes at least two of color information adjustment, gamma transformation, and random histogram equalization, the color information adjustment is performed before the gamma transformation, and the gamma transformation is performed before the random histogram equalization.
In one embodiment of the present application, the HSV domain augmentation process includes saturation adjustment and/or contrast adjustment; the saturation parameter used in the saturation adjustment is obtained by finely adjusting a preset parameter according to a random variable, and the contrast parameter used in the contrast adjustment is obtained by finely adjusting the preset parameter according to the random variable.
In one embodiment of the application, the YUV domain augmentation process includes YUV domain noise reduction and/or edge enhancement; the parameters of the low-pass filter adopted in the YUV domain noise reduction are obtained by finely adjusting preset parameters according to random variables, and the parameters adopted in the edge enhancement are obtained by finely adjusting the preset parameters according to the random variables.
In one embodiment of the application, the image recognition model is a face recognition model.
According to another aspect of the present application, there is provided an image recognition method, the method including: acquiring an image to be identified; and carrying out image recognition on the image to be recognized based on a trained image recognition model, wherein the image recognition model is obtained by training based on the method for training the image recognition model.
According to yet another aspect of the present application, there is provided an image recognition apparatus comprising a memory and a processor, the memory having stored thereon a computer program for execution by the processor, the computer program, when executed by the processor, causing the processor to perform the above-mentioned image recognition method.
According to a further aspect of the application, a storage medium is provided, on which a computer program is stored which, when being executed, carries out the above-mentioned method for training an image recognition model or carries out the above-mentioned image recognition method.
According to the method for training the image recognition model, the image recognition method and the image recognition equipment, the effect of obtaining training images shot by various different types of cameras based on the simulation of the training images shot by a certain type or a plurality of types of cameras can be achieved by carrying out the amplification processing of image signal processing on the initial training images, so that the generalization performance of the image recognition model on images to be recognized shot by different imaging modules is enhanced by the image recognition model obtained based on the image training after the amplification processing, and the image recognition model has good performance when carrying out image recognition on the images to be recognized shot by different imaging modules.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 sets forth a schematic block diagram of exemplary electronic devices for implementing a method of training an image recognition model, an image recognition method, and an image recognition device according to embodiments of the present invention.
Fig. 2 shows a schematic flow diagram of a method of training an image recognition model according to an embodiment of the present application.
Fig. 3 illustrates an example of an augmentation process in a method of training an image recognition model according to an embodiment of the present application.
Fig. 4 illustrates an example of RGB domain augmentation processing in a method of training an image recognition model according to an embodiment of the present application.
Fig. 5 illustrates an example of HSV domain augmentation processing in a method of training an image recognition model according to an embodiment of the present application.
Fig. 6 shows an example of YUV domain augmentation processing in a method of training an image recognition model according to an embodiment of the present application.
Fig. 7 shows a schematic flow chart of an image recognition method according to an embodiment of the present application.
Fig. 8 shows a schematic block diagram of an image recognition device according to an embodiment of the present application.
Fig. 9 shows a schematic block diagram of an image recognition apparatus according to another embodiment of the present application.
Fig. 10 shows a schematic block diagram of an image recognition apparatus according to yet another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, exemplary embodiments according to the present application will be described in detail below with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the application described in the application without inventive step, shall fall within the scope of protection of the application.
In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.
An example electronic device 100 for implementing the method for training an image recognition model, the image recognition method, and the image recognition device of embodiments of the present invention is described below with reference to fig. 1.
As shown in FIG. 1, electronic device 100 includes one or more processors 102, one or more memory devices 104, an input device 106, and an output device 108, which are interconnected via a bus system 110 and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device may have other components and structures as desired.
The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement client-side functionality (implemented by the processor) and/or other desired functionality in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like. The input device 106 may be any interface for receiving information.
The output device 108 may output various information (e.g., images or sounds) to an external (e.g., user), and may include one or more of a display, a speaker, and the like. The output device 108 may be any other device having an output function.
For example, an example electronic device for implementing the method and apparatus for training an image recognition model according to the embodiments of the present invention may be implemented as a terminal such as a smartphone, a tablet computer, a camera, and the like.
In the following, a method 200 of training an image recognition model according to an embodiment of the present application will be described with reference to fig. 2. As shown in FIG. 2, a method 200 of training an image recognition model may include the steps of:
in step S210, an initial training image is obtained, and an amplification process of image signal processing is performed on the initial training image to obtain a processed training image.
In step S220, an image recognition model is trained based on the processed training images.
In the embodiment of the present application, the initial training image obtained in step S210 may be an image captured by one or several cameras (imaging devices, also referred to as modules). As described above, if only images captured by a certain number of cameras are used as training image sets to train the image recognition models, the trained image recognition models can obtain good performance when performing image recognition on images to be recognized captured by the several cameras, but actually, the source (module) of the images to be recognized is uncertain during the use of the trained image recognition models, especially when the trained image recognition models are sold as an SDK, and thus the performance of the image recognition models cannot be guaranteed. Therefore, in the embodiment of the present application, the initial training Image obtained in step S210 is subjected to an Image Signal Processing (ISP) augmentation Process, where the ISP augmentation Process is obtained by fine-tuning preset parameters according to random variables, parameters (such as parameters of a color transformation matrix, coefficients of gamma transformation, saturation parameters, contrast parameters, and the like, which will be described in more detail in the following examples) in the ISP Process that affect the performance of the Image recognition model, the initial training Image is processed based on the parameters (the processing manner depends on the parameters themselves, such as parameters of the color transformation matrix for color information adjustment, coefficients of gamma transformation for performing gamma transformation, saturation parameters for performing saturation adjustment, and contrast parameters for performing contrast adjustment, and the like), and the processed training Image is then used for training Image recognition, the image recognition model can be guaranteed to have good performance when image recognition is carried out on the images to be recognized shot by different imaging modules. This is because the main difference between different modules is the difference of ISP algorithm, which is a process of converting the Raw (Raw) image collected by the image sensor into the RGB image used daily by one module. Therefore, in the embodiment of the application, the ISP augmentation processing is performed on the initial training image, so that the effect of obtaining training images shot by various different types of cameras based on the training image simulation shot by one or more types of cameras can be realized, and therefore, the generalization performance of the image recognition model on the images to be recognized shot by different imaging modules is enhanced by the image recognition model obtained based on the image training after the augmentation processing, so that the image recognition model has good performance when performing image recognition on the images to be recognized shot by the different imaging modules.
In an embodiment of the present application, the ISP augmentation process performed on the initial training image in step S210 may include at least one of the following: RGB domain augmentation processing, HSV domain augmentation processing, and YUV domain augmentation processing. Wherein, RGB, HSV and YUV are different color spaces respectively. Accordingly, the RGB domain expansion processing refers to processing in an RGB color space, the HSV domain expansion processing refers to processing in an HSV color space, and the YUV domain expansion processing refers to processing in a YUV color space. Different ISP processing links are generally arranged in different color spaces, at least one ISP processing link is subjected to augmentation processing, ISP augmentation of an initial training image can be achieved, naturally, training capacity of training data can be enhanced by the augmentation processing of a plurality of ISP processing links, and accordingly generalization of a trained image recognition model to different modules is enhanced.
Therefore, in an embodiment of the present disclosure, when the ISP expansion process includes at least two of an RGB domain expansion process, an HSV domain expansion process, and a YUV domain expansion process, the RGB domain expansion process is performed before the HSV domain expansion process, and the HSV domain expansion process is performed before the YUV domain expansion process, which may simplify a process and improve efficiency. For example, RGB domain augmentation may be performed on the initial training image to obtain an image after RGB domain augmentation, and then HSV domain augmentation may be performed on the image after RGB domain augmentation to obtain an image after HSV domain augmentation, which is used as an image for training the image recognition model. For another example, the RGB domain amplification process may be performed on the initial training image to obtain an RGB domain amplified image, and then the YUV domain amplification process may be performed on the RGB domain amplified image to obtain a YUV domain amplified image, which is used as an image for training the image recognition model. For another example, HSV domain augmentation may be performed on the initial training image to obtain an image after HSV domain augmentation, and then YUV domain augmentation may be performed on the image after HSV domain augmentation to obtain an image after YUV domain augmentation, which is used as an image for training the image recognition model.
In another example, RGB domain amplification processing may be performed on the initial training image to obtain an RGB domain-amplified image, HSV domain-amplified image may be obtained on the RGB domain-amplified image, and YUV domain-amplified image may be obtained on the HSV domain-amplified image to obtain a YUV domain-amplified YUV image as an image for training the image recognition model, as shown in fig. 3. In this example, the ISP expansion process includes RGB domain expansion process, HSV domain expansion process, and YUV domain expansion process, which can improve the training ability of the image after expansion process to the maximum extent, and enhance the generalization of the trained image recognition model to different modules to the maximum extent.
In an embodiment of the present application, the RGB domain augmentation process may include at least one of color information adjustment, Gamma (Gamma) transformation, and random histogram equalization; wherein, the parameters in the Color Correction Matrix (CCM for short, generally 3 × 3 Matrix) and the offset Matrix (generally 3 × 1 Matrix) used in the Color information adjustment are obtained by fine-tuning the preset parameters according to random variables; the gamma coefficient adopted in the gamma transformation is obtained by finely adjusting preset parameters according to random variables; the random histogram equalization refers to determining whether to perform histogram equalization based on a random variable.
In this embodiment, by using the color correction matrix and the offset matrix including the parameters obtained by fine-tuning the preset parameters according to the random variables, the color information of the initial training image can be adjusted (generally, fine-tuned) to obtain training images with different color information, thereby realizing the amplification of the image color information. In addition, global gamma transformation is carried out on the image through the gamma coefficient obtained by finely adjusting the preset parameters according to the random variable, various random corrections with different degrees can be carried out on overexposure or underexposure, and the amplification of image exposure correction is realized. In addition, some of the different modules may perform histogram equalization, and some of the different modules may not perform histogram equalization, so that whether the histogram equalization is performed or not is controlled by random histogram equalization, that is, by adopting a random variable (such as a value of 1 or 0) to control whether the histogram equalization is performed, the effect of whether the histogram equalization exists in the image can be increased.
In the embodiment of the present application, when the RGB domain expansion process includes at least two of color information adjustment, gamma transformation, and random histogram equalization, the color information adjustment is performed before the gamma transformation, and the gamma transformation is performed before the random histogram equalization, which can achieve a better expansion effect. For example, the color information of the initial training image may be adjusted to obtain an image with adjusted color information, and then the image with adjusted color information may be gamma-transformed to obtain an image with gamma-transformed color information, which is used as an image with RGB domain expansion processing. For another example, the color information of the initial training image may be adjusted to obtain an image with adjusted color information, and then the image with adjusted color information may be subjected to random histogram equalization to obtain an image with equalized random histogram, which is used as an image subjected to RGB domain amplification processing. For another example, the initial training image may be first subjected to gamma conversion to obtain an image after gamma conversion, and then the image after gamma conversion may be subjected to random histogram equalization to obtain an image after random histogram equalization, which is used as an image subjected to RGB domain augmentation processing.
In another example, the color information of the initial training image may be adjusted to obtain an image with adjusted color information, the image with adjusted color information may be gamma-transformed to obtain an image with gamma-transformed image, and the image with gamma-transformed image may be subjected to random histogram equalization to obtain an image with random histogram equalization as an image with RGB domain expansion, which is shown in fig. 4. In this example, the RGB domain augmentation process includes three processes, namely, color information adjustment, gamma conversion, and random histogram equalization, so that the training capability of the image after the RGB domain augmentation process can be improved to the maximum extent, and the generalization of the trained image recognition model to different modules can be enhanced.
In embodiments of the present application, the HSV domain augmentation process may include saturation adjustment and/or contrast adjustment; the saturation parameter used in the saturation adjustment is obtained by finely adjusting a preset parameter according to a random variable, and the contrast parameter used in the contrast adjustment is obtained by finely adjusting the preset parameter according to the random variable. In the ISP processing process of different modules, the contrast and saturation are often adjusted according to the aesthetic of the ISP adjuster, and the saturation and contrast styles of different modules are different. Therefore, in this embodiment, the saturation adjustment of the initial training image or the training image subjected to the RGB domain augmentation process is performed by fine-tuning the preset parameter according to the random variable in the HSV domain and randomly generating the saturation parameter (within a reasonable range), so that the augmentation of the image saturation information can be realized. In addition, the contrast adjustment is carried out on the image by finely adjusting the preset parameters according to the random variables in the HSV domain and randomly generating the contrast parameters (within a reasonable range), so that the contrast information of the image can be expanded.
In the embodiment of the present application, when the HSV domain augmentation process includes both the saturation adjustment and the contrast adjustment, the saturation adjustment is performed before the contrast adjustment. For example, the saturation adjustment may be performed on the initial training image or the training image subjected to the RGB domain augmentation process to obtain a saturation-adjusted image, and then the contrast adjustment may be performed on the saturation-adjusted image to obtain a training image subjected to the HSV domain augmentation process, which is shown in fig. 5. In this example, the HSV domain augmentation process includes both saturation adjustment and contrast adjustment, which can maximize the training ability of images after HSV domain augmentation process and enhance the generalization of the trained image recognition model to different modules.
In embodiments of the present application, YUV domain augmentation processing may include noise reduction and/or edge enhancement; the parameters of the low-pass filter adopted in the YUV domain noise reduction are obtained by finely adjusting preset parameters according to random variables, and the parameters adopted in the edge enhancement are obtained by finely adjusting the preset parameters according to the random variables. In this embodiment, the initial training image, the training image subjected to the RGB domain amplification process, or the training image subjected to the HSV domain amplification process is denoised by fine-tuning the preset parameters according to the random variable in the YUV domain to randomly generate the parameters of the low-pass filter (within a reasonable range), so that the enhancement of the image denoising effect can be realized. In addition, the preset parameters are finely adjusted according to random variables in a YUV domain, and the parameters of a sharpening algorithm (in a reasonable range) are randomly generated to enhance the edge of the image, so that the image sharpening degree can be increased.
In an embodiment of the present application, when YUV domain augmentation processing includes both noise reduction and edge enhancement, the YUV domain noise reduction is performed before the edge enhancement. For example, the initial training image, the training image subjected to RGB domain augmentation processing, or the training image subjected to HSV domain augmentation processing may be denoised to obtain a denoised image, and then the denoised image may be subjected to edge enhancement to obtain a YUV domain augmented training image, as shown in fig. 6. In this example, the YUV domain augmentation processing includes both noise reduction and edge enhancement, which can maximally improve the training capability of the image after YUV domain augmentation processing, and enhance the generalization of the trained image recognition model to different modules.
The above exemplarily illustrates the ISP augmentation process on the initial training image in the method for training the image recognition model according to the embodiment of the present application. The image recognition model is trained based on the training image subjected to the ISP augmentation processing, and the obtained trained image recognition model has better performance when image recognition is carried out on the images to be recognized shot by different cameras.
In the embodiment of the application, the image recognition model can be trained by combining the initial training image and the training image subjected to the ISP augmentation processing, so that the training capability of the training image set can be further improved, and the image recognition model with higher robustness for images shot by different modules can be obtained.
In the embodiment of the application, the image recognition model can be a face recognition model, and the face recognition model obtained based on the training method can obtain high-precision face recognition results for images shot by different cameras, which has significant beneficial effects in some application scenarios, such as security and law enforcement fields.
Based on the above description, according to the method for training the image recognition model of the embodiment of the present application, by performing ISP augmentation processing on the initial training image, the effect of obtaining training images shot by various different types of cameras based on the simulation of the training images shot by a certain type or a plurality of types of cameras can be achieved, so that the image recognition model obtained based on the image training after the augmentation processing enhances the generalization performance on different modules, and thus the image recognition model has good performance when performing image recognition on images to be recognized shot by different imaging modules.
The above exemplarily illustrates a method for training an image recognition model according to an embodiment of the present application. An image recognition method according to an embodiment of the present application is described below with reference to fig. 7. Fig. 7 shows a schematic flow diagram of an image recognition method 700 according to an embodiment of the application. As shown in fig. 7, an image recognition method 700 according to an embodiment of the present application may include the following steps:
in step S710, an image to be recognized is acquired.
In step S720, image recognition is performed on the image to be recognized based on the trained image recognition model, wherein the training image adopted by the image recognition model during training is a training image obtained by performing an amplification process of image signal processing on an initial training image.
In the embodiment of the present application, the image recognition method 700 performs image recognition on the image to be recognized, and the image recognition model is trained based on the training image obtained by performing the amplification process of the image signal processing on the initial training image, that is, the image recognition method 700 performs image recognition on an image to be recognized, the image recognition model is trained according to the method for training an image recognition model according to the embodiment of the present application, as described above, the method for training the image recognition model according to the embodiment of the present application performs ISP augmentation processing on the initial training image, the effect of simulating the training images shot by various different types of cameras based on the training images shot by one or more types of cameras can be realized, therefore, the image recognition model obtained based on the image training after the augmentation processing can enhance the generalization of different modules. Therefore, by using such an image recognition model, the image recognition method 700 according to the embodiment of the present application can have good performance when performing image recognition on images to be recognized, which are shot by different imaging modules. Those skilled in the art can understand the training method of the image recognition model adopted by the image recognition method 700 in combination with the foregoing description, and therefore, for brevity, the detailed description is omitted here.
An image recognition apparatus provided according to another aspect of the present application, which may be used to perform the method for training an image recognition model and/or the image recognition method according to the embodiments of the present application described above, is described below with reference to fig. 8 to 10. The structure and specific operations of the image recognition device according to the embodiments of the present application can be understood by those skilled in the art with reference to the foregoing descriptions, and for the sake of brevity, specific details are not repeated here, and only some main operations are described.
Fig. 8 shows a schematic block diagram of an image recognition device 800 according to an embodiment of the present application. As shown in fig. 8, the image recognition apparatus 800 includes an augmentation module 810 and a training module 820. The amplification module 810 is configured to obtain an initial training image, and perform amplification processing of image signal processing on the initial training image to obtain a processed training image. The training module 820 is used to train the image recognition model based on the processed training images output by the augmentation module 810.
In an embodiment of the present application, the initial training image acquired by the augmentation module 810 may be an image captured by one or several cameras (imaging devices, also referred to as modules). As described above, if only images captured by a certain number of cameras are used as training image sets to train the image recognition models, the trained image recognition models can obtain good performance when performing image recognition on images to be recognized captured by the several cameras, but actually, the source (module) of the images to be recognized is uncertain during the use of the trained image recognition models, especially when the trained image recognition models are sold as an SDK, and thus the performance of the image recognition models cannot be guaranteed. Therefore, in the embodiment of the present application, the initial training Image obtained by the augmentation module 810 is subjected to augmentation processing of Image Signal Processing (ISP), wherein the ISP augmentation processing is obtained by fine-tuning preset parameters according to random variables, parameters (such as parameters of a color transformation matrix, coefficients of gamma transformation, saturation parameters, contrast parameters, and the like, which will be described in more detail in the following examples) in the ISP Process, which affect the performance of the Image recognition model, the initial training Image is processed based on the parameters (the processing manner depends on the parameters themselves, such as parameters of the color transformation matrix for color information adjustment, coefficients of gamma transformation for gamma transformation, saturation parameters for saturation adjustment, and contrast parameters for contrast adjustment, and the like), and the processed training Image is used by the training module 820 for training Image recognition, the image recognition model can be guaranteed to have good performance when image recognition is carried out on the images to be recognized shot by different imaging modules. This is because the main difference between different modules is the difference of ISP algorithm, which is a process of converting the Raw (Raw) image collected by the image sensor into the RGB image used daily by one module. Therefore, in the embodiment of the application, the ISP augmentation processing is performed on the initial training image by the augmentation module 810, so that the effect of obtaining training images shot by various different types of cameras based on the simulation of the training images shot by one or more types of cameras can be realized, and therefore the generalization of the training module 820 on different modules is enhanced based on the image recognition model obtained by the image training after the augmentation processing, so that the image recognition model has good performance when performing image recognition on images to be recognized shot by different imaging modules.
In an embodiment of the present application, the ISP augmentation process performed by the augmentation module 810 on the initial training image may include at least one of: RGB domain augmentation processing, HSV domain augmentation processing, and YUV domain augmentation processing. Wherein, RGB, HSV and YUV are different color spaces respectively. Accordingly, the RGB domain expansion processing refers to processing in an RGB color space, the HSV domain expansion processing refers to processing in an HSV color space, and the YUV domain expansion processing refers to processing in a YUV color space. Different ISP processing links are generally arranged in different color spaces, at least one ISP processing link is subjected to augmentation processing, ISP augmentation of an initial training image can be achieved, naturally, training capacity of training data can be enhanced by the augmentation processing of a plurality of ISP processing links, and accordingly generalization of a trained image recognition model to different modules is enhanced.
Therefore, in an embodiment of the present application, when the ISP amplification processing performed by the amplification module 810 includes at least two of RGB domain amplification processing, HSV domain amplification processing, and YUV domain amplification processing, the RGB domain amplification processing is performed before the HSV domain amplification processing, and the HSV domain amplification processing is performed before the YUV domain amplification processing, which may simplify the processing procedure and improve the efficiency. For example, the augmentation module 810 may perform RGB domain augmentation on the initial training image to obtain an RGB domain augmented image, and then perform HSV domain augmentation on the RGB domain augmented image to obtain an HSV domain augmented image, which is used as an image for training the image recognition model. For another example, the augmentation module 810 may perform RGB domain augmentation on the initial training image to obtain an RGB domain augmented image, and then perform YUV domain augmentation on the RGB domain augmented image to obtain a YUV domain augmented image, which is used as an image for training the image recognition model. For another example, the augmentation module 810 may perform HSV domain augmentation on the initial training image to obtain an HSV domain augmented image, and then perform YUV domain augmentation on the HSV domain augmented image to obtain a YUV domain augmented image, which is used as an image for training the image recognition model.
In another example, the augmentation module 810 may perform RGB domain augmentation on the initial training image to obtain an RGB domain augmented image, perform HSV domain augmentation on the RGB domain augmented image to obtain an HSV domain augmented image, and perform YUV domain augmentation on the HSV domain augmented image to obtain a YUV domain augmented image, which is used as an image for training the image recognition model. In this example, the ISP expansion process includes RGB domain expansion process, HSV domain expansion process, and YUV domain expansion process, which can improve the training ability of the image after expansion process to the maximum extent, and enhance the generalization of the trained image recognition model to different modules to the maximum extent.
In an embodiment of the present application, the RGB domain augmentation process performed by the augmentation module 810 may include at least one of color information adjustment, Gamma (Gamma) transformation, and random histogram equalization; wherein, the parameters in the Color Correction Matrix (CCM for short, generally 3 × 3 Matrix) and the offset Matrix (generally 3 × 1 Matrix) used in the Color information adjustment are obtained by fine-tuning the preset parameters according to random variables; the gamma coefficient adopted in the gamma transformation is obtained by finely adjusting preset parameters according to random variables; the random histogram equalization refers to determining whether to perform histogram equalization based on a random variable.
In this embodiment, the augmentation module 810 may perform color information adjustment (generally, fine adjustment) on the initial training image by using a color correction matrix and a bias matrix including parameters obtained by fine-adjusting preset parameters according to random variables, so as to obtain training images with different color information, thereby implementing augmentation of image color information. In addition, the augmentation module 810 may perform various random corrections to different degrees of overexposure or underexposure by performing (within a reasonable range) global gamma transformation on the image for the gamma coefficient obtained by fine-tuning the preset parameter according to the random variable, thereby achieving augmentation of the image exposure correction. In addition, some of the different modules may perform histogram equalization, and some of the different modules may not perform histogram equalization, so that the augmentation module 810 may implement augmentation of whether the image has a histogram equalization effect or not by performing random histogram equalization, that is, by controlling whether the histogram equalization is performed or not by using a random variable (such as a value of 1 or 0).
In the embodiment of the present application, when the RGB domain expansion processing performed by the expansion module 810 includes at least two of color information adjustment, gamma transformation, and random histogram equalization, the color information adjustment is performed before the gamma transformation, and the gamma transformation is performed before the random histogram equalization, so that a better expansion effect can be obtained. For example, the augmentation module 810 may perform color information adjustment on the initial training image to obtain an image after color information adjustment, and then perform gamma conversion on the image after color information adjustment to obtain an image after gamma conversion, which is used as an image subjected to RGB domain augmentation processing. For another example, the augmentation module 810 may first perform color information adjustment on the initial training image to obtain an image after color information adjustment, and then perform random histogram equalization on the image after color information adjustment to obtain an image after random histogram equalization, which is used as an image subjected to RGB domain augmentation processing. For another example, the augmentation module 810 may perform gamma transformation on the initial training image to obtain an image after the gamma transformation, and then perform random histogram equalization on the image after the gamma transformation to obtain an image after the random histogram equalization, which is used as the image subjected to the RGB domain augmentation processing.
In another example, the augmentation module 810 may perform color information adjustment on the initial training image to obtain an image after color information adjustment, perform gamma conversion on the image after color information adjustment to obtain an image after gamma conversion, and perform random histogram equalization on the image after gamma conversion to obtain an image after random histogram equalization, which is used as the image subjected to RGB domain augmentation processing. In this example, the RGB domain augmentation processing performed by the augmentation module 810 includes color information adjustment, gamma conversion, and random histogram equalization, which can improve the training capability of the image after RGB domain augmentation processing to the maximum extent and enhance the generalization of the trained image recognition model to different modules.
In an embodiment of the present application, the HSV domain augmentation process performed by the augmentation module 810 may include saturation adjustment and/or contrast adjustment; the saturation parameter used in the saturation adjustment is obtained by finely adjusting a preset parameter according to a random variable, and the contrast parameter used in the contrast adjustment is obtained by finely adjusting the preset parameter according to the random variable. In the ISP processing process of different modules, the contrast and saturation are often adjusted according to the aesthetic of the ISP adjuster, and the saturation and contrast styles of different modules are different. Therefore, in this embodiment, the augmentation module 810 may implement augmentation of the image saturation information by performing fine adjustment on the preset parameters according to the random variable in the HSV domain to randomly generate (within a reasonable range) saturation parameters to perform saturation adjustment on the initial training image or the training image subjected to RGB domain augmentation processing. In addition, the augmentation module 810 may implement augmentation of image contrast information by fine-tuning the preset parameters according to the random variables in the HSV domain and randomly generating (within a reasonable range) contrast parameters to perform contrast adjustment on the image.
In an embodiment of the present application, when the HSV domain augmentation process by the augmentation module 810 includes both saturation adjustment and contrast adjustment, the saturation adjustment is performed prior to the contrast adjustment. For example, the augmentation module 810 may first perform saturation adjustment on the initial training image or the training image subjected to RGB domain augmentation to obtain a saturation-adjusted image, and then perform contrast adjustment on the saturation-adjusted image to obtain a training image subjected to HSV domain augmentation. In this example, the HSV domain augmentation processing performed by the augmentation module 810 includes both saturation adjustment and contrast adjustment, which can improve the training capability of images after HSV domain augmentation processing to the maximum extent and enhance the generalization of the trained image recognition model to different modules.
In an embodiment of the present application, the YUV domain augmentation processing performed by the augmentation module 810 may include noise reduction and/or edge enhancement; the parameters of the low-pass filter adopted in the YUV domain noise reduction are obtained by finely adjusting preset parameters according to random variables, and the parameters adopted in the edge enhancement are obtained by finely adjusting the preset parameters according to the random variables. In this embodiment, the augmentation module 810 may implement augmentation of the image noise reduction effect by performing fine tuning on the preset parameters according to the random variable in the YUV domain to randomly generate (within a reasonable range) parameters of the low-pass filter to reduce noise of the initial training image, the RGB-domain augmented training image, or the HSV-domain augmented training image. In addition, the augmentation module 810 may implement the augmentation of the image sharpening degree by performing fine tuning on the preset parameters according to the random variable in the YUV domain and randomly generating parameters of a sharpening algorithm (within a reasonable range) to perform edge enhancement on the image.
In an embodiment of the present application, when the YUV domain augmentation processing by the augmentation module 810 includes both noise reduction and edge enhancement, the YUV domain noise reduction is performed before the edge enhancement. For example, the augmentation module 810 may perform noise reduction on the initial training image or the training image subjected to RGB domain augmentation processing or the training image subjected to HSV domain augmentation processing to obtain a noise-reduced image, and then perform edge enhancement on the noise-reduced image to obtain a YUV domain augmented training image. In this example, the YUV domain augmentation processing performed by the augmentation module 810 includes both noise reduction and edge enhancement, which can improve the training capability of the image after YUV domain augmentation processing to the maximum extent, and enhance the generalization of the trained image recognition model to different modules.
The above exemplarily shows the ISP augmentation process of the initial training image by the augmentation module 810 in the image recognition apparatus according to the embodiment of the present application. The training module 820 trains the image recognition model based on the training image after the ISP augmentation processing, and the obtained trained image recognition model has better performance when performing image recognition on images to be recognized shot by different cameras.
In the embodiment of the present application, the training module 820 may further train an image recognition model by combining the initial training image and the training image subjected to the ISP amplification processing, which may further improve the training capability of the training image set, and obtain an image recognition model with higher robustness for images captured by different modules.
Based on the above description, according to the image recognition device of the embodiment of the present application, the ISP amplification processing is performed on the initial training image through the amplification module, so that the effect of obtaining training images shot by various different types of cameras based on the simulation of the training images shot by a certain type or a plurality of types of cameras can be realized, and therefore, the generalization of the training module on different modules is enhanced by the image recognition model obtained based on the image training after the amplification processing, so that the image recognition model has good performance when performing image recognition on images to be recognized shot by different imaging modules.
Fig. 9 shows a schematic block diagram of an image recognition apparatus 900 according to another embodiment of the present application. As shown in fig. 9, the image recognition apparatus 900 may include an acquisition module 910 and a recognition module 920. The obtaining module 910 is configured to obtain an image to be identified. The recognition module 920 is configured to perform image recognition on the image to be recognized based on the trained image recognition model, where the image recognition model is trained based on the method for training an image recognition model according to the embodiment of the present application. The image recognition apparatus 900 according to the embodiment of the present application may perform the image recognition method 700 according to the embodiment of the present application described previously. The detailed operation of the image recognition apparatus 900 according to the embodiment of the present application can be understood by those skilled in the art in combination with the foregoing description, and for brevity, the detailed description is omitted here.
Fig. 10 shows a schematic block diagram of an image recognition apparatus 1000 according to yet another embodiment of the present application. As shown in fig. 10, the image recognition apparatus 1000 according to the embodiment of the present application may include a memory 1010 and a processor 1020, where the memory 1010 stores a computer program executed by the processor 1020, and the computer program, when executed by the processor 1020, causes the processor 1020 to execute the method for training the image recognition model or the image recognition method according to the embodiment of the present application. The detailed operation of the image recognition apparatus 1000 according to the embodiment of the present application can be understood by those skilled in the art with reference to the foregoing description, and for the sake of brevity, specific details are not repeated herein, and only some main operations of the processor 1020 are described.
In one embodiment of the application, the computer program, when executed by the processor 1020, causes the processor 1020 to perform the steps of: acquiring an initial training image, and performing amplification processing of image signal processing on the initial training image to obtain a processed training image; training an image recognition model based on the processed training images.
In one embodiment of the present application, the computer program, when executed by the processor 1020, causes the processor 1020 to perform the augmented processing of image signal processing on the initial training image, including: performing at least one of the following on the initial training image: RGB domain augmentation processing, HSV domain augmentation processing, and YUV domain augmentation processing.
In one embodiment of the application, the computer program, when executed by the processor 1020, causes the processor 1020 to perform RGB domain augmentation processing including at least one of color information adjustment, gamma transformation, and random histogram equalization; the parameters in the color correction matrix and the offset matrix adopted in the color information adjustment are obtained by finely adjusting preset parameters according to random variables; the gamma coefficient adopted in the gamma transformation is obtained by finely adjusting preset parameters according to random variables; the random histogram equalization refers to determining whether to perform histogram equalization based on a random variable.
In one embodiment of the application, the computer program, when executed by the processor 1020, causes the processor 1020 to perform HSV domain augmentation processing including saturation adjustment and/or contrast adjustment; the saturation parameter used in the saturation adjustment is obtained by finely adjusting a preset parameter according to a random variable, and the contrast parameter used in the contrast adjustment is obtained by finely adjusting the preset parameter according to the random variable.
In an embodiment of the application, the computer program, when executed by the processor 1020, causes the processor 1020 to perform YUV domain augmentation processing including noise reduction and/or edge enhancement; the parameters of the low-pass filter adopted in the YUV domain noise reduction are obtained by finely adjusting preset parameters according to random variables, and the parameters adopted in the edge enhancement are obtained by finely adjusting the preset parameters according to the random variables.
In one embodiment of the application, the image recognition model is a face recognition model.
In one embodiment of the application, the computer program, when executed by the processor 1020, further causes the processor 1020 to perform the steps of: acquiring an image to be identified; and carrying out image recognition on the image to be recognized based on a trained image recognition model, wherein the image recognition model is obtained by training based on the method for training the image recognition model.
Furthermore, according to an embodiment of the present application, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used to execute the method for training an image recognition model or the corresponding steps of the image recognition method of the embodiment of the present application. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.
Based on the above description, according to the method for training the image recognition model, the image recognition method and the image recognition device of the embodiment of the application, the ISP augmentation processing is performed on the initial training image, so that the effect of obtaining training images shot by various different types of cameras based on the simulation of the training images shot by a certain type or several types of cameras can be realized, and therefore, the generalization performance of the image recognition model on different modules is enhanced on the basis of the image recognition model obtained through the image training after the augmentation processing, so that the image recognition model has good performance when performing image recognition on images to be recognized shot by different imaging modules.
Although the example embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above-described example embodiments are merely illustrative and are not intended to limit the scope of the present application thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present application. All such changes and modifications are intended to be included within the scope of the present application as claimed in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another device, or some features may be omitted, or not executed.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the method of the present application should not be construed to reflect the intent: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some of the modules according to embodiments of the present application. The present application may also be embodied as apparatus programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The above description is only for the specific embodiments of the present application or the description thereof, and the protection scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope disclosed in the present application, and shall be covered by the protection scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.