Image classification method and device, computer readable storage medium and equipment

文档序号:9318 发布日期:2021-09-17 浏览:93次 中文

1. An image classification method, comprising:

acquiring an image to be processed;

zooming the image to be processed to obtain input images with different sizes;

inputting the input image into a pre-trained image classification model based on a convolutional neural network to classify the input image to obtain the classification probability of the input image, so as to obtain the image class to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, each classifier with one scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes; after the image to be processed is scaled, the method further comprises the following steps:

screening the images to be processed according to the definition of the images, and taking the images corresponding to the definition exceeding the definition threshold as the images to be classified; the method for acquiring the image definition of the image to be processed comprises the following steps:

acquiring a gray level image of an image to be processed;

dividing the gray level image of the image to be processed into a plurality of image blocks;

calculating the definition of each image block;

and calculating the average value of the definition of the image block, and taking the average value as the definition of the image to be processed.

2. The image classification method according to claim 1, wherein when calculating the classification probability of the image to be classified, a corresponding weight is set for the classification result of the classifier of each scale, and the input image is classified according to the corresponding weight and the classification probability of the input image corresponding to the classifier of each scale.

3. The image classification method according to claim 1, characterized in that the image to be processed is scaled by using a bilinear interpolation algorithm or a bicubic interpolation algorithm.

4. An image classification apparatus, comprising:

the image acquisition module is used for acquiring an image to be processed;

the image preprocessing module is used for preprocessing the image to be processed to obtain input images with different sizes;

the image classification module is used for inputting the input image into a pre-trained image classification model based on a convolutional neural network to classify the input image so as to obtain the classification probability of the input image and further obtain the image class to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, each classifier with one scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes;

after the image to be processed is scaled, the method further comprises the following steps:

screening the images to be processed according to the definition of the images, and taking the images corresponding to the definition exceeding the definition threshold as the images to be classified; the method for acquiring the image definition of the image to be processed comprises the following steps:

acquiring a gray level image of an image to be processed;

dividing the gray level image of the image to be processed into a plurality of image blocks;

calculating the definition of each image block;

and calculating the average value of the definition of the image block, and taking the average value as the definition of the image to be processed.

5. The image classification device according to claim 4, wherein when calculating the classification probability of the image to be classified, a corresponding weight is set for the classification result of the classifier of each scale, and the input image is classified according to the corresponding weight and the classification probability of the input image corresponding to the classifier of each scale.

6. The image classification device according to claim 4, characterized in that the image to be processed is scaled by a bilinear interpolation algorithm or a bicubic interpolation algorithm.

7. An image classification device comprising a processor coupled to a memory, the memory storing program instructions that, when executed by the processor, implement the method of any of claims 1 to 3.

8. A computer-readable storage medium, characterized by comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 3.

Background

The medical image is an important basis for recording the diagnosis condition of a patient, and the acquisition of the high-quality medical image has important significance for medical diagnosis. In recent years, with the rapid development of deep learning technology, convolutional neural networks have become a research hotspot of many scholars, and are successfully applied to the fields of image classification, target detection, pedestrian detection and the like. However, in the existing research, a single-scale classifier is used for image classification, so that the classification of different types of samples has uncertainty, and the classification accuracy needs to be further improved.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an image classification method, apparatus, computer-readable storage medium and device, which are used to solve the problems of the prior art.

To achieve the above and other related objects, the present invention provides an image classification method, including:

acquiring an image to be processed;

zooming the image to be processed to obtain input images with different sizes;

inputting the input image into a pre-trained image classification model based on a convolutional neural network to classify the input image to obtain the classification probability of the input image, so as to obtain the image class to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, each classifier with one scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes; after the image to be processed is scaled, the method further comprises the following steps:

screening the images to be processed according to the definition of the images, and taking the images corresponding to the definition exceeding the definition threshold as the images to be classified; the method for acquiring the image definition of the image to be processed comprises the following steps:

acquiring a gray level image of an image to be processed;

dividing the gray level image of the image to be processed into a plurality of image blocks;

calculating the definition of each image block;

and calculating the average value of the definition of the image block, and taking the average value as the definition of the image to be processed.

Optionally, when the classification probability of the image to be classified is calculated, a corresponding weight is set for the classification result of the classifier of each scale, and the input image is classified according to the corresponding weight and the classification probability of the input image corresponding to the classifier of each scale.

Optionally, a bilinear interpolation algorithm or a bicubic interpolation algorithm is used to perform scaling processing on the image to be processed.

To achieve the above and other related objects, the present invention provides an image classification apparatus, comprising:

the image acquisition module is used for acquiring an image to be processed;

the image preprocessing module is used for preprocessing the image to be processed to obtain input images with different sizes;

the image classification module is used for inputting the input image into a pre-trained image classification model based on a convolutional neural network to classify the input image so as to obtain the classification probability of the input image and further obtain the image class to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, each classifier with one scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes;

after the image to be processed is scaled, the method further comprises the following steps:

screening the images to be processed according to the definition of the images, and taking the images corresponding to the definition exceeding the definition threshold as the images to be classified; the method for acquiring the image definition of the image to be processed comprises the following steps:

acquiring a gray level image of an image to be processed;

dividing the gray level image of the image to be processed into a plurality of image blocks;

calculating the definition of each image block;

and calculating the average value of the definition of the image block, and taking the average value as the definition of the image to be processed.

Optionally, when the classification probability of the image to be classified is calculated, a corresponding weight is set for the classification result of the classifier of each scale, and the input image is classified according to the corresponding weight and the classification probability of the input image corresponding to the classifier of each scale.

Optionally, a bilinear interpolation algorithm or a bicubic interpolation algorithm is used to perform scaling processing on the image to be processed.

To achieve the above and other related objects, the present invention provides an image classification apparatus comprising a processor coupled to a memory, the memory storing program instructions, the method being implemented when the program instructions stored in the memory are executed by the processor.

To achieve the above and other related objects, the present invention provides a computer-readable storage medium including a program which, when run on a computer, causes the computer to execute the method described above

As described above, the image classification method, apparatus, computer-readable storage medium and device provided by the present invention have the following advantages:

the invention discloses an image classification method, which comprises the following steps: acquiring an image to be processed; zooming the image to be processed to obtain input images with different sizes; inputting the input image into a pre-trained image classification model to classify the input image to obtain the classification probability of the input image so as to obtain the image category to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, the classifier of each scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes. The invention classifies the images through the multi-scale neural network, ensures the classification robustness of the multi-scale input images and has remarkable advantages in the aspect of image classification.

Drawings

FIG. 1 is a flowchart illustrating an image classification method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating bilinear interpolation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of bicubic interpolation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the BiCubic function according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating an image sharpness obtaining method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a first CNN according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a second CNN according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a third CNN according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an image classification apparatus according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

As shown in fig. 1, an embodiment of the present application provides an image classification method, including:

s11, acquiring an image to be processed;

s12, zooming the image to be processed to obtain input images with different sizes;

s13, inputting the input image into a pre-trained image classification model to classify the input image to obtain the classification probability of the input image, so as to obtain the image class to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, the classifier of each scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes.

The invention classifies the images through the multi-scale neural network, ensures the classification robustness of the multi-scale input images and has remarkable advantages in the aspect of image classification. The image may be a medical image, such as a CT (Computed Tomography), MRI (Magnetic Resonance Imaging), PET (Positron Emission Tomography), SPECT (Single-Photon Emission Computed Tomography), and ultrasound image.

In an embodiment, a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, or a bicubic interpolation algorithm is adopted to perform scaling processing on the image to be processed.

Mathematically, bilinear interpolation is linear interpolation extension of an interpolation function with two variables, and the core idea is to perform linear interpolation in two directions respectively.

As shown in fig. 2, the pixel values of four pixels, i.e., 4 points, Q11(x1, y1), Q12(x1, y2), Q21(x2, y1), and Q22(x2, y2), are known, and the value of the P (x, y) point is obtained by bilinear interpolation.

Assuming that P = (0.6 ), then the corresponding nearest four pixel points at this time are Q11 = (0,1), Q12 = (0,0), Q21 = (1,1) and Q22= (1,0), then the pixel value of P = (0.6 ) point is according to its distance correlation with these points, and the influence of the pixel value of the closer point is greater, the calculation process is as follows:

firstly, the pixel value of the R1= (x, y1) pixel point is calculated according to the pixel values of two pixel points Q11 = (0,1) and Q21 = (1,1), and then the pixel value of the R2= (x, y2) pixel point is calculated according to the pixel values of two pixel points Q12 = (0,0) and Q22= (1, 0).

As can be seen from the above example, as the pixel point P = (x, y) is closer to the x1 point on the x axis, the pixel values corresponding to the pixel points Q11 = (0,1) and Q12 = (0,0) are heavier; otherwise, the pixel values corresponding to Q21 = (1,1) and Q22= (1,0) are heavier in weight

Then, the pixel values of the two pixels, i.e., R1= (x, y1) and R2= (x, y2), obtained in the previous step are calculated to obtain the pixel value of the pixel, i.e., P = (x, y).

As can be seen from this equation, on the y axis, if P = (x, y) is closer to the y1 point on the y axis, the proportion occupied by the pixel values of the R1= (x, y1) pixel points obtained from Q11 = (0,1) and Q21 = (1,1) is heavier; otherwise, the proportion of the pixel value of the R2= (x, y2) pixel is heavier. This can result in the corresponding pixel value of P = (x, y).

Bicubic interpolation (English) is the most commonly used interpolation method in two-dimensional space. In this method, the value of the function f at point (x, y) can be obtained by a weighted average of the nearest sixteen sample points in a rectangular grid, where two polynomial interpolation cubic functions are required, one for each direction.

As shown in fig. 3, it is assumed that the source image a has a size of M × N, and the target image B scaled by K times has a size of M × N, i.e., K = M/M. Each pixel point of A is known, B is unknown, and in order to calculate the value of each pixel point (X, Y) in the target image B, the corresponding pixel (X, Y) of the pixel (X, Y) in the source image A needs to be found out, then the 16 pixel points closest to the pixel (X, Y) of the source image A are used as parameters for calculating the pixel value of the target image B (X, Y), the weight of the 16 pixel points is calculated by using the BiCubic basis function, and the value of the pixel (X, Y) of the image B is equal to the weighted superposition of the 16 pixel points.

According to the proportional relation X/X = M/M =1/K, the corresponding coordinate of B (X, Y) on a is a (X, Y) = a (X × (M/M), Y × (N/N)) = a (X/K, Y/K). As shown in fig. 3, the point P is the position of the target image B in (X, Y) corresponding to the source image a, and the coordinate position of P will have a fractional part, so it is assumed that the coordinate of P is P (X + u, Y + v), where X, Y respectively represent an integer part and u, v respectively represent a fractional part (the distance from the blue point to the red point in the a11 square). The position of the nearest 16 pixels as shown in fig. 3, here denoted by a (i, j) (i, j =0,1,2,3), is obtained.

Constructing a BiCubic function

Wherein, a = -0.5. Fig. 4 shows the shape of the BiCubic function, and in the present application, the parameter x in the BiCubic function is obtained to obtain the weights w (x) corresponding to the 16 pixels. The BiCubic basis function is one-dimensional and the pixels are two-dimensional, so the rows and columns of pixels are calculated separately. The parameter X in the BiCubic function represents the distance from the pixel point to the P point, for example, the distance from a00 to P (X + u, Y + v) is (1+ u,1+ v), so the abscissa weight i _0= W (1+ u) and the ordinate weight j _0= W (1+ v) of a00, and the contribution value of a00 to B (X, Y) is: (a00 pixel values) i _0 j _ 0. Therefore, the abscissa weights of a0X are W (1+ u), W (1-u), W (2-u), respectively; the ordinate weights of ay0 are respectively W (1+ v), W (1-v) and W (2-v); the B (X, Y) pixel values are:

in an embodiment, after performing scaling processing on the image to be processed, the method further includes:

and screening the images to be processed according to the definition of the images, and taking the images with the definition exceeding the definition threshold value as the images to be classified. And then classifying the screened images through an image classification model. And taking the image with the definition exceeding the definition threshold value as the image to be classified. The definition threshold value can be set according to experience as a standard for judging whether the definition of the image to be processed meets the requirement or not. If the definition of the image to be processed is greater than or equal to the definition threshold, the definition of the image to be processed meets the required standard; and if the definition of the image to be processed is smaller than the definition threshold, the definition of the image to be processed does not meet the required standard.

In an embodiment, as shown in fig. 5, the method for acquiring the image sharpness of the image to be processed includes:

s51, acquiring a gray image of the image to be processed;

if the image to be processed is a gray image, the image to be processed is the gray image corresponding to the image to be processed; if the image to be processed is not a grayscale image, the grayscale value of each pixel in the image to be processed can be calculated, and then the grayscale image corresponding to the image to be processed is generated.

S52, dividing the gray image of the image to be processed into a plurality of image blocks;

after the size of the grayscale image corresponding to the image to be processed is adjusted to the preset size, the grayscale image corresponding to the image to be processed may be divided into a plurality of sub image blocks. For example, the grayscale image corresponding to the image to be processed may be divided into 5 × 5 sub image blocks, and each sub image block has a size of 100 × 100.

S53, calculating the definition of each image block;

s54, calculating the average value of the definition of the image block, and taking the average value as the definition of the image to be processed.

After the definition of the image to be processed is obtained through calculation, the image to be processed can be screened according to the definition threshold, and the image to be processed with the definition exceeding the definition threshold is input into the subsequent steps as the image to be classified for classification.

In an embodiment, the classifier is a convolutional neural network, and the image classification model of the present application includes a plurality of convolutional neural networks with different scales.

Convolutional Neural Networks (CNN) are a class of feed forward Neural Networks (fed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep learning). The convolutional neural network has a representation learning (representation learning) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to a hierarchical structure of the convolutional neural network.

The convolutional neural network is constructed by imitating a visual perception (visual perception) mechanism of a living being, can perform supervised learning and unsupervised learning, and has the advantages that the convolutional neural network can learn grid-like topologic features such as pixels and audio with small calculation amount, has stable effect and has no additional feature engineering (feature engineering) requirement on data due to the fact that convolutional kernel parameter sharing in an implicit layer and sparsity of connection between layers.

The Convolutional Neural Network (CNN) is a multilayer perceptron, and the number of network training parameters can be reduced by using a local connection and weight sharing mode. A single-scale CNN requires input of a fixed-size image when performing an image classification task, and the size of the image in an actual environment tends to be diversified. In the convolution process, the network receptive field accounts for different image blocks with different sizes, the characteristic information obtained after the convolution operation is carried out on the image blocks with the large sizes is richer and can better reflect the local characteristics of the image, and the characteristics obtained after the convolution operation is carried out on the image blocks with the small sizes can better reflect the global characteristics of the image. Therefore, the invention utilizes CNNs with different scales to extract the characteristics of input images with different sizes, can obtain more image characteristics, classifies the input images through a Softmax function and outputs the classification probability value.

In a specific embodiment, the image classification model includes 3 CNNs with different scales, which are respectively the first CNN, the second CNN, and the third CNN, although according to actual needs and performance requirements, CNNs with more scales may be set, and the specific scale is not limited herein.

In the present application, CNNs of three sizes are provided, and a small-size image, a medium-size image, and a large-size image can be overlaid.

First CNN: as shown in fig. 6, the first CNN has 4 convolutional layers, 2 pooling layers, 2 fully-connected layers, 1 Softmax layer, convolutional layer step size of 1, and pooling layer step size of 2. The activation functions used by the convolutional layer and the fully-connected layer are modified Linear unit (ReLU) functions.

A second CNN: as shown in fig. 7, the second CNN has 4 convolutional layers, 4 pooling layers, 1 fully-connected layer, 1 Softmax layer, convolutional layer step size of 1, and pooling layer step size of 2. The activation function used by the full connectivity layer is a Sigmoid function.

Third CNN: as shown in fig. 8, the third CNN network has 5 convolutional layers, 3 pooling layers, 2 fully-connected layers, and 1 Softmax layer. The activation function used by the convolutional layer and the fully-connected layer is a ReLU function.

In the invention, the pre-trained image classification model is composed of a plurality of classifiers with different scales, so when the classification probability of the image to be classified is calculated, the corresponding weight is set for the classification result of the classifier with each scale, and the image to be classified is classified according to the corresponding weight and the classification probability of the image to be classified corresponding to the classifier with each scale.

In this application, when 3 Softmax functions of the image classification model are output in parallel, the probability output matrix of each input image can be obtained as follows:

wherein the content of the first and second substances,nthe number of the image types is represented, p n1(x) Representing imagesxAfter being classified by the first CNN, belong to the second CNNnA classification probability of a class;p n2(x) Representing imagesxAfter the second CNN classification belongs to the secondnA classification probability of a class;p n3(x) Representing imagesxAfter the third CNN classification, belongs to the secondnThe classification probability of a class. For each row, the column index with the highest classification probability is the Softmax function of each classifier for the samplexThe prediction category of (1).

In one embodiment, entropy is determined by introducing informationH i (x) To characterize input samplesxIs covered withiThe uncertainty of the Softmax function classification of each classifier.

Wherein the content of the first and second substances,p ij (x) The Softmax function representing the ith sub-network inputs the samplesxIs judged to belong tojClass probability, if the larger the value of the information entropy of the Softmax function of a certain classifier is, the higher the uncertainty of the classification is, then the Softmax function of the network is applied to the input samplesxThe less the classification capability of the classifier, the less the weight of the Softmax function of the classifier on the input samples, and vice versa. Therefore, the weight calculation formula of the Softmax function of the multi-scale CNN is:

after the weight of the multi-scale CNN is obtained, a probability output matrixP(x) Multiplying each row by the weight to obtain a new probability output matrix

Will be provided withWeighted and summed according to columns, the maximum value after weighted and summed is the final weight,

the invention considers the characteristics that the classification performances of the Softmax functions of different classifiers are different for the same input image and the classification performances of the Softmax functions of the same classifier are different for different input images, adaptively endows more reasonable weights for different input images, and solves the problem of misclassification caused by too close probability values output by the Softmax functions of a single-scale CNN.

As shown in fig. 9, an embodiment of the present application provides an image classification apparatus, including:

the image acquisition module is used for acquiring an image to be processed;

the image preprocessing module is used for preprocessing the image to be processed to obtain input images with different sizes;

the image classification module is used for inputting the input image into a pre-trained image classification model to classify the input image to obtain the classification probability of the input image so as to obtain the image class to which the image belongs; the image classification model is composed of a plurality of classifiers with different scales, the classifier of each scale corresponds to an input image with one size, and different classifiers correspond to input images with different sizes.

The invention classifies the images through the multi-scale neural network, ensures the classification robustness of the multi-scale input images and has remarkable advantages in the aspect of image classification. The image may be a medical image, such as a CT (Computed Tomography), MRI (Magnetic Resonance Imaging), PET (Positron Emission Tomography), SPECT (Single-Photon Emission Computed Tomography), and ultrasound image.

In an embodiment, a nearest neighbor interpolation algorithm, a bilinear interpolation algorithm, or a bicubic interpolation algorithm is adopted to perform scaling processing on the image to be processed.

Mathematically, bilinear interpolation is linear interpolation extension of an interpolation function with two variables, and the core idea is to perform linear interpolation in two directions respectively.

As shown in fig. 2, the pixel values of four pixels, i.e., 4 points, Q11(x1, y1), Q12(x1, y2), Q21(x2, y1), and Q22(x2, y2), are known, and the value of the P (x, y) point is obtained by bilinear interpolation.

Assuming that P = (0.6 ), then the corresponding nearest four pixel points at this time are Q11 = (0,1), Q12 = (0,0), Q21 = (1,1) and Q22= (1,0), then the pixel value of P = (0.6 ) point is according to its distance correlation with these points, and the influence of the pixel value of the closer point is greater, the calculation process is as follows:

firstly, the pixel value of the R1= (x, y1) pixel point is calculated according to the pixel values of two pixel points Q11 = (0,1) and Q21 = (1,1), and then the pixel value of the R2= (x, y2) pixel point is calculated according to the pixel values of two pixel points Q12 = (0,0) and Q22= (1, 0).

As can be seen from the above example, as the pixel point P = (x, y) is closer to the x1 point on the x axis, the pixel values corresponding to the pixel points Q11 = (0,1) and Q12 = (0,0) are heavier; otherwise, the pixel values corresponding to Q21 = (1,1) and Q22= (1,0) are heavier in weight

Then, the pixel values of the two pixels, i.e., R1= (x, y1) and R2= (x, y2), obtained in the previous step are calculated to obtain the pixel value of the pixel, i.e., P = (x, y).

As can be seen from this equation, on the y axis, if P = (x, y) is closer to the y1 point on the y axis, the proportion occupied by the pixel values of the R1= (x, y1) pixel points obtained from Q11 = (0,1) and Q21 = (1,1) is heavier; otherwise, the proportion of the pixel value of the R2= (x, y2) pixel is heavier. This can result in the corresponding pixel value of P = (x, y).

Bicubic interpolation (English) is the most commonly used interpolation method in two-dimensional space. In this method, the value of the function f at point (x, y) can be obtained by a weighted average of the nearest sixteen sample points in a rectangular grid, where two polynomial interpolation cubic functions are required, one for each direction.

As shown in fig. 3, it is assumed that the source image a has a size of M × N, and the target image B scaled by K times has a size of M × N, i.e., K = M/M. Each pixel point of A is known, B is unknown, and in order to calculate the value of each pixel point (X, Y) in the target image B, the corresponding pixel (X, Y) of the pixel (X, Y) in the source image A needs to be found out, then the 16 pixel points closest to the pixel (X, Y) of the source image A are used as parameters for calculating the pixel value of the target image B (X, Y), the weight of the 16 pixel points is calculated by using the BiCubic basis function, and the value of the pixel (X, Y) of the image B is equal to the weighted superposition of the 16 pixel points.

According to the proportional relation X/X = M/M =1/K, the corresponding coordinate of B (X, Y) on a is a (X, Y) = a (X × (M/M), Y × (N/N)) = a (X/K, Y/K). As shown in fig. 3, the point P is the position of the target image B in (X, Y) corresponding to the source image a, and the coordinate position of P will have a fractional part, so it is assumed that the coordinate of P is P (X + u, Y + v), where X, Y respectively represent an integer part and u, v respectively represent a fractional part (the distance from the blue point to the red point in the a11 square). The position of the nearest 16 pixels as shown in fig. 3, here denoted by a (i, j) (i, j =0,1,2,3), is obtained.

Constructing a BiCubic function

Wherein, a = -0.5. Fig. 4 shows the shape of the BiCubic function, and in the present application, the parameter x in the BiCubic function is obtained to obtain the weights w (x) corresponding to the 16 pixels. The BiCubic basis function is one-dimensional and the pixels are two-dimensional, so the rows and columns of pixels are calculated separately. The parameter X in the BiCubic function represents the distance from the pixel point to the P point, for example, the distance from a00 to P (X + u, Y + v) is (1+ u,1+ v), so the abscissa weight i _0= W (1+ u) and the ordinate weight j _0= W (1+ v) of a00, and the contribution value of a00 to B (X, Y) is: (a00 pixel values) i _0 j _ 0. Therefore, the abscissa weights of a0X are W (1+ u), W (1-u), W (2-u), respectively; the ordinate weights of ay0 are respectively W (1+ v), W (1-v) and W (2-v); the B (X, Y) pixel values are:

in an embodiment, after performing scaling processing on the image to be processed, the method further includes:

and screening the images to be processed according to the definition of the images, and taking the images with the definition exceeding the definition threshold value as the images to be classified. And then classifying the screened images through an image classification model. And taking the image with the definition exceeding the definition threshold value as the image to be classified. The definition threshold value can be set according to experience as a standard for judging whether the definition of the image to be processed meets the requirement or not. If the definition of the image to be processed is greater than or equal to the definition threshold, the definition of the image to be processed meets the required standard; and if the definition of the image to be processed is smaller than the definition threshold, the definition of the image to be processed does not meet the required standard.

In an embodiment, as shown in fig. 5, the method for acquiring the image sharpness of the image to be processed includes:

s51, acquiring a gray image of the image to be processed;

if the image to be processed is a gray image, the image to be processed is the gray image corresponding to the image to be processed; if the image to be processed is not a grayscale image, the grayscale value of each pixel in the image to be processed can be calculated, and then the grayscale image corresponding to the image to be processed is generated.

S52, dividing the gray image of the image to be processed into a plurality of image blocks;

after the size of the grayscale image corresponding to the image to be processed is adjusted to the preset size, the grayscale image corresponding to the image to be processed may be divided into a plurality of sub image blocks. For example, the grayscale image corresponding to the image to be processed may be divided into 5 × 5 sub image blocks, and each sub image block has a size of 100 × 100.

S53, calculating the definition of each image block;

s54, calculating the average value of the definition of the image block, and taking the average value as the definition of the image to be processed.

After the definition of the image to be processed is obtained through calculation, the image to be processed can be screened according to the definition threshold, and the image to be processed with the definition exceeding the definition threshold is input into the subsequent steps as the image to be classified for classification.

In an embodiment, the classifier is a convolutional neural network, and the image classification model of the present application includes a plurality of convolutional neural networks with different scales.

Convolutional Neural Networks (CNN) are a class of feed forward Neural Networks (fed forward Neural Networks) that contain convolution computations and have a deep structure, and are one of the representative algorithms for deep learning (deep learning). The convolutional neural network has a representation learning (representation learning) capability, and can perform shift-invariant classification (shift-invariant classification) on input information according to a hierarchical structure of the convolutional neural network.

The convolutional neural network is constructed by imitating a visual perception (visual perception) mechanism of a living being, can perform supervised learning and unsupervised learning, and has the advantages that the convolutional neural network can learn grid-like topologic features such as pixels and audio with small calculation amount, has stable effect and has no additional feature engineering (feature engineering) requirement on data due to the fact that convolutional kernel parameter sharing in an implicit layer and sparsity of connection between layers.

The Convolutional Neural Network (CNN) is a multilayer perceptron, and the number of network training parameters can be reduced by using a local connection and weight sharing mode. A single-scale CNN requires input of a fixed-size image when performing an image classification task, and the size of the image in an actual environment tends to be diversified. In the convolution process, the network receptive field accounts for different image blocks with different sizes, the characteristic information obtained after the convolution operation is carried out on the image blocks with the large sizes is richer and can better reflect the local characteristics of the image, and the characteristics obtained after the convolution operation is carried out on the image blocks with the small sizes can better reflect the global characteristics of the image. Therefore, the invention utilizes CNNs with different scales to extract the characteristics of input images with different sizes, can obtain more image characteristics, classifies the input images through a Softmax function and outputs the classification probability value.

In a specific embodiment, the image classification model includes 3 CNNs with different scales, which are respectively the first CNN, the second CNN, and the third CNN, although according to actual needs and performance requirements, CNNs with more scales may be set, and the specific scale is not limited herein.

In the present application, CNNs of three sizes are provided, and a small-size image, a medium-size image, and a large-size image can be overlaid.

First CNN: as shown in fig. 6, the first CNN has 4 convolutional layers, 2 pooling layers, 2 fully-connected layers, 1 Softmax layer, convolutional layer step size of 1, and pooling layer step size of 2. The activation functions used by the convolutional layer and the fully-connected layer are modified Linear unit (ReLU) functions.

A second CNN: as shown in fig. 7, the second CNN has 4 convolutional layers, 4 pooling layers, 1 fully-connected layer, 1 Softmax layer, convolutional layer step size of 1, and pooling layer step size of 2. The activation function used by the full connectivity layer is a Sigmoid function.

Third CNN: as shown in fig. 8, the third CNN network has 5 convolutional layers, 3 pooling layers, 2 fully-connected layers, and 1 Softmax layer. The activation function used by the convolutional layer and the fully-connected layer is a ReLU function.

In the invention, the pre-trained image classification model is composed of a plurality of classifiers with different scales, so when the classification probability of the image to be classified is calculated, the corresponding weight is set for the classification result of the classifier with each scale, and the image to be classified is classified according to the corresponding weight and the classification probability of the image to be classified corresponding to the classifier with each scale.

In this application, when 3 Softmax functions of the image classification model are output in parallel, the probability output matrix of each input image can be obtained as follows:

wherein the content of the first and second substances,nthe number of the image types is represented, p n1(x) Representing imagesxAfter being classified by the first CNN, belong to the second CNNnA classification probability of a class;p n2(x) Representing imagesxAfter the second CNN classification belongs to the secondnA classification probability of a class;p n3(x) Representing imagesxAfter the third CNN classification, belongs to the secondnThe classification probability of a class. For each row, the column index with the highest classification probability is the Softmax function of each classifier for the samplexThe prediction category of (1).

In one embodiment, entropy is determined by introducing informationH i (x) To characterize input samplesxIs covered withiThe uncertainty of the Softmax function classification of each classifier.

Wherein the content of the first and second substances,the Softmax function representing the ith sub-network inputs the samplesxIs judged to belong tojClass probability, if the larger the value of the information entropy of the Softmax function of a certain classifier is, the higher the uncertainty of the classification is, then the Softmax function of the network is applied to the input samplesxThe less the classification capability of the classifier, the less the weight of the Softmax function of the classifier on the input samples, and vice versa. Therefore, the weight calculation formula of the Softmax function of the multi-scale CNN is:

after the weight of the multi-scale CNN is obtained, a probability output matrixP(x) Multiplying each row by the weight to obtain a new probability output matrix

Will be provided withWeighted and summed according to columns, the maximum value after weighted and summed is the final weight,

the invention considers the characteristics that the classification performances of the Softmax functions of different classifiers are different for the same input image and the classification performances of the Softmax functions of the same classifier are different for different input images, adaptively endows more reasonable weights for different input images, and solves the problem of misclassification caused by too close probability values output by the Softmax functions of a single-scale CNN.

The system provided in the above embodiment can execute the method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method. Technical details that have not been elaborated upon in the above-described embodiments may be referred to a method provided in any embodiment of the invention.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

It should be noted that, through the above description of the embodiments, it is clear to those skilled in the art that part or all of the present application can be implemented by software in combination with a necessary general hardware platform. The functions, if implemented in the form of software functional units and sold or used as a separate product, may also be stored in a computer-readable storage medium with the understanding that embodiments of the present invention provide a computer-readable storage medium including a program which, when run on a computer, causes the computer to perform the method shown in fig. 1.

An embodiment of the present invention provides an image classification device, which includes a processor coupled to a memory, where the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the method shown in fig. 1 is implemented.

With this understanding in mind, the technical solutions of the present application and/or portions thereof that contribute to the prior art may be embodied in the form of a software product that may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may cause the one or more machines to perform operations in accordance with embodiments of the present application. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The storage medium may be located in a local server or a third-party server, such as a third-party cloud service platform. The specific cloud service platform is not limited herein, such as the Ali cloud, Tencent cloud, etc. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: a personal computer, dedicated server computer, mainframe computer, etc. configured as a node in a distributed system.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:缺陷检测方法、电子装置、装置及计算机可读存储介质

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!