Image data augmentation method based on discriminant area positioning
1. An image data augmentation method based on discriminant area positioning is characterized by comprising the following steps:
calculating a class activation map of an original image by a CAM method, and positioning a discriminant region by upsampling the class activation map and taking a threshold value;
randomly sampling a square area in the discriminant area, and cutting and scaling the original image based on the square area;
generating a corresponding mixed mask based on discriminant region masks of the randomly acquired two types of pictures, and mixing discriminant regions and non-discriminant regions of the two types of pictures;
selecting a plurality of images from the training set as original images when each iteration starts, and generating a plurality of augmentation samples from the original images when each iteration ends; the training sample of each iteration is composed of an augmentation sample generated by the previous iteration and the original image of the current iteration; and repeating the iteration until the training end condition is met.
2. The method of claim 1, wherein computing a class activation map of an original image using a CAM method and locating a discriminant region by upsampling and thresholding the class activation map comprises:
calculating a class activation map of the real class of the original image by a CAM method;
the class activation map is upsampled to the same size as the original image, and then all pixel values in the class activation map are normalized to between [0,1 ].
And based on a set threshold value, taking an area of the normalized class activation image with the pixel value larger than the threshold value as a discriminant area.
3. The method according to claim 2, wherein before the computing the class activation map of the true class of the original image by the CAM method, the method further comprises:
and carrying out global average pooling on the obtained original image to obtain a dimensionality reduction vector.
4. The method according to claim 1, wherein randomly sampling a square region in the discriminant region, and cropping and scaling the original image based on the square region comprises:
determining the range of the center point of the square by setting a pixel threshold value in the discriminant area;
determining the side length range of the square based on the position and the size of the external rectangle of the discriminant region and the position of the center point of the square;
and based on the square center and the side length range, cutting and zooming the current image in a random sampling mode to obtain a corresponding augmented sample.
5. The method of claim 1, wherein generating a corresponding hybrid mask based on discriminant region masks of two randomly acquired classes of pictures and mixing discriminant and non-discriminant regions of the two classes of pictures comprises:
randomly acquiring two pictures of different categories, calculating corresponding discriminant areas, and calculating corresponding two discriminant area masks based on the discriminant areas;
and taking a union set of the two discriminant region masks, and mixing discriminant regions and non-discriminant regions of the two different types of pictures based on the generated mixed mask.
Background
Data augmentation is a commonly used regularization method in the image classification problem. In the image data augmentation method, region clipping and region mixing are two common augmentation operations, but in the traditional method, the two operations randomly transform original data according to a preset probability, the distribution of the data and the preference of a model to the data are not considered, useful information in the data is not fully utilized, and the improvement effect of the data augmentation on the model performance is limited. Specifically, in the conventional area cropping, a partial area is randomly cropped from an original image, but the possibly cropped area does not contain enough effective information, and at the moment, a new sample obtained by cropping is equivalent to a noise sample, which hardly contributes to the improvement of the model performance; the traditional area mixing is represented by CutMix, two complementary areas are randomly taken out from two pictures to be mixed, and the category label of the mixed picture is obtained by mixing the category labels of two original pictures. CutMix does not consider whether the regions taken from the two original pictures contain sufficient discriminative information, and the generated samples may become noise samples because they do not contain sufficient discriminative information.
Disclosure of Invention
The invention aims to provide an image data augmentation method based on discriminant area positioning, which solves the problem that a generated sample is a noise sample because the generated sample does not contain enough discriminant information.
In order to achieve the above object, the present invention provides an image data augmentation method based on discriminant region localization, comprising the steps of:
calculating a class activation map of an original image by a CAM method, and positioning a discriminant region by upsampling the class activation map and taking a threshold value;
randomly sampling a square area in the discriminant area, and cutting and scaling the original image based on the square area;
generating a corresponding mixed mask based on discriminant region masks of the randomly acquired two types of pictures, and mixing discriminant regions and non-discriminant regions of the two types of pictures;
selecting a plurality of images from the training set as original images when each iteration starts, and generating a plurality of augmentation samples from the original images when each iteration ends; the training sample of each iteration is composed of an augmentation sample generated by the previous iteration and the original image of the current iteration; and repeating the iteration until the training end condition is met.
The method comprises the following steps of calculating a class activation map of an original image by a CAM method, and positioning a discriminant region by upsampling the class activation map and taking a threshold value, wherein the method comprises the following steps:
calculating a class activation map of the real class of the original image by a CAM method;
the class activation map is upsampled to the same size as the original image, and then all pixel values in the class activation map are normalized to between [0,1 ].
And based on a set threshold value, taking an area of the normalized class activation image with the pixel value larger than the threshold value as a discriminant area.
Before the CAM method is adopted to calculate the class activation map of the real category of the original image, the method further comprises the following steps:
and carrying out global average pooling on the obtained original image to obtain a dimensionality reduction vector.
Randomly sampling a square area in the discriminant area, and clipping and scaling the original image based on the square area, including:
determining the range of the center point of the square by setting a pixel threshold value in the discriminant area;
determining the side length range of the square based on the position and the size of the external rectangle of the discriminant region and the position of the center point of the square;
and based on the square center and the side length range, cutting and zooming the current image in a random sampling mode to obtain a corresponding augmented sample.
The method for generating a corresponding mixed mask based on discriminant region masks of two randomly acquired types of pictures and mixing discriminant regions and non-discriminant regions of the two types of pictures comprises the following steps:
randomly acquiring two pictures of different categories, calculating corresponding discriminant areas, and calculating corresponding two discriminant area masks based on the discriminant areas;
and taking a union set of the two discriminant region masks, and mixing discriminant regions and non-discriminant regions of the two different types of pictures based on the generated mixed mask.
The invention relates to an image data augmentation method based on discriminant area positioning, which adopts a CAM (computer-aided manufacturing) method to calculate a class activation map of an original image, and positions a discriminant area by up-sampling the class activation map and taking a threshold value; the data augmentation operation comprises two operations, namely region clipping and region mixing, wherein the region clipping is to randomly sample a square region in a discriminant region and clip an original image based on the square region; the area mixing is to randomly take two different types of pictures, respectively calculate the distinguishing area masks of the two pictures, and mix the distinguishing areas and the non-distinguishing areas of the two pictures according to the masks; and generating an augmentation sample from the original image in each iteration, and forming a training sample of the next iteration together with the original image of the next iteration until the training is finished. The invention can solve the problem that the sample generated by the traditional method is a noise sample because the sample does not contain enough discriminant information.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a method for augmenting image data based on discriminant area positioning according to the present invention.
Fig. 2 is a method for locating a discriminant region according to the present invention.
Fig. 3 is a schematic diagram of a region clipping method provided by the present invention.
FIG. 4 is a schematic diagram of a zone mixing method provided by the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, it is to be understood that the terms "length", "width", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships illustrated in the drawings, and are used merely for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention. Further, in the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Referring to fig. 1, the present invention provides an image data augmentation method based on discriminant area positioning, including the following steps:
s101, calculating a class activation map of an original image by a CAM method, and positioning a discriminant area by upsampling the class activation map and taking a threshold value.
Specifically, for the picture samples in the current training batch, the CAM method is firstly utilized to output class activation images of real classes of the picture samples, the class activation images are up-sampled to the size of the original image, and then all pixel values of the class activation images are normalized to [ 2 ]0,1]In between, a threshold value theta is set1,θ1Is a hyper-parameter and makes the pixel value in the class activation image larger than theta1As shown in fig. 2, the detailed flow of the area (S) of (1) is as follows:
in fig. 2, GAP represents the global average pooling and assumes that the current sample belongs to the jth class. Suppose that the feature map output by the last convolutional layer of the CNN network is F ═ { F ═ F1,F2,...,FCIn which Fi∈RH×W,i∈[1,C]C, H and W represent the number of channels in the feature map, and the height and width of the feature map, respectively, the definition of the global average pooling is as follows:
s.t.k∈{1,2,...,C}
after global average pooling is carried out, dimension reduction is carried out on the feature map to form vectors, and the vectors are f e to R after dimension reductionC×1×1Then, the class activation map of the current image real category can be obtained according to the CAM method. After obtaining the class activation image, the class activation image is up-sampled to the size of the original image and all pixel values are normalized to [0,1]]Finally, a threshold value theta is set1∈[0,1]Pixel value greater than theta in class activation map1The region (b) is regarded as a discriminant region and is denoted as S.
S102, randomly sampling a square area in the discriminant area, and clipping and scaling the original image based on the square area.
Specifically, after the discriminant region S of the picture is obtained, in order to increase the diversity of the samples, the entire discriminant region S is not directly trimmed but partial regions are randomly trimmed from the discriminant region S in the region trimming process. In the present invention, the partial regions to be cut are all square regions, and are denoted as B. In order to ensure that the cut square area B can contain enough discriminant information, the method sets a pixel threshold value theta in the discriminant area S2Finding out that the pixel value in S is greater than theta2Is marked asS 'and specifies that the center of square B can only fall within region S'. Meanwhile, the side length of the square B is limited, the side length range of the square B is determined by a positive external rectangle of the discriminant region S, the maximum side length of the square B is based on the positive external rectangle which does not exceed the discriminant region S, and a hyper-parameter phi epsilon [0,1] is set]And multiplying phi by the length of the short side of the positive circumscribed rectangle of S to obtain the minimum value of the side length of B. After the central point range and the side length range of the square B are determined, B is obtained through random sampling each time, then a corresponding area can be cut out from an original image, the area is zoomed to the size of the original image to be used as a new sample, and the new sample and the original sample have the same category label.
As shown in fig. 3, after the discriminant region S is obtained, a square region is cut from S as a new sample each time the region is cut, the square region is denoted as B, and the center point of B is (B)x,by) Side length is a, and (b)x,by) And a are uniformly sampled random numbers. To ensure that the cropped region contains enough discriminative regions, (b)x,by) And a requires a suitable sampling range. So that a threshold value theta is set again2:
θ2=θ1+λ(max(S)-θ1)
Wherein, λ ∈ [0,1]]Max(s) represents the maximum pixel value in the discriminant region. In the region S, the pixel value is larger than theta2Is denoted as S', (b)x,by) Is defined in S'. Next, the range of the side length a of the square B is determined, and the size of the cutting area B is determined according to the size of the discriminant area S. First, a rectangle circumscribing the right corner of S is determined, and the coordinate of the upper left corner of the rectangle is expressed as (S)x,sy) And length and width are respectively denoted as swAnd shLet B have a side length range of [ a ]min,amax]Wherein:
amax=2×min(bx-sx,by-sy,sx+sw-bx,sy+sh-by)
amin=min(amax,φ(min(sw,sh)))
where φ is a hyperparameter and φ ∈ [0,1]],amaxIs calculated to ensure that the range of B falls within the right bounding rectangle of the discriminative region S, aminIs calculated to ensure amin≤amaxAnd a is aminThe size of the discriminant region S. Determining (b)x,by) And a, obtaining a square area B by random sampling each time, cutting out a corresponding area from the original image, zooming to the original image size, and taking the area as an augmented sample, wherein the augmented sample obtained by area cutting has the same category label as the original sample.
S103, generating a corresponding mixed mask based on the discriminant region masks of the two randomly acquired types of pictures, and mixing the discriminant regions and the non-discriminant regions of the two types of pictures.
Specifically, as shown in fig. 4, in the current training batch, two pictures of different categories are randomly taken and recorded as x1And x2And then respectively calculating the discriminant areas S of the two pictures1And S2And calculating a discriminant region S1And S2Corresponding masks, the discriminative area masks of two pictures are respectively marked as M1And M2And taking a union set of the two masks to obtain a mixed mask M of the discriminant areas of the two pictures:
where (i, j) is the position index of the picture, then the augmented sample is generated according to the following formula:
wherein, x'1Is x1Has an extended sample of x1Same label, x'2Is x2Has an extended sample of x2The same label.
As can be seen from fig. 4, the area mixing method of the present invention generates two augmented samples at the same time each time, and each augmented sample has only one discriminant area of the original sample, and the tag of the augmented sample is the same as the category tag of the original sample to which the discriminant area belongs, thereby avoiding the occurrence of category confusion after area mixing.
S104, selecting a plurality of images from the training set as original images when each iteration starts, and generating a plurality of augmentation samples from the original images when each iteration ends; the training sample of each iteration is composed of an augmentation sample generated by the previous iteration and the original image of the current iteration; and repeating the iteration until the training end condition is met.
Specifically, in the convolutional neural network training process, after each iteration is finished, the above 3 steps are performed to generate a batch of amplification samples, and the amplification samples and the original samples are used together as training samples for the next iteration. It should also be noted that the augmented samples of each iteration are generated from the original samples only, and the augmented samples of the previous iteration are not generated from the augmented samples of the previous iteration, where the training set is all the images acquired.
Advantageous effects
The invention mainly solves the problem that the generated sample is a noise sample because the generated sample does not contain enough discriminant information in the traditional image data augmentation method, and has the following beneficial effects:
(1) through carrying out regional cutting to the discriminant region, can get rid of the interference of other regions irrelevant with categorised, make the model more be absorbed in the regional characteristic of study discriminant, promote the classification effect of model.
(2) By carrying out region mixing on the discriminant region of one picture and the non-discriminant region of another picture of different types, the generated new sample can be ensured to only contain discriminant information of one type, the generated new sample is prevented from becoming a noise sample, meanwhile, the background change of each type can be enriched, and the feature extraction capability of the model in a complex and variable scene is improved.
The invention relates to an image data augmentation method based on discriminant area positioning, which adopts a CAM (computer-aided manufacturing) method to calculate a class activation map of an original image, and positions a discriminant area by up-sampling the class activation map and taking a threshold value; the data augmentation operation comprises two operations, namely region clipping and region mixing, wherein the region clipping is to randomly sample a square region in a discriminant region and clip an original image based on the sampled square region; the area mixing is to randomly take two different types of pictures, respectively calculate the distinguishing area masks of the two pictures, and mix the distinguishing areas and the non-distinguishing areas of the two pictures according to the masks; and generating a corresponding next augmentation sample by taking the augmentation sample obtained based on the original image and the next original image as training samples of the next iteration until the training is finished, wherein if the current iteration is the first iteration training, the corresponding augmentation sample is generated according to the obtained original image, and if the current iteration is the iteration training meeting the training finishing time, the augmentation sample generated by the current iteration is output to finish the training.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.