Crop image segmentation and extraction algorithm based on MASK RCNN
1. A crop image segmentation and extraction algorithm based on MASK RCNN is applied to image data sets of various crops, images are input into a network after image preprocessing, the segmentation and extraction of the crop images can be realized with high precision, high efficiency and no damage, crop planting personnel can be helped to know the growth dynamics of the crops in real time, and the crops can be better managed, and the specific algorithm flow comprises the following steps:
(1) pretreatment: dividing a crop image data set into a training set and a testing set according to the proportion of 8: 2, and labeling the training set by Labelme;
(2) inputting: a crop image dataset;
(3) initialization: the iteration times Max _ step;
(4) extracting crop image features by using a residual error network and a feature pyramid network added with path aggregation and feature enhancement to generate a feature map;
(5) the feature map generates a network through an area optimizing the target anchor frame proportion, extracts a target area possibly existing, and performs binary classification on the foreground and the background through a softmax classifier
(6) Possibly existing regional targets pass through a non-maximum suppression filter, and targets which do not accord with a threshold value are filtered;
(7) transmitting the characteristic map generated in the step (4) and the possible existing area target meeting the threshold value in the step (6) to ROIAlign;
(8) obtaining an image numerical value on a pixel point with coordinates of a floating point number by using a bilinear interpolation method, and storing spatial information of a characteristic diagram;
(9) correcting the sizes of the characteristic images to make all the sizes of the characteristic images be 256 multiplied by 256;
(10) generating mask loss by the result of the step (9) through a newly-added micro fully-connected network;
(11) generating box and cls losses by the result of (9) through a full-connection network;
(12) predicting the target edge of the result in the step (9) through a sobel operator to generate edge loss;
(13) summing the results of (10) - (12) to generate a loss function, and outputting Precision, Recall, Average Precision, Mean Average Precision, and an F1 score;
(14) and (4) judging whether the maximum iteration frequency Max _ step is reached, if not, returning to the step (4), and if so, outputting the crop segmentation image and the Precision-Recall curve.
In recent years, under the vigorous popularization of intelligent agriculture concepts, traditional agricultural cultivation has been gradually abandoned, and agriculture is developing towards intellectualization and automation. In the early 70 s of the 20 th century, image processing technology began to be applied to the agricultural field, mainly applied to aspects such as crop disease identification, phenotype detection, quality grading and the like, and greatly improved agricultural cultivation efficiency and crop yield. The image segmentation is a process of decomposing and dividing an image into a plurality of regions according to the gray value difference of the image and extracting the region of interest of a researcher. Image segmentation is a key ring in image processing, and has a great influence on an image analysis result. Through the image segmentation technology, crop information can be efficiently and nondestructively acquired, crop planting personnel can be helped to know the growth dynamics of crops in real time, and the crops can be better managed.
The traditional Mask RCNN network is built by taking a TensorFlow as a Keras deep learning framework at the rear end, and due to the limitation of the framework, the performance of the network cannot be well exerted. The network is built based on the PyTorch framework, the advantages of the network are not only the advantages of PyTorch compared with a Keras framework taking TensorFlow as the rear end, but also the improvement of the performance of the Mask RCNN network under a brand new framework. The utilization efficiency of the computer video memory resources is higher, and the calculation speed and accuracy are also obviously improved. And the new framework is convenient to debug, highly modularized, convenient to build a model and flexible in migration of data parameters between the CPU and the GPU.
If the category and location information of the object of interest in the image is determined on the examples, the most popular target detection algorithms include Rcnn, Fast Rcnn and Mask Rcnn. However, these frameworks require a large amount of training data and do not enable end-to-end detection. The detection frame has limited positioning capability, and when the characteristics are extracted, semantic information between the shallow neural network and the deep neural network is confused along with the increase of the number of convolution layers, so that the problem that the characteristics are difficult to comprehensively cover and extract needs to be solved urgently.
The invention provides a crop image segmentation and extraction algorithm based on Mask RCNN, which optimizes a residual error network (ResNet) by adding a path aggregation and feature enhancement function in a network. Different weights are introduced into the loss function according to different proportion targets, the target edge is predicted by using a sobel operator in a mask branch output by the ROI, edge loss is added into the loss function, and algorithm classification performance and segmentation precision are tested on the Fruits 360 data set after labels are made by the Labelme. The comparison experiment results with FCN, U-net and Mask Rcnn image extraction algorithms show that the average accuracy, the average precision mean value and the F1 score index of the extraction results of the crop image extraction algorithm based on the MASK RCNN are superior to those of the comparison algorithms, and the method has excellent performance, and has better accuracy, robustness and generalization performance of the network.
The invention aims to add a path aggregation and feature enhancement function in a network design, add a micro full-link layer in a Mask branch output by an ROI, and add edge loss in a loss function by fusing a sobel operator, and provides a crop image segmentation and extraction algorithm based on Mask Rcnn.
Different from the traditional image segmentation Mask Rcnn algorithm, the characteristic pyramid network adds path aggregation and characteristic enhancement. The feature pyramid network is connected through a bypass from top to bottom, high-level semanteme is added to the extracted features, classification is facilitated, but low-level features are more beneficial to positioning of pixel points, and although the high-level features of the feature pyramid network also have low-level characteristics, information flow paths are too long and difficult to fuse. The invention further fuses the characteristics by using the thought from bottom to top, so that the path of the high-level characteristics for obtaining the low-level characteristics is shortened, and a clear lateral connection path is constructed from the low level to the high level for shortening the information path and enabling the information circulation of the bottom layer to be faster. And creating a bottom-up enhanced path, and then utilizing the precise positioning signal stored in the bottom-up low-level feature by the improved feature pyramid network to improve the feature pyramid structure.
The invention generates a network in a region to optimize the proportion of target anchor frames, the region generation network is equivalent to a non-class target detector based on a sliding window, the method is based on the structure of a convolutional neural network, a sliding frame scans to generate frame anchors, and 9 target frames with different area sizes are adopted to estimate the size of the position of an original target on each sliding window. The proposed area can generate a large number of anchor points of different sizes and aspect ratios and which overlap to cover as many images as possible, the size of the proposed area and the amount of overlap of the required area (IOU) will directly affect the classification effect. In order to adapt to more crop type areas, the scaling of the anchor point is adjusted from the original {32 × 32, 64 × 64, 256 × 256}, the aspect ratio anchor point { 1: 2, 1: 1, 2: 1}, to {16 × 16, 32 × 32, 64 × 64, 256 × 256}, the aspect ratio anchor point { 1: 2, 1: 1, 3: 1 }. After adjustment, the detection and segmentation effects of the small targets can be improved, and the detection rate of the target frame is improved.
Adding a micro full connection layer in a mask branch output by the ROI, adding a micro full connection layer branch in the mask branch, connecting the micro full connection layer branch to a full connection layer from conv3 through a branch, and passing two convolution layers conv4_ fc and conv5_ fc with the number of 3 multiplied by 3 channels. Where the conv5_ fc convolution layer channel number is halved to reduce the amount of computation, the mask size is set to 28 × 28. And (3) generating a 784x1x1 vector by the full-connection layer, then reshape into the same spatial size as the mask of the FCN prediction, and finally adding the spatial size of the mask to the output of the full-connection network to obtain the final prediction.
The method uses a sobel operator to predict the target edge, and adds edge loss into a loss function. Firstly, converting the marked image into a binary segmentation image of the crop, namely a target mask, and then taking a prediction mask and the target mask output by mask branches as inputs to be respectively convolved with a sobel operator, wherein the sobel operator is as follows:
the sobel operator describes horizontal and vertical gradients respectively, the edge strength along the x and y axes can be used in the gradient descending process in the training process, the total loss can be minimized by using the direction of the edge, and the extra information can accelerate the training and reduce a large amount of training time.
The additive edge loss of the present invention is as follows:
the invention aims to solve the following models and realize accurate and efficient identification and segmentation of crop images:
min L=min(Lcls+Lbox+Lmask+Ledge)
wherein L isclsRepresenting a classification error; l isboxIndicating a detection error; l ismaskRepresenting a segmentation error; l isedgeAdding edge loss error of sobel operator.
According to the evaluation indexes, Recall, Precision, Average Precision (AP), Mean Average Precision (mAP) and F1 are used as the evaluation indexes for crop image extraction. Each category in the data set can draw a curve according to Precision (Precision) and Recall (Recall), and Average Precision (AP) is an area enclosed by the curve and a coordinate axis. mAP is the average of all classes of AP values
Wherein T isPIs the number of correctly predicted positive samples; fPThe number of samples predicted as positive samples for negative samples; fNThe number of samples predicted as negative samples for positive samples; n is the number of data set categories.
The purpose of the invention is realized by the following technical scheme:
(1) pretreatment: dividing a crop image data set into a training set and a testing set according to the proportion of 8: 2, and labeling the training set by Labelme;
(2) inputting: a crop image dataset;
(3) initialization: the iteration times Max _ step;
(4) extracting crop image features by using a residual error network and a feature pyramid network added with path aggregation and feature enhancement to generate a feature map;
(5) the feature map generates a network through an area optimizing the target anchor frame proportion, extracts a target area possibly existing, and performs binary classification on the foreground and the background through a softmax classifier
(6) Possibly existing regional targets pass through a non-maximum suppression filter, and targets which do not accord with a threshold value are filtered;
(7) transmitting the characteristic map generated in the step (4) and the possible existing area target meeting the threshold value in the step (6) to ROIAlign;
(8) obtaining an image numerical value on a pixel point with coordinates of a floating point number by using a bilinear interpolation method, and storing spatial information of a characteristic diagram;
(9) correcting the sizes of the characteristic images to make all the sizes of the characteristic images be 256 multiplied by 256;
(10) generating mask loss by the result of the step (9) through a newly-added micro fully-connected network;
(11) generating box and cls losses by the result of (9) through a full-connection network;
(12) predicting the target edge of the result in the step (9) through a sobel operator to generate edge loss;
(13) summing the results of (10) - (12) to generate a loss function, and outputting Precision, Recall, Average Precision, Mean Average Precision, and an F1 score;
(14) and (4) judging whether the maximum iteration frequency Max _ step is reached, if not, returning to the step (4), and if so, outputting the crop segmentation image and the Precision-Recall curve.
Compared with the prior art, the invention has the following advantages and positive effects:
firstly, adding path aggregation and feature enhancement in a feature pyramid network, further fusing features by using a concept from bottom to top, shortening a path of a high-level feature for obtaining a low-level feature, and constructing a clear lateral connection path from the low level to the high level for shortening an information path and enabling bottom-level information to circulate faster. Establishing a bottom-up enhanced path, and then utilizing an accurate positioning signal stored in a bottom-up low-level feature by the improved feature pyramid network to promote a feature pyramid framework;
second, a network optimization target anchor frame ratio is generated in the region, and the scaling of the anchor point is adjusted from the original {32 × 32, 64 × 64, 256 × 256}, the aspect ratio anchor point { 1: 2, 1: 1, 2: 1}, to {16 × 16, 32 × 32, 64 × 64, 256 × 256}, the aspect ratio anchor point { 1: 2, 1: 1, 3: 1 }. After adjustment, the detection and segmentation effects of small targets can be improved, and the detection rate of a target frame is improved;
thirdly, a micro-global connection layer is added to the mask branch of the ROI output, and the mask branch is connected to the global connection layer from conv3 through a branch, and passes through two convolution layers conv4_ fc and conv5_ fc with the number of 256 channels of 3 × 3 size in the middle. Where the conv5_ fc convolution layer channel number is halved to reduce the amount of computation, the mask size is set to 28 × 28. The 784x1x1 vector generated by the full-connection layer is formed into a space size which is the same as the space size of the mask predicted by the FCN, and finally the space size is added with the output of the full-connection network to obtain the final prediction, so that the accurate reading of the mask branch prediction is improved, and a more accurate image segmentation effect is realized;
fourth, the target edge is predicted using the sobel operator, adding edge loss to the loss function. The marked image is converted into a binary segmentation image of the crop, namely the binary segmentation image is a target mask, then the prediction mask and the target mask output by the mask branches are used as input, the direction of the edge is used for minimizing the total loss, the extra information can accelerate the training, and a large amount of training time is reduced.
Fig. 1 is a flow chart of a crop image segmentation and extraction algorithm based on MASK RCNN according to the present invention;
FIG. 2 is a comparison graph before and after feature pyramid network optimization of the algorithm of the present invention;
FIG. 3 is a comparison graph of mask branch micro-total join additions to the algorithm of the present invention;
FIG. 4 is a plot of average Accuracy (AP) of a portion of the crop compared to the FCN, U-net and Mask Rcnn algorithms of the present invention;
FIG. 5 is a graph comparing the partial crop extraction and segmentation effects of the algorithm of the present invention with the FCN, U-net and Mask Rcnn algorithms.
Embodiments and advantages of the present invention will be apparent from the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings.
(1) Pretreatment: dividing a crop image data set into a training set and a testing set according to the proportion of 8: 2, and labeling the training set by Labelme;
(2) inputting: a crop image dataset;
(3) initialization: the iteration times Max _ step;
(4) extracting crop image features by using a residual error network and a feature pyramid network added with path aggregation and feature enhancement to generate a feature map;
(5) the feature map generates a network through an area optimizing the target anchor frame proportion, extracts possible target areas, and performs binary classification on the foreground and the background through a soffmax classifier
(6) Possibly existing regional targets pass through a non-maximum suppression filter, and targets which do not accord with a threshold value are filtered;
(7) transmitting the characteristic map generated in the step (4) and the possible existing area target meeting the threshold value in the step (6) to ROIAlign;
(8) obtaining an image numerical value on a pixel point with coordinates of a floating point number by using a bilinear interpolation method, and storing spatial information of a characteristic diagram;
(9) correcting the sizes of the characteristic images to make all the sizes of the characteristic images be 256 multiplied by 256;
(10) generating mask loss by the result of the step (9) through a newly-added micro fully-connected network;
(11) generating box and cls losses by the result of (9) through a full-connection network;
(12) predicting the target edge of the result in the step (9) through a sobel operator to generate edge loss;
(13) summing the results of (10) - (12) to generate a loss function, and outputting Precision, Recall, Average Precision, Mean Average Precision, and an F1 score;
(14) judging whether the maximum iteration times Max _ step is reached, if not, returning to the step (4), and if so, outputting a crop segmentation image and a Precision-Recall curve
The simulation experiment of the invention is implemented in an open-source PyTorch learning framework by using Python language programming, the hardware environment is completed by a Dell T5820 workstation, and the simulation experiment is provided with a double NVIDIA Quadro P4000 video card (8GB) and a 64-bit Ubuntu16.04 operating system.
In a simulation experiment, an experimental object is a Kaggle platform flues 360 data set, and a comparison experiment compares the crop image segmentation and extraction algorithm based on MASK RCNN provided by the invention with the existing FCN (FCN), U-net and MASK Renn algorithms respectively. It is worth pointing out that the U-net and Mask Rcnn methods compared in the present invention are one of the most advanced proposed algorithms.
We set the streams 360 dataset as 80% training set and 20% testing set, respectively. Table 1 shows the maximum AP and mAP values of the same data set under the same conditions in bold for the average accuracy AP values of part of crops and the average accuracy mAP values of each category for each image segmentation and extraction algorithm; table 2 shows the average accuracy mAP value and the F-measure (F1) value of each category of all data sets of the streams 360 of each image segmentation and extraction algorithm, wherein bold represents the maximum mAP and the F-measure (F1) value of the same data set under the same condition; it can be seen that the method of the present invention can achieve the optimal prediction results under all circumstances, and more specifically, the method of the present invention has an improvement of nearly 10% in all results compared to the suboptimal Mask Rcnn algorithm. In addition, the method of the invention has good convergence, fig. 4 shows a data graph of the PR curve result of the method of the invention on the streams 360 dataset, and as can be seen from fig. 4, the method of the invention has good convergence and stability, and the result value is approximated more quickly through a smaller number of iterations. Fig. 5 is a diagram of the crop image segmentation and extraction effects of the method and each comparison algorithm of the present invention, and it can be seen from fig. 5 that the FCN algorithm has a poor segmentation effect on crops at a low resolution, and when the pineapple image is segmented and extracted, the U-net and Mask Rcnn algorithms have an unsatisfactory segmentation result, but the method of the present invention still maintains a good segmentation effect. In conclusion, the method has the advantages of high algorithm efficiency, high convergence rate and accurate calculation result, and is an effective image segmentation and extraction method.
Table 1 comparison result of AP and mAP values of partial crops of each image segmentation and extraction algorithm
TABLE 2 mAP, F1 comparison results of each image segmentation extraction algorithm