Pavement crack detection and segmentation method based on deep learning
1. A pavement crack detection and segmentation method based on deep learning is characterized by comprising the following steps:
step 1, crack image acquisition and pretreatment
The method comprises the steps that crack images under different road surface conditions are collected by intelligent detection vehicle equipment, and the collected images comprise three types, namely transverse cracks, longitudinal cracks and reticular cracks; then, denoising the image, wherein the noise in the original image shot by the camera of the intelligent detection vehicle equipment influences the feature extraction of the follow-up model on the crack target due to the noise, so that denoising is carried out by adopting a frequency domain filtering mode, and the noise information of high-frequency components is removed;
step 2, image annotation
Marking the category, position and outline of the crack of the denoised image, and randomly dividing the marked crack image into a training set and a testing set according to the ratio of 9: 1 for training a subsequent model;
step 3, constructing a C-Mask RCNN model
On the basis of a Mask R-CNN model, improving the part of the model after the area suggestion network, and gradually improving the quality of a candidate frame generated by the model by cascading a plurality of different threshold detectors for dividing positive and negative samples IoU so as to realize accurate positioning of the crack under a high threshold; the improved C-Mask RCNN model is composed of a feature extraction network, a region suggestion network and a multi-layer detector; the feature extraction network is used for feature extraction of an input crack image, the area suggestion network is used for generating a candidate frame, the detectors can output two part results which are respectively a crack detection result and a crack segmentation result, each detector comprises a pooling layer, a full connection layer, a classification part, a boundary regression part and a mask segmentation part, the input of each detector is the output of a candidate frame generated by the previous detector through boundary regression, namely the candidate frame output by the first detector is connected with the pooling layer of the second detector, the candidate frame output by the second detector is connected with the pooling layer of the third detector, and the like, the more rear detectors are set with positive and negative sample division IoU thresholdsThe larger the value is, the more each detector can detect only the candidate frame under a certain IoU threshold value, so that the quality of the candidate frame and the training effect of the model are improved; in addition, each detector outputs masks M of the crack target in the candidate frame, the masks M are connected in series to complete the interaction of crack pixel information, and the mask M of the current detector is enabled to be a convolution kernel of 1 multiplied by 1iMask M to be contacted with the next layer of detectorsi+1Performing convolution to make Mi+1To obtain MiThe mask M information output by each layer of detector is fused, so that the segmentation of crack pixels is more detailed, and the crack detection and segmentation under high precision are realized;
f(x,b)=fT,fT-1,…,f1(x,b) (1)
equation (1) is a model architecture of the cascaded multi-layer detector, where f (x, b) represents the cascaded result of the whole model, and fT(x, b) represents the output of the T-th layer detector, T is the number of cascade, x is the input, b is the candidate frame of the output, each detector corresponds to the sample candidate frame b of this stageTCandidate frame b obtained in the next layerTWill be compared with the candidate frame b of the previous layerT-1The quality is high;
as can be seen from the formula (1), the result output by the model is the result of integrating the detectors of each layer, and the output characteristics of the detector further back are detected by the detector before, so that the accuracy rate is reduced due to characteristic mismatching, that is, the detection performance is reduced when the number of cascaded detectors is too large;
step 4, training the C-Mask RCNN model
Inputting the training set and the test set which are divided in the step 2 into the constructed C-Mask RCNN model for training;
the training process is as follows:
1) taking the crack images in the training set and the real labels corresponding to the images as training set input models;
2) setting the weight, bias and learning rate of the training set input model, wherein the weight and bias adopt random values close to 0, and the initial learning rate is set to 0.0001;
3) using an optimizer to minimize a loss function, and continuously updating the weight and the bias to obtain the optimal weight of the model;
4) inputting a test set image to test the model, and calculating the accuracy of the model;
step 5, setting parameters of the C-Mask RCNN model
Respectively selecting different feature extraction networks, optimizers and threshold parameters in the model to train and compare the model, and selecting an optimal group of results as final parameters of the model; the C-Mask RCNN model is provided with three parameters needing to set threshold values by self, namely a non-maximum value inhibition threshold value nms _ thr, an IoU threshold value IoU _ thr and a segmentation degree threshold value Mask _ thr _ binary, wherein the non-maximum value inhibition threshold value nms _ thr is used for controlling the number of candidate frames displayed by using a non-maximum value inhibition algorithm, the IoU threshold value IoU _ thr is used for controlling the quality of the generated candidate frames, the segmentation degree threshold value Mask _ thr _ binary is related to the segmentation fineness of cracks in a detection frame, and the segmentation result is more detailed when the value is larger;
step 6, outputting crack detection and segmentation results
And the C-Mask RCNN model finally outputs two parts of results, namely a crack detection result and a segmentation result. (ii) a Outputting the generated crack detection and segmentation results to an image;
the specific output process is as follows:
the detection result part generates coordinates of four vertexes of a detection frame, and the detection frame is drawn on the original image to serve as a crack detection result; the segmentation result part stores pixel points identified as cracks in the original image, and the crack pixel points are added with colors and drawn on the original image to serve as the segmentation result of the cracks.
2. The deep learning-based pavement crack detection and segmentation method according to claim 1, wherein the number of cascaded detectors is three.
3. The method for detecting and segmenting the road surface crack based on the deep learning as claimed in claim 1, wherein as the detectors regress layer by layer, IoU values between the candidate frame and the real label are also improved, and differences between the candidate frame and the real label generated by the later detector are smaller, so that the improvement quality of the candidate frame is less and less; therefore, in order to reduce the influence of the reduction of the quality improvement of the candidate frame, the IoU threshold is set for each detector in a curve increasing mode, the threshold number is reduced in turn each time the threshold number is increased, and the set values are 0.5, 0.62 and 0.72 in turn.
4. The method for detecting and segmenting the road surface cracks based on the deep learning of claim 1, wherein a ResNeXt101 network is selected as a feature extraction network in a C-Mask RCNN model, an optimizer of model training is set as an Adam optimizer, the learning rate is set in a dynamic adjustment mode, the initial learning rate is set to be 0.0001, a non-maximum suppression threshold nms _ thr threshold is set to be 0.5, an IoU threshold iou _ thr threshold is set to be 0.6, and a segmentation degree threshold Mask _ thr _ binary threshold is set to be 0.5.
Background
The pavement crack is a common road disease, and can seriously affect the service life of a road and the driving safety of vehicles, so that the detection of the pavement crack has very important research significance. With the development of artificial intelligence technology in recent years, intelligent detection of pavement cracks by adopting deep learning has also achieved good achievements. Compared with the traditional crack detection method, the deep learning technology has good model generalization capability and robustness, is high in detection precision, and can be used in complex and various road surface conditions.
Although the research on the pavement crack detection and segmentation work is advanced to some extent in recent years, the existing method still has some problems: 1) the background of an actual pavement crack is complex, noise influences such as illumination, shadow, pavement pollution and the like exist, pixels between the crack and the background are likely to be similar, and the accuracy of crack identification is low due to the problems; 2) many existing methods are limited in use scene, single in task, mostly only realize the detection or segmentation task of the crack, and can realize the positioning of the crack by using a detection model, but cannot calculate the length, width, area and other parameter information of the crack; the use of segmentation models enables segmentation of the crack pixels, but does not allow locating the crack location and distinguishing the number of cracks. The model (such as a detection model or a segmentation model) using a single task can only complete the detection or segmentation task of the pavement crack; 3) when the combined model of detection and segmentation is used, some problems of misjudgment of the number of cracks caused by the fracture of the split pixel segmentation result or incomplete segmentation result caused by incomplete crack of the detection frame due to low accuracy of the detection model can also occur.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a pavement crack detection and segmentation method based on deep learning, solves the problems of low model accuracy and single task completion existing in the prior art, and realizes the purpose of segmenting crack pixels in a generated detection frame while positioning cracks.
In order to achieve the purpose, the invention adopts the following technical scheme:
a road surface crack detection and segmentation method based on deep learning is characterized in that a Mask R-CNN model can integrate crack detection and segmentation tasks into a network model, and crack pixels in a generated detection frame are segmented while target positioning is achieved. However, since the Mask R-CNN model belongs to a single threshold detector, in order to consider the balance of positive and negative samples when performing a multi-classification task, the threshold for dividing the positive and negative samples IoU (Intersection over Union ratio) is set to 0.5, and this dividing method is not suitable for detecting a crack target in fact, and for a slender target such as a crack which generally exists at a diagonal position of a candidate frame, only a small number of crack targets are used as positive samples to participate in model training due to background interference, which may cause a large amount of false detections to occur in a detection result and further cause inaccurate positioning, but directly increasing the threshold IoU may cause unbalanced distribution of the positive and negative samples, thereby causing a missing detection phenomenon due to a decrease in the number of positive samples. Therefore, aiming at the problem of inaccurate detection of a Mask R-CNN model, the invention improves the model on the basis of the method, introduces the thought of multi-threshold detection, and provides a C-Mask RCNN model for completing the task of detecting and dividing the crack, and the method comprises the following specific steps:
step 1, crack image acquisition and pretreatment
The method comprises the steps that crack images under different road surface conditions are collected by intelligent detection vehicle equipment, and the collected images comprise three types, namely transverse cracks, longitudinal cracks and reticular cracks; then, denoising the image, wherein the noise in the original image shot by the camera of the intelligent detection vehicle equipment influences the feature extraction of the follow-up model on the crack target due to the noise, so that denoising is carried out by adopting a frequency domain filtering mode, and the noise information of high-frequency components is removed;
step 2, image annotation
Marking the crack type, position and contour of the denoised image, and enabling the marked crack image to be in a 9: 1, randomly dividing the ratio into a training set and a testing set for training a subsequent model;
step 3, constructing a C-Mask RCNN model
On the basis of a Mask R-CNN model, improving the part of the model after the area suggestion network, and gradually improving the quality of a candidate frame generated by the model by cascading a plurality of different threshold detectors for dividing positive and negative samples IoU so as to realize accurate positioning of the crack under a high threshold; improved C-Mask RCNThe N model is composed of a feature extraction network, an area suggestion network and a multi-layer detector; the feature extraction network is used for extracting features of an input crack image, the region suggestion network is used for generating a candidate frame, the detectors can output two parts of results which are respectively a crack detection result and a segmentation result, each detector comprises a pooling layer, a full-link layer, a classification part, a boundary regression part and a mask segmentation part, and the input to each detector is the candidate box output generated by the last detector through boundary regression, i.e. the candidate block output by the first detector is connected to the pooling layer of the second detector, the candidate block output by the second detector is connected to the pooling layer of the third detector, by analogy, the more the detector behind, the larger the threshold value of the positive and negative sample IoU set by the detector, each detector can only detect the candidate frame under a certain IoU threshold value, and therefore the quality of the candidate frame and the training effect of the model are improved; in addition, each detector outputs masks M of the crack target in the candidate frame, the masks M are connected in series to complete the interaction of crack pixel information, and the mask M of the current detector is enabled to be a convolution kernel of 1 multiplied by 1iMask M to be contacted with the next layer of detectorsi+1Performing convolution to make Mi+1To obtain MiThe mask M information output by each layer of detector is fused, so that the segmentation of crack pixels is more detailed, and the crack detection and segmentation under high precision are realized;
f(x,b)=fT,fT-1,…,f1(x,b) (1)
equation (1) is a model architecture of the cascaded multi-layer detector, where f (x, b) represents the cascaded result of the whole model, and fT(x, b) represents the output of the T-th layer detector, T is the number of cascade, x is the input, b is the candidate frame of the output, each detector corresponds to the sample candidate frame b of this stageTCandidate frame b obtained in the next layerTWill be compared with the candidate frame b of the previous layerT-1The quality is high;
as can be seen from the formula (1), the result output by the model is the result of integrating the detectors of each layer, and the output characteristics of the detector further back are detected by the detector before, so that the accuracy rate is reduced due to characteristic mismatching, that is, the detection performance is reduced when the number of cascaded detectors is too large; therefore, the number of the cascade detectors is three;
as the detectors regress layer by layer, the IoU value between the candidate frame and the real label is also improved, and the difference between the candidate frame and the real label generated by the detector at the later stage is smaller, so that the improvement quality of the candidate frame is less and less; therefore, in order to reduce the influence of the reduction of the quality improvement of the candidate frame, the IoU threshold is set for each detector in a curve increasing mode, the threshold number is reduced in turn each time the threshold number is increased, and the set values are 0.5, 0.62 and 0.72 in turn.
Step 4, training the C-Mask RCNN model
Inputting the training set and the test set which are divided in the step 2 into the constructed C-Mask RCNN model for training;
the training process is as follows:
1) taking the crack images in the training set and the real labels corresponding to the images as training set input models;
2) setting the weight, bias and learning rate of the training set input model, wherein the weight and bias adopt random values close to 0, and the initial learning rate is set to 0.0001;
3) using an optimizer to minimize a loss function, and continuously updating the weight and the bias to obtain the optimal weight of the model;
4) inputting a test set image to test the model, and calculating the accuracy of the model;
step 5, setting parameters of the C-Mask RCNN model
Respectively selecting different feature extraction networks, optimizers and threshold parameters in the model to train and compare the model, and selecting an optimal group of results as final parameters of the model; the C-Mask RCNN model is provided with three parameters needing to set threshold values by self, namely a non-maximum value inhibition threshold value nms _ thr, an IoU threshold value IoU _ thr and a segmentation degree threshold value Mask _ thr _ binary, wherein the non-maximum value inhibition threshold value nms _ thr is used for controlling the number of candidate frames displayed by using a non-maximum value inhibition algorithm, the IoU threshold value IoU _ thr is used for controlling the quality of the generated candidate frames, the segmentation degree threshold value Mask _ thr _ binary is related to the segmentation fineness of cracks in a detection frame, and the segmentation result is more detailed when the value is larger;
the method comprises the steps of selecting a characteristic extraction network in a C-Mask RCNN model as a ResNeXt101 network, setting an optimizer of model training as an Adam optimizer, setting a learning rate in a dynamic adjustment mode, setting an initial learning rate to be 0.0001, setting a non-maximum value inhibition threshold value nms _ thr threshold value to be 0.5, setting an IoU threshold value iou _ thr threshold value to be 0.6, and setting a segmentation degree threshold value Mask _ thr _ binary threshold value to be 0.5;
step 6, outputting crack detection and segmentation results
The C-Mask RCNN model finally outputs two parts of results, namely a crack detection result and a segmentation result; outputting the generated crack detection and segmentation results to an image;
the specific output process is as follows:
the detection result part generates coordinates of four vertexes of a detection frame, and the detection frame is drawn on the original image to serve as a crack detection result; the segmentation result part stores pixel points identified as cracks in the original image, and the crack pixel points are added with colors and drawn on the original image to serve as the segmentation result of the cracks.
Compared with the prior art, the invention has the beneficial effects that:
the invention improves the model structure on the basis of a Mask R-CNN model, provides a C-Mask RCNN model for cascade multi-threshold detection, improves the quality of generated candidate frames, and realizes accurate positioning and segmentation of crack images under high-threshold detection.
Drawings
FIG. 1 is an overall block diagram of the method of the present invention.
FIGS. 2a and 2b show the results of two transverse cracks, each showing the results of crack detection and segmentation, FIGS. 2c and 2d show the results of longitudinal crack detection and segmentation, FIGS. 2e and 2f show the results of web crack detection and segmentation
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Step 1: and (4) crack image acquisition and pretreatment.
The intelligent detection vehicle equipment is used for collecting crack images under the campus and the highway respectively so as to meet the requirements of different pavements and different scenes. The acquired fracture image comprises three categories, namely transverse fractures, longitudinal fractures and reticular fractures, and 6400 images for model training and testing, wherein 2200 transverse fractures, 2200 longitudinal fractures and 2000 reticular fractures are used, and the size of each image is 1280 x 960 pixels. Denoising the acquired crack image in a frequency domain filtering mode, removing noise information of high-frequency components through discrete Fourier transform, and realizing the enhancement and denoising of the crack information.
And 2, image annotation.
And (3) marking the crack type, position and contour of the denoised crack image by using a VIA marking tool, and marking the marked crack image according to the following steps of 9: the ratio of 1 is randomly divided into a training set (5760 sheets) and a testing set (640 sheets) for training of subsequent models.
And step 3: and constructing a C-Mask RCNN model.
In order to solve the problem of inaccurate detection caused by a single threshold detector of a Mask R-CNN model, the invention improves the part behind a region suggestion network on the basis of the Mask R-CNN model, and as shown in figure 1, the improved C-Mask RCNN model consists of a feature extraction network, the region suggestion network and three layers of detectors. The part of the dashed line frame in fig. 1 is an improved part, a candidate frame generated by the previous detector through boundary regression is sent to the next detector, and the mask M output by each detector is subjected to information fusion through a 1 × 1 convolution operation to ensure that the fracture segmentation is more detailed.
According to the invention, the detectors with 3 layers, 4 layers and 5 layers are respectively cascaded for experiment, the experimental results are shown in table 1, wherein mAP @75 represents the accuracy of the model when the IoU threshold is set to be 0.75, and mAP @50 represents the accuracy of the model when IoU is set to be 0.5, and the detection performance is best when the number of cascaded detectors is 3.
TABLE 1 Cascade different numbers of detectors
The method adopts two modes of linear increase and curve increase to carry out experimental comparison on IoU threshold values of a set three-layer detector, wherein the linear increase IoU threshold values are set to be 0.5, 0.6 and 0.7, and 0.1 threshold value is fixedly increased each time; the curve growth IoU thresholds were set at 0.5, 0.62, and 0.72, with each increment in the threshold numbers diminishing in turn. Table 2 shows the comparison between the two increasing modes, and it can be seen from the results that the detection performance is better when the threshold of the detector is increased by adopting the curve increasing mode. In summary, the present invention employs cascaded three-level detectors, with IoU thresholds set at 0.5, 0.62, and 0.72, respectively.
TABLE 2 comparison of different IoU threshold increases
And 4, training a C-Mask RCNN model.
Inputting the training set and the test set which are divided in the step 2 into the constructed C-Mask RCNN model for training, setting the initial learning rate to be 0.0001, training 20 epochs by adopting a dynamic adjustment mode, setting the batch _ size to be 1, and carrying out 17500 iterations in total.
And 5, setting parameters of the C-Mask RCNN model.
1) The method adopts two feature extraction networks of ResNet101 and ResNeXt101 as the feature extraction networks of the C-Mask RCNN model for training, and the table 3 shows the comparison result of the detection accuracy and the segmentation accuracy of the C-Mask RCNN model under the two feature extraction networks.
TABLE 3 model comparison under two feature extraction networks
2) The invention adopts SGD, SGD-M, AdaGrad, RMSProp and Adam algorithms as model optimizers to carry out experiments respectively, and Table 4 shows model training results under five optimizers.
TABLE 4 comparison of model training results for different optimizers
3) The C-Mask RCNN model comprises three parameters which need to set threshold values by self, namely an NMS threshold value (NMS _ thr), an IoU threshold value (iou _ thr) and a Mask threshold value (Mask _ thr _ binary), the settings of the parameters are optimized through experiments, a table 5 shows the results of the models under different threshold values, the NMS _ thr threshold value is set to 0.5, the iou _ thr threshold value is set to 0.6, the Mask _ thr _ binary threshold value is set to 0.5, the mAP value of the model is the highest when the detection accuracy is 95.4%, the segmentation accuracy is 93.5%, and the performance effect is the best.
TABLE 5C-Mask RCNN comparison of accuracy at different threshold settings
And 6, outputting crack detection and segmentation results.
And displaying the crack detection and segmentation results generated by the C-Mask RCNN model on an image. The model generation results are shown in fig. 2a, fig. 2b, fig. 2C, fig. 2d, fig. 2e and fig. 2f, and it can be seen from the figures that the improved C-Mask RCNN model has good crack identification capability for transverse cracks, longitudinal cracks and reticular cracks.