Novel solar cell panel hot spot detection method based on YOLOv3
1. As a new technology for solar panel hot spot detection, the YOLOv3 algorithm has a number of outstanding advantages. The basic idea of the YOLO algorithm is: firstly, extracting features from an input image through a feature extraction network to obtain a feature map of a certain size, such as 13 × 13, then dividing the input image into 13 × 13 grid cells, and then if the central coordinate of an object in a grid tree falls into which grid cell, predicting the object by the grid cell because each grid cell predicts a fixed number of bounding boxes (2 in YOLO v1, 5 in YOLO v2, 3 in YOLO v3, and the initial sizes of several bounding boxes are not the same), wherein only the bounding box with the largest IOU of the grid tree is used to predict the object. The method for detecting the hot spot of the solar panel is a method for measuring a volt-ampere characteristic curve, and the biggest disadvantage of the method is that the specific position of the hot spot cannot be determined. The CTCT (complete total cross affected array) structure method is a connection mode of a photovoltaic array proposed by Chengzi, Liwar Bing et al. The CTCT structure method greatly improves the accuracy of hot spot detection, but the installation engineering of a large number of current sensors is complicated and costs much. Therefore, the accuracy and speed of hot spot detection are improved. The invention provides a solar panel hot spot detection method based on YOLOv3 aiming at the problem. The method comprises the following steps:
(1) solar cell panel infrared picture preprocessing
The method comprises the following specific steps:
step 1) translation conversion of images
The image translation transformation is to add the specified horizontal offset and vertical offset to all the pixel coordinates in the image. The matrix of the translation transformation is represented as:
the coordinate mapping relation of the translation transformation can be calculated by the formula (1) as follows:
in the formula dx、dyRespectively a horizontal offset and a vertical offset.
Step 2) mirror image conversion of images
The mirror transformation of an image is mainly divided into two categories: horizontal mirroring and vertical mirroring. Taking the vertical central line of the image as an axis, the horizontal mirror image is to symmetrically transform the pixel coordinates of the left half part and the right half part of the image. The vertical mirroring is also a symmetric transformation of pixels of an image, but the vertical mirroring is a symmetric transformation of pixel coordinates with a horizontal center line as an axis.
The mapping relationship of the horizontal mirror transformation is as follows:
the conversion is represented as a matrix:
the width in the formula (4) is the width of the image. Similarly, if the height of the image is height, the mapping relationship of the vertical mirror transformation is as follows:
the matrix is then represented as:
step 3) zooming of the image
The scaling of the image is mainly used for changing the size of the image, and the width or height of the image after scaling transformation can be changed to some extent. In the scaling conversion of an image, a horizontal scaling factor and a vertical scaling factor as parameters are important. The horizontal scaling factor is a scaling factor for controlling the horizontal pixels, and when the horizontal scaling factor is equal to 1, the width of the image remains unchanged; when the horizontal scaling factor is less than 1, the width of the image becomes small and the image is compressed in the horizontal direction; in contrast, when the horizontal scaling factor is larger than 1, the width of the image becomes large, i.e., is stretched in the horizontal direction. Vertical scaling factors behave similarly to horizontal scaling factors, but in the vertical direction. When the scaling transformation is actually applied, the change of the ratio of the width to the height of the original image needs to be kept consistent, that is, the values of the horizontal scaling coefficient and the vertical scaling coefficient are the same, so that the image is not deformed when the scaling transformation is performed. The coordinate mapping of the scaling transformation is as follows:
the conversion is represented as a matrix:
s in formula (9)xAnd SyRespectively horizontal and vertical scaling factor
(2) Solar panel position acquisition closest to picture center
The method comprises the following specific steps:
step 1), constructing a battery plate feature extraction network Darknet-53 to extract input image features and fully extract image shallow layer feature information;
step 2) fusing the extracted shallow Feature information with the 4-scale detection of YOLOv3 to construct a multi-scale Feature Pyramid Network (FPN);
this structure connects these features to detect images of a large scale. The left side of the structure is a feedforward network from bottom to top, which consists of a plurality of groups of convolution networks and is used for feature extraction; the right side is a top-down process that enhances high-level features by connecting laterally with the left-side path. This path is kept consistent with the left dimension, in particular, using upsampling.
Step 3) generating an anchor box by using a K-means algorithm, performing bounding box regression and multi-label classification according to a loss function, and optimizing a solution model by using a random gradient descent method;
the loss function of YOLOv3 consists of 3 parts:
target confidence loss:
loss of target classification:
two losses of target localization:
equation (13) is a general equation for the derivation of the loss function to the weight in a deep neural network, where L is the loss function; w is aijIs a weight parameter in the network; node(s)jIs part of a neuron in a deep neural network;is the derivative of the output value in the deep neural network; x is the number ofijAre input values in a deep neural network. Equation (14) is a weight update equation after chain derivation of the x-coordinate positioning section loss function in target positioning loss, where LlocxIs the x coordinate offset loss function in YOLOv 3;is the derivative of the output value in the deep neural network.
And 4) extracting features of the input picture by using the pre-trained network model, sending the features into a multi-scale detection module in the simplified YOLO network for prediction, removing redundant frames by adopting a maximum suppression method (NMS) and predicting an optimal target object.
The length and the width of a target frame are counted in a data set to be trained through a K-means clustering method, K initial clustering center points are observed and selected, the distance from all data objects to each clustering center point is calculated one by one, and then the data objects are distributed to a set with the shortest distance. The invention adopts the method that the proper IOU fraction is selected, and 12 anchors box with 4 scales are adopted to be respectively (12,26), (15,45) (24,23) (29,51), (33,81), (35,54), (46,100), (54,67), (87,105), (105,170), (150,245), (165,321); and (3) sequencing according to the class classification probability of the classifier by adopting a maximum value suppression method (NMS), continuously performing IOU operation with the frame with the maximum score and other frames in an iteration mode, filtering the frames with larger IOU, repeating the operation all the time, marking all the rectangular frames to be reserved, removing redundant candidates and predicting the optimal target object.
And 5) calculating the center distance from each target object to the whole picture by using the Manhattan distance, and recording the coordinates of the nearest upper left corner and lower right corner.
c=|x1-x2|+|y1-y2| (15)
(3) Solar panel hot spot detection based on YOLOv3
1) Constructing a hot spot feature extraction network Darknet-53 to extract features of an input image and fully extracting shallow feature information of the image; 2) fusing the extracted shallow feature information with the 3-scale detection of YOLOv3 to construct a multi-scale feature pyramid network; 3) generating 12 anchor boxes by using a K-means algorithm, performing bounding box regression and multi-label classification according to a loss function, and optimizing a solution model by using a random gradient descent method; 4) extracting features of an input picture by using a pre-trained network model, sending the extracted features to a multi-scale detection module in a simplified YOLO network for prediction, removing redundant frames by adopting a maximum value suppression (NMS) method, and predicting an optimal target object. The concrete steps are the same as the step (2).
(4) The hot spots remaining in the recorded solar panel range are calculated based on the manhattan distance.
Background
Due to the earth's autobiography, the solar altitude changes periodically, which causes the frame to generate periodic shadow shading on the solar panel. The performance of the shielded single solar panel is not matched with that of other normal (unshielded) solar panels, and the current of the shielded solar panel is reduced and takes a negative voltage, namely the shielded single solar panel is converted into a load in a circuit. The solar panel then begins to continuously dissipate the power generated by the other solar panels and convert the energy into heat, resulting in an increasing temperature in the area centered on the solar panel. In the past, when the temperature is raised to a certain degree, the solar panel is damaged, and the damaged area is called hot spot, which is the phenomenon described above. If the hot spots cannot be detected in time and the problem component is replaced, the area of the hot spots is larger; and melting the welding spots on the solar panel and damaging the grid lines until the temperature generated by the hot spot effect is too high, and finally causing the whole group of solar panels to be damaged and not to be repaired. However, in order to show that the solar panel can be used for a long time under the specified conditions, the solar panel needs to be detected by a reasonable method and technology.
The method for detecting the hot spots of the solar panel is a method for measuring a volt-ampere characteristic curve, and is only suitable for detecting some simple defects on the solar panel because environmental factors such as temperature, illumination intensity and the like are not considered. In addition, the greatest disadvantage of this method is that the specific location of the defect cannot be determined. The CTCT (complete total cross affected array) structure method is a connection mode of a photovoltaic array proposed by Chengzi, Liwar Bing et al. The CTCT structure method greatly improves the accuracy of hot spot detection, but the installation engineering of a large number of current sensors is complicated and costs much. The hot spot detection based on the infrared image is a method which is rapid and accurate and does not need to directly measure the parameters of the solar panel. The method comprises the steps of firstly shooting a solar panel by using a thermal infrared imager, then carrying out image analysis on the collected infrared image, and if a temperature of a certain part shows obvious difference, indicating that the part is a fault part. The infrared image detection technology is applied to hot spot detection of solar panels, and the earliest researchers traced back to king.d.l researchers at the end of the 20 th century. The invention introduces the YOLOv3 technology on the basis of the infrared image technology, and can detect the position of the hot spot more quickly and accurately.
Disclosure of Invention
The invention provides a novel solar cell panel hot spot detection method based on YOLOv 3. The purpose of the invention is realized as follows:
(1) solar cell panel infrared picture preprocessing
The method comprises the following specific steps:
step 1) translation conversion of images
The image translation transformation is to add the specified horizontal offset and vertical offset to all the pixel coordinates in the image. The matrix of the translation transformation is represented as:
the coordinate mapping relation of the translation transformation can be calculated by the formula (1) as follows:
x=x0+dx
y=x0+dy (2)
in the formula dx、dyRespectively a horizontal offset and a vertical offset.
Step 2) mirror image conversion of images
The mirror transformation of an image is mainly divided into two categories: horizontal mirroring and vertical mirroring. Taking the vertical central line of the image as an axis, the horizontal mirror image is to symmetrically transform the pixel coordinates of the left half part and the right half part of the image. The vertical mirroring is also a symmetric transformation of pixels of an image, but the vertical mirroring is a symmetric transformation of pixel coordinates with a horizontal center line as an axis.
The mapping relationship of the horizontal mirror transformation is as follows:
the conversion is represented as a matrix:
the width in the formula (4) is the width of the image. Similarly, if the height of the image is height, the mapping relationship of the vertical mirror transformation is as follows:
the matrix is then represented as:
step 3) zooming of the image
The scaling of the image is mainly used for changing the size of the image, and the width or height of the image after scaling transformation can be changed to some extent. In the scaling conversion of an image, a horizontal scaling factor and a vertical scaling factor as parameters are important. The horizontal scaling factor is a scaling factor for controlling the horizontal pixels, and when the horizontal scaling factor is equal to 1, the width of the image remains unchanged; when the horizontal scaling factor is less than 1, the width of the image becomes small and the image is compressed in the horizontal direction; in contrast, when the horizontal scaling factor is larger than 1, the width of the image becomes large, i.e., is stretched in the horizontal direction. Vertical scaling factors behave similarly to horizontal scaling factors, but in the vertical direction. When the scaling transformation is actually applied, the change of the ratio of the width to the height of the original image needs to be kept consistent, that is, the values of the horizontal scaling coefficient and the vertical scaling coefficient are the same, so that the image is not deformed when the scaling transformation is performed. The coordinate mapping of the scaling transformation is as follows:
the conversion is represented as a matrix:
formula (9)) Middle SxAnd SyRespectively horizontal and vertical scaling factor
(2) Solar panel position acquisition closest to picture center
The method comprises the following specific steps:
step 1), constructing a battery plate feature extraction network Darknet-53 to extract input image features and fully extract image shallow layer feature information;
step 2) fusing the extracted shallow Feature information with the 4-scale detection of YOLOv3 to construct a multi-scale Feature Pyramid Network (FPN);
this structure connects these features to detect images of a large scale. The left side of the structure is a feedforward network from bottom to top, which consists of a plurality of groups of convolution networks and is used for feature extraction; the right side is a top-down process that enhances high-level features by connecting laterally with the left-side path. This path is kept consistent with the left dimension, in particular, using upsampling.
Step 3) generating an anchor box by using a K-means algorithm, performing bounding box regression and multi-label classification according to a loss function, and optimizing a solution model by using a random gradient descent method;
the loss function of YOLOv3 consists of 3 parts:
target confidence loss:
loss of target classification:
two losses of target localization:
equation (13) is in a deep neural networkA general formula of weight derivation by a loss function, wherein L is the loss function; w is aijIs a weight parameter in the network; node(s)jIs part of a neuron in a deep neural network;is the derivative of the output value in the deep neural network; x is the number ofijAre input values in a deep neural network. Equation (14) is a weight update equation after chain derivation of the x-coordinate positioning section loss function in target positioning loss, where LlocxIs the x coordinate offset loss function in YOLOv 3;is the derivative of the output value in the deep neural network.
And 4) extracting features of the input picture by using the pre-trained network model, sending the features into a multi-scale detection module in the simplified YOLO network for prediction, removing redundant frames by adopting a maximum suppression method (NMS) and predicting an optimal target object.
The length and the width of a target frame are counted in a data set to be trained through a K-means clustering method, K initial clustering center points are observed and selected, the distance from all data objects to each clustering center point is calculated one by one, and then the data objects are distributed to a set with the shortest distance. The invention adopts the method that the proper IOU fraction is selected, and 12 anchors box with 4 scales are adopted according to the relationship between the IOU and the anchors box, namely (12,26), (15,45), (24,23), (29,51), (33,81), (35,54), (46,100), (54,67), (87,105), (105,170), (150,245), (165,321); and (3) sequencing according to the class classification probability of the classifier by adopting a maximum value suppression method (NMS), continuously performing IOU operation with the frame with the maximum score and other frames in an iteration mode, filtering the frames with larger IOU, repeating the operation all the time, marking all the rectangular frames to be reserved, removing redundant candidates and predicting the optimal target object.
And 5) calculating the center distance from each target object to the whole picture by using the Manhattan distance, and recording the coordinates of the nearest upper left corner and lower right corner.
c=|x1-x2|+|y1-y2|
(3) Solar panel hot spot detection based on YOLOv3
1) Constructing a hot spot feature extraction network Darknet-53 to extract features of an input image and fully extracting shallow feature information of the image; 2) fusing the extracted shallow feature information with the 3-scale detection of YOLOv3 to construct a multi-scale feature pyramid network; 3) generating 12 anchor boxes by using a K-means algorithm, performing bounding box regression and multi-label classification according to a loss function, and optimizing a solution model by using a random gradient descent method; 4) extracting features of an input picture by using a pre-trained network model, sending the extracted features to a multi-scale detection module in a simplified YOLO network for prediction, removing redundant frames by adopting a maximum value suppression (NMS) method, and predicting an optimal target object.
The concrete method is the same as the step (2)
(4) Computing hot spots remaining in the recorded solar panel range based on manhattan distance
The invention has the following effects: the method can accurately detect the hot spots on the solar cell panel, effectively solves the problems that the traditional detection mode is subjected to background texture, ambient light and parameter adjustment of different pictures, improves the accuracy and speed of target detection, and avoids the overfitting phenomenon of a training neural network to a certain extent.
Description of the drawings.
FIG. 1 is a flow chart of a solar panel hot spot detection method based on YOLOv3 in the invention
Detailed Description
The following describes the implementation of the method of the invention:
(1) solar cell panel infrared picture preprocessing
The method comprises the following specific steps:
step 1) translation conversion of images
The image translation transformation is to add the specified horizontal offset and vertical offset to all the pixel coordinates in the image. The matrix of the translation transformation is represented as:
the coordinate mapping relation of the translation transformation can be calculated by the formula (1) as follows:
in the formula dx、dyRespectively a horizontal offset and a vertical offset.
Step 2) mirror image conversion of images
The mirror transformation of an image is mainly divided into two categories: horizontal mirroring and vertical mirroring. Taking the vertical central line of the image as an axis, the horizontal mirror image is to symmetrically transform the pixel coordinates of the left half part and the right half part of the image. The vertical mirroring is also a symmetric transformation of pixels of an image, but the vertical mirroring is a symmetric transformation of pixel coordinates with a horizontal center line as an axis.
The mapping relationship of the horizontal mirror transformation is as follows:
the conversion is represented as a matrix:
the width in the formula (4) is the width of the image. Similarly, if the height of the image is height, the mapping relationship of the vertical mirror transformation is as follows:
the matrix is then represented as:
step 3) zooming of the image
The scaling of the image is mainly used for changing the size of the image, and the width or height of the image after scaling transformation can be changed to some extent. In the scaling conversion of an image, a horizontal scaling factor and a vertical scaling factor as parameters are important. The horizontal scaling factor is a scaling factor for controlling the horizontal pixels, and when the horizontal scaling factor is equal to 1, the width of the image remains unchanged; when the horizontal scaling factor is less than 1, the width of the image becomes small and the image is compressed in the horizontal direction; in contrast, when the horizontal scaling factor is larger than 1, the width of the image becomes large, i.e., is stretched in the horizontal direction. Vertical scaling factors behave similarly to horizontal scaling factors, but in the vertical direction. When the scaling transformation is actually applied, the change of the ratio of the width to the height of the original image needs to be kept consistent, that is, the values of the horizontal scaling coefficient and the vertical scaling coefficient are the same, so that the image is not deformed when the scaling transformation is performed. The coordinate mapping of the scaling transformation is as follows:
the conversion is represented as a matrix:
s in formula (9)xAnd SyRespectively horizontal and vertical scaling factor
(2) Solar panel position acquisition closest to picture center
The method comprises the following specific steps:
step 1), constructing a battery plate feature extraction network Darknet-53 to extract input image features and fully extract image shallow layer feature information;
step 2) fusing the extracted shallow Feature information with the 4-scale detection of YOLOv3 to construct a multi-scale Feature Pyramid Network (FPN);
this structure connects these features to detect images of a large scale. The left side of the structure is a feedforward network from bottom to top, which consists of a plurality of groups of convolution networks and is used for feature extraction; the right side is a top-down process that enhances high-level features by connecting laterally with the left-side path. This path is kept consistent with the left dimension, in particular, using upsampling.
Step 3) generating an anchor box by using a K-means algorithm, performing bounding box regression and multi-label classification according to a loss function, and optimizing a solution model by using a random gradient descent method;
the loss function of YOLOv3 consists of 3 parts:
target confidence loss:
loss of target classification:
two losses of target localization:
equation (13) is a general equation for the derivation of the loss function to the weight in a deep neural network, where L is the loss function; w is aijIs a weight parameter in the network; node(s)jIs the deep nerveA portion of a neuron in a network;is the derivative of the output value in the deep neural network; x is the number ofijAre input values in a deep neural network. Equation (14) is a weight update equation after chain derivation of the x-coordinate positioning section loss function in target positioning loss, where LlocxIs the x coordinate offset loss function in YOLOv 3;is the derivative of the output value in the deep neural network.
And 4) extracting features of the input picture by using the pre-trained network model, sending the features into a multi-scale detection module in the simplified YOLO network for prediction, removing redundant frames by adopting a maximum suppression method (NMS) and predicting an optimal target object.
The length and the width of a target frame are counted in a data set to be trained through a K-means clustering method, K initial clustering center points are observed and selected, the distance from all data objects to each clustering center point is calculated one by one, and then the data objects are distributed to a set with the shortest distance. The invention adopts the method that the proper IOU fraction is selected, and 12 anchors box with 4 scales are adopted to be respectively (12,26), (15,45) (24,23) (29,51), (33,81), (35,54), (46,100), (54,67), (87,105), (105,170), (150,245), (165,321); and (3) sequencing according to the class classification probability of the classifier by adopting a maximum value suppression method (NMS), continuously performing IOU operation with the frame with the maximum score and other frames in an iteration mode, filtering the frames with larger IOU, repeating the operation all the time, marking all the rectangular frames to be reserved, removing redundant candidates and predicting the optimal target object.
And 5) calculating the center distance from each target object to the whole picture by using the Manhattan distance, and recording the coordinates of the nearest upper left corner and lower right corner.
c=|x1-x2|+|y1-y2| (15)
(3) Solar panel hot spot detection based on YOLOv3
1) Constructing a hot spot feature extraction network Darknet-53 to extract features of an input image and fully extracting shallow feature information of the image; 2) fusing the extracted shallow feature information with the 3-scale detection of YOLOv3 to construct a multi-scale feature pyramid network; 3) generating 12 anchor boxes by using a K-means algorithm, performing bounding box regression and multi-label classification according to a loss function, and optimizing a solution model by using a random gradient descent method; 4) extracting features of an input picture by using a pre-trained network model, sending the extracted features to a multi-scale detection module in a simplified YOLO network for prediction, removing redundant frames by adopting a maximum value suppression (NMS) method, and predicting an optimal target object.
(4) Computing hot spots remaining in the recorded solar panel range based on manhattan distance
(5) Experimental verification
Step 1) acquiring an infrared image of a solar panel, wherein the infrared image is divided into a training set and a testing set, an experimental data set is sorted according to a Pascal VOC data set format, and the data is divided into the training set and the testing set according to a proportion. The characteristic extraction network Darknet-53 of the invention comprises the following specific steps: changing 2 times of convolution operation and 1 time of direct connection between feature graphs with input and output sizes of 208 multiplied by 208 into 4 times of convolution operation and 2 times of direct connection; changing 4 times of convolution operation and 2 times of direct connection between characteristic graphs with input and output sizes of 104 multiplied by 104 into 12 times of convolution operation and 6 times of direct connection; changing 16 times of convolution operation and 8 times of direct connection between feature graphs with input and output sizes of 52 multiplied by 52 into 12 times of convolution operation and 6 times of direct connection; changing 16 times of convolution operation and 8 times of direct connection between feature graphs with input and output sizes of 26 multiplied by 26 into 8 times of convolution operation and 4 times of direct connection; changing 8 times of convolution operation and 4 times of direct connection between feature graphs with input and output sizes of 13 multiplied by 13 into 4 times of convolution operation and 2 times of direct connection; the transition layer is a 1 × 1 convolution layer and a 3 × 3 convolution layer which are alternately used, and the 1 × 1 convolution layer is used to help smooth the extracted features, so that more feature information is prevented from being lost in the down-sampling process.
Step 2) increasing the 3-scale detection of the YOLOv3 to 4-scale to form four branch structures, wherein the input size is 416 x 416, each branch shares the features extracted from the ResNet network, and performing double-time upsampling operation on the branches with the resolutions of 13 x 13,26 x 26 and 52 x 52, and cascading the upsampled feature layer with a shallow feature layer to perform independent detection on the fused feature maps with the 4 scales respectively. The improved multi-scale fusion can learn stronger position characteristics from a shallow characteristic layer, and fuse the deep characteristics after up-sampling to perform more accurate fine-grained detection. Compared with the original 3-scale detection of the YOLOv3 network, the method has the advantages that more scales of shallow feature information are fused, the characterization capability of the feature pyramid is enhanced, the detection precision of small targets is improved, the omission ratio is reduced, and the network performance before and after the improved scales is evaluated through mAP (mean average precision) and iteration index, wherein the precision of precision and the Recall ratio of Recall.
TP (true Positives) is "divided into positive samples and paired", TN (true Negatives) is "divided into negative samples and paired", FP (false Positives) is "divided into positive samples but misclassified", and FN (false Negatives) is "divided into negative samples but misclassified".
Step 3) generating 12 anchor boxes by using a K-means algorithm, and performing bounding box regression and multi-label classification according to a loss function; removing redundant frames by adopting a maximum value suppression (NMS) method and predicting an optimal target object, wherein the method comprises the following specific steps: the length and the width of a target frame are counted in a data set to be trained through a K-means clustering method, K initial clustering center points are observed and selected, the distance from all data objects to each clustering center point is calculated one by one, and then the data objects are distributed to a set with the shortest distance. The invention adopts the method that the proper IOU fraction is selected, and 12 anchors box with 4 scales are adopted according to the relationship between the IOU and the anchors box, namely (12,26), (15,45), (24,23), (29,51), (33,81), (35,54), (46,100), (54,67), (87,105), (105,170), (150,245), (165,321); and (3) sequencing according to the class classification probability of the classifier by adopting a maximum value suppression method (NMS), continuously performing IOU operation with the frame with the maximum score and other frames in an iteration mode, filtering the frames with larger IOU, repeating the operation all the time, marking all the rectangular frames to be reserved, removing redundant candidates and predicting the optimal target object.
In the experiment, the mAP indexes of two kinds of hot spots (linear type and circular type) detected by the infrared picture of the solar panel are Line 92% and Circle 85%.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种配电柜的圆形指示灯定位与状态识别方法