Infrared thermal human body posture identification method based on deep learning

文档序号:5669 发布日期:2021-09-17 浏览:29次 中文

1. An infrared thermal human body posture identification method based on deep learning is characterized by comprising the following steps:

s1, a data set establishing step, namely, establishing an infrared image shooting platform, shooting single or multiple infrared hot human body posture images and video data sets by using a thermal imager, dividing the single or multiple infrared hot human body posture images and the video data sets into a training set, a verification set and a test set according to the data volume proportion, marking the data sets by using a picture marking tool to obtain a label xml format file of the data sets, and converting the label xml format file into a txt format file so as to convert the tag xml format file into a coco data format;

s2, a model establishing step, namely, adopting a yolov3 target detection model based on a darknet53 network as a training network model, setting hyper-parameters of the network model, adopting yolov3 model weight based on a voc2012 data set as pre-training weight for transfer learning, then utilizing the network model to repeatedly train and verify data in a training set and a verification set to obtain yolov3 model training weight based on infrared heat human body posture recognition, leading the weight into a prediction and evaluation program, continuously adjusting network parameters according to prediction and evaluation results, and establishing an infrared heat human body posture recognition model based on a deep learning pytorch frame;

and S3, a model using step, namely processing the static images and the dynamic videos in the infrared thermal human body posture data set by using the yolov3 model to finally obtain static recognition results and dynamic recognition results.

2. The infrared thermal human body posture recognition method based on deep learning of claim 1, wherein the data set establishing step of S1 specifically comprises:

s11, based on the national engineering laboratory of robot visual perception and control technology, the FLIR A6702sc thermal imager is adopted to shoot single or multi-person infrared hot human posture images and video data sets, the infrared hot human posture images and video data sets comprise single posture recognition and multi-person posture recognition, the motion types comprise single motion of walking, standing, jumping, punching, kicking or picking up things, the motion types comprise multi-person motion of walking, standing, waving, shaking, hugging or clapping, each posture motion behavior is collected from different volunteers and/or different angles, the data sets are expanded to be classified according to the motions through a data enhancement method in the dynamic behavior data set, the number of samples in each category is ensured to be basically consistent, each infrared hot human posture image sample is manually labeled by a picture labeling tool labellimg, summarizing all samples and corresponding labels thereof to obtain an infrared hot human body posture data set after data processing;

s12, dividing the data size into 8:1:1, randomly dividing infrared thermal human body posture image samples in the data set to respectively obtain three sub-data sets, namely a training set, a verification set and a test set, wherein the number of samples of each posture category in each sub-data set is basically consistent;

s13, image normalization and data enhancement processing are carried out on infrared hot human posture samples in the training set through an image preprocessing module torchvision under a pytorech frame in python, operations such as rotation, overturning, color gamut transformation and size adjustment are carried out on the image samples to increase the data volume in the training set, then the human posture image sample labels are disordered through a voc2yolo3 program, the label xml file is converted into a standard voc data format to generate a txt file, and the txt file is converted into a coco data format through voc _ annotation.

3. The infrared thermal human body posture recognition method based on deep learning of claim 2, wherein the model building step of S2 specifically includes:

s21, firstly, using yolov3 network model weight trained on a voc2012 data set as model pre-training weight to perform feature migration, setting hyper-parameters of the yolov3 model, performing feature extraction on an input image through a trunk feature extraction network darknet53, extracting three feature layers in total, performing convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layer after processing, and combining the processed part with other feature layers after deconvolution UmSampling2 d. Then, repeatedly training and verifying the infrared thermal human body posture data in the training set and the verification set to obtain an infrared thermal human body posture recognition yolov3 model;

s22, screening the trained network model weight obtained after S21 processing, selecting totalloss and valloss which are the lowest as the network model weight, then introducing the weight into an evaluation program to obtain the MAP of the model and the recall rate, accuracy and precision of each category, and correspondingly modifying the network parameters to retrain until the requirements are met according to the requirements needed by the indexes; drawing a train curve and a valloss curve by using a Tensorflow visual tool under a tensorblow framework;

and S23, testing the pre-training data set processing model by using the data in the test set, including static images and dynamic videos, to obtain a complete data set processing model.

4. The infrared thermal human body posture identification method based on deep learning of claim 2, wherein the S21 feature extraction part specifically comprises: the yolov3 target detection model adopts Darknet-53 as a main feature extraction network structure, the Darknet-53 consists of Darknet Conv2D and a Residual error network Residual module, a large number of layer-skipping connections using Residual errors are performed for five times of downsampling, the step length is 2, the convolution kernel size is 3, the feature dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is performed during each convolution, and Batchmalization standardization and LeakyReLU activation functions are performed after the convolution is completed.

5. The infrared thermal human body posture recognition method based on deep learning of claim 2, wherein the S21 feature utilization part specifically comprises: the yolov3 target detection model features utilize part of extracted multi-feature layers to carry out target detection, three feature layers are extracted altogether, the three feature layers are positioned at different positions of a trunk part darknet53 and are respectively positioned at a middle layer, a middle lower layer and a bottom layer, shape of the three feature layers is (52, 256), (26, 512), (13, 1024), the three feature layers are subjected to 5 times of convolution processing, the processed part is used for outputting a prediction result corresponding to the feature layer, and the part is used for combining with other feature layers after carrying out deconvolution UmSamplling 2 d.

6. The infrared thermal human body posture recognition method based on deep learning of claim 2, characterized in that: the hyper-parameters comprise at least the size of an image sample in the dataset to be input, the size of the batch, the learning rate size, the number of iterations and the number of classes.

7. The infrared thermal human body posture recognition method based on deep learning of claim 2, wherein S22 specifically includes: compiling the yolov3 target detection model by adopting a compiling function, adopting yolov3 model weight based on a VOC2012 data set as pre-training weight, adopting a cross entropy loss function as a loss function, adopting an Adam loss function optimizer as an optimizer, training 50 epochs after freezing network parameters and unfreezing all the parameters, and repeatedly training and verifying the preliminary data set processing model by combining data in the training set and the verification set after optimization and update to obtain a pre-training data set processing model.

8. The infrared thermal human body posture recognition method based on deep learning of claim 2, wherein S23 specifically includes: and testing the yolov3 target detection model by using the data in the test set, drawing a trainloss curve and a valloss curve by using a tensorbard visualization tool module under a tensorblow framework, and testing the data in the test set by drawing a MAP program to obtain precision AP values, accuracy values, recall values and average precision MAP values of all types of human postures, thereby finally obtaining a perfect data set processing model.

9. The infrared thermal human body posture recognition method based on deep learning of claim 3, characterized in that: py program can be called to perform infrared hot human body gesture recognition on all images in a designated folder, output the path, the category and the probability of the images and the target detection result image, and the yolo _ video. py program can be called to perform frame extraction image processing on avi/mp4 video in any human format by using the designated frame number, then the images are imported into a trained yolo prediction program and output frame labeling images, and finally the output images are automatically synthesized into video at the same frame rate, so that the recognition video is completed.

Background

With the rapid development of the field of artificial intelligence in recent years, the related technologies such as pattern recognition, machine learning, deep learning and the like are becoming mature, and are beginning to be widely applied to various fields of human production and life, and to gradually influence and change the production and life style of human beings.

Human body posture can express very rich meanings, and human body posture estimation refers to a process of restoring the positions of human body joint points in a given picture or video, and plays a vital role in describing human body posture and predicting human body behaviors. In recent years, with the development of deep learning technology, human body posture estimation is more and more widely applied to various fields of computer vision, such as human-computer interaction, behavior recognition, intelligent monitoring and the like.

With the obvious market application requirements, the visible light image identification technology can not meet increasingly complex application environments. Compared with visible light images, infrared images provide powerful support for solving the problem that illumination changes, shadows, night visibility and the like affect the vision of a traditional computer. The gesture recognition system based on the visible light image video can only work normally under the conditions of sufficient light and clear visual objects, and can not work under the conditions of poor brightness and severe weather environment, so that the uninterrupted user requirements of the system are difficult to realize. The infrared thermal imaging human body posture recognition system based on deep learning can not only eliminate poor weather interference with low visibility such as smoke, dust, fog, rain, snow and the like, but also can realize continuous work day and night, and has important research value because the infrared thermal imaging human body posture recognition system is more and more widely applied in the civil and military fields.

In summary, designing a new human body posture recognition technology with the advantages of fast recognition speed, strong anti-interference capability, excellent recognition performance and the like under the condition that the human body posture recognition technology cannot meet the requirements under the visible light condition is also a technical problem that the technical personnel in the field expect to solve.

Disclosure of Invention

In view of the above-mentioned defects of the existing human body posture recognition technology under the visible light condition, the present invention aims to provide an infrared thermal human body posture recognition method based on deep learning, which is specifically as follows.

An infrared thermal human body posture identification method based on deep learning comprises the following steps:

s1, establishing a data set, based on the national engineering laboratory of the robot visual perception and control technology, shooting a single or multiple infrared thermal human body posture image and a video data set by adopting a FLIRA6702sc thermal imager, and carrying out data volume ratio 8:1:1, randomly dividing the data into a training set, a verification set and a test set, labeling the data sets by adopting a picture labeling tool to obtain a label xml format file of the data sets, and converting the file into a txt format to further convert the file into a coco data format;

s2, a model establishing step, namely, adopting a yolov3 target detection model based on a darknet53 network as a training network model, setting hyper-parameters of the network model, adopting yolov3 model weight based on a voc2012 data set as pre-training weight for transfer learning, then utilizing the network model to repeatedly train and verify data in a training set and a verification set to obtain yolov3 model training weight based on infrared heat human body posture recognition, leading the weight into a prediction and evaluation MAP program, continuously adjusting network parameters according to prediction and evaluation results, and establishing an infrared heat human body posture recognition model based on a deep learning pytorch frame;

and S3, a model using step, namely processing the static images and the dynamic videos in the infrared thermal human body posture data set by using the yolov3 model to finally obtain static recognition results and dynamic recognition results.

Preferably, the step of establishing the data set in S1 specifically includes:

s11, based on the national engineering laboratory of robot visual perception and control technology, a FLIR A6702sc thermal imager is adopted to shoot single or multi-person infrared hot human posture images and video data sets, the posture identification data sets comprise single posture identification and multi-person posture identification, the action types comprise single action of walking, standing, jumping, punching, kicking or picking up, and multi-person action of walking, standing, waving, shaking, embracing or beating. Each behavior is collected from different volunteers and/or different angles, including front, back, left and right, the data set is expanded by video frame extraction in the dynamic behavior data set or data enhancement methods such as translation or horizontal turning and the like according to the motion classification, and the number of samples in each category is ensured to be basically consistent. Manually labeling each infrared hot human body posture image sample by adopting a picture labeling tool labellimg, and summarizing all samples and corresponding labels thereof to obtain an infrared hot human body posture data set after data processing;

s12, dividing the proportion of the training set, the verification set and the test set into 8 according to the data size: 1:1, randomly dividing infrared thermal human body posture image samples in the data set to respectively obtain three sub-data sets, namely a training set, a verification set and a test set, wherein the number of samples of each category in each sub-data set is basically consistent;

s13, image normalization and data enhancement processing are carried out on infrared hot human posture samples in the training set through an image preprocessing module torchvision under a pytorech frame in python, operations such as rotation, overturning, color gamut transformation and size adjustment are carried out on the image samples to increase the data volume in the training set, then the human posture image sample labels are disordered through a voc2yolo3 program, the label xml file is converted into a standard voc data format to generate a txt file, and the txt file is converted into a coco data format through voc _ annotation.

Preferably, the step of establishing the model in S2 specifically includes:

s21, firstly, using yolov3 network model weight trained on a voc2012 data set as model pre-training weight to perform feature migration, setting hyper-parameters of the yolov3 model, performing feature extraction on an input image through a trunk feature extraction network darknet53, extracting three feature layers in total at a feature utilization part, performing convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layer after the processing is completed, and combining the feature layers with other feature layers after the deconvolution of mSamplling 2d is performed. Then, repeatedly training and verifying the infrared thermal human body posture data in the training set and the verification set to obtain an infrared thermal human body posture recognition yolov3 model;

s22, screening the trained network model weight obtained after S21 processing, selecting totalloss and valloss which are the lowest as the network model weight, then introducing the weight into an evaluation program to obtain the MAP of the model and the recall rate, accuracy and precision of each category, and correspondingly modifying the network parameters to retrain until the requirements are met according to the requirements needed by the indexes; the trainloss, valloss curves were plotted using the tensorbore visualization tool module under the tensoflow framework, which is a utility tool provided by the tensorbow.

And S23, testing the pre-training data set processing model by using the data in the test set including the static images and the dynamic videos in the S21 to obtain a complete data set processing model.

Preferably, the S21 feature extraction section specifically includes: the yolov3 target detection model adopts Darknet-53 as a main feature extraction network structure, the Darknet-53 consists of Darknet Conv2D and a Residual error network Residual module, a large number of layer-skipping connections using Residual errors are performed for five times of downsampling, the step length is 2, the convolution kernel size is 3, the feature dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, L2 regularization is performed during each convolution, and Batchmalization standardization and LeakyReLU activation functions are performed after the convolution is completed.

Preferably, the S21 feature utilization section specifically includes: the yolov3 target detection model features utilize part extraction of multiple feature layers to carry out target detection, three feature layers are extracted in total, the three feature layers are located at different positions of a trunk part darknet53 and are respectively located at a middle layer, a middle lower layer and a bottom layer, and the shape of the three feature layers is (52, 256), (26, 512), (13, 1024). And performing convolution processing on the three characteristic layers for 5 times, wherein after the convolution processing is completed, part of the processed characteristic layers are used for outputting a prediction result corresponding to the characteristic layer, and part of the processed characteristic layers are used for combining with other characteristic layers after deconvolution UmSamplling 2 d.

Preferably, the hyper-parameters of S2 include at least a size of an image sample in the dataset to be input, a batch size, a learning rate size, a number of iterations, and a number of categories.

Preferably, S22 specifically includes: compiling the yolov3 target detection model by adopting a compiling function, adopting yolov3 model weight based on a VOC2012 data set as pre-training weight, adopting a cross entropy loss function as a loss function, adopting an Adam loss function optimizer as an optimizer, and training 50 epochs after 50 epochs are trained by freezing network parameters and 50 epochs are trained after all parameters are unfrozen. And repeatedly training and verifying the preliminary data set processing model by combining the data in the training set and the verification set after optimization and updating, and continuously adjusting network parameters to obtain a pre-training data set processing model with better effect. The network parameter regulation rule is mainly based on the trainloss curve and the valloss curve, and the learning rate, the iteration times and the batch are properly regulated according to the curve trend and the loss value.

Preferably, S23 specifically includes: and testing the yolov3 target detection model by using the data in the test set, drawing a tranloss curve and a valloss curve by using a Tensorflow visual tool module under a tensorflow frame, and testing the data in the test set by drawing a MAP program to obtain precision AP values, accuracy values, recall values and average precision MAP values of all types of human postures, thereby finally obtaining a perfect data set processing model.

Preferably, S3 specifically includes: py program can be called to perform infrared hot human body posture recognition on all images in the appointed folder, and the path, the category and the probability of the images and the target detection result image are output. The yolo-video-py program is called to perform frame extraction image processing on the avi/mp4 video in any human format by using a specified frame number, then the images are imported into a trained yolo prediction program and output frame labeling images, and finally the output images are automatically synthesized into the video at the same frame rate to complete the identification video.

The advantages of the invention are mainly embodied in the following aspects:

according to the infrared thermal human body posture identification method based on deep learning, provided by the invention, the human body posture estimation under the condition based on the infrared thermal image is realized by means of fusing machine vision and deep learning technologies, and the defects of the existing visible light image identification technology are greatly overcome. The method provided by the invention not only obviously improves the recognition rate and accuracy rate under dark environment or severe weather conditions, but also can adapt to complex application scenes such as dark environment and the like based on the thermal imaging deep learning method, and has practical application prospects.

In addition, the method of the invention also provides reference for other related problems in the same field, can be developed, extended and deeply researched on the basis of the reference, applies similar ideas and operations to other operation platforms, is applied to the application in the civil and military fields, and has very wide application prospect and very high practical application value.

The following detailed description of the embodiments of the present invention is provided in connection with the accompanying drawings for the purpose of facilitating understanding and understanding of the technical solutions of the present invention.

Drawings

FIG. 1 is a schematic overall flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of an infrared thermographic human pose dataset of the method of the present invention;

FIG. 3 is a schematic diagram of an infrared thermographic human pose data set annotation tool of the method of the present invention;

FIG. 4 is a training set trainloss plot of the present invention during training of a data set;

FIG. 5 is a validation set valloss plot during training of a data set according to the present invention;

FIG. 6 is a graph of average accuracy Precision and average accuracy MAP for various poses of a test set when testing a data set in accordance with the present invention;

FIGS. 7-16 are graphs of recognition results of various poses of a test set during testing of a data set in accordance with the present invention;

Detailed Description

The invention discloses an infrared thermal human body posture identification method based on deep learning, and the scheme details are as follows.

As shown in fig. 1, an infrared thermal human body posture recognition method based on deep learning includes the following steps:

s1, establishing a data set, based on the national engineering laboratory of the robot visual perception and control technology, shooting a single or multiple infrared thermal human body posture image and a video data set by adopting a FLIRA6702sc thermal imager, and carrying out data volume ratio 8:1:1, dividing the images into a training set, a verification set and a test set, when the resolutions of all the images shot uniformly are the same, dividing the images according to the number of the images, marking the data sets by using an image marking tool to obtain a label xml format file of the data sets, and converting the label xml format file into a txt format file and further converting the tag xml format file into a coco data format;

s2, a model establishing step, namely, adopting a yolov3 target detection model based on a darknet53 network as a training network model, setting hyper-parameters of the network model, adopting yolov3 model weight based on a voc2012 data set as pre-training weight for transfer learning, then utilizing the network model to repeatedly train and verify data in a training set and a verification set to obtain yolov3 model training weight based on infrared heat human body posture recognition, leading the weight into a prediction and evaluation program, namely an MAP program described in S23, continuously adjusting network parameters according to prediction and evaluation results, and establishing an infrared heat human body posture recognition model based on a deep learning pytorech frame;

and S3, a model using step, namely processing the static images and the dynamic videos in the infrared thermal human body posture data set by using the yolov3 model to finally obtain static recognition results and dynamic recognition results.

S1 the step of establishing the data set specifically includes:

s11, based on the national engineering laboratory of robot visual perception and control technology, a FLIR A6702sc thermal imager is adopted to shoot single or multi-person infrared hot human body posture images and video data sets, the posture identification data sets comprise single posture identification and multi-person posture identification, and the action types comprise single action of walking, standing, jumping, punching, kicking, picking up objects, multi-person action of walking, standing, waving, shaking hands, embracing, striking hands and other human body posture actions. Each behavior is collected from different volunteers and/or different angles, including front, back, left and right, the data set is expanded by video frame extraction in the dynamic behavior data set or data enhancement methods such as translation or horizontal turning and the like according to the motion classification, and the number of samples in each category is ensured to be basically consistent. Manually labeling each infrared hot human body posture image sample by adopting a picture labeling tool labellimg, and summarizing all samples and corresponding labels thereof to obtain an infrared hot human body posture data set after data processing;

s12, training set, verification set and test set according to the division ratio of 8:1:1, randomly dividing infrared thermal human body posture image samples in the data set to respectively obtain three sub-data sets of a training set, a verification set and a test set, wherein the number of samples of each category in each sub-data set is basically consistent. Training set: and (4) verification set: test set 8:1:1, the number of images of each posture in the three subdata sets of the training set, the verification set and the test set is basically consistent. For example, assuming 1000 photos of the training set, ten postures, preferably one hundred photos of each posture, are sufficient;

in the embodiment of the invention, the training data are infrared thermal imaging human body posture samples of walking, standing, jumping, punching, kicking, picking up things, waving hands, shaking hands, hugging and stroking, 300 samples are collected in each class, 300 samples 10 samples are 3000 samples in total, and the samples in each class are randomly divided into training sets (2400 samples), verification sets (300 samples) and test sets (300 samples) according to a ratio of 8:1: 1. And simultaneously, labeling the labels of the data sets respectively by a picture labeling tool labellimg labeling tool.

S13, image normalization and data enhancement processing are carried out on infrared hot human posture samples in the training set through an image preprocessing module torchvision under a pytorech frame in python, operations such as rotation, overturning, color gamut transformation and size adjustment are carried out on the image samples to increase the data volume in the training set, then the human posture image sample labels are disordered through a voc2yolo3 program, the label xml file is converted into a standard voc data format to generate a txt file, and the txt file is converted into a coco data format through voc _ annotation.

S2, the model establishing step specifically comprises:

s21, firstly, using yolov3 network model weight trained on a voc2012 data set as model pre-training weight to perform feature migration learning, setting hyper-parameters of the yolov3 model, performing feature extraction on an input image through a trunk feature extraction network darknet53, extracting three feature layers in total at a feature utilization part, performing convolution processing on the three feature layers for 5 times, outputting a prediction result corresponding to the feature layer after the processing, and combining the feature layers with other feature layers after the processing is performed on the part for deconvolution Uampling 2 d. Then, repeatedly training and verifying the infrared thermal human body posture data in the training set and the verification set to obtain an infrared thermal human body posture recognition yolov3 model;

the foregoing operation may specifically be that the yolov3 target detection model adopts Darknet-53 as a backbone feature extraction network structure, and the Darknet-53 is composed of a Darknet conv2D and a Residual network Residual module, where the Residual convolution in the Darknet53 is a convolution performed once with a step length of 3 × 3 and 2, then the convolution layer is stored, and then a convolution of 1 × 1 and a convolution of 3 × 3 are performed again, and this result is added to the layer as a final result. And then, performing layer jump connection by using a large amount of residual errors, performing five times of downsampling, wherein the step length is 2, the convolution kernel size is 3, the characteristic dimensions are 64, 128, 256, 512 and 1024 respectively, an average pooling layer and a full connection layer are not used, performing L2 regularization during each convolution, and performing Batchnormalization standardization and LeakyReLU activation functions after the convolution is completed. ReLU is set to zero for all negative values, whereas LeakyReLU is given a non-zero slope for all negative values. The Leaky ReLU activation function in a mathematical way we can express as follows:

the yolov3 target detection model features utilize part extraction of multiple feature layers to carry out target detection, three feature layers are extracted in total, the three feature layers are located at different positions of a trunk part darknet53 and are respectively located at a middle layer, a middle lower layer and a bottom layer, and the shape of the three feature layers is (52, 256), (26, 512), (13, 1024). And performing convolution processing on the three characteristic layers for 5 times, wherein after the convolution processing is completed, part of the processed characteristic layers are used for outputting a prediction result corresponding to the characteristic layer, and part of the processed characteristic layers are used for combining with other characteristic layers after deconvolution UmSamplling 2 d.

The hyper-parameters comprise at least the size of an image sample in the dataset to be input (input _ shape), the size of batch (batch _ size), the number of iterations (epochs), the learning rate (lr) and the number of classes (num _ classes). In an embodiment of the present invention, the size input _ shape of an image sample in a dataset to be input is set to 416 × 3; setting the size of batch Freeze-batch _ size to be 8 and the size of UnPreze-batch _ size to be 4; typically 2^ N, such as 32, 64, 128; setting the category number num _ categories as 10; setting the iteration number Freeze _ epochs to be 50 and the UnPreze _ epochs to be 100; the learning rate Freeze _ lr is set to 1e-3, and UnPreze _ lr is set to 1 e-4.

The transfer learning is a machine learning method, namely, a model trained aiming at data A is used as an initial point in the development and design process of a model aiming at data B. The invention performs transfer learning on a yolov3 model pre-trained on the VOC2012 data set.

S22, screening the trained network model weight obtained after S21 processing, selecting totalloss and valloss which are the lowest as the network model weight, then introducing the weight into an evaluation program to obtain the MAP of the model and the recall rate, accuracy and precision of each category, and correspondingly modifying the network parameters to retrain until the requirements are met according to the requirements needed by the indexes; the Tensorflow visualization tool module under the Tensorflow framework was used to plot the train, valloss curves.

The operation may specifically be that a yolov3 target detection model is compiled by using a compiling function, the pretrained weight is a yolov3 model weight based on a VOC2012 data set, the loss function is a cross entropy loss function, the optimizer is an Adam loss function optimizer, and 50 epochs are trained after network parameters are frozen and 50 epochs are trained after all parameters are thawed. And repeatedly training and verifying the preliminary data set processing model by combining the data in the training set and the verification set after optimization and update to obtain a pre-training data set processing model.

The cross entropy loss function is a smooth function, the essence of the cross entropy loss function is the application of the cross entropy in the information theory in the classification problem, and the formula is

The Adam loss function optimizer is an optimization method that calculates an adaptive learning rate that can be used for each parameter. I.e. the square v of the past gradient is storedtThe average value of exponential decay of (1), and the past gradient m is maintainedtExponential decay average of (d):

wherein m istIs an exponential moving average, vtIs the square gradient, gtFor gradients on a time step sequence

If m istAnd vtAre initialized to 0 vectors, and then they are biased toward 0, so that the offset correction is performed by calculating the offset-corrected mtAnd vtTo counteract these deviations:

the gradient update rule is:

the over-parameter set value is:

β1=0.9,β2=0.999,ε=10e-8。

and S23, testing the pre-training data set processing model by using the data in the test set including the static images and the dynamic videos in the S21 to obtain a complete data set processing model.

The above operation may specifically be that the yolov3 target detection model is tested by using the data in the test set, a terminboard visualization tool module under a tensoflow frame is used to draw a trainloss curve and a valloss curve, and the test set data is tested by drawing a prediction and evaluation program MAP program to obtain a precision AP value, an accuracy value, a recall value and an average precision MAP value of each category of the human body posture, so as to finally obtain a perfect data set processing model.

And S3, a model using step, namely processing the static images and the dynamic videos in the infrared thermal human body posture data set by using the yolov3 model to finally obtain static recognition results and dynamic recognition results.

The operation may specifically be that infrared hot human body posture recognition may be performed on all images in the designated folder by calling a predict. The yolo-video-py program is called to perform frame extraction image processing on the avi/mp4 video in any human format by using a specified frame number, then the images are imported into a trained yolo prediction program and output frame labeling images, and finally the output images are automatically synthesized into the video at the same frame rate to complete the identification video.

According to the infrared thermal human body posture identification method based on deep learning, provided by the invention, the human body posture estimation under the condition based on the infrared thermal image is realized by means of fusing machine vision and deep learning technologies, and the defects of the existing visible light image identification technology are greatly overcome. The method provided by the invention not only obviously improves the recognition rate and accuracy rate under dark environment or severe weather conditions, but also can adapt to complex application scenes such as dark environment and the like based on the thermal imaging deep learning method, and has practical application prospects.

Specifically, the method of the invention takes a pre-trained yolov3 network based on a VOC2012 data set as a basic network, and realizes an algorithm flow through a pytorech platform.

By evaluating the actual performance of the method in processing different types of image data sets, the method has better generalization performance on different data sets, and the average precision MAP on a single image, a multi-person image and a single and multi-person mixed image respectively reaches 90.64%, 84.01% and 87.48%. The method provides powerful technical support for recognizing the human body posture under the condition of visible light under the defects in the dark environment or the severe weather condition.

In addition, the method of the invention also provides reference for other related problems in the same field, can be developed, extended and deeply researched on the basis of the reference, applies similar ideas and operations to other operation platforms, is applied to the application in the civil and military fields, and has very wide application prospect and very high practical application value.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Finally, it should be understood that although the present description refers to embodiments, not every embodiment contains only a single technical solution, and such description is for clarity only, and those skilled in the art should integrate the description, and the technical solutions in the embodiments can be appropriately combined to form other embodiments understood by those skilled in the art.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种适用于地面目标温度的高精度估算方法

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!