Pedestrian re-identification method, device and equipment based on deep learning and storage medium
1. A pedestrian re-identification method based on deep learning is characterized by comprising the following steps:
carrying out human body target detection on the source image data to determine a target pedestrian;
carrying out attribute detection on the target pedestrian and determining local region attribute information of the target pedestrian;
performing feature extraction on a local region of the target pedestrian to obtain features of the local region, performing weighted summation on the features, and determining the features of the target pedestrian;
determining the similarity between the features of the target pedestrian and each feature in a pedestrian feature library;
and determining a pedestrian re-identification result according to the similarity.
2. The pedestrian re-recognition method based on deep learning of claim 1, wherein the performing human target detection on the source image data to determine the target pedestrian comprises:
acquiring a real-time video sequence to obtain a plurality of source image data;
and carrying out human body target detection on each source image data through a YOLOv5 network, and determining a plurality of target pedestrians in each source image data.
3. The pedestrian re-identification method based on deep learning according to claim 1, wherein the performing attribute detection on the target pedestrian and determining local area attribute information of the target pedestrian comprises:
performing attribute detection on the target pedestrian through a YOLOv5 network, and determining local area attribute information of the target pedestrian;
wherein the attribute information includes a pedestrian whole body region, a pedestrian head region, a pedestrian upper body region, a pedestrian lower body region, a pedestrian category, and a pedestrian confidence.
4. The pedestrian re-identification method based on deep learning according to claim 1, wherein the step of performing feature extraction on the local region of the target pedestrian to obtain features of the local region and performing weighted summation on the features to determine the features of the target pedestrian comprises:
inputting the attribute information of the target pedestrian into a feature extraction network, and extracting the feature of the target pedestrian by the feature extraction network;
wherein the step of the feature extraction network extracting the features of the target pedestrian comprises:
processing the attribute information through a shared CNN to obtain a first processing result;
performing label distinguishing on the first processing result to obtain second processing results of different types;
inputting the second processing results of different types into an independent network for feature extraction to obtain third processing results;
carrying out feature weighted summation processing on a third processing result obtained by the independent network processing to obtain a fourth processing result;
outputting the characteristic of the target pedestrian.
5. The pedestrian re-identification method based on deep learning of claim 1, wherein the method further comprises a step of constructing a pedestrian feature library.
6. The pedestrian re-identification method based on deep learning of claim 1, wherein the determining of the similarity between the features of the target pedestrian and the features in the pedestrian feature library comprises:
acquiring attribute information of the target pedestrian, and filtering the features in the pedestrian feature library according to the attribute information;
extracting features to be selected in a filtered pedestrian feature library, calculating a vector distance between the features to be selected and the features of the target pedestrian, and determining the similarity between the features of the target pedestrian and each feature in the pedestrian feature library;
wherein the filtering conditions include age condition, gender condition, and hair length condition.
7. The pedestrian re-identification method based on deep learning according to claim 4, wherein the inference step of the feature extraction network comprises:
converting the trained data into a TensorRT reasoning model;
carrying out Int8 quantization processing on the TensorRT reasoning model;
and reasoning the quantitative processing result through TensorRT to obtain the characteristics of the target pedestrian.
8. A pedestrian re-recognition device based on deep learning is characterized by comprising:
the target detection module is used for carrying out human body target detection on the source image data and determining a target pedestrian;
the attribute detection module is used for carrying out attribute detection on the target pedestrian and determining the local region attribute information of the target pedestrian;
the characteristic extraction module is used for extracting the characteristics of the local area of the target pedestrian to obtain the characteristics of the local area, carrying out weighted summation on the characteristics and determining the characteristics of the target pedestrian;
the similarity calculation module is used for determining the similarity between the features of the target pedestrian and each feature in a pedestrian feature library;
and the identification module is used for determining a pedestrian re-identification result according to the similarity.
9. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-7.
Background
Pedestrian Re-identification (Person Re-identification, also called pedestrian Re-identification, abbreviated ReID) is a technique for determining whether a specific pedestrian is present in an image or video sequence using computer vision techniques. In a video monitoring scene, pedestrians are important components, and the pedestrians are identified, tracked and the like by means of a pedestrian re-identification technology. The pedestrian re-identification technology mainly comprises two parts: and extracting the pedestrian features and comparing the pedestrian features. The pedestrian feature extraction is to extract representative feature vectors from the pedestrian detected by snapshot; the pedestrian feature comparison means that distance similarity calculation is carried out on the extracted features and pedestrian features in the feature library, and the probability that the pedestrian is the same person is higher when the similarity is higher.
In the existing feature representation-based method, the pedestrian is generally represented by extracting more robust identification features. Although the computational complexity is relatively low, the accuracy is also relatively low.
Distance-based methods generally learn a discriminative distance metric function so that the distance between images of the same person is less than the distance between images of different pedestrians. A complex learning process is required, and timeliness and accuracy in practical application cannot be well met.
In an actual scene, the situation is complex and changeable. The great change in the appearance of the pedestrian due to the different lower cameras includes: the pedestrian re-identification method has the advantages that problems of pedestrian size, illumination change, target shielding, visual angle and the like exist, so that the pedestrian re-identification method has great challenges in practical application.
Disclosure of Invention
In view of this, embodiments of the present invention provide a pedestrian re-identification method, apparatus, device and storage medium based on deep learning, so as to improve identification accuracy and identification timeliness.
One aspect of the present invention provides a pedestrian re-identification method based on deep learning, including:
carrying out human body target detection on the source image data to determine a target pedestrian;
carrying out attribute detection on the target pedestrian and determining local region attribute information of the target pedestrian;
performing feature extraction on a local region of the target pedestrian to obtain features of the local region, performing weighted summation on the features, and determining the features of the target pedestrian;
determining the similarity between the features of the target pedestrian and each feature in a pedestrian feature library;
and determining a pedestrian re-identification result according to the similarity.
Optionally, the performing human body target detection on the source image data to determine a target pedestrian includes:
acquiring a real-time video sequence to obtain a plurality of source image data;
and carrying out human body target detection on each source image data through a YOLOv5 network, and determining a plurality of target pedestrians in each source image data.
Optionally, the detecting the attributes of the target pedestrian, determining the attribute information of the local area of the target pedestrian,
the method comprises the following steps:
performing attribute detection on the target pedestrian through a YOLOv5 network, and determining local area attribute information of the target pedestrian; wherein the attribute information includes a pedestrian whole body region, a pedestrian head region, a pedestrian upper body region, a pedestrian lower body region, a pedestrian category, and a pedestrian confidence.
Optionally, the performing feature extraction on the local region of the target pedestrian to obtain features of the local region and performing weighted summation on the features to determine the features of the target pedestrian includes:
inputting the attribute information of the target pedestrian into a feature extraction network, and extracting the feature of the target pedestrian by the feature extraction network;
wherein the step of the feature extraction network extracting the features of the target pedestrian comprises:
processing the attribute information through a shared CNN to obtain a first processing result;
performing label distinguishing on the first processing result to obtain second processing results of different types;
inputting the second processing results of different types into an independent network for feature extraction to obtain third processing results;
carrying out feature weighted summation processing on a third processing result obtained by the independent network processing to obtain a fourth processing result;
outputting the characteristic of the target pedestrian.
Optionally, the method further comprises the step of constructing a pedestrian feature library.
Optionally, the determining the similarity between the feature of the target pedestrian and each feature in a pedestrian feature library includes:
acquiring attribute information of the target pedestrian, and filtering the features in the pedestrian feature library according to the attribute information;
extracting features to be selected in a filtered pedestrian feature library, calculating a vector distance between the features to be selected and the features of the target pedestrian, and determining the similarity between the features of the target pedestrian and each feature in the pedestrian feature library;
wherein the filtering conditions include age condition, gender condition, and hair length condition.
Optionally, the step of reasoning in the feature extraction network comprises:
converting the trained data into a TensorRT reasoning model;
carrying out Int8 quantization processing on the TensorRT reasoning model;
and reasoning the quantitative processing result through TensorRT to obtain the characteristics of the target pedestrian.
The embodiment of the invention also provides a pedestrian re-identification device based on deep learning, which comprises:
the target detection module is used for carrying out human body target detection on the source image data and determining a target pedestrian;
the attribute detection module is used for carrying out attribute detection on the target pedestrian and determining the local region attribute information of the target pedestrian;
the characteristic extraction module is used for extracting the characteristics of the local area of the target pedestrian to obtain the characteristics of the local area, carrying out weighted summation on the characteristics and determining the characteristics of the target pedestrian;
the similarity calculation module is used for determining the similarity between the features of the target pedestrian and each feature in a pedestrian feature library;
and the identification module is used for determining a pedestrian re-identification result according to the similarity.
The embodiment of the invention also provides the electronic equipment, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
An embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, the computer instructions are stored in a computer readable storage medium, the computer instructions can be read by a processor of a computer device from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method.
The embodiment of the invention carries out human body target detection on source image data to determine target pedestrians; then, carrying out attribute detection on the target pedestrian, and determining local region attribute information of the target pedestrian; then, carrying out feature extraction on a local region of the target pedestrian to obtain features of the local region, carrying out weighted summation on the features and determining the features of the target pedestrian; determining the similarity between the features of the target pedestrian and each feature in a pedestrian feature library; and finally, determining a pedestrian re-identification result according to the similarity. The embodiment of the invention is beneficial to improving the identification precision and the identification timeliness.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating the overall steps provided by an embodiment of the present invention;
fig. 2 is a flow chart of a feature extraction network according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Aiming at the problems in the prior art, the invention provides a pedestrian re-identification method, a pedestrian re-identification device, pedestrian re-identification equipment and a storage medium based on deep learning. Wherein, the whole steps of the method can comprise: 1. a single pedestrian is detected based on the full map. 2. And detecting a single pedestrian human body module to obtain four regions of the whole body, the head, the upper body and the lower body of the pedestrian. 3. The feature network extracts features of each block. 4. And filtering the conditions of the pedestrian feature library to reduce the compared features. 5. And calculating the similarity distance of the feature vectors to obtain the similarity.
Specifically, as shown in fig. 1, the method of the present invention comprises the steps of:
step 1: in practical application scenarios, the data source is from a camera snapshot or a real-time video sequence. Most of the images are full images, and each image generally contains a plurality of pedestrian targets. The invention needs to detect the pedestrian target of the whole image and detect a single pedestrian. The system adopts but not limited to a YOLOv5 network to detect the pedestrian target and obtain a single pedestrian. Wherein the Yolov5 network training steps are as follows:
1) sorting the data with the labels, and dividing a training set verification set and a test set;
2) converting the data format, and converting the xml tag file into a TXT file in a yolo format;
3) selecting yolo5s model, modifying configuration file to train model, and outputting the trained model file.
Step 2: and carrying out attribute detection on a single pedestrian to obtain four regions of the whole body, the head, the upper body and the lower body of the pedestrian, and using the four regions as the input of the feature extraction model. The pedestrian attribute detection is a necessary step for extracting pedestrian structured information, and the system adopts, but is not limited to, a Yolov5 network to detect the pedestrian attribute to obtain each region of the trunk of the pedestrian.
Step 3: the pedestrian feature extraction is to extract an identifying feature for a pedestrian. As shown in fig. 2, images of each region of the trunk of a pedestrian are input into a network, are distinguished by a shared CNN and labels, then enter a corresponding independent network to extract features of each portion, are subjected to global posing to reduce the dimensions into 512-dimensional vectors, and are subjected to feature weighted summation (implemented by using formula 1) to serve as output. The network main feature extraction part, namely the shared CNN part is a ResNet50 network, and is reconstructed at the cov5 level, a sub-network is added to respectively extract features of each part, and the features are pooled to output feature values. Wherein the feature weighted sum formula is as follows:
wherein the vectorRespectively are head region feature vectors, upper body region feature vectors, lower body region feature vectors and whole body region feature vectors.Is the weighted summed feature vector, P is the confidence of each region detection, n is the vector dimension, and the value is 512.
Step 4: the pedestrian feature library is built by adopting the network extraction features in the step 3. According to the method, the pedestrian structural information is extracted firstly, and during feature comparison, the structural information of the pedestrian is adopted to filter the feature library, so that the number of compared features is reduced, time consumption is reduced, and timeliness is improved. The filtration conditions adopted were: age group, sex, hair length. And after the training is finished, obtaining a weight model, and converting the weight model into an INT8 type through INT8 quantization. And during reasoning, acquiring the feature vector of the specified layer, and calculating the similarity, so that the time consumption of forward calculation is reduced.
Step 5: and calculating vector distance of the extracted features and the features in the feature library, and sorting according to the calculated distance, wherein the higher the sorting is, the higher the similarity is. The distance calculation selects the distance of the feature vector calculated by the conventional cosine distance, and the smaller the distance is, the higher the similarity is. And a reordering method is adopted to carry out distance calculation optimization.
Step 6: all model files in the embodiment of the invention are weight files after training and are converted into a TensorRT reasoning model, Int8 quantization is carried out, and then TensorRT is used for reasoning, so that the reasoning speed of the model is improved. The TensorRT processes the trained feature extraction network model and comprises three steps:
firstly, the method comprises the following steps: TensorRT eliminates useless output layers and reduces calculation by analyzing the trained network model.
Secondly, the method comprises the following steps: and (3) vertically integrating the network, namely fusing the conv layer, the BN layer and the Relu layer into one layer.
Thirdly, the method comprises the following steps: and combining network levels, and fusing the input same tensor and the layers performing the same operation together.
In summary, the invention has the following advantages:
1. in practical application, the pedestrian identification method can be used as a main function in a pedestrian structured product, effectively utilizes pedestrian structured information, provides more conditions for pedestrian re-identification, and can improve timeliness and accuracy of pedestrian retrieval.
2. According to the pedestrian feature extraction method, the global and local fine-grained features are constructed by extracting the local features and then performing feature fusion, so that the pedestrian features are more representative, and the accuracy of pedestrian retrieval in practical application is improved.
3. The invention reduces the reasoning time of feature extraction by quantizing the extracted features, and can improve the timeliness in practical application.
The embodiment of the invention also provides a pedestrian re-identification device based on deep learning, which comprises:
the target detection module is used for carrying out human body target detection on the source image data and determining a target pedestrian;
the attribute detection module is used for carrying out attribute detection on the target pedestrian and determining the local region attribute information of the target pedestrian;
the characteristic extraction module is used for extracting the characteristics of the local area of the target pedestrian to obtain the characteristics of the local area, carrying out weighted summation on the characteristics and determining the characteristics of the target pedestrian;
the similarity calculation module is used for determining the similarity between the features of the target pedestrian and each feature in a pedestrian feature library;
and the identification module is used for determining a pedestrian re-identification result according to the similarity.
The embodiment of the invention also provides the electronic equipment, which comprises a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
An embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the method described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, the computer instructions are stored in a computer readable storage medium, the computer instructions can be read by a processor of a computer device from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.