Model generation method, target detection method, device, equipment and storage medium
1. A method of model generation, comprising:
obtaining a scaling coefficient of a batch normalization layer in an intermediate detection model after preliminary training is completed, wherein the intermediate detection model is obtained by training an original detection model based on a plurality of training samples, and the training samples comprise sample images and sample labeling results of known targets in the sample images;
screening out coefficients to be pruned from the scaling coefficients according to the numerical values of the scaling coefficients;
and screening out a channel to be pruned corresponding to the coefficient to be pruned from each channel of the intermediate detection model, and performing channel pruning on the channel to be pruned to generate a target detection model.
2. The method according to claim 1, wherein the selecting the coefficient to be pruned from each of the scaling coefficients according to the magnitude of the value of each of the scaling coefficients comprises:
and obtaining a pruning threshold value of the scaling coefficient according to the numerical value of each scaling coefficient and a preset pruning rate, and screening out the coefficient to be pruned from each scaling coefficient according to the pruning threshold value.
3. The method of claim 1, further comprising:
screening pruned convolutional layers from the convolutional layers of the intermediate detection model, wherein the pruned convolutional layers comprise convolutional layers other than 1 x 1 convolutional layer and/or convolutional layers in classification regression branches;
screening out a target scaling coefficient corresponding to the pruning convolutional layer from the scaling coefficients;
correspondingly, the screening out the coefficient to be pruned from each scaling coefficient according to the value of each scaling coefficient includes: and screening out the coefficient to be pruned from each target scaling coefficient according to the numerical value of each target scaling coefficient.
4. The method according to claim 1, wherein the screening out the channel to be pruned corresponding to the coefficient to be pruned from the channels of the intermediate detection model comprises:
and screening out an output channel of the current convolutional layer corresponding to the current pruning coefficient in the coefficients to be pruned and an input channel of the next convolutional layer of the current convolutional layer from all channels of all convolutional layers of the intermediate detection model, and taking the output channel and the input channel as the channels to be pruned.
5. The method of claim 1, further comprising:
and acquiring the training samples, and performing sparse training based on the batch normalization layer on the original detection model based on the plurality of training samples to obtain the intermediate detection model.
6. The method of claim 5, wherein the target loss function in the original detection model is composed of an original loss function and an L1 regular constraint function, and wherein the L1 regular constraint function comprises a loss function that is L1 regular constraint on each of the scaling coefficients.
7. The method of claim 6, wherein the target loss function L is represented by the following formula:
wherein x is the sample image, y is the sample labeling result, W is a parameter value in the original detection model, f (x, W) is a sample prediction result of a known target in the sample image, γ is a scaling coefficient, λ is a penalty coefficient, L () is the original loss function, and g () is the L1 regular constraint function.
8. The method according to claim 1, wherein the performing channel pruning on the channel to be pruned to generate a target detection model comprises:
channel pruning is carried out on the channel to be pruned to obtain a pruning detection model;
and carrying out fine tuning training on the pruning detection model to generate a target detection model.
9. The method of claim 1, wherein the original detection model comprises a single-shot multi-frame detector (SSD), or wherein the original detection model comprises the SSD and a backbone network of the SSD comprises an initiation _ v3 structure.
10. A method of object detection, comprising:
acquiring an image to be detected and a target detection model generated according to the method of any one of claims 1 to 9;
and inputting the image to be detected into the target detection model, and obtaining a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
11. A model generation apparatus, comprising:
the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a scaling coefficient of a batch normalization layer in an intermediate detection model after preliminary training is finished, the intermediate detection model is obtained after an original detection model is trained on the basis of a plurality of training samples, and the training samples comprise sample images and sample labeling results of known targets in the sample images;
the first screening module is used for screening out the coefficient to be pruned from each scaling coefficient according to the numerical value of each scaling coefficient;
and the model generation module is used for screening out the channel to be pruned corresponding to the coefficient to be pruned from each channel of the intermediate detection model, and performing channel pruning on the channel to be pruned to generate a target detection model.
12. An object detection device, comprising:
a second acquisition module for acquiring an image to be detected and a target detection model generated according to the method of any one of claims 1 to 9;
and the target detection module is used for inputting the image to be detected into the target detection model and obtaining a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
13. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a model generation method as claimed in any one of claims 1-9, or an object detection method as claimed in claim 10.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the model generation method as claimed in any one of claims 1 to 9, or the object detection method as claimed in claim 10.
Background
Object Detection (Object Detection) is the basis of many computer vision tasks, and can be used to determine whether an Object to be detected of interest exists in an image to be detected, and to accurately locate the Object to be detected. Moreover, the target detection technology can be combined with the technologies of target tracking, target re-identification and the like, and is applied to the fields of artificial intelligence systems, vehicle automatic driving systems, intelligent robots, intelligent logistics and the like.
In the process of implementing the invention, the inventor finds that the following technical problems exist in the prior art: most of existing target detection technologies are realized based on deep learning models, and the problem that the deep learning models are low in detection speed easily due to large sizes is particularly obvious in equipment with limited computing resources, so that the target detection technologies are difficult to directly apply to the ground on actual projects.
For example, in the field of intelligent logistics, the mass application of unmanned distribution vehicles can reduce distribution cost and improve distribution efficiency, and vision-based target detection technology is a very important technical means required by unmanned distribution vehicles to sense surrounding environment. However, for mass production and cost considerations, the onboard processors on unmanned delivery vehicles are mostly composed based on Xvaier platforms with relatively limited computing resources. Therefore, the detection speed of the target detection model applied to the onboard processor is relatively slow, which directly influences the environment perception capability of the unmanned distribution vehicle, and further influences the distribution efficiency of the unmanned distribution vehicle. Therefore, how to improve the detection speed of the target detection model is very important for the development of the intelligent logistics field.
Disclosure of Invention
The embodiment of the invention provides a model generation method, a target detection device, equipment and a storage medium, and aims to achieve the effect of improving the detection speed of a model by compressing the model.
In a first aspect, an embodiment of the present invention provides a model generation method, which may include:
obtaining a scaling coefficient of a batch normalization layer in an intermediate detection model after preliminary training is completed, wherein the intermediate detection model is obtained after an original detection model is trained on the basis of a plurality of training samples, and the training samples comprise sample images and sample labeling results of known targets in the sample images;
screening out coefficients to be pruned from the scaling coefficients according to the numerical values of the scaling coefficients;
and screening out channels to be pruned corresponding to the coefficients to be pruned from all channels of the intermediate detection model, and performing channel pruning on the channels to be pruned to generate a target detection model.
Optionally, the screening out the coefficient to be pruned from each scaling coefficient according to the value of each scaling coefficient may include:
and obtaining the pruning threshold value of the scaling coefficient according to the numerical value of each scaling coefficient and the preset pruning rate, and screening out the coefficient to be pruned from each scaling coefficient according to the pruning threshold value.
Optionally, the model generation method may further include:
screening pruned convolutional layers from the convolutional layers of the intermediate detection model, wherein the pruned convolutional layers comprise convolutional layers except 1 x 1 convolutional layer and/or convolutional layers in classification regression branches;
screening out a target scaling coefficient corresponding to the lopping convolutional layer from the scaling coefficients;
according to the value of each scaling coefficient, screening out the coefficient to be pruned from each scaling coefficient, which may include: and screening out the coefficient to be pruned from each target scaling coefficient according to the numerical value of each target scaling coefficient.
Optionally, screening out a channel to be pruned corresponding to the coefficient to be pruned from each channel of the intermediate detection model may include:
and screening an output channel of the current convolutional layer corresponding to the current pruning coefficient in the coefficients to be pruned and an input channel of the next convolutional layer of the current convolutional layer from all channels of all convolutional layers of the intermediate detection model, and taking the output channel and the input channel as the channels to be pruned.
Optionally, the model generation method may further include:
and acquiring training samples, and performing batch normalization layer-based sparse training on the original detection model based on a plurality of training samples to obtain an intermediate detection model.
Optionally, the target loss function in the original detection model is composed of an original loss function and an L1 regular constraint function, and the L1 regular constraint function includes a loss function subjected to L1 regular constraint on each scaling coefficient.
Alternatively, the target loss function L is represented by the following formula:
where x is the sample image, y is the sample labeling result, W is the parameter value in the original detection model, f (x, W) is the sample prediction result of the known target in the sample image, γ is the scaling coefficient, λ is the penalty coefficient, L () is the original loss function, and g () is the L1 regular constraint function.
Optionally, performing channel pruning on a channel to be pruned to generate a target detection model, which may include:
channel pruning is carried out on a channel to be pruned to obtain a pruning detection model;
and carrying out fine tuning training on the pruning detection model to generate a target detection model.
Optionally, the original detection model includes a single-shot multi-frame detector SSD, or the original detection model includes a single-shot multi-frame detector SSD and a backbone network of the single-shot multi-frame detector SSD includes an acceptance _ v3 structure.
In a second aspect, an embodiment of the present invention further provides a target detection method, which may include:
acquiring an image to be detected and a target detection model generated according to any one of the methods;
and inputting the image to be detected into the target detection model, and obtaining a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
In a third aspect, an embodiment of the present invention further provides a model generation apparatus, which may include:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring the scaling coefficient of a batch normalization layer in an intermediate detection model after preliminary training is finished, the intermediate detection model is obtained by training an original detection model based on a plurality of training samples, and the training samples comprise sample images and sample labeling results of known targets in the sample images;
the first screening module is used for screening out the coefficient to be pruned from each scaling coefficient according to the numerical value of each scaling coefficient;
and the model generation module is used for screening out the channel to be pruned corresponding to the coefficient to be pruned from each channel of the intermediate detection model, and performing channel pruning on the channel to be pruned to generate the target detection model.
In a fourth aspect, an embodiment of the present invention further provides an object detection apparatus, which may include:
the second acquisition module is used for acquiring an image to be detected and a target detection model generated according to any one of the methods;
and the target detection module is used for inputting the image to be detected into the target detection model and obtaining a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
In a fifth aspect, an embodiment of the present invention further provides an apparatus, which may include:
one or more processors;
a memory for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the model generation method or the object detection method provided by any of the embodiments of the present invention.
In a sixth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the model generation method or the target detection method provided in any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, the scaling coefficients of the batch normalization layer in the intermediate detection model after the initial training are obtained, and the coefficients to be pruned can be screened from the scaling coefficients according to the numerical values of the scaling coefficients; and because the coefficient to be pruned and the channel to be pruned have a corresponding relation, the channel to be pruned corresponding to the coefficient to be pruned can be screened from all channels of the intermediate detection model, and channel pruning is carried out on the channel to be pruned to generate the target detection model. According to the technical scheme, the channel pruning is combined with the middle detection model, the channel pruning can be carried out on the middle detection model according to the scaling coefficient in the middle detection model after the preliminary training is finished, and therefore the effect of improving the detection speed of the model through the compression model is achieved.
Drawings
FIG. 1 is a flow chart of a model generation method according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a model generation method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a target detection method according to a third embodiment of the present invention;
FIG. 4a is a flowchart illustrating compression of a model in a target detection method according to a third embodiment of the present invention;
FIG. 4b is a flowchart illustrating the pruning of a model in a target detection method according to a third embodiment of the present invention;
FIG. 5 is a block diagram showing a model generating apparatus according to a fourth embodiment of the present invention;
fig. 6 is a block diagram of a target detection apparatus according to a fifth embodiment of the present invention;
fig. 7 is a schematic structural diagram of an apparatus in the sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a model generation method according to a first embodiment of the present invention. The embodiment is applicable to the situation of compressing the deep learning model in the target detection technology. The method can be executed by a model generation device provided by the embodiment of the invention, the device can be realized by software and/or hardware, and the device can be integrated on various electronic devices.
Referring to fig. 1, the method of the embodiment of the present invention specifically includes the following steps:
s110, obtaining a scaling coefficient of a batch normalization layer in an intermediate detection model after the initial training is finished, wherein the intermediate detection model is obtained after an original detection model is trained on the basis of a plurality of training samples, and the training samples comprise sample images and sample labeling results of known targets in the sample images.
The method comprises the steps of obtaining an untrained original detection model, wherein the original detection model is a model for visual detection in a deep learning model and can be divided into an anchor-based class, an anchor-free class and a fusion class of the anchor-based class and the fusion class, and the difference is whether an anchor is used for extracting a candidate frame or not. The anchor, which may also be referred to as an anchor box, is a set of rectangular boxes obtained based on a clustering algorithm on training samples before model training.
Specifically, the original detection model in the anchor-based includes fasterncn, SSD (Single Shot multi box Detector), YoloV2, YoloV3, and the like; the original detection models in the anchor-free include CornerNet, ExtremeNet, CenterNet, FCOS, and the like; the original detection models fusing the anchor-based branch and the anchor-free branch include FSAF, SFace, GA-RPN, and the like. Particularly, the SSD is a single-stage (one-stage) detection model, which has no candidate region generation (region pro-usas) stage, directly generates a class probability and a position coordinate of an object to be detected, has a great advantage in detection speed, and can be better moved on an unmanned delivery vehicle and a mobile terminal. Thus, as an alternative example, the original detection model may be an SSD, and on this basis, the backbone network of the SSD may be an acceptance _ v3 structure.
Thus, after the original detection model is trained based on a plurality of training samples, an intermediate detection model after preliminary training can be obtained, the training samples can include sample images and sample labeling results of known targets in the sample images, the sample images can be a frame of image, a video sequence and the like, and the sample labeling results can be category probabilities and position coordinates.
It should be noted that, a Batch Normalization (BN) layer is immediately behind each convolution layer of the original detection model, and the BN layer can normalize the scale of the output result of each convolution layer, which can avoid the situations of gradient loss and gradient overflow during the training process. The BN layers include scaling coefficients (gamma coefficients) and offset coefficients (beta coefficients), wherein in each BN layer the number of scaling coefficients is consistent with the number of channels in the convolutional layer immediately adjacent to the BN layer, i.e., each scaling coefficient corresponds to one channel in the convolutional layer. For example, if there are 32 scaling factors in a BN layer, the convolutional layer immediately adjacent to the BN layer includes 32 channels, and the BN layer also includes 32 channels. In addition, in the training stage and the application stage of the original detection model, the scaling coefficient is multiplied by the corresponding channel in the convolutional layer, that is, whether a certain scaling coefficient exists or not will directly influence whether the channel in the convolutional layer corresponding to the scaling coefficient plays a role or not. Therefore, the scaling factor of the batch normalization layer in the intermediate detection model can be obtained, and which of the intermediate detection models is subjected to channel pruning can be determined according to the scaling factor.
And S120, screening out coefficients to be pruned from the scaling coefficients according to the numerical values of the scaling coefficients.
The method includes the steps that according to the numerical value of each scaling coefficient, the number of implementation modes for screening out the coefficient to be pruned from each scaling coefficient is multiple, for example, the numerical values of the scaling coefficients can be sorted, the median value of each scaling coefficient is obtained according to the sorting result, and the coefficient to be pruned is screened out from each scaling coefficient according to the median value; for example, the average value of the values of the scaling coefficients can be calculated, and the coefficients to be pruned are screened from the scaling coefficients according to the average value; for another example, a pruning threshold of the scaling coefficient may be obtained according to the value of each scaling coefficient and a preset pruning rate, and a to-be-pruned coefficient may be selected from each scaling coefficient according to the pruning threshold, where the to-be-pruned coefficient may be a scaling coefficient whose value is less than or equal to the pruning threshold; and so on.
S130, screening out channels to be pruned corresponding to the coefficients to be pruned from all channels of the intermediate detection model, and performing channel pruning on the channels to be pruned to generate a target detection model.
Each scaling coefficient is in one-to-one correspondence with one channel in one convolutional layer, and one channel in one convolutional layer is also in one-to-one correspondence with one channel in a BN layer adjacent to the convolutional layer, so that channels to be pruned can be screened from the channels of the intermediate detection model according to the coefficients to be pruned, and the channels to be pruned are channels with lower importance, which may be one channel in one convolutional layer or one channel in one BN layer.
Furthermore, channel pruning can be carried out on the channel to be pruned to generate a target detection model, so that the effect of model compression is realized. Wherein, channel pruning (channel pruning) is a mode of simplifying the model by deleting redundant channels in the model, and is a structured compression mode; moreover, after channel pruning is performed on the channels to be pruned, the convolution kernels corresponding to the channels to be pruned are correspondingly deleted, so that the operation amount of convolution is also reduced through the channel pruning. Illustratively, if a convolutional layer is 32 channels, the BN layer immediately adjacent to the convolutional layer is also 32 channels, each channel in the BN layer includes a scaling factor and an offset factor, and the coefficient to be pruned is selected from the scaling factors, so that which channels in the BN layer are channels to be pruned can be determined according to the coefficient to be pruned, and accordingly, which channels in the convolutional layer are channels to be pruned can be determined.
Optionally, the specific implementation process of the channel pruning may be as follows: and screening an output channel of the current convolutional layer corresponding to the current pruning coefficient in the coefficients to be pruned and an input channel of the next convolutional layer of the current convolutional layer from all channels of all convolutional layers of the intermediate detection model, and taking the output channel and the input channel as the channels to be pruned. This is because the output channel of the current convolutional layer is the input channel of the convolutional layer next to the current convolutional layer, and exemplarily, if the output channel of the current convolutional layer is 1 to 32, the input channel of the convolutional layer next to the current convolutional layer is also 1 to 32, at this time, if the output channel 17 of the current convolutional layer corresponding to the current pruning coefficient is the channel to be pruned, the input channel 17 of the convolutional layer next to the current convolutional layer is also the channel to be pruned.
According to the technical scheme of the embodiment of the invention, the scaling coefficients of the batch normalization layer in the intermediate detection model after the initial training are obtained, and the coefficients to be pruned can be screened from the scaling coefficients according to the numerical values of the scaling coefficients; and because the coefficient to be pruned and the channel to be pruned have a corresponding relation, the channel to be pruned corresponding to the coefficient to be pruned can be screened from all channels of the intermediate detection model, and channel pruning is carried out on the channel to be pruned to generate the target detection model. According to the technical scheme, the channel pruning is combined with the middle detection model, the channel pruning can be carried out on the middle detection model according to the scaling coefficient in the middle detection model after the preliminary training is finished, and therefore the effect of improving the detection speed of the model through the compression model is achieved.
An optional technical solution, the model generation method may further include: screening pruned convolutional layers from the convolutional layers of the intermediate detection model, wherein the pruned convolutional layers comprise convolutional layers except 1 x 1 convolutional layer and/or convolutional layers in classification regression branches; screening out a target scaling coefficient corresponding to the lopping convolutional layer from the scaling coefficients; correspondingly, the screening out the coefficient to be pruned from each scaling coefficient according to the value of each scaling coefficient may include: and screening out the coefficient to be pruned from each target scaling coefficient according to the numerical value of each target scaling coefficient.
In general, the original detection model generally includes two parts, namely a backbone network and a classification regression branch, where the backbone network can be used to extract the feature map, and the classification regression branch is a classification branch and a regression branch branching from the backbone network and can be used to classify or regress the extracted feature map. Since the classification of the classification regression is usually fixed, the convolution layer in the classification regression branch can be kept as fixed as possible, which can ensure the output dimension to be fixed and can simplify the execution code. Thus, it is possible to take 1 × 1 convolutional layer and/or convolutional layers other than convolutional layers in the classification regression branch as pruneable convolutional layers, and screen out coefficients to be pruned from each target scaling coefficient in the pruneable convolutional layers.
An optional technical scheme is that after channel pruning is carried out on a channel to be pruned, a pruning detection model can be generated firstly; and then, carrying out fine tuning training on the pruning detection model to generate a target detection model. That is, the simplified pruning detection model after channel pruning can be subjected to fine tuning training, so that the detection effect is recovered, that is, the original performance of the model is maintained as much as possible while the model is compressed. The fine tuning training process may be: acquiring historical images and historical labeling results of known targets in the historical images, and taking the historical images and the historical labeling results as a group of historical samples; training the pruning detection model based on a plurality of historical samples to obtain a target detection model. In addition, the historical sample and the training sample are the same sample data in general, that is, in the fine adjustment training, the historical image and the sample image may be the same image, and the historical labeling result and the sample labeling result may be the same labeling result.
Example two
Fig. 2 is a flowchart of a model generation method provided in the second embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the model generating method may further include: and acquiring training samples, and performing batch normalization layer-based sparse training on the original detection model based on a plurality of training samples to obtain an intermediate detection model. The same or corresponding terms as those in the above embodiments are not explained in detail herein.
Referring to fig. 2, the method of the present embodiment may specifically include the following steps:
s210, obtaining training samples, and performing batch normalization layer-based sparse training on the original detection model based on a plurality of training samples to obtain an intermediate detection model, wherein the training samples comprise sample images and sample labeling results of known targets in the sample images.
When the original detection model is subjected to the BN layer-based sparse training, a BN layer-sparse intermediate detection model can be obtained, namely, the sparse is introduced into the dense connection of the original detection model. Illustratively, one alternative of the BN-layer-based sparseness training is to apply L1 regular constraints to the scaling coefficients in the original detection model, so that the original detection model adjusts parameters in the direction of structural sparseness, where the scaling coefficients (gamma coefficients) in the BN layer act as the switching coefficients of the information flow channel, and control the switching of the information flow channel to be closed.
The reason for this is that, in the model training process, applying the L1 regular constraint to the scaling factors, more scaling factors can be adjusted to 0. Therefore, in the model training stage and the application stage, because the scaling coefficients are multiplied by the corresponding channels in the convolutional layers, when more scaling coefficients are 0, the channels in the convolutional layers corresponding to the scaling coefficients do not play any role any more, that is, the scaling coefficients are greatly compressed, so that the channel pruning effect is also played. On this basis, when the channel to be pruned is screened according to the preset pruning rate, if the scaling factor with the value of 0 in the intermediate detection model is more, the possibility that the channel corresponding to the scaling factor with the value of non-0 is pruned is lower, and thus the network structure of the generated target detection model is more consistent with the network structure of the intermediate detection model, so that the detection performances of the two are more consistent, that is, the detection performances are ensured, and the effect of model compression is realized.
On the basis, the target loss function in the original detection model can be composed of an original loss function and an L1 regular constraint function, and the L1 regular constraint function can include a loss function subjected to L1 regular constraint on each scaling coefficient. That is, on the basis of the original loss function, an L1 regular constraint term of the scaling factor of the BN layer is introduced. Therefore, the minimum value can be solved according to the target loss function in the training process, and each parameter value in the model can be adjusted according to the solving result.
Alternatively, the target loss function L may be represented by the following formula:
where x is the sample image, y is the sample labeling result, W is the parameter value in the original detection model, f (x, W) is the sample prediction result of the known target in the sample image, γ is the scaling coefficient, λ is the penalty coefficient, L () is the original loss function, and g () is the L1 regular constraint function. Moreover, since the L1 regular constraint function is applied only to the scaling coefficient of the BN layer, the gradient γ of the scaling coefficient when propagating the update gradient backwardgradA product term of [ scaling coefficient sign (gamma) ] and penalty coefficient lambda needs to be added]The formula is as follows:
γgrad=γgrad+λ*sign(γ)
s220, obtaining the scaling coefficients of the batch normalization layer in the intermediate detection model, and screening out the coefficient to be pruned from each scaling coefficient according to the numerical value of each scaling coefficient.
S230, screening out channels to be pruned corresponding to the coefficients to be pruned from all channels of the intermediate detection model, and performing channel pruning on the channels to be pruned to generate a target detection model.
According to the technical scheme of the embodiment of the invention, the original detection model is subjected to BN layer-based sparse training based on a plurality of training samples, so that a BN layer sparse intermediate detection model can be obtained; the target detection model can be generated after channel pruning is carried out on the intermediate detection model, and the detection performances of the intermediate detection model and the target detection model are consistent, namely, the detection performances are ensured, and meanwhile, the effect of model compression is realized.
EXAMPLE III
Fig. 3 is a flowchart of a target detection method provided in the third embodiment of the present invention. The embodiment can be applied to the situation of performing target detection on an image to be detected based on the target detection model generated by the method in any embodiment. The method can be executed by the object detection device provided by the embodiment of the invention, the device can be realized by software and/or hardware, and the device can be integrated on various electronic devices.
Referring to fig. 3, the method of the embodiment of the present invention specifically includes the following steps:
s310, acquiring an image to be detected and a target detection model generated according to the method of any embodiment.
Wherein the image to be detected may be a frame image, a video sequence, etc., and the target detection model may be a visual detection model generated according to the method described in any of the above embodiments. For example, one method of generating the object detection model may be as shown in fig. 4 a:
firstly, based on an original SSD detection model, an acceptance _ v3 structure is used as a backbone network, and during model training, under the condition of keeping the original parameter setting, L1 regular constraint is applied to gamma coefficients in a BN layer adjacent to a convolutional layer, so that the model adjusts parameters towards the direction of structural sparsity, and sparseness of the BN layer is realized. Secondly, after sparse training based on the BN layer is completed, channels in the corresponding convolutional layer and the BN layer can be cut according to the preset pruning rate on the preliminarily trained intermediate detection model according to the scaling coefficient of the BN layer, so that the model can be simplified, and certain detection speed can be improved. And finally, performing fine tuning training on the simplified model after channel pruning to recover the detection effect.
On this basis, the above-mentioned channel pruning process can be as shown in fig. 4b, and firstly, it needs to determine which convolutional layers to perform channel pruning, and there are two limitations in the embodiment of the present invention: the 1 x 1 convolutional layer and the convolutional layer in the classification regression branch are not pruned, which can ensure the dimension of the output is unchanged. Secondly, counting gamma coefficients in BN layers corresponding to the rest loppable convolutional layers, sequencing all the counted gamma coefficients, and calculating a pruning threshold of the gamma coefficients according to a preset pruning rate. Thirdly, selecting each channel in the convolutional layer and the BN layer according to the pruning threshold, and reserving the channels corresponding to the gamma coefficients with the values larger than the pruning threshold, thereby determining MASKs (MASKs) of the BN layer and the channels which can be reserved in the convolutional layer adjacent to the BN layer. Finally, the corresponding channels in the convolutional layer and the BN layer are reserved according to the MASK, and those channels which are not reserved are pruned.
S320, inputting the image to be detected into the target detection model, and obtaining a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
As an alternative example, continuing with the example in the background, the above-described target detection method may be applied to the detection of visual targets on unmanned delivery vehicles in the field of intelligent logistics. Although most of the onboard processors on the unmanned delivery vehicle are formed on the basis of the XVeier platform with relatively limited computing resources, the target detection model involved in the target detection method has a small scale and a high detection speed, and the unmanned delivery vehicle can still run without people in the true sense even under the constraint of limited computing resources. Moreover, structured pruning operation is implemented on a channel level, so that the generated compact model can be directly operated on a mature framework such as Pythrch, MXnet, TensorFlow and the like or a hardware platform such as GPU, FPGA and the like, the support of a special algorithm library is not needed, and the application is more convenient.
To further verify the detection accuracy of the target detection method, the method is used to test the detection accuracy on a 5-class subset (car, pedestrian, truck, rider) of the BDD data set, and the quantitative results are shown in the following two tables. As can be seen from the data in the table, the target detection method for structured pruning according to the embodiment of the present invention can achieve a relatively significant compression effect while keeping a part of the convolutional layer and the BN layer stationary, and the detection result mAP (average value of class 5 subsets) is only slightly decreased.
Preset pruning rate
Quantity of model parameters
Model calculated quantity (FLOPS)
base
28.29M
9.45G
30%
21.55M
5.97G
50%
16.76M
5.60G
80%
11.24M
4.94G
Preset pruning rate
mAP
car
pedestrian
truck
bus
rider
base
37.56
56.66
22.95
44.56
44.37
19.28
80%
37.32
56.55
23.26
44.11
44.13
18.55
According to the technical scheme of the embodiment of the invention, the target detection can be carried out on the image to be detected based on the generated target detection model, and the target detection model is a simplified model after model compression, so that the detection speed of the target to be detected in the image to be detected can be effectively improved, and the original performance of the model can be kept as much as possible.
Example four
Fig. 5 is a block diagram of a model generation apparatus according to a fourth embodiment of the present invention, which is configured to execute the model generation method according to any of the embodiments. The device and the model generating method of each embodiment belong to the same inventive concept, and details which are not described in detail in the embodiment of the model generating device can refer to the embodiment of the model generating method. Referring to fig. 5, the apparatus may specifically include: a first acquisition module 410, a first screening module 420, and a model generation module 430.
The first obtaining module 410 is configured to obtain a scaling coefficient of a batch normalization layer in an intermediate detection model after preliminary training is completed, where the intermediate detection model is obtained by training an original detection model based on multiple training samples, and the training samples include sample images and sample labeling results of known targets in the sample images;
the first screening module 420 is configured to screen out coefficients to be pruned from the scaling coefficients according to the magnitude of each scaling coefficient;
and the model generating module 430 is configured to screen out a channel to be pruned corresponding to the coefficient to be pruned from each channel of the intermediate detection model, and perform channel pruning on the channel to be pruned, so as to generate the target detection model.
Optionally, the first screening module 420 may be specifically configured to:
and obtaining the pruning threshold value of the scaling coefficient according to the numerical value of each scaling coefficient and the preset pruning rate, and screening out the coefficient to be pruned from each scaling coefficient according to the pruning threshold value.
Optionally, on the basis of the above apparatus, the apparatus may further include:
a second screening module for screening pruneable convolutional layers from the convolutional layers of the intermediate detection model, the pruneable convolutional layers including convolutional layers other than 1 x 1 convolutional layer and/or convolutional layers in the classification regression branch;
the third screening module is used for screening out a target scaling coefficient corresponding to the pruning convolutional layer from all scaling coefficients;
correspondingly, the first screening module 420 may be specifically configured to:
and screening out the coefficient to be pruned from each target scaling coefficient according to the numerical value of each target scaling coefficient.
Optionally, the model generating module 430 may specifically include:
and the channel to be pruned screening unit is used for screening an output channel of the current convolutional layer corresponding to the current pruning coefficient in the coefficients to be pruned and an input channel of the next convolutional layer of the current convolutional layer from each channel of each convolutional layer of the intermediate detection model, and taking the output channel and the input channel as the channels to be pruned.
Optionally, on the basis of the above apparatus, the apparatus may further include:
and the third acquisition module is used for acquiring the training samples and carrying out batch normalization layer-based sparse training on the original detection model based on the training samples to obtain an intermediate detection model.
Optionally, the target loss function in the original detection model is composed of an original loss function and an L1 regular constraint function, and the L1 regular constraint function includes a loss function subjected to L1 regular constraint on each scaling coefficient.
Alternatively, the target loss function L is represented by the following formula:
where x is the sample image, y is the sample labeling result, W is the parameter value in the original detection model, f (x, W) is the sample prediction result of the known target in the sample image, γ is the scaling coefficient, λ is the penalty coefficient, L () is the original loss function, and g () is the L1 regular constraint function.
Optionally, the model generating module 430 may specifically include:
the channel pruning unit is used for carrying out channel pruning on a channel to be pruned to obtain a pruning detection model;
and the fine tuning training unit is used for performing fine tuning training on the pruning detection model to generate a target detection model.
In the model generation device provided by the fourth embodiment of the present invention, the first obtaining module obtains the scaling coefficient of the batch normalization layer in the intermediate detection model after the preliminary training; the first screening module can screen out the coefficient to be pruned from each scaling coefficient according to the numerical value of each scaling coefficient; the model generation module can screen out the channel to be pruned corresponding to the coefficient to be pruned from each channel of the intermediate detection model because the coefficient to be pruned and the channel to be pruned have a corresponding relation, and carry out channel pruning on the channel to be pruned to generate the target detection model. According to the device, the channel pruning is combined with the middle detection model, the channel pruning is carried out on the middle detection model according to the scaling coefficient in the middle detection model after the preliminary training is finished, and therefore the effect of improving the detection speed of the model through the compression model is achieved.
The model generation device provided by the embodiment of the invention can execute the model generation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the model generating apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
EXAMPLE five
Fig. 6 is a block diagram of a target detection apparatus according to a fifth embodiment of the present invention, which is configured to execute the target detection method according to any of the embodiments described above. The apparatus and the target detection method of each embodiment belong to the same inventive concept, and details that are not described in detail in the embodiment of the target detection apparatus may refer to the embodiment of the target detection method. Referring to fig. 6, the apparatus may specifically include: a secondary acquisition module 510 and an object detection module 520.
The second obtaining module 510 is configured to obtain an image to be detected and a target detection model generated according to the method in any one of the first embodiment and the second embodiment;
and the target detection module 520 is configured to input the image to be detected into the target detection model, and obtain a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
The target detection device provided by the fifth embodiment of the invention can perform target detection on the image to be detected based on the generated target detection model through the mutual cooperation of the second acquisition module and the target detection module, and can effectively improve the detection speed of the target to be detected in the image to be detected and keep the original performance of the model as much as possible because the target detection model is a simplified model after model compression.
The target detection device provided by the embodiment of the invention can execute the target detection method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the object detection apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
EXAMPLE six
Fig. 7 is a schematic structural diagram of an apparatus according to a sixth embodiment of the present invention, as shown in fig. 7, the apparatus includes a memory 610, a processor 620, an input device 630, and an output device 640. The number of processors 620 in the device may be one or more, and one processor 620 is taken as an example in fig. 7; the memory 610, processor 620, input device 630, and output device 640 in the apparatus may be connected by a bus or other means, such as by bus 650 in fig. 7.
The memory 610 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the model generation method in the embodiment of the present invention (e.g., the first obtaining module 410, the first filtering module 420, and the model generation module 430 in the model generation apparatus), or program instructions/modules corresponding to the object detection method in the embodiment of the present invention (e.g., the second obtaining module 510 and the object detection module 520 in the object detection apparatus). The processor 620 executes various functional applications of the device and data processing, i.e., the above-described model generation method or the object detection method, by executing software programs, instructions, and modules stored in the memory 610.
The memory 610 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 610 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 610 may further include memory located remotely from processor 620, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the device. The output device 640 may include a display device such as a display screen.
EXAMPLE seven
A seventh embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for model generation, the method comprising:
obtaining a scaling coefficient of a batch normalization layer in an intermediate detection model after preliminary training is completed, wherein the intermediate detection model is obtained after an original detection model is trained on the basis of a plurality of training samples, and the training samples comprise sample images and sample labeling results of known targets in the sample images;
screening out coefficients to be pruned from the scaling coefficients according to the numerical values of the scaling coefficients;
and screening out channels to be pruned corresponding to the coefficients to be pruned from all channels of the intermediate detection model, and performing channel pruning on the channels to be pruned to generate a target detection model.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the model generation method provided by any embodiment of the present invention.
Example eight
An eighth embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for object detection, the method comprising:
acquiring an image to be detected and a target detection model generated according to the method of the seventh embodiment;
and inputting the image to be detected into the target detection model, and obtaining a target detection result of the target to be detected in the image to be detected according to an output result of the target detection model.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. With this understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种模型训练的样本增强方法、装置与系统