Moving target identification system and method based on FPGA
1. A moving target recognition system based on FPGA is characterized by comprising a data signal acquisition module, a data signal preprocessing module, a target recognition module and a data signal display module;
the data signal acquisition module comprises a CMOS camera, a data conversion unit, an acquisition data storage unit and an acquisition module control unit;
the data signal preprocessing module comprises a filtering unit, an edge detection unit, a moving target positioning unit and a preprocessing data storage unit;
the target identification module comprises a pre-processing unit, a quantification unit, a neural network detection unit and a post-processing unit;
the data signal display module comprises a display data storage unit, a display module control unit and a VGA displayer;
firstly, a data signal acquisition module is responsible for acquiring a video image of a target area in real time, converting the acquired video image into a gray image and storing the gray image into an acquisition data storage unit; sequentially reading out the video image data stored in the acquired data storage unit by a data signal preprocessing module, removing noise after median filtering to obtain all gray information including a moving object and a background, obtaining a real-time binary image after edge detection processing, extracting the moving object in a video sequence by using an interframe difference method to complete the positioning of a moving object calibration frame, and storing the positioned data in a preprocessed data storage unit; then the target identification module is responsible for taking the content in the moving target calibration frame in the preprocessing data storage unit as an image to be detected, inputting the image to be detected into the neural network detection unit, completing the identification of the image to be detected by utilizing the image identification neural network, and storing the identification result in the display data storage unit; and finally, displaying the identification result of the moving target of the video sequence in the display data storage unit by the data signal display module.
2. A moving target identification method based on FPGA is characterized by comprising the following steps:
the method comprises the following steps: the data signal acquisition module finishes the real-time video image acquisition of a target area and converts and stores the acquired data image in a data format;
step two: the data signal preprocessing module is responsible for orderly reading out the video sequence image data stored in the acquired data storage unit, obtaining a real-time binary image through median filtering denoising and edge detection, extracting a moving target by using an interframe difference method, and positioning and storing a moving target calibration frame;
step three: the content in a moving target calibration frame in a preprocessing data storage unit is used as an image to be detected through a target identification module, the image to be detected is identified through a preprocessing unit and a quantization unit by utilizing a neural network detection unit, and an identification result is stored through a post-processing unit;
step four: and the data signal display module sequentially takes out the data stored in the display data storage unit through the display module control unit, and then displays the identification result aiming at the moving target in the video sequence through the VGA display.
3. The moving object identification method based on the FPGA as claimed in claim 2, wherein the first specific method is as follows:
firstly, an acquisition module control unit controls a CMOS camera to shoot a target area in real time to obtain a real-time video sequence, and then the RGB format needs to be converted into the YCbCr format in a data conversion unit because the output format of the CMOS camera is the RGB format, so that a gray level image of the video sequence is obtained; the specific formula is as follows:
in the formula: y represents the luminance signal and represents it as the gray scale of the image pixel, Cb represents the difference between the blue part of the RGB input signal and the luminance value of the RGB signal, Cr represents the difference between the red part of the RGB input signal and the luminance value of the RGB signal, R, G, B represents the gray scale values of the three color components red, green and blue, respectively.
Then storing the video sequence image in the YCbCr format into an acquisition data storage unit;
4. the moving object identification method based on the FPGA according to claim 3, wherein the second specific method is as follows:
firstly, a data signal preprocessing module takes out an original gray level image of a video sequence in a collected data storage unit to a filtering unit, and the original gray level image of the video sequence is filtered by using a median filtering method, so that the influence of environmental change and noise interference is avoided, salt and pepper noise in the original gray level image of the video sequence is eliminated, and the low signal-to-noise ratio and the quality reduction of the image are prevented; the specific implementation mode is as follows:
f(x,y)=median{g(s,t)}s,t∈Sxy
wherein g (S, t) is the original gray image of the video sequence, f (x, y) is the filtered gray image, SxyIs a filtering window; selecting a mask window containing (2n +1) × (2n +1) odd-numbered points, wherein n is a positive integer, placing the middle point of the window at a certain pixel point of an original gray image, sequencing all numerical values in the window according to the size, selecting an intermediate value to replace the original value of the pixel point of the image, sliding the window to the next pixel point until the whole image is traversed, and removing noise to obtain all gray information including a moving target and a background;
then, an edge detection unit detects an edge contour in the video sequence gray level image after filtering processing by using an edge detection operator and uses the edge contour as feature information, and the specific implementation mode is as follows:
select 1 convolution window of 3 x 3 as follows:
8 neighborhood P representing image pixel point (x, y)ijSetting edge detection operators of four gradient directions of 0 degrees, 45 degrees, 90 degrees and 135 degrees as follows:
respectively connecting four edge detection operators with PijConvolution operation is carried out to obtain gradient values in four directions as follows:
T0=(P31+2P32+P33)-(P11+2P12+P13)
T45=(P23+2P33+P32)-(P12+2P11+P21)
T90=(P13+2P23+P33)-(P11+2P21+P31)
T135=(P12+2P13+P23)-(P21+2P31+P32)
the final resultant gradient T (x, y) takes the maximum of the four gradient values as follows:
T(x,y)=max(T0,T45,T90,T135)
further according to 8 neighborhoods P of image pixel points (x, y)ijThe image edge dynamic threshold t (x, y) is set as follows:
comparing the image edge dynamic threshold T (x, y) with the gradient value T (x, y) corresponding to the window, so as to obtain a video sequence binary edge image, as follows:
then, the moving target positioning unit compares the values of all corresponding pixel points in two continuous frames of images in the video sequence by using an interframe difference method; the formula is as follows:
di(x,y)=|hi(x,y)-hi-1(x,y)|
in the formula, hi(x, y) represents the pixel value of the current i frame image pixel point (x, y), h of the video sequencei-1(x, y) represents the value of the point in the previous frame image, di(x, y) is a differential value; comparing the obtained difference value with a difference dynamic threshold J to judge whether a moving object exists or not; if the difference value diIf (x, y) is larger than the differential dynamic threshold J, the corresponding position is considered to move, and if the differential value d is larger than the differential dynamic threshold J, the corresponding position is considered to movei(x, y) if the difference dynamic threshold value J is smaller than the difference dynamic threshold value J, hiding the point, namely completing the extraction of the moving target to obtain a binary image only containing the moving target, and taking the edge of the binary image only containing the moving target as a moving target calibration frame;
the formula is as follows:
the dynamic threshold J of the difference in the formula is the average value [ h ] of the pixel points of two continuous frames of imagesi(x,y)+hi-1(x,y)]/2;
And finally, storing the data completing the positioning of the moving target into a preprocessing data unit so as to facilitate the reading and storage of the data by the target identification module at any time.
5. The moving object identification method based on the FPGA according to claim 4, wherein the third specific method is as follows:
firstly, a target recognition module takes out data in a preprocessing data storage unit and puts the data into a preprocessing unit, extracts content in a moving target positioning frame in the data content as an image to be detected, divides the pixel of the image to be detected by 256 and converts the pixel into a [0,1] interval, adjusts the size of the image to be detected to 416 x 416 according to the length-width ratio of the image to be detected, fills 0.5 in the insufficient part to obtain an image array of 416 x 3, and stores the processed image array to be detected into an on-chip cache;
then, the input characteristic diagram of the image to be detected after the pre-processing unit and the convolution kernel weight of the image recognition neural network used in the neural network detection unit are quantized in a fixed point mode by a quantization unit and stored in an on-chip cache, wherein the quantization mode is as follows:
xfixed=(int)(xfloat*2q)
xfloat=(float)(xfixed*2-q)
in the formula, xfixedRepresenting fixed point number, xfloatRepresenting a floating point number, and q represents the bit width of the fixed point number;
and then the data in the on-chip cache is taken out by the neural network detection unit, the target information is identified by utilizing the convolutional neural network model, the array containing the category information of the moving target object is output, and finally the array containing the category information of the moving target object is restored to the size of the image to be detected by the post-processing unit according to the length-width ratio of the image to be detected, so that the image identification is completed, and the identification result is written into the display data storage unit.
6. The FPGA-based moving object recognition method of claim 5, further characterized in that, the neural network for image recognition of the neural network detection unit is implemented by a Yolov2 neural network; YOLOv2 consists of 32 layers in total, including 23 convolutional layers, 5 max pooling layers, 2 routing layers, 1 reordering layer, and 1 final check; the convolution layer function is to perform convolution on the feature graph of the input layer by using a corresponding convolution kernel to realize feature extraction; the pooling layer adopts a maximum pooling method to carry out downsampling on the input characteristic diagram, reduces the scale of the characteristic diagram, simultaneously has the function of keeping certain invariance and generally follows the convolution layer; the routing layer cascades the multi-layer output characteristic graphs to serve as the next layer input characteristic graph, and multi-dimensional characteristic information is fused substantially; the reordering layer samples the input characteristic diagram in the neighborhood according to positions, and the pixels at the same position output to form a sub-diagram; the detection layer is used as the last layer of the YOLOv2 convolutional neural network and is responsible for carrying out analogy and output on the detection result.
Technical Field
The moving target identification is one of basic problems in the technical field of image processing, and aims to utilize a computer vision technology and a digital image processing technology to automatically analyze and process an acquired video so as to judge whether a moving target exists in the video, and the moving target identification has important application in the fields of public safety, intelligent navigation, video monitoring, aerospace, medical processing, industrial production and the like. With the rise and rapid development of the deep learning technology, the moving object recognition algorithm based on the deep learning draws wide attention and is greatly developed, and compared with the traditional moving object recognition algorithm, the moving object recognition algorithm based on the deep learning has better performance.
With the improvement of the accuracy and the real-time performance of moving target identification, the computation complexity and the memory requirement of a deep learning algorithm are also increased sharply, the current general processor cannot meet the computation requirement, and a Field-Programmable Gate Array (FPGA) is used as a high-performance and low-power-consumption Programmable chip, can be suitable for small-batch streaming application, is very suitable for processing the inference stage of deep learning, and can greatly shorten the development period. Therefore, the FPGA is used for identifying the moving target, so that the real-time detection speed and the low-power-consumption running state can be better achieved, and the method is suitable for the real application scene of moving target identification.
The invention content is as follows:
the technical problem to be solved by the invention is as follows: the invention provides a moving target identification system and method based on an FPGA (field programmable gate array), which can effectively position a moving object in real time and accurately identify the moving object, and solve the problems of low identification speed and high power consumption of the conventional moving target identification device.
The technical scheme of the invention is as follows:
a moving target recognition system based on FPGA comprises a data signal acquisition module, a data signal preprocessing module, a target recognition module and a data signal display module.
The data signal acquisition module comprises a CMOS camera, a data conversion unit, an acquisition data storage unit and an acquisition module control unit.
The data signal preprocessing module comprises a filtering unit, an edge detection unit, a moving target positioning unit and a preprocessing data storage unit.
The target identification module comprises a pre-processing unit, a quantification unit, a neural network detection unit and a post-processing unit.
The data signal display module comprises a display data storage unit, a display module control unit and a VGA displayer.
Firstly, a data signal acquisition module is responsible for acquiring a video image of a target area in real time, converting the acquired video image into a gray image and storing the gray image into an acquisition data storage unit; sequentially reading out the video image data stored in the acquired data storage unit by a data signal preprocessing module, removing noise after median filtering to obtain all gray information including a moving object and a background, obtaining a real-time binary image after edge detection processing, extracting the moving object in a video sequence by using an interframe difference method to complete the positioning of a moving object calibration frame, and storing the positioned data in a preprocessed data storage unit; then the target identification module is responsible for taking the content in the moving target calibration frame in the preprocessing data storage unit as an image to be detected, inputting the image to be detected into the neural network detection unit, completing the identification of the image to be detected by utilizing the image identification neural network, and storing the identification result in the display data storage unit; and finally, displaying the identification result of the moving target of the video sequence in the display data storage unit by the data signal display module.
A moving target identification method based on FPGA comprises the following steps:
the method comprises the following steps: the data signal acquisition module finishes the real-time video image acquisition of the target area and converts and stores the acquired data image in a data format.
Step two: the data signal preprocessing module is responsible for orderly reading out the video sequence image data stored in the acquired data storage unit, obtaining a real-time binary image through median filtering denoising and edge detection, extracting the moving target by utilizing an interframe difference method, and completing the positioning and storage of a moving target calibration frame.
Step three: the content in the moving target calibration frame in the preprocessing data storage unit is used as an image to be detected through the target identification module, the image to be detected is identified through the preprocessing unit and the quantization unit by utilizing the neural network detection unit, and the identification result is stored through the post-processing unit.
Step four: and the data signal display module sequentially takes out the data stored in the display data storage unit through the display module control unit, and then displays the identification result aiming at the moving target in the video sequence through the VGA display.
The method comprises the following specific steps:
firstly, an acquisition module control unit controls a CMOS camera to shoot a target area in real time so as to obtain a real-time video sequence, and then the RGB format needs to be converted into the YCbCr format in a data conversion unit because the output format of the CMOS camera is the RGB format, so that a gray level image of the video sequence is obtained. The specific formula is as follows:
in the formula: y represents the luminance signal and represents it as the gray scale of the image pixel, Cb represents the difference between the blue part of the RGB input signal and the luminance value of the RGB signal, Cr represents the difference between the red part of the RGB input signal and the luminance value of the RGB signal, R, G, B represents the gray scale values of the three color components red, green and blue, respectively.
And then storing the video sequence images in the YCbCr format into a collected data storage unit.
The second specific method comprises the following steps:
firstly, the data signal preprocessing module takes out the original gray level image of the video sequence in the acquired data storage unit to the filtering unit, and the filtering processing is carried out on the original gray level image of the video sequence by using a median filtering method, so that the influence of environmental change and noise interference is avoided, the salt and pepper noise in the original gray level image of the video sequence is eliminated, and the low signal-to-noise ratio and the quality reduction of the image are prevented. The specific implementation mode is as follows:
f(x,y)=median{g(s,t)} s,t∈Sxy
wherein g (S, t) is the original gray image of the video sequence, f (x, y) is the filtered gray image, SxyIs a filtering window. Selecting a mask window containing (2n +1) × (2n +1) odd-numbered points, wherein n is a positive integer, placing the middle point of the window at a certain pixel point of an original gray image, sequencing all numerical values in the window according to the size, selecting an intermediate value to replace the original value of the pixel point of the image, sliding the window to the next pixel point until the whole image is traversed, and removing noise to obtain all gray information including a moving target and a background.
Then, an edge detection unit detects an edge contour in the video sequence gray level image after filtering processing by using an edge detection operator and uses the edge contour as feature information, and the specific implementation mode is as follows:
select 1 convolution window of 3 x 3 as follows:
8 neighborhood P representing image pixel point (x, y)ijSet at 0 °45 °, 90 °, 135 ° of the four gradient directions, as follows:
respectively connecting four edge detection operators with PijConvolution operation is carried out to obtain gradient values in four directions as follows:
T0=(P31+2P32+P33)-(P11+2P12+P13)
T45=(P23+2P33+P32)-(P12+2P11+P21)
T90=(P13+2P23+P33)-(P11+2P21+P31)
T135=(P12+2P13+P23)-(P21+2P31+P32)
the final resultant gradient T (x, y) takes the maximum of the four gradient values as follows:
T(x,y)=max(T0,T45,T90,T135)
further according to 8 neighborhoods P of image pixel points (x, y)ijThe image edge dynamic threshold t (x, y) is set as follows:
comparing the image edge dynamic threshold T (x, y) with the gradient value T (x, y) corresponding to the window, so as to obtain a video sequence binary edge image, as follows:
and then, the moving target positioning unit compares the values of all corresponding pixel points in two continuous frames of images in the video sequence by using an interframe difference method. The formula is as follows:
di(x,y)=|hi(x,y)-hi-1(x,y)|
in the formula, hi(x, y) represents the pixel value of the current i frame image pixel point (x, y), h of the video sequencei-1(x, y) represents the value of the point in the previous frame image, di(x, y) is a differential value. And comparing the obtained difference value with a difference dynamic threshold J to judge whether a moving object exists. If the difference value diIf (x, y) is larger than the differential dynamic threshold J, the corresponding position is considered to move, and if the differential value d is larger than the differential dynamic threshold J, the corresponding position is considered to moveiAnd (x, y) hiding the point if the (x, y) is smaller than the differential dynamic threshold J, namely completing the extraction of the moving target to obtain a binary image only containing the moving target, and taking the edge of the binary image only containing the moving target as a moving target calibration frame.
The formula is as follows:
the dynamic threshold J of the difference in the formula is the average value [ h ] of the pixel points of two continuous frames of imagesi(x,y)+hi-1(x,y)]/2。
And finally, storing the data completing the positioning of the moving target into a preprocessing data unit so as to facilitate the reading and storage of the data by the target identification module at any time.
The third specific method comprises the following steps:
firstly, the target recognition module takes out data in the preprocessing data storage unit and puts the data in the preprocessing data storage unit into the preprocessing unit, extracts content in a moving target positioning frame in the data content as an image to be detected, divides the pixel of the image to be detected by 256 and converts the pixel into a [0,1] interval, adjusts the size of the image to be detected to 416 x 416 according to the length-width ratio of the image to be detected, fills 0.5 in the insufficient part, obtains an image array of 416 x 3, and stores the processed image array to be detected into an on-chip cache.
Then, the input characteristic diagram of the image to be detected after the pre-processing unit and the convolution kernel weight of the image recognition neural network used in the neural network detection unit are quantized in a fixed point mode by a quantization unit and stored in an on-chip cache, wherein the quantization mode is as follows:
xfixed=(int)(xfloat*2q)
xfloat=(float)(xfixed*2-q)
in the formula, xfixedRepresenting fixed point number, xfloatRepresenting a floating point number and q representing the bit width of the fixed point number.
And then the data in the on-chip cache is taken out by the neural network detection unit, the target information is identified by utilizing the convolutional neural network model, and an array containing the category information of the moving target object is output. And finally, restoring the array containing the category information of the moving target object to the size of the image to be detected by the post-processing unit according to the length-width ratio of the image to be detected, completing the identification of the image, and writing the identification result into the display data storage unit.
Furthermore, the image recognition neural network of the neural network detection unit adopts a YOLOv2 neural network; YOLOv2 consists of 32 layers in total, including 23 convolutional layers, 5 max pooling layers, 2 routing layers, 1 reordering layer, and 1 final check. The convolution layer function is to perform convolution on the feature graph of the input layer by using a corresponding convolution kernel to realize feature extraction; the pooling layer adopts a maximum pooling method to carry out downsampling on the input characteristic diagram, reduces the scale of the characteristic diagram, simultaneously has the function of keeping certain invariance and generally follows the convolution layer; the routing layer cascades the multi-layer output characteristic graphs to serve as the next layer input characteristic graph, and multi-dimensional characteristic information is fused substantially; the reordering layer samples the input characteristic diagram in the neighborhood according to positions, and the pixels at the same position output to form a sub-diagram; the detection layer is used as the last layer of the YOLOv2 convolutional neural network and is responsible for carrying out analogy and output on the detection result.
The invention has the following beneficial effects:
the invention provides a moving target identification system and method based on FPGA, which fully utilize the correlation between the operation resource of FPGA and the adjacent frames of video, more quickly position moving objects in a video sequence and identify the types of the objects, and better achieve the real-time detection speed and the low-power-consumption running state, thereby being suitable for the outdoor application scene of moving target identification.
Drawings
FIG. 1 is a block diagram of a hardware implementation platform of an embodiment of the invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
The method of the present invention will be described in such full, clear and detailed manner as to make the objects and effects of the invention more apparent from the drawings attached hereto.
As shown in fig. 1, the moving object recognition system based on FPGA of the present invention includes a data signal acquisition module, a data signal preprocessing module, an object recognition module, and a data signal display module.
The data signal acquisition module comprises a CMOS camera, a data conversion unit, an acquisition data storage unit and an acquisition module control unit.
The data signal preprocessing module comprises a filtering unit, an edge detection unit, a moving target positioning unit and a preprocessing data storage unit.
The target identification module comprises a pre-processing unit, a quantification unit, a neural network detection unit and a post-processing unit.
The data signal display module comprises a display data storage unit, a display module control unit and a VGA displayer.
Firstly, a data signal acquisition module is responsible for acquiring a video image of a target area in real time, converting the acquired video image into a gray image and storing the gray image into an acquisition data storage unit; sequentially reading out the video image data stored in the acquired data storage unit by a data signal preprocessing module, removing noise after median filtering to obtain all gray information including a moving object and a background, obtaining a real-time binary image after edge detection processing, extracting the moving object in a video sequence by using an interframe difference method to complete the positioning of a moving object calibration frame, and storing the positioned data in a preprocessed data storage unit; then the target identification module is responsible for taking the content in the moving target calibration frame in the preprocessing data storage unit as an image to be detected, inputting the image to be detected into the neural network detection unit, completing the identification of the image to be detected by utilizing the image identification neural network, and storing the identification result in the display data storage unit; and finally, displaying the identification result of the moving target of the video sequence in the display data storage unit by the data signal display module.
As shown in fig. 2, a moving object identification method based on FPGA includes the following steps:
the method comprises the following steps: the data signal acquisition module finishes the real-time video image acquisition of the target area and converts and stores the acquired data image in a data format.
Firstly, an acquisition module control unit controls a CMOS camera to shoot a target area in real time so as to obtain a real-time video sequence, and then the RGB format needs to be converted into the YCbCr format in a data conversion unit because the output format of the CMOS camera is the RGB format, so that a gray level image of the video sequence is obtained. The specific formula is as follows:
in the formula: y represents the luminance signal and represents it as the gray scale of the image pixel, Cb represents the difference between the blue part of the RGB input signal and the luminance value of the RGB signal, Cr represents the difference between the red part of the RGB input signal and the luminance value of the RGB signal, R, G, B represents the gray scale values of the three color components red, green and blue, respectively.
And then storing the video sequence image in the YCbCr format into a collected data storage unit so as to facilitate the data signal preprocessing module to read and store the data at any time.
Step two: the data signal preprocessing module is responsible for orderly reading out the video sequence image data stored in the acquired data storage unit, obtaining a real-time binary image through median filtering denoising and edge detection, extracting the moving target by utilizing an interframe difference method, and completing the positioning and storage of a moving target calibration frame.
Firstly, the data signal preprocessing module takes out the original gray level image of the video sequence in the acquired data storage unit to the filtering unit, and the filtering processing is carried out on the original gray level image of the video sequence by using a median filtering method, so that the influence of environmental change and noise interference is avoided, the salt and pepper noise in the original gray level image of the video sequence is eliminated, and the low signal-to-noise ratio and the quality reduction of the image are prevented. The specific implementation mode is as follows:
f(x,y)=median{g(s,t)} s,t∈Sxy
wherein g (S, t) is the original gray image of the video sequence, f (x, y) is the filtered gray image, SxyIs a filtering window. Selecting a mask window containing (2n +1) × (2n +1) odd-numbered points, wherein n is a positive integer, placing the middle point of the window at a certain pixel point of an original gray image, sequencing all numerical values in the window according to the size, selecting an intermediate value to replace the original value of the pixel point of the image, sliding the window to the next pixel point until the whole image is traversed, and removing noise to obtain all gray information including a moving target and a background.
And then, an edge detection unit detects the edge contour in the video sequence gray level image after filtering processing by using an edge detection operator and uses the edge contour as characteristic information, so that a large amount of invalid information can be filtered, and the interference in the process of extracting the moving target is reduced. The specific implementation mode is as follows:
select 1 convolution window of 3 x 3 as follows:
8 neighborhood P representing image pixel point (x, y)ijSetting edge detection operators of four gradient directions of 0 degrees, 45 degrees, 90 degrees and 135 degrees as follows:
respectively connecting four edge detection operators with PijConvolution operation is carried out to obtain gradient values in four directions as follows:
T0=(P31+2P32+P33)-(P11+2P12+P13)
T45=(P23+2P33+P32)-(P12+2P11+P21)
T90=(P13+2P23+P33)-(P11+2P21+P31)
T135=(P12+2P13+P23)-(P21+2P31+P32)
the final resultant gradient T (x, y) takes the maximum of the four gradient values as follows:
T(x,y)=max(T0,T45,T90,T135)
further according to 8 neighborhoods P of image pixel points (x, y)ijThe image edge dynamic threshold t (x, y) is set as follows:
comparing the image edge dynamic threshold T (x, y) with the gradient value T (x, y) corresponding to the window, so as to obtain a video sequence binary edge image, as follows:
and then, the moving target positioning unit compares the values of all corresponding pixel points in two continuous frames of images in the video sequence by using an interframe difference method. The formula is as follows:
di(x,y)=|hi(x,y)-hi-1(x,y)|
in the formula, hi(x, y) represents the pixel value of the current i frame image pixel point (x, y), h of the video sequencei-1(x, y) represents the value of the point in the previous frame image, di(x, y) is a differential value. And comparing the obtained difference value with a difference dynamic threshold J to judge whether a moving object exists. If the difference value diIf (x, y) is larger than the differential dynamic threshold J, the corresponding position is considered to move, and if the differential value d is larger than the differential dynamic threshold J, the corresponding position is considered to moveiAnd (x, y) hiding the point if the (x, y) is smaller than the differential dynamic threshold J, namely completing the extraction of the moving target to obtain a binary image only containing the moving target, and taking the edge of the binary image only containing the moving target as a moving target calibration frame.
The formula is as follows:
the dynamic threshold J of the difference in the formula is the average value [ h ] of the pixel points of two continuous frames of imagesi(x,y)+hi-1(x,y)]/2。
And finally, storing the data completing the positioning of the moving target into a preprocessing data unit so as to facilitate the reading and storage of the data by the target identification module at any time.
Step three: the content in the moving target calibration frame in the preprocessing data storage unit is used as an image to be detected through the target identification module, the image to be detected is identified through the preprocessing unit and the quantization unit by utilizing the neural network detection unit, and the identification result is stored through the post-processing unit.
According to the structure, parameter and operation amount of the convolutional neural network used by the neural network detection unit and the power consumption, detection speed and application scene required by the whole moving target recognition system, the YOLOv2 convolutional neural network is selected as the neural network for recognizing object types in the image and applied to the FPGA, and the YOLOv2 neural network is composed of a convolutional layer, a pooling layer, a routing layer and a reordering layer.
Firstly, the target recognition module takes out data in the preprocessing data storage unit and puts the data in the preprocessing data storage unit into the preprocessing unit, extracts content in a moving target positioning frame in the data content as an image to be detected, divides the pixel of the image to be detected by 256 and converts the pixel into a [0,1] interval, adjusts the size of the image to be detected to 416 x 416 according to the length-width ratio of the image to be detected, fills 0.5 in the insufficient part, obtains an image array of 416 x 3, and stores the processed image array to be detected into an on-chip cache.
Then, the input feature map of the image to be detected after the pre-processing unit and the convolution kernel weight of the YOLOv2 convolution neural network are quantized in a fixed point mode by a quantization unit and stored in an on-chip cache, wherein the quantization mode is as follows:
xfixed=(int)(xfloat*2q)
xfloat=(float)(xfixed*2-q)
in the formula, xfixedRepresenting fixed point number, xfloatRepresenting a floating point number and q representing the bit width of the fixed point number. Through quantification, the data transmission amount and the resource consumed by calculation can be reduced on the premise of keeping the accuracy unchanged.
And then the data in the on-chip cache is taken out by the neural network detection unit, the target information is identified by a YOLOv2 neural network model, and an image array with the size of 13 x 425 and containing the moving target object type information is output. YOLOv2 consists of 32 layers in total, of which 23 convolutional layers (conv), 5 maximum pooling layers (max), 2 routing layers (route), 1 reordering layer (reorg) and 1 last detection layer (detection) are involved. The convolution layer has the main function of performing convolution on the feature graph of the input layer by using a corresponding convolution kernel to realize feature extraction; the pooling layer adopts a maximum pooling method to carry out downsampling on the input characteristic diagram, reduces the scale of the characteristic diagram, simultaneously has the function of keeping certain invariance (rotation, translation, expansion and contraction and the like), and generally follows the convolution layer; the routing layer cascades the multi-layer output characteristic graphs to serve as the next layer input characteristic graph, and multi-dimensional characteristic information is fused substantially; the reordering layer samples the input characteristic diagram in the neighborhood according to positions, and the pixels at the same position output to form a sub-diagram; the detection layer is used as the last layer of the YOLOv2 convolutional neural network and is responsible for carrying out analogy and output on the detection result.
And finally, restoring the image array of 13 × 425 to the size of the image to be detected by the post-processing unit according to the aspect ratio of the image to be detected, completing the identification of the image, and writing the identification result into the data storage unit.
Step four: and the data signal display module sequentially takes out the data stored in the display data storage unit through the display module control unit, and then displays the identification result aiming at the moving target in the video sequence through the VGA display.
Table 1 is the network structure of the YOLOv2 convolutional neural network.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种相似度计算方法和多目标跟踪方法