Method and device for detecting indoor falling of old people based on FPGA and deep learning
1. An indoor fall detection method for the old based on FPGA and deep learning is characterized by comprising the following steps:
s1, acquiring gesture picture data of the human body in a preset gesture, and identifying the position of a human body frame in the gesture picture;
s2, constructing a deep learning network model of the human body frame, and training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor fall detection of the old;
s3, loading the trained deep learning network model into an FPGA platform; identifying a video stream picture to be detected by utilizing the FPGA platform;
and S4, outputting an early warning signal to perform voice reminding and alarm when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture falls down.
2. The method for detecting indoor falls of the elderly people based on FPGA and deep learning of claim 1, wherein in step S1, the attitude picture data in the preset attitude comprises MSCOCO, MPII, LSP and FLIC public body attitude data sets, and the attitude picture data in the preset attitude further comprises a body attitude data set collected by the user himself;
the position identification of the human body frame is identified by adopting a semantic segmentation marking tool labelme, and the human body frame is a rectangular frame containing a human body.
3. The method for detecting indoor falls of elderly people based on FPGA and deep learning of claim 1, wherein step S2 comprises the following steps:
s21, marking the position of the human body frame to obtain the coordinates (Xc, Yc) of the center point of the rectangular human body frame, the width W and the height H;
s22, taking the marked attitude pictures as the input of a deep learning network, and performing feature extraction on each attitude picture by using a Convolutional Neural Network (CNN) to obtain a feature map;
and S23, calculating the characteristic diagram by using the full connection layer FC to obtain the coordinates of the rectangular human body frame.
4. The method for detecting indoor falls of elderly people based on FPGA and deep learning of claim 3, wherein in step S22, the convolutional neural network CNN is used to normalize the input gesture picture data and adjust the image size, and each gesture picture is feature extracted by using eight layers of convolutional neural networks:
the first layer is a convolution layer;
the second layer is a maximum pooling layer;
the third layer is a convolution layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a convolution layer;
the sixth layer is a convolution layer, and the convolution kernels of the sixth layer and the fifth layer have the same size and step length;
the seventh layer is a convolution layer, and the convolution kernels of the seventh layer and the fifth layer have the same size and step length;
the eighth layer is the largest pooling layer.
5. The method for detecting indoor falling of the elderly people based on FPGA and deep learning of claim 3, wherein in step S23, the feature map is processed through the full connection layer FC to obtain four coordinates (X ', Y ') (X ', Y ') of the rectangular human body frame, where X ' represents the coordinate of the center point, X ' represents the width of the rectangular human body frame, and Y ' represents the height of the rectangular human body frame;
the four coordinates (X ', Y' X ', Y') of the rectangular human body frame adopt a normalization processing mode, namely, the horizontal and vertical coordinate values are respectively divided by the width and the height of the image pixel.
6. The method for detecting the indoor falling of the elderly based on the FPGA and the deep learning as claimed in claim 5, wherein the basis for judging the falling state of the elderly is as follows:
aspect ratio (X') of the current human body/Y``)<1, and the gravity center lowering speed Vy of the human body>5m/s;
The method for calculating the descending speed of the center of gravity of the human body is Vy ═ Sx (Y ″)t-Y``t-1)×FP;
Wherein S is the actual length represented by each pixel, Y ″t-1Is the coordinate of the center point of the previous frame of human body in the video stream, Y ″tThe coordinate of the center point of the human body of the current frame in the video stream is shown, and FP is the frame rate of the video.
7. Indoor detection device that tumbles of old man based on FPGA and deep learning, its characterized in that includes:
the image acquisition module is used for acquiring posture picture data of a human body in a preset posture;
the human body frame identification module is used for identifying the position of a human body frame in the gesture picture;
the model training module is used for constructing a deep learning network model of the human body frame, training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor falling detection of the old;
the fall detection module is used for loading the trained deep learning network model into the FPGA platform; identifying a video stream picture to be detected by utilizing the FPGA platform;
and the early warning module is used for outputting an early warning signal to carry out voice reminding and alarm when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture is falling.
8. The device for detecting indoor falling of the elderly based on FPGA and deep learning of claim 7, wherein in the image acquisition module, the attitude picture data in the preset attitude comprises MSCOCO, MPII, LSP and FLIC public body attitude data sets, and the attitude picture data in the preset attitude further comprises a body attitude data set collected by the user;
in the human body frame identification module, the position identification of the human body frame is identified by adopting a semantic segmentation marking tool labelme, and the human body frame is a rectangular frame containing a human body.
9. The device for detecting indoor fall of the elderly based on FPGA and deep learning of claim 7, wherein the model training module comprises:
the first data processing submodule is used for obtaining the coordinates (Xc, Yc) of the center point of the rectangular human body frame, the width W and the height H after marking the position of the human body frame;
the second data processing submodule is used for taking the marked attitude pictures as the input of a deep learning network and extracting the characteristics of each attitude picture by using a Convolutional Neural Network (CNN) to obtain a characteristic picture;
and the third data processing submodule is used for calculating the characteristic diagram by using the full connection layer FC to obtain the coordinates of the rectangular human body frame.
10. The device for detecting indoor fall of the elderly based on FPGA and deep learning of claim 9, wherein in the second data processing sub-module, normalization processing is performed on the input attitude picture data through the convolutional neural network CNN, the image size is adjusted, and feature extraction is performed on each attitude picture by using eight layers of convolutional neural networks:
the first layer is a convolution layer;
the second layer is a maximum pooling layer;
the third layer is a convolution layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a convolution layer;
the sixth layer is a convolution layer, and the convolution kernels of the sixth layer and the fifth layer have the same size and step length;
the seventh layer is a convolution layer, and the convolution kernels of the seventh layer and the fifth layer have the same size and step length;
the eighth layer is a maximum pooling layer;
in the third data processing submodule, the feature map is subjected to full connection layer FC to obtain four coordinates (X ', Y ') (X ', Y ') of the rectangular human body frame, wherein X ' represents the coordinate of the central point, X ' represents the width of the rectangular human body frame, and Y ' represents the height of the rectangular human body frame;
four coordinates (X ', Y' X ', Y') of the rectangular human body frame adopt a normalization processing mode, namely, the horizontal and vertical coordinate values are respectively divided by the width and the height of an image pixel;
in the falling detection module, the basis for judging the falling state of the old man is as follows:
the width-to-height ratio (X '/Y') <1 of the current human body, and the gravity center descending speed Vy of the human body is >5 m/s;
the method for calculating the falling speed of the center of gravity of the human body is VyS×(Y``t-Y``t-1)×FP;
Wherein S is the actual length represented by each pixel, Y ″t-1Is the coordinate of the center point of the previous frame of human body in the video stream, Y ″tThe coordinate of the center point of the human body of the current frame in the video stream is shown, and FP is the frame rate of the video.
Background
According to the seventh national census data, the population of 60 years old and above in China reaches 2.6 hundred million, which accounts for 18.7 percent of the total population of China, wherein the proportion of the population of 65 years old and above reaches 13.5 percent, and China has entered into a deeply aging society. For the elderly, the fall caused by the inconvenience of action is one of the health damages. The technical guideline for the tumble intervention of old people published by the ministry of health indicates that tumble is the first cause of injury and death of old people over 65 years old in China. If the elderly fall over, the results may be worsened without timely assistance. How to detect the falling of the old people and to give an alarm and inform rescue workers in time when the old people fall has great market value.
In the traditional technology, there are three main types of old people fall detection methods: wearable sensor methods, acoustic signal methods, and video image methods.
The wearable sensor method is characterized in that devices such as an acceleration sensor and a gyroscope are embedded into wearable equipment, and parameters such as angular velocity and acceleration of a human body are collected in real time. When the characteristic parameters are changed greatly, whether a fall event occurs can be judged through a certain algorithm. The wearable sensor method needs the old to wear special detection equipment, the old is prone to forgetting due to the decline of memory function, and meanwhile the worn equipment enables the old to feel uncomfortable, so the wearable sensor method is low in market acceptance.
The acoustic signal method judges the falling state by identifying the sound emitted by the old people when falling, has high requirements on the environment, and is difficult to accurately detect under certain environmental noises.
Video imaging methods use image processing techniques to construct a human fall model to detect fall status. And the video image processing algorithm based on deep learning can accurately identify the posture of human behavior. However, the deep learning algorithm has a high computational power requirement, is often processed in the cloud, and has a privacy hidden danger when transmitting the video stream containing the human body posture to the cloud for processing, and has a long time delay.
Disclosure of Invention
Therefore, the invention provides the old people indoor falling detection method and device based on the FPGA and the deep learning, realizes the falling state detection early warning of the old people, and solves the problems of complexity, low accuracy and efficiency and lack of privacy protection of the traditional detection technology.
In order to achieve the above purpose, the invention provides the following technical scheme: an indoor falling detection method for the old based on FPGA and deep learning comprises the following steps:
s1, acquiring gesture picture data of the human body in a preset gesture, and identifying the position of a human body frame in the gesture picture;
s2, constructing a deep learning network model of the human body frame, and training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor fall detection of the old;
s3, loading the trained deep learning network model into an FPGA platform; identifying a video stream picture to be detected by utilizing the FPGA platform;
and S4, outputting an early warning signal to perform voice reminding and alarm when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture falls down.
As a preferred scheme of the elderly indoor fall detection method based on FPGA and deep learning, in step S1, the posture picture data in the preset posture includes a MSCOCO, MPII, LSP and FLIC public human posture data set, and the posture picture data in the preset posture also includes a human posture data set collected by the user;
the position identification of the human body frame is identified by adopting a semantic segmentation marking tool labelme, and the human body frame is a rectangular frame containing a human body.
As a preferred scheme of the elderly indoor fall detection method based on FPGA and deep learning, step S2 includes the following steps:
s21, marking the position of the human body frame to obtain the coordinates (Xc, Yc) of the center point of the rectangular human body frame, the width W and the height H;
s22, taking the marked attitude pictures as the input of a deep learning network, and performing feature extraction on each attitude picture by using a Convolutional Neural Network (CNN) to obtain a feature map;
and S23, calculating the characteristic diagram by using the full connection layer FC to obtain the coordinates of the rectangular human body frame.
As a preferred scheme of the elderly indoor fall detection method based on FPGA and deep learning, in step S22, normalizing input posture picture data by the convolutional neural network CNN and adjusting the image size, and performing feature extraction on each posture picture by using an eight-layer convolutional neural network:
the first layer is a convolution layer;
the second layer is a maximum pooling layer;
the third layer is a convolution layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a convolution layer;
the sixth layer is a convolution layer, and the convolution kernels of the sixth layer and the fifth layer have the same size and step length;
the seventh layer is a convolution layer, and the convolution kernels of the seventh layer and the fifth layer have the same size and step length;
the eighth layer is the largest pooling layer.
As a preferred scheme of the old people indoor fall detection method based on FPGA and deep learning, in step S23, the feature map obtains four coordinates (X ', Y', X ", Y") of a rectangular human body frame through a full connection layer FC, (X ', Y') represents a center point coordinate, X "represents a width of the rectangular human body frame, and Y" represents a height of the rectangular human body frame;
the four coordinates (X ', Y', X ', Y') of the rectangular human body frame adopt a normalization processing mode, namely, the horizontal and vertical coordinate values are respectively divided by the width and the height of the image pixel.
As an optimal scheme of the old people indoor falling detection method based on FPGA and deep learning, the basis for judging the falling state of the old people is as follows:
the width-to-height ratio (X '/Y') <1 of the current human body, and the gravity center descending speed Vy of the human body is >5 m/s;
the method for calculating the descending speed of the center of gravity of the human body is Vy ═ Sx (Y ″)t-Y``t-1)×FP;
Wherein S is the actual length represented by each pixel, Y ″t-1Is the coordinate of the center point of the previous frame of human body in the video stream, Y ″tThe coordinate of the center point of the human body of the current frame in the video stream is shown, and FP is the frame rate of the video.
The invention also provides an indoor fall detection device for the old based on the FPGA and the deep learning, which comprises:
the image acquisition module is used for acquiring posture picture data of a human body in a preset posture;
the human body frame identification module is used for identifying the position of a human body frame in the gesture picture;
the model training module is used for constructing a deep learning network model of the human body frame, training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor falling detection of the old;
the fall detection module is used for loading the trained deep learning network model into the FPGA platform; identifying a video stream picture to be detected by utilizing the FPGA platform;
and the early warning module is used for outputting an early warning signal to carry out voice reminding and alarm when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture is falling.
As a preferred scheme of the old people indoor fall detection device based on the FPGA and the deep learning, in the image acquisition module, the attitude picture data in the preset attitude comprises a MSCOCO, MPII, LSP and FLIC public human body attitude data set, and the attitude picture data in the preset attitude further comprises a human body attitude data set collected by the user;
in the human body frame identification module, the position identification of the human body frame is identified by adopting a semantic segmentation marking tool labelme, and the human body frame is a rectangular frame containing a human body.
As an optimal scheme of the indoor fall detection device for the old people based on FPGA and deep learning, the model training module comprises:
the first data processing submodule is used for obtaining the coordinates (Xc, Yc) of the center point of the rectangular human body frame, the width W and the height H after marking the position of the human body frame;
the second data processing submodule is used for taking the marked attitude pictures as the input of a deep learning network and extracting the characteristics of each attitude picture by using a Convolutional Neural Network (CNN) to obtain a characteristic picture;
and the third data processing submodule is used for calculating the characteristic diagram by using the full connection layer FC to obtain the coordinates of the rectangular human body frame.
As an optimal scheme of the old people indoor fall detection device based on the FPGA and the deep learning, in the second data processing sub-module, normalization processing is performed on input attitude picture data through the convolutional neural network CNN, the size of an image is adjusted, and feature extraction is performed on each attitude picture by using an eight-layer convolutional neural network:
the first layer is a convolution layer;
the second layer is a maximum pooling layer;
the third layer is a convolution layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a convolution layer;
the sixth layer is a convolution layer, and the convolution kernels of the sixth layer and the fifth layer have the same size and step length;
the seventh layer is a convolution layer, and the convolution kernels of the seventh layer and the fifth layer have the same size and step length;
the eighth layer is a maximum pooling layer;
in the third data processing submodule, the feature map is subjected to full connection layer FC to obtain four coordinates (X ', Y ', X ', Y ') (X ', Y ') of the rectangular human body frame, wherein X ' represents the coordinate of the central point, X ' represents the width of the rectangular human body frame, and Y ' represents the height of the rectangular human body frame;
the four coordinates (X ', Y', X ', Y') of the rectangular human body frame adopt a normalization processing mode, namely, the horizontal and vertical coordinate values are respectively divided by the width and the height of the image pixel.
As an optimal scheme of the indoor falling detection device for the old people based on FPGA and deep learning, in a falling detection module, the basis for judging the falling state of the old people is as follows:
the width-to-height ratio (X '/Y') <1 of the current human body, and the gravity center descending speed Vy of the human body is >5 m/s;
the method for calculating the descending speed of the center of gravity of the human body is Vy ═ Sx (Y ″)t-Y``t-1)×FP;
Wherein S is the actual length represented by each pixel, Y ″t-1Is the coordinate of the center point of the previous frame of human body in the video stream, Y ″tThe coordinate of the center point of the human body of the current frame in the video stream is shown, and FP is the frame rate of the video.
The invention has the following advantages: acquiring gesture picture data of a human body in a preset gesture, and identifying the position of a human body frame in the gesture picture; constructing a deep learning network model of a human body frame, and training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor falling detection of the old; loading the trained deep learning network model into an FPGA platform; recognizing a video stream picture to be detected by using an FPGA platform; and when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture falls down, outputting an early warning signal to perform voice reminding and alarming. According to the method, the FPGA is used as a core, the characteristics of high speed, large data processing capacity, hardware programming design and the like of the FPGA are fully utilized, a deep learning algorithm is implanted into the FPGA, and the instantaneity and privacy of human body falling detection are enhanced; the invention adopts the deep learning neural network model to automatically extract the coordinates of the human body frame, and has high accuracy; the invention can effectively solve the problem of detecting the falling of the old people, alarm in time and effectively protect the health of the old people.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.
Fig. 1 is a schematic diagram of an indoor fall detection process of the elderly based on FPGA and deep learning according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a deep learning network model adopted for elderly indoor fall detection based on FPGA and deep learning in an embodiment of the present invention;
fig. 3 is a schematic diagram of an indoor fall detection device for the elderly based on FPGA and deep learning according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Referring to fig. 1 and 2, an indoor fall detection method for the elderly based on FPGA and deep learning is provided, which includes the following steps:
s1, acquiring gesture picture data of the human body in a preset gesture, and identifying the position of a human body frame in the gesture picture;
s2, constructing a deep learning network model of the human body frame, and training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor fall detection of the old;
s3, loading the trained deep learning network model into an FPGA platform; identifying a video stream picture to be detected by utilizing the FPGA platform;
and S4, outputting an early warning signal to perform voice reminding and alarm when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture falls down.
In this embodiment, in step S1, the posture picture data in the preset posture includes a human posture data set disclosed by MSCOCO, MPII, LSP, and FLIC, and the posture picture data in the preset posture further includes a human posture data set collected by the user;
the position identification of the human body frame is identified by adopting a semantic segmentation marking tool labelme, and the human body frame is a rectangular frame containing a human body.
Specifically, the MSCOCO dataset is a large image dataset developed and maintained by microsoft, and the tasks of frequency aggregation include identification, segmentation and detection. The COCO API provides Matlab, Python and lua API interfaces that can provide for the loading, analysis and visualization of complete image tag data.
The MPII body posture data set is a reference for body posture estimation, and comprises 25000 marked pictures of more than 40k persons, which are extracted from the You Tube video. The body part occlusion, 3D trunk and head direction labels are also recorded in the test set.
LSP is a sports pose data set that is divided into sports, badminton, baseball, gymnastics, running, football, volleyball and tennis categories, and contains about 2000 pose annotations, all of which are from Flickr's sportsmen.
FLIC intercepts a movie clip from Hollywood, and although a scene may contain a plurality of persons, the annotation information only contains joint information of one person, and the total number of joints is 11.
Specifically, labelme is an image annotation tool developed by the computer science of Massachusetts Institute of Technology (MIT) and the Artificial Intelligence laboratory (CSAIL), and one can use the tool to create a customized annotation task or perform image annotation, and the project source code is already open.
In this embodiment, step S2 includes the following steps:
s21, marking the position of the human body frame to obtain the coordinates (Xc, Yc) of the center point of the rectangular human body frame, the width W and the height H;
s22, taking the marked attitude pictures as the input of a deep learning network, and performing feature extraction on each attitude picture by using a Convolutional Neural Network (CNN) to obtain a feature map; the convolutional neural network CNN comprises 5 convolutional layers and 3 pooling layers to obtain a characteristic diagram;
and S23, calculating the characteristic diagram by using the full connection layer FC to obtain the coordinates of the rectangular human body frame.
In this embodiment, in step S22, the convolutional neural network CNN is used to normalize the input attitude picture data and adjust the image size, and each attitude picture is subjected to feature extraction by using an eight-layer convolutional neural network:
the first layer is a convolution layer;
the second layer is a maximum pooling layer;
the third layer is a convolution layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a convolution layer;
the sixth layer is a convolution layer, and the convolution kernels of the sixth layer and the fifth layer have the same size and step length;
the seventh layer is a convolution layer, and the convolution kernels of the seventh layer and the fifth layer have the same size and step length;
the eighth layer is the largest pooling layer.
Referring to fig. 2, specifically, the convolutional neural network performs normalization processing on input image data, adjusts the size of an image to 224 × 224, and performs feature extraction on each image by using 8 layers of convolutional neural networks:
the first layer is a convolution layer, a 224 × 224 image is initialized, and 110 × 32 is generated after the image passes through a convolution filter with the size of 9 × 9 and the step size of 2, and then the image enters the next layer;
the second layer is the largest pooling layer with the size of 3 × 3 and the step size of 2, so that 55 × 32 output is obtained and enters the third layer;
the third layer is a convolution layer, the convolution kernel is 5 × 5, the step size is 2, and 26 × 86 output is obtained;
the fourth layer is the largest pooling layer with a size of 3 × 3 and a step size of 2, resulting in an output of 13 × 32;
the fifth layer is a convolution layer, the convolution kernel size is 3 × 3, the step size is 1, and output of 13 × 128 is obtained;
the sixth layer is a convolution layer, and the convolution kernel size and the step length of the sixth layer are the same, so that 13 × 128 output is obtained;
the seventh layer is a convolution layer, and the size and the step length of the convolution kernel of the fifth layer are the same, so that 13 × 128 output is obtained;
the eighth layer is the largest pooling layer with a size of 3 × 3 and step size of 2, resulting in a 13 × 128 signature.
Specifically, the full-connection FC output layer is further included, and the output is the coordinates (X ', Y') of the center point of the rectangular human body frame, the width (X ') and the height Y').
In this embodiment, in step S23, the feature map is processed through the full connection layer FC to obtain four coordinates (X ', Y', X ", Y") of the rectangular human body frame, (X ', Y') represents the coordinates of the central point, X "represents the width of the rectangular human body frame, and Y" represents the height of the rectangular human body frame;
the four coordinates (X ', Y', X ', Y') of the rectangular human body frame adopt a normalization processing mode, namely, the horizontal and vertical coordinate values are respectively divided by the width and the height of the image pixel.
Specifically, the basis for judging the falling state of the old man is as follows:
the width-to-height ratio (X '/Y') <1 of the current human body, and the gravity center descending speed Vy of the human body is >5 m/s;
the method for calculating the descending speed of the center of gravity of the human body is Vy ═ Sx (Y ″)t-Y``t-1)×FP;
Wherein S is the actual length represented by each pixel, Y ″t-1Is the coordinate of the center point of the previous frame of human body in the video stream, Y ″tThe coordinate of the center point of the human body of the current frame in the video stream is shown, and FP is the frame rate of the video.
In this embodiment, the trained neural network model refers to a neural network model trained by using MSCOCO, MPII, LSP, FLIC to disclose human body posture data sets and self-built data sets, and the self-built data sets refer to images of different behaviors of a human body collected in an actual process of the model, and then positions of human body frames of users are marked by adopting a manual marking method, and some unsatisfactory images are removed.
Referring to fig. 1, the elderly fall detection based on the FPGA mainly applies an offline trained neural network model to real-time elderly fall detection. After the FPGA is initialized, loading trained deep learning network model parameters, and imaging indoors by imaging equipment, wherein the size of each image is 224 × 224; the FPGA takes image data as input and transmits the image data to a deep learning program framework in the FPGA, the loaded network model parameters are utilized to carry out old people falling detection, and if the old people falling phenomenon exists in the currently processed image, an alarm is started.
Specifically, the video acquisition process for detecting the indoor falling of the old people takes FPGA as a core, adopts an SAA7113H video decoding chip and passes through I2The C bus protocol carries out initialization configuration on the camera and is externally connected with a camera simulating PAL/NTAL system.
Specifically, the adopted FPGA chip is Zynq series Zynq-7020 developed by Xilinx company; the adopted video decoding chip is SAA7113H, and the video decoding chip is controlled by FPGA through I2The C bus completes the configuration and initialization process and outputs 8 bits of video data compatible with CCIR 656.
In this embodiment, the deep learning neural network has the advantages of high detection rate and high detection speed, but the calculation amount is large, and the effect of real-time detection cannot be achieved by adopting an embedded processor or a server CPU. In order to reduce the operation load of the FPGA, the imaging algorithm part is placed in one FPGA independent of the detection of the dangerous goods, and the forward transfer operation in the network model is placed in the other FPGA, so that the accelerated operation of a deep learning network is realized, and the whole detection process of the dangerous goods has real-time performance. Compared with the traditional CPU, the FPGA platform has better concurrency and higher processing speed; compared with the traditional GPU, the method has finer granularity of concurrent operation and concurrent execution efficiency; and the method has the advantage of better function customizability compared with an application-specific integrated circuit chip.
Example 2
Referring to fig. 3, the present invention further provides an indoor fall detection device for the elderly based on FPGA and deep learning, including:
the image acquisition module 1 is used for acquiring posture picture data of a human body in a preset posture;
the human body frame identification module 2 is used for identifying the position of a human body frame in the gesture picture;
the model training module 3 is used for constructing a deep learning network model of the human body frame, training the deep learning network model by using the marked attitude picture, and obtaining the trained deep learning network model for the indoor falling detection of the old;
the falling detection module 4 is used for loading the trained deep learning network model into the FPGA platform; identifying a video stream picture to be detected by utilizing the FPGA platform;
and the early warning module 5 is used for outputting an early warning signal to carry out voice reminding and alarm when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture is falling.
In this embodiment, in the image acquisition module 1, the attitude picture data in the preset attitude includes MSCOCO, MPII, LSP, and FLIC public human body attitude data sets, and the attitude picture data in the preset attitude further includes a human body attitude data set collected by the user;
in the human body frame identification module 2, the position identification of the human body frame is identified by adopting a semantic segmentation marking tool labelme, and the human body frame is a rectangular frame containing a human body.
In this embodiment, the model training module 3 includes:
the first data processing submodule 31 is configured to obtain coordinates (Xc, Yc) of a center point of the rectangular human body frame, a width W, and a height H after marking the position of the human body frame;
the second data processing submodule 32 is configured to take the labeled posture picture as an input of the deep learning network, and perform feature extraction on each posture picture by using a Convolutional Neural Network (CNN) to obtain a feature map;
and the third data processing submodule 33 is configured to perform operation on the feature map by using the full connection layer FC to obtain coordinates of the rectangular human body frame.
In this embodiment, in the second data processing sub-module 32, the convolutional neural network CNN is used to perform normalization processing on the input attitude picture data and adjust the image size, and each attitude picture is subjected to feature extraction by using eight layers of convolutional neural networks:
the first layer is a convolution layer;
the second layer is a maximum pooling layer;
the third layer is a convolution layer;
the fourth layer is a maximum pooling layer;
the fifth layer is a convolution layer;
the sixth layer is a convolution layer, and the convolution kernels of the sixth layer and the fifth layer have the same size and step length;
the seventh layer is a convolution layer, and the convolution kernels of the seventh layer and the fifth layer have the same size and step length;
the eighth layer is a maximum pooling layer;
in the third data processing sub-module 33, the feature map passes through the full connection layer FC to obtain four coordinates (X ', Y', X ", Y") of the rectangular human body frame, where (X ', Y') represents the coordinates of the center point, X "represents the width of the rectangular human body frame, and Y" represents the height of the rectangular human body frame;
the four coordinates (X ', Y', X ', Y') of the rectangular human body frame adopt a normalization processing mode, namely, the horizontal and vertical coordinate values are respectively divided by the width and the height of the image pixel.
In this embodiment, in the fall detection module 4, the basis for determining the fall state of the old man is:
the width-to-height ratio (X '/Y') <1 of the current human body, and the gravity center descending speed Vy of the human body is >5 m/s;
the method for calculating the descending speed of the center of gravity of the human body is Vy ═ Sx (Y ″)t-Y``t-1)×FP;
Wherein S is the actual length represented by each pixel, Y ″t-1Is the coordinate of the center point of the previous frame of human body in the video stream, Y ″tThe coordinate of the center point of the human body of the current frame in the video stream is shown, and FP is the frame rate of the video.
In summary, the invention acquires the posture picture data of the human body in the preset posture, and identifies the position of the human body frame in the posture picture; constructing a deep learning network model of a human body frame, and training the deep learning network model by using the marked attitude picture to obtain the trained deep learning network model for the indoor falling detection of the old; loading the trained deep learning network model into an FPGA platform; recognizing a video stream picture to be detected by using an FPGA platform; and when the FPGA platform judges that the human body state of a certain frame of picture in the video stream picture falls down, outputting an early warning signal to perform voice reminding and alarming. According to the method, the FPGA is used as a core, the characteristics of high speed, large data processing capacity, hardware programming design and the like of the FPGA are fully utilized, a deep learning algorithm is implanted into the FPGA, and the instantaneity and privacy of human body falling detection are enhanced; the invention adopts the deep learning neural network model to automatically extract the coordinates of the human body frame, and has high accuracy; the invention can effectively solve the problem of detecting the falling of the old people, alarm in time and effectively protect the health of the old people.
It should be noted that, because the contents of information interaction, execution process, and the like between the modules/units of the apparatus are based on the same concept as the method embodiment in embodiment 1 of the present application, the technical effect brought by the contents is the same as the method embodiment of the present application, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.
Example 3
Embodiment 3 of the present invention provides a computer-readable storage medium storing therein a program code for an elderly indoor fall detection method based on FPGA and deep learning, the program code including instructions for executing the elderly indoor fall detection method based on FPGA and deep learning of embodiment 1 or any possible implementation thereof.
The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Example 4
Embodiment 4 of the present invention provides an electronic device, where the electronic device includes a processor, the processor is coupled to a storage medium, and when the processor executes an instruction in the storage medium, the electronic device is enabled to execute the method for detecting an indoor fall of an elderly person based on FPGA and deep learning in embodiment 1 or any possible implementation manner thereof.
Specifically, the processor may be implemented by hardware or software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated in the processor, located external to the processor, or stand-alone.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.