Intelligent health detection method and device, electronic equipment and readable storage medium
1. A health intelligence detection method, characterized in that the method comprises:
acquiring an audio signal, and preprocessing the audio signal to obtain a detection signal;
converting the detection signal into a digital matrix;
inputting the obtained digital matrix as a detection sample into a health intelligent detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network;
wherein said converting said detected audio signal into a digital matrix comprises:
framing and shifting the detection audio signal to obtain a multi-frame detection signal;
determining a power spectrum and a periodogram of each frame of detection signals through Fourier transform;
carrying out Mel filtering transformation on the power spectrum and the periodic diagram of each frame of detection signals to obtain Mel spectrum energy of each frame of detection signals;
discrete cosine transform is carried out on the Mel frequency spectrum energy of each frame of detection signals to obtain a digital matrix;
the intelligent health detection model is obtained by training by adopting the following method: acquiring a training audio sample and a test audio sample, wherein the training audio sample comprises human voice audio, emotion voice audio and cough audio for training, and the test audio sample is the cough audio for testing;
and training the human voice audio, the emotion voice audio, the cough audio for training and the cough audio for testing in sequence, and adjusting the parameters of the intelligent health detection model to obtain the final intelligent health detection model.
2. The method of claim 1, wherein the audio signal is a cough sound;
the acquiring an audio signal and preprocessing the audio signal to obtain a detection signal includes:
acquiring a cough sound signal with the duration of 3-30s through audio acquisition equipment of a detection terminal;
and performing noise cleaning on the cough sound signal to delete invalid, irrelevant, damaged or incomplete signals, and taking the cleaned cough sound signal as a detection signal.
3. The method of claim 1, wherein the shape of the number matrix is determined according to selected parameters including at least a sampling frequency, a tracking duration, and a number of coefficients.
4. The method of claim 1, wherein the structure of the health intelligence detection model is formed by connecting a plurality of different convolutional neural networks together; and each convolutional neural network is improved through transfer learning.
5. The method of claim 4, wherein the number and parameters of fully connected layers and convolutional layers of each convolutional neural network of the health intellectual detection model are determined based on transfer learning training.
6. An intelligent health detection device, comprising:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring an audio signal and preprocessing the audio signal to obtain a detection audio signal;
the signal processing unit is used for converting the detected audio signal into a digital matrix;
the detection unit is used for inputting the obtained digital matrix as a detection sample into the intelligent health detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network;
the signal processing unit is specifically configured to perform framing and frame shifting on the detected audio signal to obtain a multi-frame detection signal;
determining a power spectrum and a periodogram of each frame of detection signals through Fourier transform;
carrying out Mel filtering transformation on the power spectrum and the periodic diagram of each frame of detection signals to obtain Mel spectrum energy of each frame of detection signals;
discrete cosine transform is carried out on the Mel frequency spectrum energy of each frame of detection signals to obtain a digital matrix;
the intelligent health detection model is obtained by training by adopting the following method: acquiring a training audio sample and a test audio sample, wherein the training audio sample comprises human voice audio, emotion voice audio and cough audio for training, and the test audio sample is the cough audio for testing;
and training the human voice audio, the emotion voice audio, the cough audio for training and the cough audio for testing in sequence, and adjusting the parameters of the intelligent health detection model to obtain the final intelligent health detection model.
7. An electronic device, comprising: a processor; and
memory arranged to store computer executable instructions, wherein said executable instructions, when executed, cause said processor to perform the method of claims 1 to 5.
8. A computer readable storage medium storing one or more programs which, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of claims 1-5.
Background
Along with the improvement of people's living standard, people pay more and more attention to the healthy state of self, along with the outbreak of epidemic situation with spread, artificial intelligence can utilize the information of data source to make better decision, and is more extensive in the application in each field, makes people's life more convenient, along with the continuous development of artificial intelligence technique, also appears more application in the aspect of medical science.
In the aspect of health detection, for example, chinese patent CN 111629663 a discloses a method for diagnosing respiratory system diseases by analyzing cough sounds with disease characteristics, which adopts a single neural network for training and the model thereof is a logistic regression model, and has the problems of low detection precision, large calculation amount and poor pertinence.
Based on this, a health intelligent detection method with high accuracy, small calculation amount and user friendliness is urgently needed.
Disclosure of Invention
The embodiment of the application provides a health intelligent detection method, a health intelligent detection device, an electronic device and a readable storage medium, so as to overcome or at least partially overcome the defects of the prior art.
In a first aspect, a health intelligent detection method is provided, including:
acquiring an audio signal, and preprocessing the audio signal to obtain a detection signal;
converting the detection signal into a matrix digital matrix;
inputting the obtained matrix digital matrix serving as a detection sample into a health intelligent detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network.
Optionally, in the above method, the audio signal is a cough sound;
the acquiring an audio signal and preprocessing the audio signal to obtain a detection signal includes:
acquiring a cough sound signal with the duration of 3-30s through audio acquisition equipment of a detection terminal;
and performing noise cleaning on the cough sound signal to delete invalid, irrelevant, damaged or incomplete signals, and taking the cleaned cough sound signal as a detection signal.
Optionally, in the method, the converting the detected audio signal into a digital matrix includes:
framing and shifting the detection audio signal to obtain a multi-frame detection signal;
determining a power spectrum and a periodogram of each frame of detection signals through Fourier transform;
carrying out Mel filtering transformation on the power spectrum and the periodic diagram of each frame of detection signals to obtain Mel spectrum energy of each frame of detection signals;
discrete cosine transform is carried out on the Mel frequency spectrum energy of each frame of detection signals to obtain a digital matrix.
Optionally, in the above method, the shape of the number matrix is determined according to selected parameters, the selected parameters including at least a sampling frequency, a tracking duration and a number of coefficients.
Optionally, in the above method, the intelligent health detection model is obtained by training using the following method: acquiring a training audio sample and a test audio sample, wherein the training audio sample comprises human voice audio, emotion voice audio and cough audio for training, and the test audio sample is the cough audio for testing;
and training the human voice audio, the emotion voice audio, the cough audio for training and the cough audio for testing in sequence, and adjusting the parameters of the intelligent health detection model to obtain the final intelligent health detection model.
Optionally, in the above method, the structure of the health intelligent detection model is formed by connecting a plurality of different convolutional neural networks. And each convolutional neural network is improved through transfer learning.
Optionally, in the method, the number and parameters of the fully-connected layers and convolutional layers of each convolutional neural network of the health intelligent detection model are determined by training based on transfer learning.
In a second aspect, a health intelligence detection apparatus is provided, the apparatus comprising:
the device comprises an acquisition unit, a detection unit and a processing unit, wherein the acquisition unit is used for acquiring an audio signal and preprocessing the audio signal to obtain a detection audio signal;
the signal processing unit is used for converting the detected audio signal into a digital matrix;
the detection unit is used for inputting the obtained digital matrix as a detection sample into the intelligent health detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform any of the methods described above.
In a fourth aspect, this application embodiment also provides a computer-readable storage medium storing one or more programs which, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform any of the methods described above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:
the method combines the transfer learning and the convolutional neural network, obtains the healthy intelligent detection model through training, and the healthy intelligent detection model can determine whether people are in a healthy state or not through testing audio signals of people. Compared with the prior art, the method has the advantages that the components or part of the components of the convolutional neural network are retrained based on transfer learning, so that the accuracy of human health detection is obviously improved; and the intelligent health detection model in the application is a classification model, has small calculated amount, can be deployed in the mobile terminal of people, is convenient to use, and greatly improves the use experience of users.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 shows a schematic flow diagram of a health intelligence detection method according to an embodiment of the present application;
FIG. 2 shows a schematic structural diagram of a health intelligence detection apparatus, according to an embodiment of the present application;
FIG. 3 shows a schematic diagram of MVT data interaction according to one embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
With the outbreak and spread of the novel coronavirus pneumonia (Covid-19, called new crown for short), people are inconvenient to see a doctor, especially in some areas with serious epidemic situations, people are difficult to see a doctor in time after getting ill, and in order to meet the requirement that people urgently want to know the state of illness, the health detection method is provided.
FIG. 1 is a schematic flow chart of a health intelligence detection method according to an embodiment of the present application, and as can be seen from FIG. 1, the method at least includes steps S110 to S130:
step S110: and acquiring an audio signal, and preprocessing the audio signal to obtain a detection signal.
The method is based on machine learning, detects the health state of a user, and firstly acquires the audio signal of the user.
The audio signal may be, but is not limited to, a sound signal collected by the smart terminal when the user coughs or breathes. For example, audio signals of a microphone of the intelligent terminal are collected at 16000Hz, a time sequence of the audio signals is obtained, the audio signals represent a collected point signal size with 16 bits, and are monaural, for example, the audio signals are (2, 4, 100, 120, 140, 60, -60, -130, …), and the interval time of each point is =1/16000 seconds.
In some embodiments of the present application, noise cleaning may be performed on the cough sound signal to remove invalid, irrelevant, damaged, or incomplete signals therein, and the cleaned cough sound signal is used as a detection signal. The means of pretreatment can be referred to the prior art, and the application is not limited.
Step S120: and converting the detection signals into a digital matrix.
After the detection signal is obtained, the detection signal cannot be directly used as input data of the intelligent health detection model, but needs to be converted into a digital matrix, and the digital matrix is used as the input data of the intelligent health detection model.
In some embodiments of the present application, specific transformation procedures may be referred to as follows: framing and shifting the detection audio signal to obtain a multi-frame detection signal; determining a power spectrum and a periodogram of each frame of detection signals through Fourier transform; carrying out Mel filtering transformation on the power spectrum and the periodic diagram of each frame of detection signals to obtain Mel spectrum energy of each frame of detection signals; discrete cosine transform is carried out on the Mel frequency spectrum energy of each frame of detection signals to obtain a digital matrix.
The time sequence is used for taking out a group of data according to a certain rule, the group of data is called a frame of data, for example, 512 data are taken out each time, the group of data is called a frame of data, the process is divided into frames, and the quantity of the data taken out each time can be set according to calculated quantity, generally 512 or 1024. Specifically, in one frame of data, data of frequency points of the audio signal is extracted, which is related to frequency point resolution, for example, when the frequency point resolution is 16000, 512 data are extracted each time, since 16000/512=31.25Hz, that is, in a frequency domain of 0-8000Hz, only information of frequency points of 31.25 × N, N = an integer of 1-256, can be obtained.
In some embodiments of the present application, framing (short frames, such as 20-40 millisecond frames) can minimize spectral leakage From Fourier Transform (FFT) or Discrete Fourier Transform (DFT). The first key parameter is the frame length, i.e., the total number of samples per frame (e.g., 16,000 samples per second for a 16kHz signal, 400 samples per second for a25 millisecond frame).
In some embodiments of the present application, in order to improve the detection accuracy, in the framing process, each time of the value is taken, not from the tail of the previous frame data, but from the middle position of the previous frame data, in this embodiment, the second frame is taken from the middle position of the first frame, that is, from 257 frequency points, and the amplitudes corresponding to 512 frequency points are taken. For convenience of processing, a frame number is set for each frame of audio signal, and the frame number is sequentially increased.
In this application, the frame shift, which is typically shorter than the frame length (e.g., 10 milliseconds), allows overlap with the frame. The MFCC coefficients are finally extracted for each frame. Symbol: s _ i (n) where n is from 1 to F, where F is the total frame length and i is within the total number of frames.
The amplitude values corresponding to the frequency points in the audio signal frame are subjected to Fourier transform, and are combined according to the time sequence to form the power spectrum of the audio signal frame. That is, the power spectrum of each frame can be represented by using a one-dimensional array (a 1, a2, a3, …, a 256), corresponding to amplitudes of 31.25Hz, 62.5Hz, 93.75Hz, …, 8000Hz, respectively.
Fourier transform or discrete fourier transform, converts the original signal from the time domain to the frequency domain. This generates a "power spectrum" and a "periodogram of power spectra" (frequency is the X-axis) for each frame. Using DFT (discrete Fourier transform) of each frame, which contains parameters: n determines the sampling point of the long window (e.g., Hanning window), K is the length of the DFT. That is, the output of the DFT is scaled by its input k (frequency axis): it is the range of input frequencies, from 1 to K. Finally, each S _ i (k) is adjusted by the power spectrum estimate.
DFT of each frame:
,
periodogram estimation for each s _ i:
。
a mel filter is then applied to the power spectrum, which specifies a number of filters (typically 26-40). Each filter is a vector representing a particular energy level (which corresponds to some fraction of the frequency range being non-zero). The filter energy may be generated for each filter by multiplying each filter with the power spectrum and then adding all coefficients. Positive and negative values represent the concentration of spectral energy (in low or high frequencies). Mathematically: each filter is represented as a vector with K entries, where K represents the length of the DFT (the range of input frequencies). It is non-zero in a specific part of the total frequency range, which represents the energy level. The main parameters include: the X number of the filter (typically 26-40) selects the upper/lower limit frequency (e.g. "lower limit frequency" 300Hz, "upper limit frequency" 8000Hz; this is limited by the audio sampling frequency).
Taking the logarithm of all X filter energies as an example, this would result in an X log filter energy.
Taking the DCT (discrete cosine transform) of the X log filter energy as an example, this results in X cepstral coefficients. The resulting X-cepstral Coefficients are MFCCs (Mel Frequency cepstral Coefficients).
Step S130: inputting the obtained matrix digital matrix serving as a detection sample into a health intelligent detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network.
Finally, the obtained matrix digital matrix is used as a detection sample and is input into a healthy intelligent detection model to obtain a detection result, wherein the healthy intelligent detection model is a classification model, generally a binary model, and the given result can be whether the diagnosis is confirmed or not, for example, the diagnosis result of the novel coronavirus pneumonia is a healthy or dangerous area.
In the application, the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network. Compared with the prior art, the test model obtained by training the training sample by simply adopting the convolutional neural network has the advantage that the detection accuracy is obviously improved.
The health intelligent detection model is based on a Convolutional Neural Network (CNN) algorithm component (including back propagation, max circulation, and received linear activation) and a design concept of transfer learning. The CNN algorithm is a classification algorithm, not a regression algorithm. The CNN performs feature extraction by detecting macro-level features of data (similar to image classification) and is not limited to a specific physical region of an audio signal. Structure of CNN: CNNs consist of different types of layers (layers): 1. convolutional layers (capacitive layers), 2. pooling layers (passivating layers), 3. fully connected layers (dense layers), 4.dropout layers. The most important parameter, except for the layer type, is the activation function (activation function). This activation function controls the behavior of the learning process. Examples of activation functions include: ReLU, Softmax, tanH, Sigmoid functions. The role of each layer of CNN is as follows:
1. convolution layers are mainly used for feature extraction in input data, convolution operations being performed between the input data and a filter of a particular size M' and vector dot products being captured between the filter and some parts of the input image.
2. The Pooling layer is intended to reduce the size and computational complexity of convolution operated features (convolutional features), and the types of Pooling layers include Max Pooling, Average Pooling, and Sum Pooling.
3. Fully Connected Layers (FCLs), or dense layers, are used to connect neurons between two different layers, where the classification process takes place.
4. The Dropout layer, to prevent over-fitting (over fitting), is where the neuron falls during the training process.
Considerations regarding the training and migration learning framework: the neural network training framework separates test data from training data, specifies X (input class) and Y (output class), runs the neural network on the training data to form parameter values, and evaluates the accuracy of the test data. Transfer learning is a general design principle, not a specific algorithm. It does not apply to every layer, but refers to the principle of removing and adding layers in any overall neural network design, and selecting which layers to remove and retrain.
As can be seen from the method shown in FIG. 1, the intelligent health detection model is obtained through training by combining the transfer learning with the convolutional neural network, and can determine whether people are in a healthy state or not by testing audio signals of people. Compared with the prior art, the method has the advantages that the components or part of the components of the convolutional neural network are retrained based on transfer learning, so that the accuracy of human health detection is obviously improved; and the intelligent health detection model in the application is a classification model, has small calculated amount, can be deployed in the mobile terminal of people, is convenient to use, and greatly improves the use experience of users.
In some embodiments of the present application, in the above method, the audio signal is a cough sound; the acquiring an audio signal and preprocessing the audio signal to obtain a detection signal includes: and acquiring the cough sound signal with the duration of 3-30s by the audio acquisition equipment of the detection terminal.
In some embodiments of the present application, the present application can determine whether the user is infected with a new crown by the cough sound of people, and therefore, in this case, the collected audio signal is the cough sound of the user, and it is usually enough to obtain the cough sound signal with a duration of 3-30s by the audio collecting device of the detection terminal.
In some embodiments of the present application, the following three improvements are mainly made to the training of the health intelligent detection model: firstly, training and designing: correct order of training data and correct selection of data; second, a concatenation of a plurality of neural networks; and third, a new selection of parameters optimized for MFCC parameters for the health intelligence detection model.
For the first item, in some embodiments of the present application, in the above method, the health intelligence test model is trained by the following method: acquiring a training audio sample and a test audio sample, wherein the training audio sample comprises human voice audio, emotion voice audio and cough audio for training, and the test audio sample is the cough audio for testing; and training the human voice audio, the emotion voice audio, the cough audio for training and the cough audio for testing in sequence, and adjusting the parameters of the intelligent health detection model to obtain the final intelligent health detection model.
Existing CNNs (Resnet-50) are redesigned using repeated migratory learning by training the human speech audio first, then the emotional speech, and finally the sick cough audio. This new design of training order improves accuracy and reduces "false negatives". The main reason for this is that without migratory learning, the unique data used for training will include thousands of negative results, but only a few hundred positive results, which is not enough to extract all the information from the positive data. Through migratory learning, one retrains the existing (trained) neural network from the new diagnostic audio data. This existing neural network already has a "basis" for feature extraction because its parameters have been trained using existing audio or image data (e.g., classifying sounds of different instruments). If the already trained neural network is trained a second time using audio data similar to the disease audio (identifying similar macroscopic features), this will increase the level of information extracted from positive cases and improve the accuracy of detecting positives. For transfer learning, it is important to accurately select the correct type of audio data that is closest to the disease diagnosis so that transfer learning can be applied and retrained using the audio of the disease diagnosis.
In some embodiments of the present application, in the method, the structure of the health intelligence detection model is formed by connecting a plurality of different convolutional neural networks. And each convolution neural network is improved through transfer learning; the number and parameters of the fully-connected layers and the convolutional layers of each convolutional neural network of the health intelligent detection model are determined by training based on transfer learning.
The new design of the neural network layer, a plurality of neural network tensors (Global Average potential 2D layer) gathered together, each component neural network can use repeated migration learning. It should be noted that: the concatenation of neural networks, which is different from the transfer learning, includes an additional layer of data input from a plurality of neural networks. For example, one design is to connect 3 different neural networks together, each of which is a CNN improved by migratory learning.
As for the second item, in some embodiments of the present application, in the above method, the shape of the number matrix is determined according to selected parameters, the selected parameters at least including sampling frequency (sampling rate), tracking duration (the track duration for each mfccs matrix), and coefficient number (n _ mfccs).
Digital parameters (parameters including the number of dense and constitutive layers, and the selection of specific neural network layers for replacement and retraining in the transfer learning) are selected and parameters are selected at the MFCC signal processing stage (including n _ MFCCs, sampling rate, and the track duration for access MFCCs matrix) to be mutually optimized and optimized to improve accuracy. The significance lies in the design principle of guiding parameter selection. For example, if the disease is not associated with a sound region but affects the entire body (e.g., respiratory rate), then we avoid using algorithms that filter specific sound regions (e.g., VAD and abdominal mean normalization). This also affects the choice of parameters such as: sr (sampling frequency, sampling rate), number of filters (filterbanks), and DCT algorithm. For example, the DCT step cannot discard coefficients (unlike ASR (automatic speech recognition), where the lower 12-13 DCT coefficients are retained and the rest are discarded). On the other hand, if the disease is related to a specific region of the vocal cords (each region corresponding to an energy level), there may be a rule to discard coefficients. Each new design rule will eventually be tested for each disease, but the existence of these design rules allows more parameter choices and complex conditions for their use to be discovered.
With respect to the above, the main parameters used in the present application include: sr (sampling rate), n _ MFCCs (number of coeffients), n _ fft, hop _ length, all of which are essential in the step of calculating the MFCC. (e.g. frame rate = sr/hop-length.) there are also secondary parameters that can be inserted, such as DCT type (type 1, 2, 3), but less emphasized. For Covid-19, our modified parameters were mainly sr and n _ mfccs, but we retained a range of possible parameter choices and tested for each disease. The parameters are not randomly selected but must follow the design principles described above (e.g., matching parameters to structural features of the disease, physiological features, and features of machine learning algorithms).
Further, data input and output formats: selected MFCC parameters: for each wav file (44.1 kHz), an mfccs matrix is generated from the parameters. The parameters include: samplerate = default, n _ MFCCs =40 is the number of MFCCs to be returned, and the resulting MFCCs matrix is numpy. T is the track duration in frames. For Covid-19 cough audio, we cut the mfccs matrix into shapes (n _ mfccs, 100), with the track duration cut to 100 frames (100/44.1 = 2.27 seconds). The format of the MFCC matrix for each output is the Python tensor (tensor).
Implementations of the present application may use any programming language, such as Python, C + +. The technical content depends only on the basic algorithms and parameters and not on the specific programming language. Different codes can build very identical algorithm steps like FFT and deep learning. In principle, the algorithm steps can be implemented using Python, C + +, Java, and other programming languages for machine learning and signal processing (haskel, scalla, F #, LISP, FORTRAN). The technical task is independent of the type of programming language: functional-oriented programming (object-oriented), object-oriented programming (functional), imperative programming (empirical), or other types.
Data input and output format (neural network training): each mfccs matrix is converted from 40 '100 to 4000' 1 and converted into a single CSV file, where each row is a single audio file, and each row has 4000 columns. Then, for neural network training, each audio file is reshaped back to 40X 100.
In some embodiments of the present application, the novel coronavirus pneumonia (Covid-19) cough sound data collection originally collected by the inventors comprises 3172 sound files in total, of which 3091 are used for the training set and 81 are used for the test set. Without retraining the health intelligence test model with Transfer Learning (Transfer Learning), with only one very basic Convolutional Neural Network (CNN) and modified Linear Unit (ReLU), the accuracy of the test is only 0.6. On the basis, the inventor introduces transfer learning into a healthy intelligent detection model, and the verification rate can be close to 0.97, and the specificity and the sensitivity exceed 0.9. Furthermore, the overall design is more robust with respect to changes in conditions. In the prior art, some technologies exist to achieve the similar precision level in the present application, such as Laguarta, Hueto, subrana (2020), etc., but the model design is very complicated, and the theoretical capability is lower than that in the present application.
FIG. 2 shows a schematic structural diagram of a health intelligence detection apparatus, according to an embodiment of the present application; as can be seen from fig. 2, the apparatus 200 comprises:
the obtaining unit 210 is configured to obtain an audio signal, and pre-process the audio signal to obtain a detected audio signal.
A signal processing unit 220, configured to convert the detected audio signal into a digital matrix.
The detection unit 230 is configured to input the obtained digital matrix as a detection sample into the intelligent health detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network.
In some embodiments of the present application, in the above apparatus, the audio signal is a cough sound; the obtaining unit 210 is configured to obtain a cough sound signal with a duration of 3 to 30s through an audio collecting device of the detection terminal; and performing noise cleaning on the cough sound signal to delete invalid, irrelevant, damaged or incomplete signals, and taking the cleaned cough sound signal as a detection signal.
In some embodiments of the present application, in the above apparatus, the signal processing unit 220 is configured to perform framing and frame shifting on the detection audio signal to obtain a multi-frame detection signal; determining a power spectrum and a periodogram of each frame of detection signals through Fourier transform; carrying out Mel filtering transformation on the power spectrum and the periodic diagram of each frame of detection signals to obtain Mel spectrum energy of each frame of detection signals; discrete cosine transform is carried out on the Mel frequency spectrum energy of each frame of detection signals to obtain a digital matrix.
In some embodiments of the application, in the above apparatus, the shape of the number matrix is determined according to selected parameters including at least a sampling frequency, a tracking duration, and a number of coefficients.
In some embodiments of the present application, in the above apparatus, the health intelligent detection model is trained by using the following method: acquiring a training audio sample and a test audio sample, wherein the training audio sample comprises human voice audio, emotion voice audio and cough audio for training, and the test audio sample is the cough audio for testing;
and training the human voice audio, the emotion voice audio, the cough audio for training and the cough audio for testing in sequence, and adjusting the parameters of the intelligent health detection model to obtain the final intelligent health detection model.
In some embodiments of the present application, in the above apparatus, the structure of the health intellectual detection system model is formed by connecting a plurality of different convolutional neural networks. And each convolutional neural network is improved through transfer learning.
In some embodiments of the present application, in the above apparatus, the number and parameters of the fully-connected layers and convolutional layers of each convolutional neural network of the health smart detection model are determined by training based on transfer learning.
It can be understood that, for the intelligent health detection device, the steps of the intelligent health detection method provided in the foregoing embodiments can be implemented, and the explanations related to the intelligent health detection method are applicable to the intelligent health detection device, and are not repeated herein.
Software and hardware implementation: a backend network version. One type of implementation runs on a network of multiple computing devices, rather than on a single computing device. For this version, there are three main software modules and one hardware module:
software module 1: the main code of the algorithm. This refers to the main algorithmic steps: data cleaning, signal processing and deep learning. This code is the same whether implemented on a single device or over a network.
The software module 2: front-end client software code to be run on the local computing device. The front end includes software for a User Interface (UI) and signal recording functionality. This requires attaching some code on top of the central code and using existing techniques. The interface may be implemented on multiple programming languages.
Front-end and back-end processing. There are three major versions: (1) running all algorithms completely in the front-end, (2) recording signals only in the front-end client, sending to the file server and running the algorithms in the back-end network, (3) running one part of the algorithms in the local device (e.g. signal processing) and another part of the algorithms in the back-end network (e.g. deep learning). Each of the three versions does not change the algorithmic processing steps in module 1.
Software module 3: software content for back-end (network) implementation and processing. This includes: interface (API) architecture code, as well as other components of the overall "framework," including software for building, testing, and managing backend software.
There are many design architectures for developing Web application program interfaces for communication between front-end and back-end processing components. The architecture includes MVC (model-view-controller) or MVT (model-view-template). The MVT architecture is an example and includes a client component (e.g., Web browser), a server component (view, model, database), and request and response classes (e.g., HTTP responses) for communication. This architecture can be implemented using different types of programming languages, such as Java and Python. In addition to the "main code" of the algorithm steps, additional code is required to develop this back-end architecture. It also requires some translation of the basic algorithm code to communicate with other components of the backend interface. Referring to fig. 3, fig. 3 shows a schematic diagram of data interaction of an MVT architecture according to an embodiment of the present application.
Other components of the back end frame include: operating systems used by the network hardware, software libraries, software for testing the interfaces, cloud management software, database management software, optimized code (parallel processing), etc. This includes all possible software components that are relevant for the backend network to finally perform the technical task.
Hardware module: network hardware. This includes a set of computing devices in the network, such as a Graphics Processor (GPU) cluster and a Central Processing Unit (CPU) cluster. If the technique is implemented or completed in part on a network, there may not be a single device storing the data, but multiple devices are required and there are multiple devices performing CPU or GPU processing. The algorithmic task is the same, regardless of which particular hardware implements it, but the speed is affected.
Software and hardware implementation: a single device version.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 4, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the intelligent health detection device on the logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
and acquiring an audio signal, and preprocessing the audio signal to obtain a detection audio signal.
And converting the detection audio signal into a digital matrix.
Inputting the obtained digital matrix as a detection sample into a health intelligent detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network.
The method executed by the health intelligence detection apparatus as disclosed in the embodiment of fig. 2 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further execute the method executed by the intelligent health detection apparatus in fig. 2, and implement the functions of the intelligent health detection apparatus in the embodiment shown in fig. 2, which are not described herein again in this embodiment of the present application.
An embodiment of the present application further provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method performed by the intelligent health detection apparatus in the embodiment shown in fig. 2, and are specifically configured to perform:
and acquiring an audio signal, and preprocessing the audio signal to obtain a detection audio signal.
And converting the detection audio signal into a digital matrix.
Inputting the obtained digital matrix as a detection sample into a health intelligent detection model to obtain a detection result; the intelligent health detection model is obtained by training a training sample by adopting transfer learning and a convolutional neural network.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.