Voice noise reduction method, device and equipment
1. A method for speech noise reduction, comprising:
collecting continuous audio signals and dividing the audio signals into a plurality of voice phonemes;
matching the voice phoneme with a phoneme model of a phoneme model library, and taking the phoneme model with the highest matching degree as a selected output phoneme model;
and carrying out waveform modification on the output factor model according to the loudness change and the duration of the collected phonemes, and then outputting.
2. A method for speech noise reduction, comprising:
collecting continuous audio signals and dividing the audio signals into a plurality of voice phonemes;
matching a part of the target speech phoneme which is in front of the target speech phoneme according to time sequence with a part of the phoneme model library, wherein the part of the target speech phoneme has the same length as the part of the phoneme model library, and taking the phoneme model with the highest matching degree as the selected output phoneme model;
carrying out waveform modification on the output factor model according to the loudness change and the duration of the collected phonemes and then outputting the output factor model;
and predicting a subsequent part of the target voice phoneme based on the output phoneme model, comparing the subsequent part with the acquired target voice phoneme, if the difference is too large, re-matching the acquired target voice phoneme with a part with the same length of the phoneme model library, and taking the phoneme model with the highest matching degree as the selected output phoneme model.
3. A method for speech noise reduction according to claim 1 or 2, wherein the phoneme model is built from individual speech phonemes collected in a quiet environment.
4. A method for speech noise reduction according to claim 1 or 2, characterized in that the method further comprises:
acquiring a reference audio signal;
the dividing of the audio signal into a plurality of speech phonemes specifically includes: the acquired continuous audio signal is segmented into a plurality of speech phonemes according to the reference audio signal.
5. The method of claim 4, wherein the reference audio signal is a bone conduction vibration signal.
6. The method of claim 4, wherein the reference audio signal is an electroencephalogram signal or a vibration signal at the throat.
7. A method for speech noise reduction according to claim 1 or 2, characterized in that the method further comprises:
comparing the collected voice phoneme with the output phoneme model, and if the background noise of the collected voice phoneme is smaller, clearer or more complete, replacing the output phoneme model with the collected voice phoneme.
8. A speech noise reduction apparatus, comprising:
a model bank memory configured to store a phoneme model bank;
a program memory configured to store a noise reduction program;
a processor configured to implement the method of any one of claims 1-7 when executing the noise reduction program.
9. A speech noise reduction apparatus, comprising:
a first audio signal acquisition device configured to acquire an audio signal;
a reference audio signal acquisition device configured to acquire reference audio;
the speech noise reduction apparatus of claim 8, connected to the first audio signal acquisition device and the reference audio signal acquisition device.
10. The speech noise reduction device of claim 9, wherein the reference audio signal acquisition means is a bone conduction vibration sensor.
Background
Along with the development of artificial intelligence technology, the interaction between people and equipment becomes more and more frequent, and wearable equipment can interact with a user at any time, so that a large number of artificial intelligence technologies are attracted to find application scenes in the field.
In order to liberate both hands and eyes of a person, voice becomes an important input mode during human-computer interaction or human-human interaction, however, in practical application, many environments are full of noise, interference is caused to collected voice signals, and great challenges are brought to voice detection and noise reduction.
At present, a large number of microphones with noise reduction functions are available in the market, and the main means is to set a sensitivity threshold value and shield sounds with lower energy; selecting the directivity by utilizing the specific position of the sound source; or filtering means to filter out high and low frequency sounds, leaving sounds in the speech frequency range, etc. And a plurality of voice noise reduction algorithms, such as LMS adaptive filters, adaptive notch filters, basic spectral subtraction, wiener filtering and the like, are used for reducing noise according to voice characteristics. However, in an environment with a low signal-to-noise ratio, even in a multi-person conversation environment, since the difference between voices of different persons is not large in a frequency domain, selective filtering is difficult, and it is difficult for these noise reduction means to achieve a better effect.
Disclosure of Invention
The invention aims to provide a voice noise reduction method, a voice noise reduction device and voice noise reduction equipment.
The purpose of the invention can be realized by the following technical scheme:
a method of speech noise reduction, comprising:
collecting continuous audio signals and dividing the audio signals into a plurality of voice phonemes;
matching the voice phoneme with a phoneme model of a phoneme model library, and taking the phoneme model with the highest matching degree as a selected output phoneme model;
the output factor model is output after waveform correction is carried out on the output factor model according to loudness change and duration of collected phonemes, a voice phoneme model base aiming at an individual is pre-established, then collected audio is divided into a plurality of voice phonemes, and finally the standard voice phoneme model is played after waveform correction is carried out on the standard voice phoneme model according to the collected voice phonemes, so that individual voice extraction can be realized in more complex voice environments with similar strength, and the effect of noise reduction is achieved.
A method of speech noise reduction, comprising:
collecting continuous audio signals and dividing the audio signals into a plurality of voice phonemes;
matching a part of the target speech phoneme which is in front of the target speech phoneme according to time sequence with a part of the phoneme model library, wherein the part of the target speech phoneme has the same length as the part of the phoneme model library, and taking the phoneme model with the highest matching degree as the selected output phoneme model;
carrying out waveform modification on the output factor model according to the loudness change and the duration of the collected phonemes and then outputting the output factor model;
and predicting a subsequent part of the target voice phoneme based on the output phoneme model, comparing the subsequent part with the acquired target voice phoneme, if the difference is too large, re-matching the acquired target voice phoneme with a part with the same length of the phoneme model library, and taking the phoneme model with the highest matching degree as the selected output phoneme model.
The phoneme model is established according to individual voice phonemes collected under a quiet environment.
The method further comprises the following steps:
acquiring a reference audio signal;
the dividing of the audio signal into a plurality of speech phonemes specifically includes: the acquired continuous audio signal is segmented into a plurality of speech phonemes according to the reference audio signal.
The reference audio signal is a bone conduction vibration signal.
The reference audio signal is an electroencephalogram signal or a vibration signal at the throat.
The method further comprises the following steps:
comparing the collected voice phoneme with the output phoneme model, and if the background noise of the collected voice phoneme is smaller, clearer or more complete, replacing the output phoneme model with the collected voice phoneme.
A speech noise reduction apparatus comprising:
a model bank memory configured to store a phoneme model bank;
a program memory configured to store a noise reduction program;
a processor configured to perform the noise reduction procedure to implement the method as described above.
A speech noise reduction apparatus comprising:
a first audio signal acquisition device configured to acquire an audio signal;
a reference audio signal acquisition device configured to acquire reference audio;
according to the voice noise reduction device, the voice noise reduction device is connected with the first audio signal acquisition device and the reference audio signal acquisition device.
The reference audio signal acquisition device is a bone conduction vibration sensor.
Compared with the prior art, the invention has the following beneficial effects:
1) the method comprises the steps of establishing a voice phoneme model base aiming at an individual in advance, dividing collected audio into a plurality of voice phonemes, performing waveform correction on a standard voice phoneme model according to the collected voice phonemes, and then playing the standard voice phoneme model, so that the voice extraction aiming at the individual can be extracted from more complex voice environments with similar strength, and the noise reduction effect is achieved.
2) And a reference audio signal is added in the phoneme segmentation process, so that the phoneme splitting effect can be effectively improved, and the accuracy and timeliness of response are further improved.
3) Matching is performed after a period of phoneme collection is finished, so that the accuracy can be improved.
4) And matching a part of collected phonemes can improve the noise reduction speed.
5) The bone conduction vibration signal is adopted as a reference signal and is matched with a bone conduction earphone, so that the cost is low and the industrialization is easy.
Drawings
FIG. 1 is a schematic diagram of a noise reduction method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a noise reduction method incorporating a reference audio signal;
fig. 3 is a schematic view of a noise reduction apparatus using bone conduction vibration signals.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
One embodiment of the present application provides an algorithm to implement speech noise reduction, and the implementation principle is specifically shown in fig. 1:
1. firstly, acquiring individual voice phonemes in a quiet environment, and establishing an individual phoneme model based on the individual voice phonemes;
2. then, dividing the collected continuous audio signals into voice phonemes, processing the voice phonemes as a unit, specifically, matching the voice phonemes with a phoneme model, and taking the closest phoneme model as an output phoneme model;
3. the output link performs waveform correction on the phoneme model so as to realize the speech acquired by simulation, and specifically, the output phoneme model is adjusted according to the loudness change and the duration of the speech phoneme acquired by current sampling so as to output the sound which is closer to the sound emitted by the current user.
Specifically, another embodiment of the present application provides an intelligent voice noise reduction chip, which includes the following contents:
1. the chip divides the collected continuous audio signals into voice phonemes and processes the voice phonemes as a unit.
2. The chip is internally provided with nonvolatile phoneme model library storage, and the speech output is formed by selecting phonemes from the phoneme model library according to a selection algorithm and splicing after certain processing.
3. In some embodiments, a high-precision mode may be supported, and when a speech phoneme collection is completed, a matching is performed with a phoneme model in the phoneme model library, and the output phoneme model with the highest matching degree is selected.
4. In some embodiments, a high-speed mode may be supported, where, at the beginning of a speech phoneme, an acquired part is matched with an initial segment of a phoneme model in a phoneme model library, the output phoneme model with the highest matching degree is selected and immediately output is started, and meanwhile, the phoneme model is used to predict subsequent audio acquisition data, the prediction result is compared with the acquisition data in real time, and when the difference is too large, the selection of the phoneme model is changed in real time to correct the error of initial model selection.
5. In some embodiments, when performing phoneme model matching, the current collected phoneme fragment is compared with the phoneme template, and if the background noise, the clarity and the integrity are better than those in the phoneme template library, the phoneme template is corrected based on the just collected phoneme, and as the usage time is longer, the phoneme library is more accurate, richer and clearer and closer to the sound of the user.
6. In some embodiments, as shown in FIG. 2, in a better configuration, the segmentation of the phonetic phonemes and the discrimination of the user's voice from other human or background noise may be performed through a reference signal channel, where the signal may more accurately distinguish the time period during which the user uttered.
The reference audio signal data has more information that can distinguish the user's own voice from the background sound. Possible reference audio data sources include:
bone conduction vibrations signal, because the structural feature of human vocal, when self vocal, vocal cords vibrations can be followed the oral cavity and spread with the mode of pronunciation on the one hand, also can cause the vibrations of skull. Due to the vibration characteristic of the skull, the vibration caused by the external sound on the skull has much smaller energy than the vibration caused by the self-voice, so that the bone conduction vibration signal can be used as a distinguishing basis for the self-voice of a user and the environmental noise, and has more accurate segmentation information compared with the method of directly identifying the voice from the mixed sound collected by the microphone. However, since the bone and air transmit different sound characteristics, bone conduction vibration cannot be directly collected and transmitted as a voice signal, but is very suitable as a reference for voice signal separation.
The electroencephalogram signal changes when a person speaks, and can be used as reference audio data to be input into the noise reduction chip after the electroencephalogram signal at a specific position is collected in real time
The vibration signal at the throat and the most obvious part of vocal cord vibration can generate a reference data stream closest to the sounding condition of a human body by collecting the vibration signal at the part, and the noise reduction chip is assisted in segmenting the signal.
Another embodiment of the present application further provides a speech noise reduction device, which has a main body of a noiseless microphone, and employs an intelligent speech noise reduction chip to perform speech noise reduction, establish a speech phoneme model library for an individual, and provide an uploading and downloading function. The intelligent voice noise reduction function can be selected by a user, whether the intelligent voice noise reduction function is used or not can be selected, in addition, the voice mixing function can be matched, when the intelligent voice noise reduction function is used, one background sound can be selected from the locally pre-stored background sounds, and the background sound and the voice with noise reduction output are subjected to real-time voice mixing output.
The pre-stored background sound can be uploaded through a data interface of the microphone, or can be recorded and stored through the microphone in advance.
The input audio data is a data stream of collected audio signals which are digitized, after the data stream enters the chip, the audio segmentation module firstly segments the data stream into a plurality of voice phoneme fragments according to the voice phoneme characteristics, and then the voice phoneme fragments are delivered to the model matching module.
Under a high-precision mode, the model matching module normalizes the voice phonemes to a certain degree, reduces non-voice frequency band signals, performs some adjustment on the signal amplitude, integrity and the like, matches the signals with models stored in a phoneme model library, finds out a phoneme model with the highest matching degree, outputs the phoneme model to the waveform correction module, obtains the adjustment parameters of the current phonemes from the model matching module by the waveform correction module, adjusts the phoneme model in the reverse direction, sends the phoneme model to the output module, and outputs the phoneme model to the output module according to a set speed. And if the model with higher matching degree is not found, storing the currently processed speech phoneme as a new phoneme model in a phoneme model library.
In the high-speed mode, the model matching module obtains current sampling data from the phoneme segmentation module, buffers part of data acquired by the current speech phoneme in the module, matching with the corresponding length part of the model in the phoneme model library, finding out the phoneme model with the highest matching degree, outputting the phoneme model to a waveform correction module, completing waveform correction according to the adjustment during matching, sending the current latest data to an output module, outputting the latest data at a set speed, at the same time, according to the currently selected model, predicting the next sampling data, sending the next sampling data to the comparison module, obtaining the next data from the input data stream by the comparison module, comparing the next data with the prediction result, and when the comparison has larger difference, searching a model with larger matching degree in the phoneme model library again, adjusting the model output, and if the comparison has smaller difference, correcting the phoneme model according to the data quality and the difference condition.
The reference audio signal is a bone conduction vibration signal, an electroencephalogram signal or a vibration signal at the throat and is collected by the reference audio signal collecting device. A method for implementing a noiseless microphone is shown in the following figure.
In one embodiment, as shown in fig. 3, the microphone includes an elastic support, a battery support, a microphone body, a front microphone, and a main circuit board. One side of the bone conduction vibration sensor which is connected and close to the head of the user is provided with the bone conduction vibration sensor, and the bone conduction signal is used as a reference audio signal and is transmitted to the main circuit board. A lithium battery is arranged in the battery support, and the power module of the main circuit board is connected through wiring in the support to supply power for the circuit. The main body part is connected with a flexible connecting rod, the head part of the main body part is provided with a forward-extending microphone sensor, and the main body part is connected with the main circuit board through the connecting rod. The front-extending microphone is a voice sensor, and the collected voice signal is used as a first audio signal and transmitted to the main circuit board.
The main circuit board is a main circuit of the noiseless microphone, a sound signal accessed by the front-extending microphone is accessed to the main control chip, the main control chip performs analog-to-digital conversion on the signal and converts the signal into audio data, and the main control chip can select two paths to send data according to personal setting:
the voice frequency data are sent to an intelligent noise reduction chip through a digital interface, the intelligent noise reduction chip carries out intelligent noise reduction on the voice under the reference of bone conduction signals and sends the noise-reduced data to a main control module, and the main control module forwards the noise-reduced data to a Bluetooth module to carry out voice frequency data sending, so that the function of a noise-free voice microphone is achieved;
the audio data are directly forwarded to the Bluetooth module for audio transmission, and the microphone is represented as a common microphone at the moment
The Bluetooth module can be connected with the smart phone under the control of the controller to transmit data or be configured through smart phone application.
In addition, the noiseless microphone can be combined with a wireless earphone to form a wireless headset, the connection between the earphone head and the controller is increased, and the functions of audio input and audio output are realized.
The use method of the noiseless microphone comprises the following steps:
the noiseless microphone can be connected with the smart phone in the form of a traditional Bluetooth microphone device, and has the functions of providing audio input for the smart phone, supporting network conversation, multi-person network conference and the like.
The smart phone can be provided with a configuration application program for the noiseless microphone, and the configuration application program can change the working mode of the noiseless microphone and switch between the noiseless mode and the common mode so as to deal with different application scenes.
The controller of the noiseless microphone can also be added with an automatic configuration function, and the noiseless mode is automatically started in a noisy environment, so that the noiseless microphone is convenient for a user.
The noiseless microphone can support a sound mixing mode, is connected with the main control chip through the Bluetooth through the smart phone application program, configures whether to use sound mixing, uploads background sound data and configures the selected background sound. Meanwhile, the smart phone can be configured to start recording and sampling the current background sound through the microphone sensor and store the current background sound in a memory on the main control board for selective use.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种联合去混响的回声消除方法及装置