Echo cancellation time delay estimation method and echo cancellation method

文档序号:9810 发布日期:2021-09-17 浏览:38次 中文

1. An echo cancellation delay estimation method, characterized in that the echo cancellation delay estimation method comprises the following steps:

setting a buffer area at a direct memory access layer;

adding sampling points for the audio signal to be played in a buffer area at a certain sampling frequency;

adding sampling points for the recorded audio signals in the buffer area at the same sampling frequency;

and obtaining the echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency.

2. The echo cancellation delay estimation method of claim 1, wherein the buffer comprises: a play buffer, a record buffer and a mix buffer.

3. The echo cancellation delay estimation method according to claim 2, wherein the echo cancellation delay estimation method comprises the steps of:

carrying and converting a certain sampling point in the playing buffer area into an analog electric signal and then driving a loudspeaker at a loudspeaker sound-playing point to sound;

sound is transmitted to a recording point recording device in the air;

the recording device converts the sound into an analog electric signal and converts the analog electric signal into a sampling point data stream;

carrying the sampling points containing the echoes to a recording buffer area;

and calculating the echo time delay by adding the sampling rate of the sampling point and the distance between the sound playing point and the sound recording point.

4. The echo cancellation delay estimation method according to any one of claims 1 to 3, wherein the playback point and the recording point are connected to the buffer via the same bus.

5. An echo cancellation method, characterized in that the echo cancellation method comprises the steps of:

setting a buffer area at a direct memory access layer;

adding reference sound sampling points for audio signals to be played in a buffer area at a certain sampling frequency;

adding recording sampling points for the recorded audio signals in the buffer area at the same sampling frequency;

obtaining echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency;

obtaining a reference sound sampling point matched and corresponding to the recording sampling point in the buffer area according to the echo time delay, thereby obtaining a mixed audio frequency of the recording audio frequency and the corresponding reference sound audio frequency;

splitting a path of recording signal and a path of reference sound signal according to the mixed audio;

and executing an echo cancellation processing algorithm to obtain a clean recording signal.

6. The echo cancellation method of claim 5, wherein the buffer comprises: a play buffer, a record buffer and a mix buffer.

7. The method of claim 6, wherein each recording sample received in the recording buffer is retrieved from the play buffer for its corresponding reference tone sample, and both samples are written together into the mixing buffer.

8. The echo cancellation method according to claim 6, wherein said echo cancellation method comprises the steps of:

converting external sound waves received by the recording point recording device into audio sampling point data streams at a fixed rate;

carrying the audio sampling point data stream to a recording buffer area;

when the audio data stream in the recording buffer area is full of one frame, generating interruption;

in the interrupt processing program, copying a new recording audio frame to a mixed buffer area, and searching a corresponding synchronous reference audio frame in a play buffer area to the mixed buffer area;

reading the mixed audio frame, and splitting a path of recording signal and a path of reference signal required by an echo cancellation processing algorithm;

and executing an echo cancellation processing algorithm to obtain a clean recording signal.

9. The echo cancellation method of claim 5, wherein the playback point and the recording point are connected to the buffer via the same bus.

10. The echo cancellation method according to any one of claims 1-9, wherein said echo cancellation processing algorithm employs a speedk algorithm.

Background

In voice communication, echo can interfere with a speaker, and a large echo can seriously affect the call quality and must be eliminated by a method. Echo is the phenomenon that the voice of a speaker sent to other people through a communication device returns to the receiver of the speaker. Echoes are classified into two types, namely "circuit echoes" and "acoustic echoes". The former can be eliminated by the rational design of the hardware devices, and the present invention focuses on the elimination of "acoustic echo". The "acoustic echo" refers to an echo formed by the sound of the far-end user coming out of the receiver, passing through the air or other propagation medium to the microphone of the near-end user, and then being re-transmitted to the receiver of the far-end user after being recorded by the microphone. The echo is particularly obvious when the playback volume of a near-end user is relatively large and the recording device and the playback device are relatively close to each other.

The acoustic echo cancellation Algorithm (AEC) requires the input of two input signals: the collected recording signal containing echo and reference signal played by loudspeaker are based on the reference signal and the relativity of echo caused in the recording signal, and the speech model of far-end signal is established, and the echo is estimated by using the speech model, and the coefficient of filter is continuously modified, so that the estimated value is more approximate to the real echo. Then, the echo estimated value is aligned with the actual echo in the recording signal, and the actual echo is subtracted, so that the purpose of eliminating the echo is achieved. It is then clear that the effect of AEC will be mainly influenced by two factors: the degree of match between the echo estimate and the actual echo, and the accuracy of the estimate of the time delay of the echo in the recorded signal relative to the original reference signal. If the two input signals are not synchronized well, the adaptive filter in the algorithm diverges, which affects the echo cancellation effect. The time delay estimation mechanism can be divided into a real-time platform based on DSP and a non-real-time platform based on linux/windows and the like according to the real-time difference. The former realizes the synchronization between signals directly based on hardware, such as application scenes of mobile phones and the like, and because the integrated DSP module processes echo cancellation, the DSP can directly control the acquisition and playing of ADC/DAC in real time, so the synchronization problem does not exist; however, in the latter case, since the echo cancellation algorithm is operated in the application layer and the recording and playing work in different threads, it is much more difficult to obtain signal synchronization with low error.

The current mainstream delay estimation of a non-real-time platform adopts a self-adaptive delay estimation algorithm based on cross-correlation calculation, and because the estimation accuracy of the algorithm depends on the calculation complexity seriously, only a compromise scheme can be adopted in an embedded system with limited calculation, which causes two defects of the software mode estimation of the embedded system: the calculation overhead is large, and the accuracy is not high, so that the AEC effect is poor.

Disclosure of Invention

In view of the above-mentioned shortcomings, the present invention provides an echo cancellation delay estimation method and an echo cancellation method, which generate a recording signal and a reference signal that are precisely aligned in a Direct Memory Access (DMA) layer, thereby greatly improving the echo cancellation effect and basically occupying no additional CPU resources.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

an echo cancellation delay estimation method, comprising the steps of:

setting a buffer area at a direct memory access layer;

adding sampling points for the audio signal to be played in a buffer area at a certain sampling frequency;

adding sampling points for the recorded audio signals in the buffer area at the same sampling frequency;

and obtaining the echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency.

In accordance with one aspect of the invention, the buffer comprises: a play buffer, a record buffer and a mix buffer.

According to one aspect of the invention, the echo cancellation delay estimation method comprises the following steps:

carrying and converting a certain sampling point in the playing buffer area into an analog electric signal and then driving a loudspeaker at a loudspeaker sound-playing point to sound;

sound is transmitted to a recording point recording device in the air;

the recording device converts the sound into an analog electric signal and converts the analog electric signal into a sampling point data stream;

carrying the sampling points containing the echoes to a recording buffer area;

and calculating the echo time delay by adding the sampling rate of the sampling point and the distance between the sound playing point and the sound recording point.

According to one aspect of the invention, the playback point and the recording point are connected to the buffer via the same bus.

An echo cancellation method, comprising the steps of:

setting a buffer area at a direct memory access layer;

adding reference sound sampling points for audio signals to be played in a buffer area at a certain sampling frequency;

adding recording sampling points for the recorded audio signals in the buffer area at the same sampling frequency;

obtaining echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency;

obtaining a reference sound sampling point matched and corresponding to the recording sampling point in the buffer area according to the echo time delay, thereby obtaining a mixed audio frequency of the recording audio frequency and the corresponding reference sound audio frequency;

splitting a path of recording signal and a path of reference sound signal according to the mixed audio;

and executing an echo cancellation processing algorithm to obtain a clean recording signal.

In accordance with one aspect of the invention, the buffer comprises: a play buffer, a record buffer and a mix buffer.

In accordance with one aspect of the invention, each recording sample received in the recording buffer finds its corresponding reference tone sample from the play buffer, and then writes both samples together into the mixing buffer.

In accordance with one aspect of the invention, the echo cancellation method comprises the steps of:

converting external sound waves received by the recording point recording device into audio sampling point data streams at a fixed rate;

carrying the audio sampling point data stream to a recording buffer area;

when the audio data stream in the recording buffer area is full of one frame, generating interruption;

in the interrupt processing program, copying a new recording audio frame to a mixed buffer area, and searching a corresponding synchronous reference audio frame in a play buffer area to the mixed buffer area;

reading the mixed audio frame, and splitting a path of recording signal and a path of reference signal required by an echo cancellation processing algorithm;

and executing an echo cancellation processing algorithm to obtain a clean recording signal.

According to one aspect of the invention, the playback point and the recording point are connected to the buffer via the same bus.

In accordance with one aspect of the invention, the echo cancellation processing algorithm employs a speedk algorithm.

The implementation of the invention has the advantages that: the echo cancellation method of the invention comprises the following steps:

setting a buffer area at a direct memory access layer; adding reference sound sampling points for audio signals to be played in a buffer area at a certain sampling frequency; adding recording sampling points for the recorded audio signals in the buffer area at the same sampling frequency; obtaining echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency; obtaining a reference sound sampling point matched and corresponding to the recording sampling point in the buffer area according to the echo time delay, thereby obtaining a mixed audio frequency of the recording audio frequency and the corresponding reference sound audio frequency; splitting a path of recording signal and a path of reference sound signal according to the mixed audio; executing an echo cancellation processing algorithm to obtain a clean recording signal; the accuracy of computing the AEC time delay is high, the time delay can be computed accurately for terminal equipment with fixed microphone and loudspeaker positions, and the error is at the level of a sampling point and is far smaller than the software estimation error; less CPU resource is occupied, and the overhead is only one-time recording and extra copying of the corresponding reference sound.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a block diagram of an echo cancellation data flow according to the present invention;

FIG. 2 is a schematic diagram of an echo cancellation delay estimation method according to the present invention;

fig. 3 is a schematic diagram of an echo cancellation method according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

As shown in fig. 1 and fig. 2, an echo cancellation delay estimation method includes the following steps:

step S1: setting a buffer area at a direct memory access layer;

setting a buffer area at a Direct Memory Access (DMA) layer, wherein the buffer area comprises: a play buffer, a record buffer and a mix buffer. In the embodiment, the non-real-time linux-based embedded system comprises an application layer, a DMA controller, an I2S controller, an I2S bus, a CODEC (coder-decoder), a microphone, a loudspeaker and the like. The buffer layer is arranged in the DMA controller, and specifically comprises an AEC mixed buffer area, a DMA recording buffer area and a DMA playing buffer area.

Step S2: adding sampling points for the audio signal to be played in a buffer area at a certain sampling frequency;

step S3: adding sampling points for the recorded audio signals in the buffer area at the same sampling frequency;

step S4: and obtaining the echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency.

The echo cancellation delay estimation method specifically includes the following steps:

1) carrying and converting a certain sampling point in the playing buffer area into an analog electric signal and then driving a loudspeaker at a loudspeaker sound-playing point to sound;

2) sound is transmitted to a recording point recording device in the air;

3) the recording device converts the sound into an analog electric signal and converts the analog electric signal into a sampling point data stream;

4) carrying the sampling points containing the echoes to a recording buffer area;

5) and calculating the echo time delay by adding the sampling rate of the sampling point and the distance between the sound playing point and the sound recording point.

In this embodiment, a parameter, called DMA layer echo delay, is defined, which refers to the time delay between when a sample point is sent from the DMA layer and when the DMA layer receives the echo of the sample point. The specific echo delay acquisition process is as follows:

carrying a certain sampling point in the DMA playing buffer to an I2S sending buffer by the DMA controller;

b. the sampling point is transmitted to a CODEC through an I2S bus, converted into an analog electric signal through a DAC and driven to a loudspeaker to sound;

c. sound propagates in air to the microphone;

d. the microphone converts sound into analog electric signals, an ADC (analog to digital converter) of the CODEC converts the analog electric signals into sampling point data streams, and the sampling point data streams are transmitted to an I2S receiving buffer area through an I2S bus;

the DMA controller carries the echo containing sample point to the DMA record buffer.

The time consumed by the steps a and e is basically negligible, and the time consumed by the steps b, c and d is much and can be calculated; the calculation is as follows, assuming an 8kHZ sampling rate, a microphone and speaker separation of 34CM, the sum of time is 0.125ms +1ms +0.125 ms-1.25 ms. In the practical application of this embodiment, an accurate value of the echo delay of the DMA layer can be obtained through testing, in this embodiment, it is assumed that it is 1.50ms, if we obtain the current buffer working position pointers of the recording and playing two DMA channels in the DMA controller at this time synchronously, the recording position pointer points to the current recording sampling point in step e, and the playing position pointer points to the current playing sampling point, because the recording and playing two channels share one I2S bus, and their transmission sampling points are synchronous, the playing position pointer traces back 1.50ms (i.e., traces back 12 sampling points) forward at this time, and the corresponding sampling point is the original sampling point in step a. In this way, a reference audio frame is generated for the recorded audio frame.

Example two

As shown in fig. 1 and 3, an echo cancellation method includes the steps of:

step S10: setting a buffer area at a direct memory access layer;

setting a buffer area at a Direct Memory Access (DMA) layer, wherein the buffer area comprises: a play buffer, a record buffer and a mix buffer. In the embodiment, the non-real-time linux-based embedded system comprises an application layer, a DMA controller, an I2S controller, an I2S bus, a CODEC (coder-decoder), a microphone, a loudspeaker and the like. The buffer layer is arranged in the DMA controller, and specifically comprises an AEC mixed buffer area, a DMA recording buffer area and a DMA playing buffer area.

Step S20: adding reference sound sampling points for audio signals to be played in a buffer area at a certain sampling frequency;

step S30: adding recording sampling points for the recorded audio signals in the buffer area at the same sampling frequency;

step S40: obtaining echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency;

step S50: obtaining a reference sound sampling point matched and corresponding to the recording sampling point in the buffer area according to the echo time delay, thereby obtaining a mixed audio frequency of the recording audio frequency and the corresponding reference sound audio frequency;

step S60: splitting a path of recording signal and a path of reference sound signal according to the mixed audio;

step S70: and executing an echo cancellation processing algorithm to obtain a clean recording signal.

In this embodiment, an AEC mixing buffer is added to the audio driver set (ALSA) driver compared to the standard ALSA, and each recording sample received in the DMA recording buffer will find its corresponding reference tone sample from the DMA playback buffer, and then write the two samples together into the AEC mixing buffer. For ease of understanding, this embodiment discusses the case of mono recording and playback at a sampling rate of 8 kHZ. The basic data format of the audio frame in the new buffer is: (recording sampling point 1, reference tone sampling point 1, recording sampling point 2, reference tone sampling point 2, ·, recording sampling point n, reference tone sampling point n); and finally, calling and reading the AEC mixed buffer area by the upper layer to obtain a mixed signal, and splitting a path of recording signal and a path of corresponding reference signal from the mixed signal to be used as an AEC algorithm input signal.

The basic flow of processing a new audio frame in the recording process is as follows:

step 1, an ADC (analog to digital converter) of a CODEC converts external sound waves received by a microphone into audio sampling point data streams at a fixed speed;

step 2, the data stream is transmitted to an I2S receiving buffer area through an I2S bus;

step 3, the DMA controller carries the data stream to a DMA recording buffer area;

step 4, generating an interrupt when the audio data stream in the DMA recording buffer area is full of one frame;

step 5, copying a new recording audio frame to an AEC mixed buffer area in the interrupt processing program, and searching a corresponding synchronous reference audio frame to the AEC mixed buffer area in the DMA playing buffer area;

step 6, informing ALSA that a new audio frame is ready;

step 7, the upper layer application reads the mixed audio frame from the ALSA and splits a path of recording signal and a path of reference signal required by the AEC algorithm;

step 8, executing an echo cancellation processing algorithm to obtain a clean recording signal;

and 9, ending.

The key point is how to find a reference sound sampling point corresponding to a recording sampling point in the step 5, and therefore, a parameter called DMA layer echo time delay is defined, which refers to the time delay between the time when a sampling point is sent from the DMA layer and the time when the DMA layer receives the echo of the sampling point. The specific echo delay acquisition process is as follows:

carrying a certain sampling point in the DMA playing buffer to an I2S sending buffer by the DMA controller;

b. the sampling point is transmitted to a CODEC through an I2S bus, converted into an analog electric signal through a DAC and driven to a loudspeaker to sound;

c. sound propagates in air to the microphone;

d. the microphone converts sound into analog electric signals, an ADC (analog to digital converter) of the CODEC converts the analog electric signals into sampling point data streams, and the sampling point data streams are transmitted to an I2S receiving buffer area through an I2S bus;

the DMA controller carries the echo containing sample point to the DMA record buffer.

The time consumed by the steps a and e is basically negligible, and the time consumed by the steps b, c and d is much and can be calculated; the calculation is as follows, assuming an 8kHZ sampling rate, a microphone and speaker separation of 34CM, the sum of time is 0.125ms +1ms +0.125 ms-1.25 ms. In the practical application of this embodiment, an accurate value of the echo delay of the DMA layer can be obtained through testing, in this embodiment, it is assumed that it is 1.50ms, if we obtain the current buffer working position pointers of the recording and playing two DMA channels in the DMA controller at this time synchronously, the recording position pointer points to the current recording sampling point in step e, and the playing position pointer points to the current playing sampling point, because the recording and playing two channels share one I2S bus, and their transmission sampling points are synchronous, the playing position pointer traces back 1.50ms (i.e., traces back 12 sampling points) forward at this time, and the corresponding sampling point is the original sampling point in step a. In this way, a reference audio frame is generated for the recorded audio frame.

Based on a non-real-time linux embedded system, a mode of combining software and hardware is adopted, and a recording signal and a reference signal which are accurately aligned are generated on a DMA layer, so that the echo cancellation effect is greatly improved, and CPU resources are basically not additionally occupied. The verification of the invention uses a speed echo cancellation algorithm, speed does not contain a time delay estimation algorithm, but has high requirement on time delay precision, and the invention has strong complementarity.

The implementation of the invention has the advantages that: the echo cancellation method of the invention comprises the following steps:

setting a buffer area at a direct memory access layer; adding reference sound sampling points for audio signals to be played in a buffer area at a certain sampling frequency; adding recording sampling points for the recorded audio signals in the buffer area at the same sampling frequency; obtaining echo time delay according to the distance between the playback point and the recording point and the audio sampling frequency; obtaining a reference sound sampling point matched and corresponding to the recording sampling point in the buffer area according to the echo time delay, thereby obtaining a mixed audio frequency of the recording audio frequency and the corresponding reference sound audio frequency; splitting a path of recording signal and a path of reference sound signal according to the mixed audio; executing an echo cancellation processing algorithm to obtain a clean recording signal; the accuracy of computing the AEC time delay is high, the time delay can be computed accurately for terminal equipment with fixed microphone and loudspeaker positions, and the error is at the level of a sampling point and is far smaller than the software estimation error; less CPU resource is occupied, and the overhead is only one-time recording and extra copying of the corresponding reference sound.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed herein are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:语音降噪方法、装置及设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!