Audio encoding method, apparatus, device, and computer-readable storage medium
1. An audio encoding method applied to a first terminal, the audio encoding method comprising:
sending audio information to be coded of a first terminal to a second terminal;
acquiring voice damage information of the audio information sent by the second terminal;
determining a mean opinion MOS value of the audio information according to the voice damage information;
and coding the audio information according to the coding rate corresponding to the MOS value.
2. The audio encoding method of claim 1, wherein the step of encoding the audio information according to the encoding rate corresponding to the MOS value comprises:
if the MOS value is larger than or equal to a preset threshold value, determining the coding rate according to the current network state;
and coding the audio information according to the determined coding rate.
3. The audio encoding method of claim 1, wherein the step of encoding the audio information according to the encoding rate corresponding to the MOS value comprises:
if the MOS value is smaller than a preset threshold value, determining the coding rate according to a preset coding index, wherein the coding index comprises the MOS value and the coding rate corresponding to the MOS value;
and coding the audio information according to the determined coding rate.
4. The audio encoding method of claim 3, wherein the determining the encoding rate according to the preset encoding index comprises:
determining a first difference value between a preset threshold value and the MOS value of the audio information;
acquiring a second difference value between a preset initial coding rate and a coding rate corresponding to a preset index value in the coding index;
acquiring a third difference value between a preset initial coding code rate and a coding code rate corresponding to an index value next to the preset index value;
if the second difference is smaller than the first difference and the third difference is larger than the first difference, obtaining the coding rate corresponding to an index value next to the preset index value in the coding index;
if the second difference is greater than or equal to the first difference, or the third difference is less than or equal to the first difference, adding 1 to the preset index value, and returning to the step of obtaining the second difference between the preset initial coding rate and the coding rate corresponding to the preset index value in the coding index until the sum of the preset index value plus one reaches the maximum index value of the coding index.
5. The audio encoding method of claim 4, wherein the determining the encoding rate according to the preset encoding index further comprises:
and if the sum of the preset index value plus one reaches the maximum index value of the coding index, determining the coding rate corresponding to the maximum index value as the coding rate of the audio information.
6. The audio encoding method of claim 1, wherein the step of determining the mean opinion MOS value of the audio information according to the speech impairment information comprises:
and inputting the voice damage information into a preset model to determine a mean opinion value of the audio information, wherein the voice damage information comprises at least one of a basic signal-to-noise ratio, synchronous transmission damage information, delay damage information and equipment damage information.
7. The audio encoding method of claim 1, wherein the step of obtaining speech impairment information for the audio information transmitted by the second terminal comprises:
acquiring a real-time transport control protocol (RTCP) packet sent by the second terminal;
and determining the voice damage information of the audio information according to the RTCP packet.
8. An audio encoding apparatus, characterized in that the audio encoding apparatus comprises:
the transmitting module is used for transmitting the audio information to be coded of the first terminal to the second terminal;
the acquisition module is used for acquiring voice damage information of the audio information sent by the second terminal;
the computing module is used for determining a mean opinion MOS value of the audio information according to the voice damage information;
and the coding module is used for coding the audio information according to the coding rate corresponding to the MOS value.
9. An audio encoding device, characterized in that the audio encoding device comprises a memory, a processor and an audio encoding program stored in the memory and executable on the processor, the audio encoding program, when executed by the processor, implementing the steps of the audio encoding method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores an audio encoding program which, when executed by a processor, implements the steps of the audio encoding method according to any one of claims 1 to 7.
Background
In real-life multimedia communication, network fluctuation inevitably exists due to different network environments, and voice call quality is inevitably affected by the network fluctuation. Network fluctuation affects the transmission of audio and each stage of processing, and problems of voice packet loss, voice packet arrival delay and the like can occur, so that the voice communication quality is low.
Disclosure of Invention
The invention mainly aims to provide an audio coding method, an audio coding device, audio coding equipment and a computer readable storage medium, and aims to solve the problem of low voice call quality.
In order to achieve the above object, the present invention provides an audio encoding method, including:
sending audio information to be coded of a first terminal to a second terminal;
acquiring voice damage information of the audio information sent by the second terminal;
determining a mean opinion MOS value of the audio information according to the voice damage information;
and coding the audio information according to the coding rate corresponding to the MOS value.
In an embodiment, the step of encoding the audio information according to the encoding rate corresponding to the MOS value includes:
if the MOS value is larger than or equal to a preset threshold value, determining the coding rate according to the current network state;
and coding the audio information according to the determined coding rate.
In an embodiment, the step of encoding the audio information according to the encoding rate corresponding to the MOS value includes:
if the MOS value is smaller than a preset threshold value, determining the coding rate according to a preset coding index, wherein the coding index comprises the MOS value and the coding rate corresponding to the MOS value;
and coding the audio information according to the determined coding rate.
In an embodiment, the determining the coding rate according to a preset coding index includes:
determining a first difference value between a preset threshold value and the MOS value of the audio information;
acquiring a second difference value between a preset initial coding rate and a coding rate corresponding to a preset index value in the coding index;
acquiring a third difference value between a preset initial coding code rate and a coding code rate corresponding to an index value next to the preset index value;
if the second difference is smaller than the first difference and the third difference is larger than the first difference, obtaining the coding rate corresponding to an index value next to the preset index value in the coding index;
if the second difference is greater than or equal to the first difference, or the third difference is less than or equal to the first difference, adding 1 to the preset index value, and returning to the step of obtaining the second difference between the preset initial coding rate and the coding rate corresponding to the preset index value in the coding index until the sum of the preset index value plus one reaches the maximum index value of the coding index.
In an embodiment, the step of determining the coding rate according to a preset coding index further includes:
and if the sum of the preset index value plus one reaches the maximum index value of the coding index, determining the coding rate corresponding to the maximum index value as the coding rate of the audio information.
In an embodiment, the step of determining the mean opinion MOS value of the audio information according to the speech impairment information comprises:
and inputting the voice damage information into a preset model to determine a mean opinion value of the audio information, wherein the voice damage information comprises at least one of a basic signal-to-noise ratio, synchronous transmission damage information, delay damage information and equipment damage information.
In an embodiment, the step of obtaining the voice impairment information of the audio information sent by the second terminal includes:
acquiring a real-time transport control protocol (RTCP) packet sent by the second terminal;
and determining the voice damage information of the audio information according to the RTCP packet.
To achieve the above object, the present invention also provides an audio encoding apparatus, including:
the transmitting module is used for transmitting the audio information to be coded of the first terminal to the second terminal;
the acquisition module is used for acquiring voice damage information of the audio information sent by the second terminal;
the computing module is used for determining a mean opinion MOS value of the audio information according to the voice damage information;
and the coding module is used for coding the audio information according to the coding rate corresponding to the MOS value.
To achieve the above object, the present invention also provides an audio encoding apparatus comprising a memory, a processor, and an audio encoding program stored in the memory and executable on the processor, the audio encoding program, when executed by the processor, implementing the steps of the audio encoding method as described above.
To achieve the above object, the present invention also provides a computer-readable storage medium storing an audio encoding program which, when executed by a processor, implements the steps of the audio encoding method as described above.
The invention provides an audio coding method, an audio coding device, audio coding equipment and a computer readable storage medium, wherein audio information to be coded of a first terminal is sent to a second terminal; acquiring voice damage information of audio information sent by a second terminal; determining a mean opinion MOS value of the audio information according to the voice damage information; and coding the audio information according to the coding rate corresponding to the MOS value. The MOS value is determined through the voice damage information, the MOS value can measure the audio quality of a communication system, the coding code rate of the audio information is determined according to the MOS value, and the audio quality of the coded audio information is guaranteed.
Drawings
Fig. 1 is a schematic diagram of a hardware configuration of an audio encoding apparatus according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of an audio encoding method according to the present invention;
FIG. 3 is a detailed flowchart of step S40 of the second embodiment of the audio encoding method according to the present invention;
FIG. 4 is a schematic diagram of a logic structure of an audio encoding apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: sending audio information to be coded of a first terminal to a second terminal; acquiring voice damage information of audio information sent by a second terminal; determining a mean opinion MOS value of the audio information according to the voice damage information; and coding the audio information according to the coding rate corresponding to the MOS value.
The MOS value is determined through the voice damage information, the MOS value can measure the audio quality of a communication system, the coding rate of the audio information is determined according to the MOS value, and the audio quality of the coded audio information is guaranteed.
As an implementation, the audio encoding apparatus may be as shown in fig. 1.
An embodiment of the present invention relates to an audio encoding apparatus, including: a processor 101, e.g. a CPU, a memory 102, a communication bus 103. Wherein a communication bus 103 is used for enabling the connection communication between these components.
The memory 102 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). As shown in fig. 1, a memory 102, which is a kind of computer-readable storage medium, may include therein an audio encoding program; and the processor 101 may be configured to call the audio encoding program stored in the memory 102 and perform the following operations:
sending audio information to be coded of a first terminal to a second terminal;
acquiring voice damage information of the audio information sent by the second terminal;
determining a mean opinion MOS value of the audio information according to the voice damage information;
and coding the audio information according to the coding rate corresponding to the MOS value.
In one embodiment, the processor 101 may be configured to call an audio encoding program stored in the memory 102 and perform the following operations:
if the MOS value is larger than or equal to a preset threshold value, determining the coding rate according to the current network state;
and coding the audio information according to the determined coding rate.
In one embodiment, the processor 101 may be configured to call an audio encoding program stored in the memory 102 and perform the following operations:
if the MOS value is smaller than a preset threshold value, determining the coding rate according to a preset coding index, wherein the coding index comprises the MOS value and the coding rate corresponding to the MOS value;
and coding the audio information according to the determined coding rate.
In one embodiment, the processor 101 may be configured to call an audio encoding program stored in the memory 102 and perform the following operations:
determining a first difference value between a preset threshold value and the MOS value of the audio information;
acquiring a second difference value between a preset initial coding rate and a coding rate corresponding to a preset index value in the coding index;
acquiring a third difference value between a preset initial coding code rate and a coding code rate corresponding to an index value next to the preset index value;
if the second difference is smaller than the first difference and the third difference is larger than the first difference, obtaining the coding rate corresponding to an index value next to the preset index value in the coding index;
if the second difference is greater than or equal to the first difference, or the third difference is less than or equal to the first difference, adding 1 to the preset index value, and returning to the step of obtaining the second difference between the preset initial coding rate and the coding rate corresponding to the preset index value in the coding index until the sum of the preset index value plus one reaches the maximum index value of the coding index.
In one embodiment, the processor 101 may be configured to call an audio encoding program stored in the memory 102 and perform the following operations:
and if the sum of the preset index value plus one reaches the maximum index value of the coding index, determining the coding rate corresponding to the maximum index value as the coding rate of the audio information.
In one embodiment, the processor 101 may be configured to call an audio encoding program stored in the memory 102 and perform the following operations:
and inputting the voice damage information into a preset model to determine a mean opinion value of the audio information, wherein the voice damage information comprises at least one of a basic signal-to-noise ratio, synchronous transmission damage information, delay damage information and equipment damage information.
In one embodiment, the processor 101 may be configured to call an audio encoding program stored in the memory 102 and perform the following operations:
acquiring a real-time transport control protocol (RTCP) packet sent by the second terminal;
and determining the voice damage information of the audio information according to the RTCP packet.
Based on the hardware architecture of the audio encoding device, an embodiment of the audio encoding method of the present invention is proposed.
Referring to fig. 2, fig. 2 is a first embodiment of the audio encoding method of the present invention, which includes the steps of:
step S10, sending the audio information to be encoded of the first terminal to the second terminal.
Specifically, the first terminal sends the audio information to be encoded of the first terminal to the second terminal, and the second terminal determines the voice damage information of the audio information according to the received audio information.
Step S20, acquiring the voice impairment information of the audio information sent by the second terminal.
Specifically, the step of the first terminal acquiring the voice damage information of the audio information sent by the second terminal may be receiving a real-time transport control protocol RTCP packet sent by the second terminal, and determining the voice damage information according to the RTCP packet. The voice impairment information may include basic signal-to-noise ratio, synchronous transmission impairment information, delayed impairment information, or device impairment information, among others. Wherein, the basic signal-to-noise ratio is the ratio of audio information to noise; the synchronous transmission damage information is audio quality damage generated by packet loss and other reasons in the transmission process of the audio information; the delay damage information is audio quality damage of the audio information caused by network delay, and the equipment damage information is audio quality damage of the audio information caused by aging of equipment such as a loudspeaker.
Step S30, determining the mean opinion MOS value of the audio information according to the speech impairment information.
Specifically, the MOS value of the audio information is determined according to the voice damage information, a weight value corresponding to each voice damage information may be determined, and the MOS value of the audio information is determined according to the weight value and the voice damage information.
And determining an MOS value of the audio information according to the voice damage information, and inputting the voice damage information into a preset model to output the MOS value. The second terminal may also input the speech impairment information into a preset Model to determine a mean opinion value of the audio information, and the preset Model may be, for example, an E-Model audio quality assessment Model.
And step S40, encoding the audio information according to the encoding code rate corresponding to the MOS value.
Specifically, the encoding code rate is determined according to the size of the MOS value to encode the audio information, and the encoded audio information is sent to the second end. When the MOS value is greater than or equal to the preset threshold, it indicates that the current audio quality is good, the coding rate is determined according to the current network state, and the audio information is coded according to the determined coding rate. And when the MOS value is smaller than the preset threshold, the current audio quality is poor, and the coding code rate which is larger than that under the condition of good audio quality is selected to code the audio information until the MOS value is larger than or equal to the preset threshold. Wherein, the preset threshold may be 4.0.
In the technical scheme of the embodiment, audio information to be coded of a first terminal is sent to a second terminal; acquiring voice damage information of audio information sent by a second terminal; determining a mean opinion MOS value of the audio information according to the voice damage information; and coding the audio information according to the coding rate corresponding to the MOS value. The MOS value is determined through the voice damage information, the MOS value can measure the audio quality of a communication system, the coding code rate of the audio information is determined according to the MOS value, and the audio quality of the coded audio information is guaranteed.
Referring to fig. 3, fig. 3 is a second embodiment of the audio encoding method of the present invention, and based on the first embodiment, the step S40 includes:
step S41, if the MOS value is smaller than a preset threshold, determining the coding rate according to a preset coding index, wherein the coding index comprises the MOS value and the coding rate corresponding to the MOS value;
and step S42, encoding the audio information according to the determined encoding code rate.
Specifically, when the MOS value is smaller than the preset threshold, the coding rate of the audio information is determined according to the preset coding index, and the audio information is coded according to the determined coding rate. The coding rate of the audio information determined here is greater than the coding rate determined when the audio quality is good.
The preset code index includes an MOS value of the audio information and a code rate corresponding to the MOS value, and may be, for example, as shown in the following table:
determining the coding rate according to the preset coding index, and determining a first difference value between the preset threshold value and the MOS value of the audio information, where the first difference value is shown in the following formula:
ΔMOS=4.0-MOS;
wherein, Δ MOS is the first difference, MOS is the current MOS value of the audio information, and 4.0 is the preset threshold.
And acquiring a second difference value between the preset initial coding rate and the coding rate corresponding to the preset index value in the coding index, wherein the table can be used for obtaining that the requirement that the MOS value is more than 4.0 can be met only when the coding rate is more than or equal to 12.65kbps, namely the index i of the coding rate structure is more than 2, so that the preset index value can be an index value more than or equal to 2. Wherein the second difference value can be represented by the following formula:
ΔMOS1(i)=BIT(i).MOS-BIT(0).MOS;
wherein, the delta MOS1(i) Representing a second difference value, and using BIT (i) MOS as a coding code rate with an index value of i; BIT (0) MOS is the initial coding rate, and the index value corresponding to the initial coding rate is 0, as shown in the above table, the preset initial coding rate may be 6.6 kbps.
Obtaining a third difference value between the preset initial coding rate and the coding rate corresponding to the next index value of the preset index value, wherein the third difference value is shown in the following formula:
ΔMOS2(i+1)=BIT(i+1).MOS-BIT(0).MOS;
wherein, the delta MOS2(i +1) represents a second difference value, BIT (i +1). MOS is the coding rate with index value of i + 1; and BIT (0) MOS is an initial coding rate, and the index value corresponding to the initial coding rate is 0.
If the second difference is smaller than the first difference and the third difference is larger than the first difference, obtaining the coding rate corresponding to the next index value of the preset index value in the coding index, namely when the delta MOS is used1(i)<Delta MOS and delta MOS2(i+1)>And in delta MOS, the BIT (i +1) MOS is used as the coding rate of the audio information.
If the second difference is greater than or equal to the first difference, or the third difference is less than or equal to the first difference, that is, when the delta MOS is used1(i) Δ MOS and Δ MOS2When the (i +1) is less than or equal to delta MOS, adding 1 to the preset index value, and returningAnd executing the step of obtaining a second difference value between the preset initial coding rate and the coding rate corresponding to the preset index value in the coding index until the sum of the preset index value plus one reaches the maximum index value of the coding index. And if the sum of the preset index value plus one reaches the maximum index value of the coding index, determining the coding rate corresponding to the maximum index value as the coding rate of the audio information. As shown in the above table, the maximum index value may be 8.
In the technical scheme of this embodiment, if the MOS value is smaller than the preset threshold, the coding rate is determined according to the preset coding index, and the audio information is coded according to the determined coding rate. And determining the coding rate through the coding index so as to enable the MOS value of the coded audio information to be greater than or equal to a preset threshold value and improve the quality of audio transmission.
Referring to fig. 4, the present invention also provides an audio encoding apparatus including:
a sending module 100, configured to send audio information to be encoded of a first terminal to a second terminal;
an obtaining module 200, configured to obtain voice impairment information of the audio information sent by the second terminal;
a calculating module 300, configured to determine a mean opinion MOS value of the audio information according to the voice damage information;
and the encoding module 400 is configured to encode the audio information according to the encoding rate corresponding to the MOS value.
In an embodiment, in terms of encoding the audio information according to the encoding rate corresponding to the MOS value, the encoding module 400 is specifically configured to:
if the MOS value is larger than or equal to a preset threshold value, determining the coding rate according to the current network state;
and coding the audio information according to the determined coding rate.
In an embodiment, in terms of encoding the audio information according to the encoding rate corresponding to the MOS value, the encoding module 400 is specifically configured to:
if the MOS value is smaller than a preset threshold value, determining the coding rate according to a preset coding index, wherein the coding index comprises the MOS value and the coding rate corresponding to the MOS value;
and coding the audio information according to the determined coding rate.
In an embodiment, in determining the coding rate according to a preset coding index, the coding module 400 is specifically configured to:
determining a first difference value between a preset threshold value and the MOS value of the audio information;
acquiring a second difference value between a preset initial coding rate and a coding rate corresponding to a preset index value in the coding index;
acquiring a third difference value between a preset initial coding code rate and a coding code rate corresponding to an index value next to the preset index value;
if the second difference is smaller than the first difference and the third difference is larger than the first difference, obtaining the coding rate corresponding to an index value next to the preset index value in the coding index;
if the second difference is greater than or equal to the first difference, or the third difference is less than or equal to the first difference, adding 1 to the preset index value, and returning to the step of obtaining the second difference between the preset initial coding rate and the coding rate corresponding to the preset index value in the coding index until the sum of the preset index value plus one reaches the maximum index value of the coding index.
In an embodiment, in determining the coding rate according to a preset coding index, the coding module 400 is specifically configured to:
and if the sum of the preset index value plus one reaches the maximum index value of the coding index, determining the coding rate corresponding to the maximum index value as the coding rate of the audio information.
In an embodiment, in terms of determining the mean opinion MOS value of the audio information according to the voice impairment information, the calculating module 300 is specifically configured to:
and inputting the voice damage information into a preset model to determine a mean opinion value of the audio information, wherein the voice damage information comprises at least one of a basic signal-to-noise ratio, synchronous transmission damage information, delay damage information and equipment damage information.
In an embodiment, in terms of acquiring the voice impairment information of the audio information sent by the second terminal, the acquiring module 200 is specifically configured to:
acquiring a real-time transport control protocol (RTCP) packet sent by the second terminal;
and determining the voice damage information of the audio information according to the RTCP packet.
The present invention also provides an audio encoding device comprising a memory, a processor and an audio encoding program stored in the memory and executable on the processor, the audio encoding program, when executed by the processor, implementing the steps of the audio encoding method as described in the above embodiments.
The present invention also provides a computer-readable storage medium storing an audio encoding program which, when executed by a processor, implements the steps of the audio encoding method as described in the above embodiments.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, system, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, system, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, system, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the system of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a parking management device, an air conditioner, or a network device) to execute the system according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.