Method, device, equipment and storage medium for evaluating machine translation result
1. A method for evaluating machine translation results, the method comprising:
obtaining target language linguistic data obtained after a plurality of machine systems respectively translate the same source language linguistic data;
determining actual matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system;
determining a first difficulty weight of each semantic unit in the reference language corpus according to the actual matching score of each machine system;
obtaining a second difficulty weight of each semantic unit in a target language corpus of a target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus, wherein the target machine system is any one of the multiple machine systems;
determining an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
2. The method of claim 1, wherein determining an evaluation score for the translation result of the target machine system based on the first difficulty weight and the second difficulty weight comprises:
determining a precision rate parameter and a recall rate parameter based on the first difficulty weight, an actual match score for the target machine system, and the second difficulty weight;
and determining an evaluation score of the translation result of the target machine system according to the precision rate parameter and the recall rate parameter.
3. The method according to claim 1, wherein said determining a first difficulty weight for each semantic unit in the reference language corpus according to the actual matching score of each machine system comprises:
respectively determining a matching score corresponding to a semantic unit with the highest matching degree with a target semantic unit in the reference language corpus in the target language corpus of each machine system according to the actual matching score of each machine system, wherein the target semantic unit is any one semantic unit in the reference language corpus;
and determining the first difficulty weight according to the matching score corresponding to the semantic unit with the highest matching degree of the target semantic unit in the reference language corpus.
4. The method according to claim 1, wherein obtaining the second difficulty weight of each semantic unit in the target language corpus of the target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus comprises:
if the semantic unit in the target language corpus of the target machine system exists in the reference language corpus, taking the first difficulty weight of the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system;
and if the semantic unit in the target language corpus of the target machine system does not exist in the reference language corpus, taking the first difficulty weight of the semantic unit with the highest matching degree with the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system.
5. The method of claim 2, wherein determining an accuracy rate parameter and a recall rate parameter based on the first difficulty weight, the actual match score for the target machine system, and the second difficulty weight comprises:
determining the highest matching score of each semantic unit in the target language corpus of the target machine system according to the actual matching score of the target machine system;
determining the accuracy rate parameter based on the highest matching score of each semantic unit in the target language corpus of the target machine system, the second difficulty weight of each semantic unit and the length of the target language corpus of the target machine system;
determining the highest matching score of each semantic unit in the reference language corpus according to the actual matching score of the target machine system;
and determining the recall rate parameter based on the highest matching score of each semantic unit in the reference language corpus, the first difficulty weight of each semantic unit and the length of the reference language corpus.
6. The method of claim 2, wherein determining an evaluation score for the translation results of the target machine system based on the precision rate parameter and the recall rate parameter comprises:
determining an evaluation score of a translation result of the target machine system according to a preset super parameter, the precision parameter and the recall rate parameter, wherein the preset super parameter is used for indicating the proportion between the precision parameter and the recall rate parameter.
7. The method according to claim 1, wherein determining the actual matching score between each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system comprises:
inputting the target language corpus and the reference language corpus of each machine system into a pre-trained word vectorization model respectively to obtain a target language corpus vector and a reference language corpus vector of each machine system;
and determining the matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus vector and the reference language corpus vector of each machine system.
8. An apparatus for evaluating results of machine translation, the apparatus comprising:
the system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring target language corpora obtained after a plurality of machine systems respectively translate the same source language corpora;
the first determining module is used for determining the actual matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system;
the second determining module is used for determining the first difficulty weight of each semantic unit in the reference language corpus according to the actual matching score of each machine system;
a judging module, configured to obtain a second difficulty weight of each semantic unit in a target language corpus corresponding to a target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus, where the target machine system is any one of the multiple machine systems;
and the third determination module is used for determining the evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method for evaluating a machine translation result according to any one of claims 1-7.
10. A storage medium having stored thereon a computer program for performing, when executed by a processor, the steps of the method of evaluating a machine translation result according to any of claims 1-7.
Background
Machine translation, also known as automatic translation, is the process of converting one natural language (source language) to another (target language or translation) using a computer. With the rapid development of the economy globalization and the internet, the machine translation plays an increasingly important role in the aspects of economy, cultural exchange and the like, and therefore the machine translation evaluation method has an important research value for the evaluation of the machine translation result.
Currently, each semantic unit (such as a word and a phrase) in a machine translation result (target language) is matched with a reference translation, and the matching result of each semantic unit is directly subjected to score integration to obtain an evaluation score of the machine translation result.
However, in the score integration stage, each semantic unit is assigned with the same evaluation strategy, that is, the prior art does not distinguish the difficulty level of each semantic unit in translation, which reduces the accuracy of evaluating the machine translation result.
Disclosure of Invention
The present application aims to provide a method, an apparatus, a device and a storage medium for evaluating machine translation results, which can improve the accuracy of evaluating the machine translation results.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides a method for evaluating a machine translation result, where the method includes:
obtaining target language linguistic data obtained after a plurality of machine systems respectively translate the same source language linguistic data;
determining actual matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system;
determining a first difficulty weight of each semantic unit in the reference language corpus according to the actual matching score of each machine system;
obtaining a second difficulty weight of each semantic unit in a target language corpus of a target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus, wherein the target machine system is any one of the multiple machine systems;
determining an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
Optionally, the determining an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight comprises:
determining a precision rate parameter and a recall rate parameter based on the first difficulty weight, an actual match score for the target machine system, and the second difficulty weight;
and determining an evaluation score of the translation result of the target machine system according to the precision rate parameter and the recall rate parameter.
Optionally, the determining, according to the actual matching score of each machine system, a first difficulty weight of each semantic unit in the reference language corpus includes:
respectively determining a matching score corresponding to a semantic unit with the highest matching degree with a target semantic unit in the reference language corpus in the target language corpus of each machine system according to the actual matching score of each machine system, wherein the target semantic unit is any one semantic unit in the reference language corpus;
and determining the first difficulty weight according to the matching score corresponding to the semantic unit with the highest matching degree of the target semantic unit in the reference language corpus.
Optionally, the obtaining, according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus, a second difficulty weight of each semantic unit in the target language corpus of the target machine system includes:
if the semantic unit in the target language corpus of the target machine system exists in the reference language corpus, taking the first difficulty weight of the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system;
and if the semantic unit in the target language corpus of the target machine system does not exist in the reference language corpus, taking the first difficulty weight of the semantic unit with the highest matching degree with the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system.
Optionally, the determining an accuracy rate parameter and a recall rate parameter based on the first difficulty weight, the actual match score of the target machine system, and the second difficulty weight comprises:
determining the highest matching score of each semantic unit in the target language corpus of the target machine system according to the actual matching score of the target machine system;
determining the accuracy rate parameter based on the highest matching score of each semantic unit in the target language corpus of the target machine system, the second difficulty weight of each semantic unit and the length of the target language corpus of the target machine system;
determining the highest matching score of each semantic unit in the reference language corpus according to the actual matching score of the target machine system;
and determining the recall rate parameter based on the highest matching score of each semantic unit in the reference language corpus, the first difficulty weight of each semantic unit and the length of the reference language corpus.
Optionally, the determining an evaluation score of the translation result of the target machine system according to the precision rate parameter and the recall rate parameter includes:
determining an evaluation score of a translation result of the target machine system according to a preset super parameter, the precision parameter and the recall rate parameter, wherein the preset super parameter is used for indicating the proportion between the precision parameter and the recall rate parameter.
Optionally, the determining, according to the target language corpus and the reference language corpus of each machine system, an actual matching score of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus includes:
inputting the target language corpus and the reference language corpus of each machine system into a pre-trained word vectorization model respectively to obtain a target language corpus vector and a reference language corpus vector of each machine system;
and determining the matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus vector and the reference language corpus vector of each machine system.
In a second aspect, an embodiment of the present application further provides an apparatus for evaluating a machine translation result, where the apparatus includes:
the system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring target language corpora obtained after a plurality of machine systems respectively translate the same source language corpora;
the first determining module is used for determining the actual matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system;
the second determining module is used for determining the first difficulty weight of each semantic unit in the reference language corpus according to the actual matching score of each machine system;
a judging module, configured to obtain a second difficulty weight of each semantic unit in a target language corpus corresponding to a target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus, where the target machine system is any one of the multiple machine systems;
and the third determination module is used for determining the evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
Optionally, the third determining module is specifically configured to determine an accuracy rate parameter and a recall rate parameter based on the first difficulty weight, the actual matching score of the target machine system, and the second difficulty weight; and determining an evaluation score of the translation result of the target machine system according to the precision rate parameter and the recall rate parameter.
Optionally, the second determining module is specifically configured to determine, according to the actual matching score of each machine system, a matching score corresponding to a semantic unit with a highest matching degree with a target semantic unit in the reference language corpus in the target language corpus of each machine system, where the target semantic unit is any one semantic unit in the reference language corpus; and determining the first difficulty weight according to the matching score corresponding to the semantic unit with the highest matching degree of the target semantic unit in the reference language corpus.
Optionally, the determining module is specifically configured to, if a semantic unit in a target language corpus of the target machine system exists in the reference language corpus, take a first difficulty weight of the semantic unit in the reference language corpus as a second difficulty weight of the semantic unit in the target language corpus of the target machine system; and if the semantic unit in the target language corpus of the target machine system does not exist in the reference language corpus, taking the first difficulty weight of the semantic unit with the highest matching degree with the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system.
Optionally, the third determining module is further specifically configured to determine, according to the actual matching score of the target machine system, a highest matching score to which each semantic unit in a target language corpus of the target machine system belongs; determining the accuracy rate parameter based on the highest matching score of each semantic unit in the target language corpus of the target machine system, the second difficulty weight of each semantic unit and the length of the target language corpus of the target machine system; determining the highest matching score of each semantic unit in the reference language corpus according to the actual matching score of the target machine system; and determining the recall rate parameter based on the highest matching score of each semantic unit in the reference language corpus, the first difficulty weight of each semantic unit and the length of the reference language corpus.
Optionally, the third determining module is further specifically configured to determine an evaluation score of the translation result of the target machine system according to a preset hyper-parameter, the precision parameter, and the recall rate parameter, where the preset hyper-parameter is used to indicate a specific gravity between the precision parameter and the recall rate parameter.
Optionally, the first determining module is further specifically configured to input the target language corpus and the reference language corpus of each machine system into a pre-trained word vectorization model respectively, so as to obtain a target language corpus vector and a reference language corpus vector of each machine system; and determining the matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus vector and the reference language corpus vector of each machine system.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to perform the steps of the method for evaluating the machine translation result according to the first aspect.
In a fourth aspect, the present application provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the method for evaluating a machine translation result according to the first aspect are performed.
The beneficial effect of this application is:
the embodiment of the application provides a method, a device, equipment and a storage medium for evaluating a machine translation result, wherein the method comprises the following steps: obtaining target language linguistic data obtained after a plurality of machine systems respectively translate the same source language linguistic data; determining actual matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system; determining a first difficulty weight of each semantic unit in the reference language corpus according to the actual matching score of each machine system; obtaining a second difficulty weight of each semantic unit in the target language corpus of the target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus, wherein the target machine system is any one of a plurality of machine systems; and determining an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
By adopting the evaluation method of the machine translation result provided by the embodiment of the application, after the actual matching score of each machine system is obtained, the difficulty level of each semantic unit can be analyzed, the first difficulty weight of each semantic unit in the reference language corpus and the second difficulty weight of each semantic unit in the target language corpus of the target machine system are determined, namely the difficulty level of each semantic unit in translation is distinguished, and the evaluation score of the translation result of the target machine system is determined by introducing the concept of the difficulty weight in the score integration stage, so that the accuracy of evaluation of the machine translation result can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of a method for evaluating a machine translation result according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of another method for evaluating a machine translation result according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another method for evaluating a machine translation result according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for evaluating a machine translation result according to an embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating another method for evaluating machine translation results according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of another method for evaluating a machine translation result according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an apparatus for evaluating a machine translation result according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Before explaining the embodiments of the present application in detail, an application scenario of the present application will be described first. The application scenario may specifically be a scenario in which automatic evaluation is performed on each machine translation result (translated text) in a machine competition, or another scenario in which automatic evaluation is performed on a machine translation result. The inventor finds that with the development of machine translation technology, although the accuracy of a translated text is remarkably improved by the application of a self-attention mechanism and a deep neural model, the translation styles of all machine systems are extremely close, so that differences of the translated text translated respectively by the machine systems in the same source language can not be distinguished, and further an evaluation score obtained by a traditional evaluation mode can not be accurately matched with the performance of each machine system, namely the evaluation score can not accurately represent the actual performance of each machine system in translation.
The method of the following embodiment can be used for analyzing the relationship between a translated text and a reference translated text translated by a plurality of machine systems based on the same source language corpus to obtain the difficulty weight for representing the difficulty degree of each semantic unit, that is, it can be known through the difficulty weight which semantic units are easy to translate and which semantic units are not easy to translate, that is, the technical scheme of the application has the effect of difficulty perception, wherein the reference translated text can be used for indicating a correct translation result after the source language corpus is translated, and it needs to be explained that the source language corpus and the specific content in the reference translated text are not limited in the application.
Based on the difficulty degree of each semantic unit, the translation result of any target machine system in each machine system can be evaluated, so that the accuracy of evaluating the translation result of the target machine can be improved, and the finally obtained evaluation score can reflect the actual expression of the target machine system in translation more accurately.
The evaluation method of the machine translation result mentioned in the present application is exemplified below with reference to the drawings. Fig. 1 is a schematic flowchart of a method for evaluating a machine translation result according to an embodiment of the present disclosure.
As shown in fig. 1, the method may include:
s101, obtaining target language linguistic data obtained after a plurality of machine systems respectively translate the same source language linguistic data.
The content in the source language corpus is the content to be translated, the source language corpus can comprise a plurality of sentences to be translated, and the minimum unit of each sentence to be translated can be a word or a phrase. Each machine system can translate a corpus of a certain source language into a corpus of a certain specified target language, for example, the corpus of the source language is chinese and the corpus of the target language is english, that is, each machine system translates the source language with the corpus of chinese into the target language with the corpus of english. It should be noted that the present application does not limit the specific content of the source language corpus.
For example, the source language corpus may be input into a source language translation model corresponding to each machine system, and each source language translation model outputs a target language corpus, where each source language translation model may adopt a deep network structure.
After the target language corpora are output by the source language translation models, the target language corpora can be stored in association with the corresponding machine system and stored in the database in a key-value pair storage mode. When the translation result of the machine system needs to be automatically evaluated, the target language corpora corresponding to the multiple machine systems can be obtained from the database, wherein the multiple machine systems can include the machine system needing to be evaluated and other machine systems.
S102, determining actual matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus and the reference language corpus of each machine system.
Optionally, the reference language corpus may be content formulated by an expert, or may be a target language corpus obtained after translation by a machine system whose confidence level meets a preset requirement, and it needs to be explained that the present application does not limit the manner of obtaining the reference language corpus.
The target language corpus of each machine system can be divided into a plurality of semantic units by utilizing a word segmentation tool, the minimum unit of each semantic unit can be a word or a word, and the reference language can be divided into a plurality of semantic units in the same way without being limited by the application.
Optionally, the actual matching score of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus may be determined by cosine similarity, euclidean Distance, information entropy difference, or WMD (Word move's Distance) Distance. For example, assume that there are 3 machine systems, each machine system (e.g., machine system 1, machine system 2, and machine system 3) may include J semantic units in the target language corpus, and the reference language corpus may include I semantic units, and the target language corpus corresponding to one machine system (e.g., machine system 1) is taken as an example for explanation, and the other machine systems are similar. The above mentioned manner (e.g. cosine similarity) may be adopted to determine the actual matching scores of each semantic unit (e.g. semantic unit J) in the target language corpus of the machine system 1 and each semantic unit (e.g. semantic unit 1, semantic unit 2 … semantic unit I) in the reference language corpus, and finally, the actual matching scores corresponding to each machine system may be stored in the database in the form of a matrix, that is, each machine system corresponds to one actual matching score matrix.
It should be noted that the number of semantic units included in the target language corpus of each machine system may be the same (for example, J), or may be different, and the present application does not limit the number.
S103, determining first difficulty weights of semantic units in the reference language corpus according to the actual matching scores of all machine systems.
The reference language corpus is used as a dimension for explanation, a target language corpus of each machine system and an actual matching score matrix of the reference language corpus can be obtained from a database, and rows in each actual matching score matrix are spoken by each semantic unit in the target language corpus as the dimension and represent actual matching scores of a certain semantic unit in the target language corpus and each semantic unit in the reference language corpus respectively; the columns in each actual matching score matrix are spoken with each semantic unit in the reference language corpus as a dimension, and represent actual matching scores of a certain semantic unit in the reference language corpus and a certain semantic unit in the target language corpus respectively.
Continuing with the above example, each semantic unit (e.g. semantic unit 1, semantic unit 2 … semantic unit I) in the reference language corpus may be determined by describing the first difficulty weight for determining semantic unit 1, and the other semantic units are similar, obtaining the highest actual matching score in the first column from the actual matching score matrix of machine system 1, the actual matching score matrix of machine system 2, and the actual matching score matrix of machine system 3, and determining the first difficulty weight for semantic unit 1 in the reference language corpus according to the 3 highest actual matching scores, wherein the first difficulty weight may be used to indicate the difficulty degree of semantic unit 1 being translated, and the larger the first difficulty weight is, the more difficult semantic unit 1 is translated; the smaller this first difficulty weight, the more easily semantic unit 1 is translated.
S104, obtaining a second difficulty weight of each semantic unit in the target language corpus of the target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus.
The target machine system is any one of the plurality of machine systems, that is, the second weight of each semantic unit in the target language corpus of each machine system can be determined through the step. Whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus can be directly embodied according to the actual matching score of the target machine system, namely, whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus can be judged according to the actual matching score matrix of the target machine system.
It can be understood that the larger the actual matching score existing in each row in the actual matching score matrix is, the higher the probability that the semantic unit corresponding to the target machine system corresponding to the row exists in the reference language corpus is, or in other words, whether each semantic unit in the target language corpus exists in the reference language corpus or not, the second difficulty weight of each semantic unit in the target language corpus corresponding to the target machine system can be directly obtained according to the actual matching score matrix of the target machine system. Specifically, the maximum actual matching parameter in each row in the actual matching score matrix of the target machine system is determined, and the first weight of the semantic unit in the reference language corpus represented by the column where the maximum actual matching parameter is located may be directly used as the second weight of the semantic unit in the target language corpus of the target machine system represented by the row.
And S105, determining an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
The method comprises the steps of obtaining a target language corpus, obtaining a first difficulty weight of each semantic unit in the target language corpus, obtaining a second difficulty weight of each semantic unit in the target language corpus, obtaining a recall rate parameter according to the first difficulty weight of each semantic unit in the reference language corpus, obtaining an accuracy rate parameter according to the second difficulty weight of each semantic unit in the target language corpus, and finally obtaining an evaluation score of a translation result of the target machine system by combining the recall rate parameter and the accuracy rate parameter.
Continuing with the above example, by determining the evaluation scores of the translation results of the machine systems 1, 2 and 3 respectively through this step (score integration stage), the machine systems 1, 2 and 3 can be sorted in the order of the evaluation scores from large to small, so that the machine system ranked first can be the machine system with the best translation performance.
In summary, according to the evaluation method for the machine translation result provided by the application, after the actual matching score of each machine system is obtained, the difficulty level of each semantic unit can be analyzed, the first difficulty weight of each semantic unit in the reference language corpus and the second difficulty weight of each semantic unit in the target language corpus of the target machine system are determined, that is, the difficulty level of each semantic unit in translation is distinguished, and in the score integration stage, the evaluation score of the translation result of the target machine system is determined by introducing the concept of the difficulty weight, so that the accuracy of evaluation of the machine translation result can be improved.
Fig. 2 is a schematic flowchart of another method for evaluating a machine translation result according to an embodiment of the present disclosure. Optionally, as shown in fig. 2, the determining an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight includes:
s201, determining an accuracy rate parameter and a recall rate parameter based on the first difficulty weight, the actual matching score of the target machine system and the second difficulty weight.
S202, determining an evaluation score of the translation result of the target machine system according to the accuracy rate parameter and the recall rate parameter.
The precision rate parameter is used for indicating how many correct semantic units exist in a target language corpus of a target machine system, that is, the precision rate parameter is explained by taking the target language corpus as a dimension; the recall rate parameter is used to indicate how many semantic units in the reference language corpus are translated, that is, the recall rate parameter is explained by using the reference language corpus as a dimension, when the precision rate parameter is calculated, the second difficulty weight of each semantic unit in the target language corpus of the target machine system and the actual matching score matrix of the target machine system are introduced, and when the recall rate parameter is calculated, the first difficulty weight of each semantic unit in the reference language corpus and the actual matching score matrix of the target machine system are introduced.
After the precision parameter and the recall parameter are obtained, the precision parameter and the recall parameter can be brought into an evaluation score calculation formula, and the result of the evaluation score calculation formula is used as the evaluation score of the translation result of the target machine system.
It can be seen that in the process of determining the evaluation score of the translation result of the target machine system, the difficulty weight of the semantic unit is introduced, that is, a lower score is given to the semantic unit with a lower difficulty weight, and a higher score is given to the semantic unit with a higher difficulty weight, so that the correlation coefficient between the automatic scoring result and the evaluation result in a manual mode can be improved, and the expensive cost required by the manual mode is avoided.
Fig. 3 is a flowchart illustrating a further method for evaluating a machine translation result according to an embodiment of the present application. Optionally, as shown in fig. 3, the determining a first difficulty weight of each semantic unit in the corpus of the reference language according to the actual matching score of each machine system includes:
s301, according to the actual matching scores of all the machine systems, respectively determining the matching scores corresponding to the semantic units with the highest matching degree with the target semantic units in the reference language corpus in the target language corpus of all the machine systems.
S302, determining the first difficulty weight according to the matching score corresponding to the semantic unit with the highest matching degree of the target semantic unit in the reference language corpus.
Wherein, the target semantic unit is any one semantic unit in the reference language corpus, and the reference language corpus is assumed to be t ═ t (t)1,…,ti,…,tI) Then the target semantic unit may be (t)1,…,ti,…,tI) Any of (a), available (h)1,…,hK) Representing target language corpora corresponding to K machine systems respectively, wherein the actual matching score of each machine system can be represented by sim (t, h), determining the target column number corresponding to the target semantic unit in the reference language corpora by using the actual matching score matrix corresponding to each machine system, extracting the maximum actual matching score from the target column number of the actual matching score matrix corresponding to each machine system, wherein the larger the matching score is, the higher the matching degree is, and the whole process can be realizedAnd (4) performing representation.
The process of determining the first difficulty weight d (t) corresponding to the target semantic unit in the reference language corpus can be described by the following formula:
the average matching score corresponding to the target semantic unit in the reference language corpus can be determined according to the actual matching score matrix of the K machine systems, the degree of translation of the target semantic unit can be represented by the average matching score, the larger the average matching score is, the easier the target semantic unit is translated is proved, and the value of the average matching score is in the [0-1] interval. Subtracting the average matching score from a numerical value 1 to directly reflect the difficulty of the target semantic unit being translated, wherein the larger the numerical value d (t) is, the more difficult the target semantic unit is to be translated is proved; the smaller the value of d (t), the easier it proves that the target semantic unit is translated.
Finally, each semantic unit (t) in the reference language corpus can be obtained1,…,ti,…,tI) The first difficulty weight d (t) may also be stored in a matrix form in association with each semantic unit in the reference language corpus.
Fig. 4 is a flowchart illustrating a method for evaluating a machine translation result according to another embodiment of the present application. Optionally, as shown in fig. 4, the obtaining a second difficulty weight of each semantic unit in the target language corpus of the target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus includes:
s401, if the semantic unit in the target language corpus of the target machine system exists in the reference language corpus, taking the first difficulty weight of the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system.
S402, if the semantic unit in the target language corpus of the target machine system does not exist in the reference language corpus, taking the first difficulty weight of the semantic unit with the highest matching degree with the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system.
The second difficulty weight of each semantic unit in the target language corpus can be determined according to the relationship between the semantic unit in the target language corpus of the target machine system and the reference language corpus, that is, the second difficulty weight of the target semantic unit in the target language corpus can be determined by using the first difficulty weight expected by each language in the reference language corpus, and the target semantic unit is any one semantic unit in the target language corpus.
The second difficulty weight for the target semantic unit in the target language corpus of the target machine system may be determined by the following formula:
wherein (ifh e.t) indicates that the target semantic unit h exists in the reference language prediction t, indicating that the target semantic unit h is not present in the reference language prediction t. For example, assuming that the target semantic unit h is specifically english the, if the semantic unit of english the exists in the reference language corpus, the first difficulty weight d (t) corresponding to the semantic unit of english the in the reference language corpus t may be the second difficulty weight of the target semantic unit h of english the in the target language corpus of the target machine system; if the semantic unit of English the does not exist in the reference language corpus, the maximum matching score contained in the row corresponding to the target semantic unit h can be searched from the matching score matrix of the target machine system, and then the column where the maximum matching score is located can be determined, and the process can be carried out through (max)t∈tsim (t, h)) as the second difficulty weight of the target semantic unit h, the first difficulty weight d (t) of the semantic unit in the reference language expectation corresponding to the column is described.
Fig. 5 is a flowchart illustrating another method for evaluating a machine translation result according to an embodiment of the present application. Optionally, as shown in fig. 5, the determining the accuracy rate parameter and the recall rate parameter based on the first difficulty weight, the actual matching score of the target machine system, and the second difficulty weight includes:
s501, determining the highest matching score of each semantic unit in the target language corpus of the target machine system according to the actual matching score of the target machine system.
S502, determining the accuracy rate parameter based on the highest matching score of each semantic unit in the target language corpus of the target machine system, the second difficulty weight of each semantic unit and the length of the target language corpus of the target machine system.
Wherein, the maximum matching score contained in each row (each semantic unit in the target language corpus) can be determined according to the actual matching score matrix of the target machine systemMultiplying and summing the maximum matching score corresponding to each row and the second difficulty weight corresponding to each row, dividing the summation result by the length | h | of the target language corpus of the target machine system, and taking the divided result as a specific numerical value (P) of the precision rate parameterDA)。
The specific value (P) of the accuracy parameter can be determined by the following formulaDA) The solution process of (a) represents:
if the target language corpus of the target machine system contains 10 semantic units, the length | h | of the target language corpus of the target machine system is 10, that is, the specific value of | h | is related to the number of semantic units in the target language corpus.
S503, determining the highest matching score of each semantic unit in the reference language corpus according to the actual matching score of the target machine system.
S504, determining the recall rate parameter based on the highest matching score of each semantic unit in the reference language corpus, the first difficulty weight of each semantic unit and the length of the reference language corpus.
Wherein, by continuing to use the above-mentioned actual matching score matrix of the target machine system, the maximum matching score contained in each column (each semantic unit of the reference language corpus) can be determinedMultiplying and summing the maximum matching score corresponding to each column with the first weight corresponding to each column, dividing the summation result by the length | t | of the reference language corpus, and taking the divided result as the specific value (R) of the recall rate parameterDA)。
The specific value of the recall parameter (R) may be determined by the following formulaDA) The solution process of (a) represents:
wherein, if the reference language corpus contains I semantic units (t)1,…,ti,…,tI) Then the length | h | of the reference language corpus is equal to I, i.e., the specific value of | h | is related to the number of semantic units in the reference language corpus.
It can be seen that, when determining the precision rate parameter, the second weight of each semantic unit in the target language corpus of the target machine is introduced, and when determining the recall rate parameter, the first weight of each semantic unit in the reference language corpus is introduced, that is, the difficult weight is applied on the basis of the obtained maximum matching score of the semantic unit, so that the attention degree of the semantic unit in the evaluation process can be distinguished, and the evaluation score obtained by using the precision rate parameter and the recall rate parameter can reflect the translation performance of the target machine system more accurately.
Optionally, the determining an evaluation score of the translation result of the target machine system according to the precision parameter and the recall parameter includes: and determining an evaluation score of the translation result of the target machine system according to a preset super parameter, the accuracy parameter and the recall rate parameter, wherein the preset super parameter is used for indicating the proportion between the accuracy parameter and the recall rate parameter.
Specifically, the evaluation score (F) of the translation result of the target machine system can be determined by the following formulaDA):
Wherein β represents the above-mentioned preset hyper-parameter, and the preset hyper-parameter can be used to indicate the specific gravity between the precision parameter and the recall rate parameter, in general, β takes a value of 1, which indicates that the precision parameter and the recall rate parameter have the same weight in the process of determining the evaluation score of the translation result of the target machine system; when β is greater than 1, it means that when determining the evaluation score of the translation result of the target machine system, the recall rate parameter is more likely to be considered, that is, the evaluation score of the translation result of the target machine system is determined from the dimension of how many semantic units in the reference language corpus are translated; when β is smaller than 1 and larger than 0, it means that when determining the evaluation score of the translation result of the target machine system, it is more preferable to consider the accuracy parameter, that is, it is more preferable to determine the evaluation score of the translation result of the target machine system from the dimension of how many correct semantic units exist in the target language corpus of the target machine system, and it should be noted that the application does not limit the specific value of β.
It can be seen that by adjusting the preset hyper-parameter β, the participation degree of the precision rate parameter and the recall rate parameter in the score integration stage can be adjusted, so that the evaluation scores of the translation results of the target machine systems with different dimensions can be conveniently obtained.
Fig. 6 is a flowchart illustrating a further method for evaluating a machine translation result according to an embodiment of the present application. Optionally, as shown in fig. 6, the determining, according to the target language corpus and the reference language corpus of each machine system, an actual matching score between each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus includes:
s601, respectively inputting the target language corpus and the reference language corpus of each machine system into a pre-trained word vectorization model to obtain a target language corpus vector and a reference language corpus vector of each machine system.
The target language corpus and the reference language corpus of each machine system may be segmented in advance by using a jieba (jieba) segmentation tool or other segmentation tools, and the target language corpus and the reference language corpus are requested to be deleted respectively, so as to obtain a semantic unit in each target language corpus and a semantic unit in the reference language corpus.
And respectively inputting the semantic units in each target language corpus and the semantic units in the reference language corpus into a pre-trained word vectorization model, wherein the pre-trained word vectorization model can encode the semantic units into vector representations in a semantic space, namely, the pre-trained word vectorization model can respectively output target language corpus vectors and reference language corpus vectors, and each target language corpus vector and each reference language corpus vector are composed of a plurality of semantic unit vectors.
S602, determining matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus vector of each machine system and the reference language corpus vector.
The matching scores sim (t, h) of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus can be determined by cosine similarity, euclidean distance, information entropy difference or WMD (Word move) distance, and the like, which is not limited in the present application, and here, a cosine similarity calculation mode is taken as an example for explanation, and the cosine similarity calculation mode is as follows:
wherein, o (t) represents a certain semantic unit vector in the corpus vector of the reference language, and o (h) represents a certain semantic unit vector in the corpus vector of the target language of the target machine system.
The matching scores of each semantic unit in the target language corpus of the target machine system and each semantic unit in the reference language corpus can be expressed in the form of a matching score matrix, and the matching score in the J-th row and the I-th column in the matching score matrix represents the result obtained by calculating the J-th semantic unit vector in the target language corpus vector of the target machine system and the I-th semantic unit vector in the reference language corpus according to the formula. And obtaining the matching score of each machine system according to the matching score of each semantic unit in the target language corpus of the target machine system and each semantic unit in the reference language corpus.
Fig. 7 is a schematic structural diagram of an apparatus for evaluating a machine translation result according to an embodiment of the present application. As shown in fig. 7, the apparatus includes:
an obtaining module 701, configured to obtain target language corpora obtained by translating a same source language corpus by multiple machine systems respectively;
a first determining module 702, configured to determine, according to the target language corpus and the reference language corpus of each machine system, an actual matching score between each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus;
a second determining module 703, configured to determine, according to the actual matching score of each machine system, a first difficulty weight of each semantic unit in the reference language corpus;
a judging module 704, configured to obtain a second difficulty weight of each semantic unit in the target language corpus of the target machine system according to whether each semantic unit in the target language corpus of the target machine system exists in the reference language corpus;
a third determining module 705, configured to determine an evaluation score of the translation result of the target machine system according to the first difficulty weight and the second difficulty weight.
Optionally, the third determining module 705 is specifically configured to determine the accuracy rate parameter and the recall rate parameter based on the first difficulty weight, the actual matching score of the target machine system, and the second difficulty weight; and determining the evaluation score of the translation result of the target machine system according to the accuracy rate parameter and the recall rate parameter.
Optionally, the second determining module 703 is specifically configured to determine, according to the actual matching score of each machine system, a matching score corresponding to a semantic unit with the highest matching degree with a target semantic unit in the reference language corpus in the target language corpus of each machine system, where the target semantic unit is any one semantic unit in the reference language corpus; and determining a first difficulty weight according to the matching score corresponding to the semantic unit with the highest matching degree of the target semantic unit in the reference language corpus.
Optionally, the determining module 704 is specifically configured to, if a semantic unit in the target language corpus of the target machine system exists in the reference language corpus, take a first difficulty weight of the semantic unit in the reference language corpus as a second difficulty weight of the semantic unit in the target language corpus of the target machine system; and if the semantic unit in the target language corpus of the target machine system does not exist in the reference language corpus, taking the first difficulty weight of the semantic unit with the highest matching degree with the semantic unit in the reference language corpus as the second difficulty weight of the semantic unit in the target language corpus of the target machine system.
Optionally, the third determining module 705 is further specifically configured to determine, according to the actual matching score of the target machine system, a highest matching score to which each semantic unit in the target language corpus of the target machine system belongs; determining an accuracy rate parameter based on the highest matching score of each semantic unit in the target language corpus of the target machine system, the second difficulty weight of each semantic unit and the length of the target language corpus of the target machine system; determining the highest matching score of each semantic unit in the reference language corpus according to the actual matching score of the target machine system; and determining a recall rate parameter based on the highest matching score of each semantic unit in the reference language corpus, the first difficulty weight of each semantic unit and the length of the reference language corpus.
Optionally, the third determining module 705 is further specifically configured to determine an evaluation score of the translation result of the target machine system according to a preset hyper-parameter, an accuracy parameter, and a recall parameter, where the preset hyper-parameter is used to indicate a specific gravity between the accuracy parameter and the recall parameter.
Optionally, the first determining module 702 is further specifically configured to input the target language corpus and the reference language corpus of each machine system into a pre-trained word vectorization model respectively, so as to obtain a target language corpus vector and a reference language corpus vector of each machine system; and determining the matching scores of each semantic unit in the target language corpus of each machine system and each semantic unit in the reference language corpus according to the target language corpus vector and the reference language corpus vector of each machine system.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors, or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 8, the electronic device may include: a processor 801, a storage medium 802 and a bus 803, the storage medium 802 storing machine-readable instructions executable by the processor 801, the processor 801 communicating with the storage medium 802 via the bus 803 when the electronic device is operated, the processor 801 executing the machine-readable instructions to perform the steps of the above-described method embodiments. The specific implementation and technical effects are similar, and are not described herein again.
Optionally, the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the steps of the above method embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. Alternatively, the indirect coupling or communication connection of devices or units may be electrical, mechanical or other.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:翻译模型的训练与翻译方法、装置