Entity recognition model training and entity recognition method and device
1. A training method of an entity recognition model comprises the following steps:
acquiring training data, wherein the training data comprises a plurality of training texts and entity labeling results corresponding to different entity types in the plurality of training texts;
constructing a neural network model comprising a first network layer, a second network layer and a third network layer, wherein the first network layer is used for obtaining a first semantic vector sequence of a training text according to the training text and an industry dictionary corresponding to different entity types, and the second network layer is used for obtaining a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and a target entity type vector;
and training the neural network model by using a plurality of training texts, industry dictionaries corresponding to different entity types, target entity type vectors and entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model.
2. The method of claim 1, wherein the first network layer deriving a first semantic vector sequence of a training text from the training text and an industry dictionary corresponding to different entity types comprises:
taking the training text as input to obtain an initial semantic vector of each semantic unit in the training text;
matching each semantic unit in the industry dictionaries corresponding to different entity types, and obtaining an identification vector of each semantic unit according to a matching result;
splicing the initial semantic vector of each semantic unit with the identification vector, and obtaining a first semantic vector of each semantic unit according to a splicing result;
and obtaining a first semantic vector sequence of the training text according to the first semantic vector of each semantic unit.
3. The method of claim 2, wherein the matching each semantic unit in an industry dictionary corresponding to a different entity type, and the obtaining an identification vector for each semantic unit according to the matching result comprises:
setting the sequence of the industry dictionaries corresponding to different entity types;
for each semantic unit, matching the semantic unit in industry dictionaries corresponding to different entity types in sequence;
and in the case that the word matched with the semantic unit exists in the industry dictionary, setting the vector corresponding to the industry dictionary position in the identification vector to be 1, and otherwise, setting the vector to be 0.
4. The method of claim 1, wherein the obtaining, by the second network layer, the second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and the target entity type vector comprises:
performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence to obtain a first calculation result of each semantic unit;
performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain a second calculation result of each semantic unit;
splicing the first calculation result and the second calculation result of each semantic unit to obtain a second semantic vector of each semantic unit;
and obtaining a second semantic vector sequence of the training text according to the second semantic vector of each semantic unit.
5. The method according to claim 4, wherein the performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain the second calculation result of each semantic unit comprises:
calculating the similarity between the target entity type vector and the first semantic vector of each semantic unit;
and performing attention calculation according to the calculated similarity and the target entity type vector to obtain a second calculation result of each semantic unit.
6. The method of claim 1, wherein the training the neural network model using a plurality of training texts, an industry dictionary corresponding to different entity types, a target entity type vector, and entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model comprises:
aiming at each training text, taking the training text and industry dictionaries corresponding to different entity types as the input of the first network layer to obtain a first semantic vector sequence output by the first network layer;
taking the first semantic vector sequence and a target entity type vector as the input of the second network layer to obtain a second semantic vector sequence output by the second network layer;
taking the second semantic vector sequence as the input of the third network layer, and obtaining an entity recognition result of the training text according to the output result of the third network layer;
and updating parameters of the neural network model according to the entity recognition result of the training text and the entity labeling result of the corresponding target entity type in the training text until the neural network model converges to obtain the entity recognition model.
7. The method of claim 1, wherein the acquiring training data comprises:
acquiring seed word sets corresponding to different entity types;
expanding the seed word set to obtain an industry dictionary corresponding to different entity types;
and aiming at each word in the industry dictionary, obtaining a text containing the word as a training text, and taking the word as an entity labeling result corresponding to the entity type of the industry dictionary where the word is located in the training text.
8. An entity identification method, comprising:
acquiring a text to be identified;
inputting the text to be recognized, industry dictionaries corresponding to different entity types and target entity type vectors into an entity recognition model;
extracting an entity corresponding to the target entity type in the text to be recognized according to the output result of the entity recognition model, and taking the entity as the entity recognition result of the text to be recognized;
wherein the entity recognition model is pre-trained according to the method of any one of claims 1-7.
9. The method of claim 8, wherein the inputting a target entity type vector into an entity recognition model comprises:
obtaining a target entity type;
and inputting an entity type vector corresponding to the target entity type into the entity recognition model as the target entity type vector.
10. An apparatus for training an entity recognition model, comprising:
the first acquisition unit is used for acquiring training data, wherein the training data comprises a plurality of training texts and entity labeling results corresponding to different entity types in the plurality of training texts;
the device comprises a construction unit and a processing unit, wherein the construction unit is used for constructing a neural network model comprising a first network layer, a second network layer and a third network layer, the first network layer is used for obtaining a first semantic vector sequence of a training text according to the training text and an industry dictionary corresponding to different entity types, and the second network layer is used for obtaining a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and a target entity type vector;
and the training unit is used for training the neural network model by using the plurality of training texts, the industry dictionaries corresponding to different entity types, the target entity type vectors and the entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model.
11. The apparatus according to claim 10, wherein the first network layer constructed by the construction unit specifically performs, when obtaining the first semantic vector sequence of the training text from the training text and an industry dictionary corresponding to different entity types:
taking the training text as input to obtain an initial semantic vector of each semantic unit in the training text;
matching each semantic unit in the industry dictionaries corresponding to different entity types, and obtaining an identification vector of each semantic unit according to a matching result;
splicing the initial semantic vector of each semantic unit with the identification vector, and obtaining a first semantic vector of each semantic unit according to a splicing result;
and obtaining a first semantic vector sequence of the training text according to the first semantic vector of each semantic unit.
12. The apparatus according to claim 11, wherein the first network layer constructed by the construction unit specifically performs, when matching each semantic unit in an industry dictionary corresponding to different entity types and obtaining an identification vector of each semantic unit according to a matching result:
setting the sequence of the industry dictionaries corresponding to different entity types;
for each semantic unit, matching the semantic unit in industry dictionaries corresponding to different entity types in sequence;
and in the case that the word matched with the semantic unit exists in the industry dictionary, setting the vector corresponding to the industry dictionary position in the identification vector to be 1, and otherwise, setting the vector to be 0.
13. The apparatus according to claim 10, wherein the second network layer constructed by the construction unit, when obtaining the second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and the target entity type vector, specifically performs:
performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence to obtain a first calculation result of each semantic unit;
performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain a second calculation result of each semantic unit;
splicing the first calculation result and the second calculation result of each semantic unit to obtain a second semantic vector of each semantic unit;
and obtaining a second semantic vector sequence of the training text according to the second semantic vector of each semantic unit.
14. The apparatus according to claim 13, wherein the second network layer constructed by the construction unit, when performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain the second calculation result of each semantic unit, specifically performs:
calculating the similarity between the target entity type vector and the first semantic vector of each semantic unit;
and performing attention calculation according to the calculated similarity and the target entity type vector to obtain a second calculation result of each semantic unit.
15. The method of claim 10, wherein the training unit, when training the neural network model using a plurality of training texts, an industry dictionary corresponding to different entity types, a target entity type vector, and entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model, specifically performs:
aiming at each training text, taking the training text and industry dictionaries corresponding to different entity types as the input of the first network layer to obtain a first semantic vector sequence output by the first network layer;
taking the first semantic vector sequence and a target entity type vector as the input of the second network layer to obtain a second semantic vector sequence output by the second network layer;
taking the second semantic vector sequence as the input of the third network layer, and obtaining an entity recognition result of the training text according to the output result of the third network layer;
and updating parameters of the neural network model according to the entity recognition result of the training text and the entity labeling result of the corresponding target entity type in the training text until the neural network model converges to obtain the entity recognition model.
16. The apparatus according to claim 10, wherein the first acquiring unit, when acquiring the training data, specifically performs:
acquiring seed word sets corresponding to different entity types;
expanding the seed word set to obtain an industry dictionary corresponding to different entity types;
and aiming at each word in the industry dictionary, obtaining a text containing the word as a training text, and taking the word as an entity labeling result corresponding to the entity type of the industry dictionary where the word is located in the training text.
17. An entity identification apparatus comprising:
the second acquisition unit is used for acquiring the text to be recognized;
the processing unit is used for inputting the text to be recognized, the industry dictionaries corresponding to different entity types and the target entity type vectors into an entity recognition model;
the recognition unit is used for extracting an entity corresponding to the type of the target entity in the text to be recognized according to the output result of the entity recognition model and taking the entity as the entity recognition result of the text to be recognized;
wherein the entity recognition model is pre-trained according to the apparatus of any one of claims 10-16.
18. The apparatus according to claim 17, wherein the processing unit, when inputting the target entity type vector into the entity recognition model, specifically performs:
obtaining a target entity type;
and inputting an entity type vector corresponding to the target entity type into the entity recognition model as the target entity type vector.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.
Background
When a query term (query) input by a user for retrieval is obtained, in order to more accurately obtain a retrieval requirement of the user, all entities or a specific entity in the query term need to be identified. Because the entities in the query terms correspond to different entity types, the prior art generally adopts a mode of setting a plurality of entity identification models to respectively identify the entities corresponding to different entity types, which leads to the technical problems of complicated identification steps and low identification accuracy.
Disclosure of Invention
According to a first aspect of the present disclosure, there is provided a training method of an entity recognition model, including: acquiring training data, wherein the training data comprises a plurality of training texts and entity labeling results corresponding to different entity types in the plurality of training texts; constructing a neural network model comprising a first network layer, a second network layer and a third network layer, wherein the first network layer is used for obtaining a first semantic vector sequence of a training text according to the training text and an industry dictionary corresponding to different entity types, and the second network layer is used for obtaining a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and a target entity type vector; and training the neural network model by using a plurality of training texts, industry dictionaries corresponding to different entity types, target entity type vectors and entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model.
According to a second aspect of the present disclosure, there is provided an entity identification method, including: acquiring a text to be identified; inputting the text to be recognized, industry dictionaries corresponding to different entity types and target entity type vectors into an entity recognition model; and extracting an entity corresponding to the target entity type in the text to be recognized according to the output result of the entity recognition model, and taking the entity as the entity recognition result of the text to be recognized.
According to a third aspect of the present disclosure, there is provided a training apparatus for an entity recognition model, comprising: the first acquisition unit is used for acquiring training data, wherein the training data comprises a plurality of training texts and entity labeling results corresponding to different entity types in the plurality of training texts; the device comprises a construction unit and a processing unit, wherein the construction unit is used for constructing a neural network model comprising a first network layer, a second network layer and a third network layer, the first network layer is used for obtaining a first semantic vector sequence of a training text according to the training text and an industry dictionary corresponding to different entity types, and the second network layer is used for obtaining a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and a target entity type vector; and the training unit is used for training the neural network model by using the plurality of training texts, the industry dictionaries corresponding to different entity types, the target entity type vectors and the entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model.
According to a fourth aspect of the present disclosure, there is provided an entity identifying apparatus comprising: the second acquisition unit is used for acquiring the text to be recognized; the processing unit is used for inputting the text to be recognized, the industry dictionaries corresponding to different entity types and the target entity type vectors into an entity recognition model; and the identification unit is used for extracting an entity corresponding to the target entity type in the text to be identified according to the output result of the entity identification model and taking the entity as the entity identification result of the text to be identified.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.
According to the technical scheme, the neural network model is trained in a mode of introducing the industry dictionaries corresponding to different entity types and the target entity type vectors, so that the neural network model can learn the dependency relationship among different entities in the text and is not limited by the entity types overlapped by the entities in the text, the technical effect of identifying the entities of different entity types through one entity identification model is achieved, and the accuracy of the entity identification model in identifying the entities corresponding to the different entity types in the text is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a sixth embodiment of the present disclosure;
FIG. 7 is a block diagram of an electronic device for implementing the entity recognition model training and entity recognition methods of the embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. As shown in fig. 1, the method for training an entity recognition model in this embodiment may specifically include the following steps:
s101, acquiring training data, wherein the training data comprises a plurality of training texts and entity labeling results corresponding to different entity types in the plurality of training texts;
s102, constructing a neural network model comprising a first network layer, a second network layer and a third network layer, wherein the first network layer is used for obtaining a first semantic vector sequence of a training text according to the training text and an industry dictionary corresponding to different entity types, and the second network layer is used for obtaining a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and a target entity type vector;
s103, training the neural network model by using the plurality of training texts, the industry dictionaries corresponding to different entity types, the target entity type vectors and the entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model.
According to the training method of the entity recognition model, the neural network model is trained in a mode of introducing the industry dictionaries corresponding to different entity types and the target entity type vectors, so that the neural network model can learn the dependency relationship among different entities in the text and is not limited by the entity types of the entities in the text which are overlapped, and the accuracy of the entity recognition model in recognizing the entities corresponding to the different entity types in the text is improved.
In the training data obtained by executing S101, the entity labeling result of the training text is an entity corresponding to different entity types in the training text, and the same entity in the training text may correspond to multiple entity types; in this embodiment, all entities corresponding to different entity types in the training text may be labeled, or only entities corresponding to a specific entity type in the training text may be labeled.
The entity type in this embodiment includes at least one of a brand entity type, a category entity type, a color entity type, a crowd entity type, a time entity type, and a style entity type, and the number of the entity types is not limited in this embodiment.
For example, if the training text in this embodiment is "spring of jeep jacket man", the entity labeling result of the training text may be: a "Jeep" corresponding to a brand entity type, a "jacket" corresponding to a category entity type, a "man" corresponding to a crowd entity type, a "spring" corresponding to a time entity type, and a "spring" corresponding to a style entity type, and the like.
In this embodiment, after the training data of the entity labeling result including the plurality of training texts and the plurality of training texts is obtained by executing S101, S102 is executed to construct a neural network model including a first network layer, a second network layer, and a third network layer.
In the neural network model constructed by executing S102 in this embodiment, the first network layer is configured to obtain a first semantic vector sequence of the training text according to the training text and the industry dictionaries corresponding to different entity types.
Specifically, when the first network layer in this embodiment obtains the first semantic vector sequence of the training text according to the training text and the industry dictionaries corresponding to different entity types, an optional implementation manner that can be adopted is as follows: taking the training text as input to obtain an initial semantic vector of each semantic unit in the training text; matching each semantic unit in an industry dictionary corresponding to different entity types, and obtaining an identification vector of each semantic unit according to a matching result; splicing the initial semantic vector and the identification vector of each semantic unit, and obtaining a first semantic vector of each semantic unit according to a splicing result; and obtaining a first semantic vector sequence of the training text according to the first semantic vector of each semantic unit.
The first network layer in this embodiment is composed of a first neural network and a second neural network; the first neural network is a pre-training model, such as an Ernie model, and is used for obtaining initial semantic vectors of semantic units in a training text according to the training text; the second neural network is a recurrent neural network, such as a bidirectional long-short term memory network, and is used for obtaining the first semantic vector of each semantic unit according to the splicing result between the initial semantic vector of each semantic unit in the training text and the identification vector of each semantic unit, and correspondingly obtaining the first semantic vector sequence of the training text.
The system comprises a plurality of industry dictionaries, a plurality of word processing units and a plurality of word processing units, wherein each industry dictionary corresponds to different entity types, and different industry dictionaries comprise a plurality of words corresponding to different entity types; the embodiment can also update the used industry dictionary at regular time.
For example, the industry dictionary corresponding to the brand entity type in the embodiment includes words of different brands; the industry dictionary corresponding to the entity type of the item class comprises words of different item classes.
In this embodiment, when matching each semantic unit in an industry dictionary corresponding to different entity types and obtaining an identifier vector of each semantic unit according to a matching result, an optional implementation manner that can be adopted is as follows: setting the sequence of the industry dictionaries corresponding to different entity types; for each semantic unit, matching the semantic unit in industry dictionaries corresponding to different entity types in sequence; and in the case that the word matched with the semantic unit exists in the industry dictionary, setting the vector corresponding to the industry dictionary position in the identification vector to be 1, and otherwise, setting the vector to be 0.
For example, if there are 3 industry dictionaries in the present embodiment, which are an industry dictionary 1 corresponding to the brand entity type, an industry dictionary 2 corresponding to the item entity type, and an industry dictionary 3 corresponding to the time entity type in sequence, if the semantic unit included in the training text is "jeep", and only the term "jeep" is included in the industry dictionary 1, the identification vector corresponding to the semantic unit "jeep" obtained in the present embodiment is (1, 0, 0).
That is to say, the first network layer in this embodiment enables the first semantic vector to be capable of fusing entity types by introducing a manner of an industry dictionary corresponding to different entity types, so that the accuracy of the obtained first semantic vector sequence is improved.
In the neural network model constructed by executing S102, the second network layer is configured to obtain a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and the target entity type vector; the target entity type vector in this embodiment corresponds to the target entity type identified from the training text.
Specifically, when the second network layer in this embodiment obtains the second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and the target entity type vector, the optional implementation manner that can be adopted is as follows: performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence to obtain a first calculation result of each semantic unit; performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain a second calculation result of each semantic unit; splicing the first calculation result and the second calculation result of each semantic unit to obtain a second semantic vector of each semantic unit; and obtaining a second semantic vector sequence of the training text according to the second semantic vector of each semantic unit.
In this embodiment, when obtaining the target entity type vector, the following method may be adopted: determining a target entity type, the target entity type corresponding to an entity type of an entity to be identified from the training text; and taking an entity type vector corresponding to the target entity type as a target entity type vector.
In this embodiment, when obtaining the entity type vector corresponding to the entity type, the optional implementation manner that may be adopted is: determining descriptive terms for different entity types, such as brand entity type "brand"; after replacing the entity in the training text with a corresponding description word, for example, replacing "jeep" in the training text with "brand", and performing unsupervised learning on the obtained replacement text, for example, performing unsupervised learning by using an Erine model; and taking the vector corresponding to the description word in the replacement text as an entity type vector of each entity type, for example, taking the vector of the description word obtained after preset learning times as the entity type vector. The embodiment may also update the entity type vector according to the above method in a timing manner.
The second network layer in this embodiment is composed of a first attention network and a second attention network; the first attention network is used for carrying out attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence to obtain a first calculation result of each semantic unit; the second attention network is used for performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain a second calculation result of each semantic unit.
In this embodiment, when performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain the second calculation result of each semantic unit, an optional implementation manner that can be adopted is as follows: calculating the similarity between the target entity type vector and the first semantic vector of each semantic unit, for example, calculating the cosine similarity; and performing attention calculation according to the calculated similarity and the target entity type vector to obtain a second calculation result of each semantic unit, for example, performing dot multiplication on each similarity and the target entity type vector to obtain the second calculation result.
That is to say, in this embodiment, by obtaining the target entity type vector, the neural network model can identify the entity corresponding to the target entity type in the training text, and thus accuracy of the neural network model in entity identification is improved.
In the neural network model constructed by executing S102, the third network layer is configured to label, according to the second semantic vector sequence of the training text, an entity in the training text corresponding to the target entity type; the third network layer in this embodiment may be a Conditional Random field (Conditional Random Fields) model, and identify a corresponding entity in the training text in a BIO labeling manner.
For example, if the training text in this embodiment is "jeep jacket man spring", and if the entity type corresponding to the target entity type vector is a brand entity type, the embodiment marks the brand entity "jeep" in the training text.
In this embodiment, after the step S102 of constructing the neural network model including the first network layer, the second network layer, and the third network layer is performed, the step S103 of training the neural network model using a plurality of training texts, an industry dictionary corresponding to different entity types, a target entity type vector, and entity labeling results corresponding to different entity types in the plurality of training texts is performed to obtain an entity recognition model.
By using the entity recognition model obtained by executing the training in S103 in this embodiment, after the text to be recognized, the dictionaries corresponding to different entity types, and the target entity type vector are used as the input of the entity recognition model, the target entity in the text to be recognized can be obtained according to the labeling result output by the entity recognition model.
Specifically, in this embodiment, when executing S103 to train the neural network model using the multiple training texts, the industry dictionaries corresponding to different entity types, the target entity type vector, and the entity labeling results corresponding to different entity types in the multiple training texts, to obtain the entity recognition model, an optional implementation manner that can be adopted is as follows: aiming at each training text, taking the training text and industry dictionaries corresponding to different entity types as the input of a first network layer to obtain a first semantic vector sequence output by the first network layer; taking the first semantic vector sequence and the target entity type vector as the input of a second network layer to obtain a second semantic vector sequence output by the second network layer; taking the second semantic vector sequence as the input of a third network layer, and obtaining an entity recognition result of the training text according to the output result of the third network layer; and updating parameters of the neural network model according to the entity recognition result of the training text and the entity marking result corresponding to the target entity type until the neural network model converges to obtain the entity recognition model.
In addition, after the neural network model in this embodiment finishes labeling the training text, a score corresponding to the training text is also output, so this embodiment may further include the following: determining training texts meeting preset conditions according to the scores corresponding to the training texts; adding the determined training text as a new sample to the training data for training the neural network model.
In this embodiment, when the training texts meeting the preset conditions are determined according to the scores corresponding to the training texts, the training texts with the scores larger than the preset threshold may be selected, and the uncertainty values of the training texts may be obtained based on a calculation method of the information entropy, so as to select the training texts with the uncertainty values smaller than the preset threshold.
In this embodiment, the unselected training texts may be directly discarded, or the training texts obtained by sampling may be manually labeled in a sampling manner.
That is to say, this embodiment can also realize the reinforcing of training data, screens the training text according to the mark result for neural network model can use the better training data of quality to train, thereby promotes the training quality of neural network model.
By adopting the method, the neural network model is trained by introducing the industry dictionaries corresponding to different entity types and the target entity type vectors, so that the neural network model can learn the dependency relationship between different entities in the text and is not limited by the entity types overlapped by the entities in the text, and the accuracy of the entity recognition model in recognizing the entities corresponding to the different entity types in the text is improved.
Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure. As shown in fig. 2, when executing S101 to acquire training data, the present embodiment may specifically include the following steps:
s201, acquiring seed word sets corresponding to different entity types;
s202, expanding the seed word set to obtain industry dictionaries corresponding to different entity types;
s203, aiming at each word in the industry dictionary, obtaining a text containing the word as a training text, and taking the word as an entity labeling result corresponding to the entity type of the industry dictionary where the word is located in the training text.
That is to say, this embodiment can combine data mining technique to obtain training data fast, has avoided the dependence problem to artifical label data, has reduced the training cost of neural network model.
In this embodiment, a small number of words corresponding to different entity types may be included in different seed word sets obtained by executing S201; in this embodiment, when S202 is executed, the words associated with the words in the seed word set may be added to the seed word set through the encyclopedic knowledge base, so as to obtain the industry dictionaries corresponding to different entity types.
In this embodiment, after the industry dictionaries corresponding to different entity types are obtained in S202, the rationality of each industry dictionary may also be verified, for example, the coverage rate and the matching rate of words included in the industry dictionary are verified.
In this embodiment, when the text including the words in the industry dictionary is obtained by executing S203, text search may be performed based on each word, so that automatic acquisition of the text is realized.
Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure. As shown in fig. 3, the entity identification method of this embodiment may specifically include the following steps:
s301, acquiring a text to be recognized;
s302, inputting the text to be recognized, industry dictionaries corresponding to different entity types and target entity type vectors into an entity recognition model;
s303, extracting an entity corresponding to the target entity type in the text to be recognized according to the output result of the entity recognition model, and taking the entity as the entity recognition result of the text to be recognized.
According to the entity recognition method, the corresponding entities are extracted from the text to be recognized through the entity recognition model obtained through pre-training, and the entity recognition model can learn the dependency relationship among different entities in the text and is not limited by the entity types of the entities in the text which are overlapped, so that the accuracy of the obtained entity recognition result is improved.
In this embodiment, the text to be recognized obtained in S301 may be a query word (query) text input by the user during searching.
In this embodiment, when S302 is executed to input the target entity type vector into the entity identification model, the entity type vectors corresponding to all entity types may be respectively used as the target entity type vectors, and multiple entity identification results corresponding to different entity types in the text to be identified are obtained through multiple identifications of the entity identification model.
In this embodiment, when the S302 is executed to input the target entity type vector into the entity identification model, the entity type vector corresponding to the target entity type may also be input into the entity identification model as the target entity type vector according to the obtained target entity type, that is, this embodiment may also achieve the purpose of extracting the entity of the specific entity type in the text to be identified.
Fig. 4 is a schematic diagram according to a fourth embodiment of the present disclosure. A flow chart of entity identification of the present embodiment is shown in fig. 4: inputting a text to be recognized and an industry dictionary corresponding to different entity types into a first neural network of a first network layer to obtain a first semantic vector of each semantic unit in a training text output by the first network layer; inputting the first semantic vector and the target entity type vector of each semantic unit into a second network layer to obtain a second semantic vector of each semantic unit in a training text output by the second network layer; inputting the second semantic vector of each semantic unit into a third network layer to obtain an entity labeling result corresponding to the target entity type in the training text output by the third network layer; in this embodiment, the target entity type is "brand entity type", and the output of the entity identification model is a labeling result of a brand entity in a text to be identified, where B represents a start of the brand entity, I represents content in the brand entity, and O represents content unrelated to the brand entity.
Fig. 5 is a schematic diagram according to a fifth embodiment of the present disclosure. As shown in fig. 5, the training apparatus 500 for entity recognition model of the present embodiment includes:
the first obtaining unit 501 is configured to obtain training data, where the training data includes a plurality of training texts and entity labeling results corresponding to different entity types in the plurality of training texts;
the building unit 502 is configured to build a neural network model including a first network layer, a second network layer, and a third network layer, where the first network layer is configured to obtain a first semantic vector sequence of a training text according to the training text and an industry dictionary corresponding to different entity types, and the second network layer is configured to obtain a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and a target entity type vector;
the training unit 503 is configured to train the neural network model using the multiple training texts, the industry dictionaries corresponding to different entity types, the target entity type vector, and the entity labeling results corresponding to different entity types in the multiple training texts, so as to obtain an entity recognition model.
In the training data acquired by the first acquiring unit 501, the entity labeling result of the training text is an entity corresponding to different entity types in the training text, and the same entity in the training text may correspond to multiple entity types; in this embodiment, all entities corresponding to different entity types in the training text may be labeled, or only entities corresponding to a specific entity type in the training text may be labeled.
When the first obtaining unit 501 obtains the training data, the following method may be adopted: acquiring seed word sets corresponding to different entity types; expanding the seed word set to obtain an industry dictionary corresponding to different entity types; and aiming at each word in the industry dictionary, obtaining a text containing the word as a training text, and taking the word as an entity labeling result corresponding to the entity type of the industry dictionary where the word is located in the training text.
That is to say, the first obtaining unit 501 can combine with a data mining technology to obtain training data quickly, so that the problem of dependence on manual labeling data is avoided, and the training cost of the neural network model is reduced.
In this embodiment, after the first obtaining unit 501 obtains the training data of the entity labeling result including a plurality of training texts and a plurality of training texts, the constructing unit 502 constructs the neural network model including the first network layer, the second network layer, and the third network layer.
In the neural network model constructed by the construction unit 502, the first network layer is configured to obtain a first semantic vector sequence of the training text according to the training text and the industry dictionaries corresponding to different entity types.
Specifically, when the first network layer constructed by the construction unit 502 obtains the first semantic vector sequence of the training text according to the training text and the industry dictionaries corresponding to different entity types, the optional implementation manner that can be adopted is as follows: taking the training text as input to obtain an initial semantic vector of each semantic unit in the training text; matching each semantic unit in an industry dictionary corresponding to different entity types, and obtaining an identification vector of each semantic unit according to a matching result; splicing the initial semantic vector and the identification vector of each semantic unit, and obtaining a first semantic vector of each semantic unit according to a splicing result; and obtaining a first semantic vector sequence of the training text according to the first semantic vector of each semantic unit.
The first network layer constructed by the construction unit 502 is composed of a first neural network and a second neural network; the first neural network is a pre-training model and is used for obtaining initial semantic vectors of all semantic units in a training text according to the training text; the second neural network is a recurrent neural network and is used for obtaining the first semantic vector of each semantic unit according to the splicing result between the initial semantic vector of each semantic unit in the training text and the identification vector of each semantic unit, and correspondingly obtaining the first semantic vector sequence of the training text.
The system comprises a plurality of industry dictionaries, a plurality of word processing units and a plurality of word processing units, wherein each industry dictionary corresponds to different entity types, and different industry dictionaries comprise a plurality of words corresponding to different entity types; the construction unit 502 can also update the used industry dictionary periodically.
The first network layer constructed by the construction unit 502 matches each semantic unit in the industry dictionaries corresponding to different entity types, and when obtaining the identification vector of each semantic unit according to the matching result, the optional implementation manner that can be adopted is: setting the sequence of the industry dictionaries corresponding to different entity types; for each semantic unit, matching the semantic unit in industry dictionaries corresponding to different entity types in sequence; and in the case that the word matched with the semantic unit exists in the industry dictionary, setting the vector corresponding to the industry dictionary position in the identification vector to be 1, and otherwise, setting the vector to be 0.
That is to say, the first network layer constructed by the construction unit 502 enables the first semantic vector to be fused with the entity types by introducing the industry dictionaries corresponding to different entity types, so that the accuracy of the obtained first semantic vector sequence is improved.
In the neural network model constructed by the construction unit 502, the second network layer is configured to obtain a second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and the target entity type vector; the target entity type vector in this embodiment corresponds to the target entity type identified from the training text.
Specifically, when the second network layer constructed by the construction unit 502 obtains the second semantic vector sequence of the training text according to the first semantic vector sequence of the training text and the target entity type vector, the optional implementation manner that can be adopted is as follows: performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence to obtain a first calculation result of each semantic unit; performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain a second calculation result of each semantic unit; splicing the first calculation result and the second calculation result of each semantic unit to obtain a second semantic vector of each semantic unit; and obtaining a second semantic vector sequence of the training text according to the second semantic vector of each semantic unit.
When obtaining the target entity type vector, the constructing unit 502 may adopt the following manner: determining a target entity type, the target entity type corresponding to an entity type of an entity to be identified from the training text; and taking an entity type vector corresponding to the target entity type as a target entity type vector.
When the construction unit 502 obtains the entity type vector corresponding to the entity type, the optional implementation manner that can be adopted is as follows: determining description words of different entity types; after replacing the entity in the training text with the corresponding description word, carrying out unsupervised learning on the obtained replacement text; and taking the vector corresponding to the description word in the replacement text as an entity type vector of each entity type. The embodiment may also update the entity type vector according to the above method in a timing manner.
The second network layer constructed by the construction unit 502 is composed of a first attention network and a second attention network; the first attention network is used for carrying out attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence to obtain a first calculation result of each semantic unit; the second attention network is used for performing attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain a second calculation result of each semantic unit.
When the second network layer constructed by the construction unit 502 performs attention calculation according to the first semantic vector of each semantic unit in the first semantic vector sequence and the target entity type vector to obtain the second calculation result of each semantic unit, the optional implementation manner that can be adopted is as follows: calculating the similarity between the target entity type vector and the first semantic vector of each semantic unit; and performing attention calculation according to the calculated similarity and the target entity type vector to obtain a second calculation result of each semantic unit.
That is to say, the second network layer constructed by the construction unit 502 enables the neural network model to identify the entity corresponding to the target entity type in the training text by obtaining the target entity type vector, thereby improving the accuracy of the neural network model in entity identification.
In the neural network model constructed by the construction unit 502, the third network layer is configured to label, according to the second semantic vector sequence of the training text, an entity corresponding to the target entity type in the training text; the third network layer in this embodiment may be a Conditional Random field (Conditional Random Fields) model, and identify a corresponding entity in the training text in a BIO labeling manner.
In this embodiment, after the building unit 502 builds the neural network model including the first network layer, the second network layer, and the third network layer, the training unit 503 trains the neural network model using a plurality of training texts, an industry dictionary corresponding to different entity types, a target entity type vector, and entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model.
When the training unit 503 trains the neural network model using a plurality of training texts, an industry dictionary corresponding to different entity types, a target entity type vector, and entity labeling results corresponding to different entity types in the plurality of training texts to obtain an entity recognition model, an optional implementation manner that can be adopted is as follows: aiming at each training text, taking the training text and industry dictionaries corresponding to different entity types as the input of a first network layer to obtain a first semantic vector sequence output by the first network layer; taking the first semantic vector sequence and the target entity type vector as the input of a second network layer to obtain a second semantic vector sequence output by the second network layer; taking the second semantic vector sequence as the input of a third network layer, and obtaining an entity recognition result of the training text according to the output result of the third network layer; and updating parameters of the neural network model according to the entity recognition result of the training text and the entity marking result corresponding to the target entity type until the neural network model converges to obtain the entity recognition model.
In addition, after the neural network model in this embodiment finishes labeling the training text, the neural network model also outputs a score corresponding to the training text, so the training unit 503 may further include the following contents: determining training texts meeting preset conditions according to the scores corresponding to the training texts; adding the determined training text as a new sample to the training data for training the neural network model.
When determining the training texts meeting the preset conditions according to the scores corresponding to the training texts, the training unit 503 may select the training texts with the scores larger than the preset threshold, and may also obtain the uncertainty values of the training texts based on a calculation method of the information entropy, thereby selecting the training texts with the uncertainty values smaller than the preset threshold.
That is to say, the training unit 503 can also enhance the training data, and filter the training text according to the labeling result, so that the neural network model can be trained by using the training data with better quality, thereby improving the training quality of the neural network model.
Fig. 6 is a schematic diagram according to a sixth embodiment of the present disclosure. As shown in fig. 6, the entity identifying apparatus 600 of the present embodiment includes:
a second obtaining unit 601, configured to obtain a text to be recognized;
the processing unit 602 is configured to input the text to be recognized, an industry dictionary corresponding to different entity types, and a target entity type vector into an entity recognition model;
the identifying unit 603 is configured to extract, according to the output result of the entity identification model, an entity corresponding to the target entity type in the text to be identified, as an entity identification result of the text to be identified.
The text to be recognized acquired by the second acquiring unit 601 may be a query word (query) text input by the user when performing a search.
When the processing unit 602 inputs the target entity type vector into the entity identification model, the entity type vectors corresponding to all entity types may be respectively used as the target entity type vectors, and multiple entity identification results corresponding to different entity types in the text to be identified are obtained through multiple identifications of the entity identification model.
When the target entity type vector is input into the entity identification model, the processing unit 602 may further input the entity type vector corresponding to the target entity type into the entity identification model as the target entity type vector according to the obtained target entity type, that is, this embodiment may also achieve the purpose of extracting the entity of the specific entity type in the text to be identified.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 7 is a block diagram of an electronic device for training an entity recognition model and an entity recognition method according to an embodiment of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as training of an entity recognition model and an entity recognition method. For example, in some embodiments, the training of the entity recognition model and the entity recognition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708.
In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM702 and/or communications unit 709. When loaded into RAM703 and executed by the computing unit 701, may perform one or more of the steps of the method of training an entity recognition model and entity recognition described above. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform the training of the entity recognition model and the entity recognition method.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:训练语言模型的方法和标签设置方法