Named entity identification method and device
1. A method for identifying a named entity, the method comprising:
the method comprises the steps that name matching is carried out on a named entity to be identified and an experience named entity in an entity knowledge base, the similarity between the named entity to be identified and the experience named entity is obtained, and N candidate experience named entities are obtained based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
obtaining specific characteristics of candidate named entities; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
based on the specific characteristics, the similarity between the candidate named entity and the named entity to be identified is recalculated, the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified is obtained, and the named entity to be identified is classified into the standard named entity corresponding to the candidate named entity meeting the preset conditions.
2. The method of claim 1, wherein the preset relevance features comprise at least one or a combination of:
whether the name of the standard named entity corresponding to the candidate empirical named entity is completely the same as the name of the named entity to be identified or not;
the names of the standard named entities corresponding to the candidate named entities are the frequency proportion of the names appearing in the corresponding named entities;
similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified.
3. The method according to claim 2, wherein if the predetermined relevance feature includes a name of a standard named entity corresponding to the candidate named entity, and a ratio of occurrences in the corresponding named entity, obtaining a specific feature of the candidate named entity comprises:
searching the names of the standard named entities corresponding to the candidate named entities from the entity knowledge base, wherein the names are in the frequency proportion of the appearance in the corresponding named entities;
or acquiring names of standard named entities corresponding to the candidate named entities, counting the number of the acquired named entities corresponding to the standard named entities and the number of the named entities identical to the acquired name of the standard named entity, and calculating the proportion of the number of times that the acquired name of the standard named entity appears in the corresponding named entities based on the counted number.
4. The method according to claim 2, wherein if the preset relevance features include a similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified, obtaining the specific features of the candidate named entity comprises:
acquiring a text to be recognized, in which the named entity to be recognized is recorded;
determining the semantic type of the named entity to be recognized by combining the text to be recognized;
obtaining semantic types of the candidate empirically named entities from the entity knowledge base;
and calculating the similarity between the semantic type of the candidate empirical named entity and the semantic type of the named entity to be identified.
5. The method according to any of claims 1-4, wherein recalculating the similarity of the candidate empirical named entity to the named entity to be identified comprises:
assigning a preset weight to the specific feature;
and performing weighted calculation based on the distributed weights to obtain the similarity of the recalculated candidate empirical named entity and the named entity to be identified.
6. The method according to any of claims 1-4, wherein recalculating the similarity of the candidate empirical named entity to the named entity to be identified comprises:
and inputting the specific features into a preset recognition model, and recalculating the similarity between the candidate empirical named entity and the named entity to be recognized.
7. The method of claim 6, further comprising:
acquiring N candidate named entities determined based on the similarity between the named entities in the historical text and the named entities tested in the entity knowledge base and specific characteristics of the candidate named entities;
performing model training on the specific characteristics of the candidate named entity after the label is added to obtain the preset recognition model; the label is used for indicating the standard named entity to which the named entity belongs in the history text.
8. The method of claim 6, further comprising:
and updating the preset identification model according to the named entity to be identified and the corresponding standard named entity.
9. The method of claim 6, wherein the pre-set recognition model comprises a linear regression model;
the optimization goal of the linear regression model is to minimize | Xw-Y |;
where X is a particular feature, Y is 0 or 1, and w is the weight of the particular feature.
10. A method for identifying a named entity, the method comprising:
acquiring a medical named entity to be identified in a medical text to be identified;
the medical named entity to be identified and the empirical medical named entity in the medical entity knowledge base are subjected to name matching, the similarity between the medical named entity to be identified and the empirical medical named entity is obtained, and N candidate empirical medical named entities are obtained based on the similarity; the named entities of the empirical medical treatment comprise named entities which are obtained from historical medical treatment texts and classified into standard medical treatment named entities; n is a positive integer;
acquiring specific characteristics of candidate empirical medical named entities; the specific characteristics comprise the similarity between the candidate medical named entity and the medical named entity to be identified and preset relevance characteristics between the candidate medical named entity and other named entities;
based on the specific characteristics, the similarity between the candidate named entities for medical treatment experience and the named entities for medical treatment to be identified is recalculated, candidate named entities for medical treatment experience with the similarity meeting preset conditions with the named entities for medical treatment to be identified are obtained, and the named entities for medical treatment to be identified are classified into standard named entities corresponding to the candidate named entities for medical treatment experience meeting the preset conditions.
11. A method for identifying a named entity, the method comprising:
acquiring a text to be identified, and acquiring a named entity to be identified from the text to be identified;
the method comprises the steps that name matching is carried out on the named entity to be identified and an experience named entity in an entity knowledge base, the similarity between the named entity to be identified and the experience named entity is obtained, and N candidate experience named entities are obtained based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
obtaining specific characteristics of candidate named entities; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
based on the specific characteristics, recalculating the similarity between the candidate named entity and the named entity to be identified, obtaining the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified, and classifying the named entity to be identified into a standard named entity corresponding to the candidate named entity meeting the preset conditions;
and outputting the corresponding relation between the named entity to be identified and the standard named entity according to the classification result.
12. An apparatus for identifying named entities, the apparatus comprising:
the name matching unit is used for carrying out name matching on the named entity to be identified and the experience named entity in the entity knowledge base to obtain the similarity between the named entity to be identified and the experience named entity;
a first obtaining unit, configured to obtain N candidate empirically-named entities based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
the second acquisition unit is used for acquiring the specific characteristics of the candidate named entity; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
the calculating unit is used for recalculating the similarity between the candidate named entity and the named entity to be identified based on the specific characteristics to obtain the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified;
and the classifying unit is used for classifying the named entity to be identified into the standard named entity corresponding to the candidate empirical named entity meeting the preset condition.
13. An apparatus for identifying named entities, the apparatus comprising:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring medical named entities to be identified in medical texts to be identified;
the name matching unit is used for performing name matching on the medical named entity to be identified and the empirical medical named entity in the medical entity knowledge base to obtain the similarity between the medical named entity to be identified and the empirical medical named entity;
a second obtaining unit, configured to obtain N candidate empirical medical named entities based on the similarity; the named entities of the empirical medical treatment comprise named entities which are obtained from historical medical treatment texts and classified into standard medical treatment named entities; n is a positive integer;
the third acquisition unit is used for acquiring specific characteristics of the candidate empirical medical named entities; the specific characteristics comprise the similarity between the candidate medical named entity and the medical named entity to be identified and preset relevance characteristics between the candidate medical named entity and other named entities;
the calculating unit is used for recalculating the similarity between the candidate medical named entity and the medical named entity to be identified based on the specific characteristics to obtain the candidate medical named entity of which the similarity with the medical named entity to be identified meets a preset condition;
and the classifying unit is used for classifying the medical named entities to be identified into standard medical named entities corresponding to the candidate empirical medical named entities meeting the preset conditions.
14. An apparatus for identifying named entities, the apparatus comprising:
the acquisition unit is used for acquiring a text to be recognized;
the first acquisition unit is used for acquiring the named entity to be identified from the text to be identified;
the name matching unit is used for carrying out name matching on the named entity to be identified and an empirical named entity in an entity knowledge base to obtain the similarity between the named entity to be identified and the empirical named entity;
a second obtaining unit, configured to obtain N candidate empirically-named entities based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
the third acquisition unit is used for acquiring the specific characteristics of the candidate named entity; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
the calculating unit is used for recalculating the similarity between the candidate named entity and the named entity to be identified based on the specific characteristics to obtain the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified;
the classifying unit is used for classifying the named entity to be identified into a standard named entity corresponding to the candidate empirical named entity meeting the preset condition;
and the output unit is used for outputting the corresponding relation between the named entity to be identified and the standard named entity according to the classification result.
15. A storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform a method of identifying a named entity according to any one of claims 1 to 11.
16. An electronic device, comprising a storage medium and a processor;
the processor is suitable for realizing instructions;
the storage medium adapted to store a plurality of instructions;
the instructions are adapted to be loaded by the processor and to perform a method of identifying a named entity according to any one of claims 1 to 11.
Background
In real life, different expression modes can be adopted for the same named entity, for example, for an eye drop for relieving eye fatigue, some doctors can write sodium hyaluronate eye drops on a medical notebook, some doctors can use a product brand to implicitly indicate that the sodium hyaluronate eye drops are of a certain type, and some doctors can write artificial tears because the sodium hyaluronate eye drops belong to artificial tears. This presents difficulties to the identification of named entities, and the normalization process becomes the key to solving this problem. At present, the method for carrying out normalization processing on named entities mainly comprises the following steps: and matching the entity names in the text to be recognized with the entity names in the entity knowledge base one by one, calculating the similarity of the two entity names, and classifying the entity names in the text to be recognized into the entity names with the maximum similarity in the entity knowledge base. However, sometimes the names of two entities, although similar to each other and different by one word, may belong to two different entities, for example penicillin and erythromycin, which are two different drugs. Therefore, the method for realizing the named entity normalization processing by only matching the words in the entity name in the prior art has low accuracy.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for identifying a named entity, which aim to solve the problem of low accuracy in classifying the named entity in the prior art.
In a first aspect, the present invention provides a method for identifying a named entity, where the method includes:
the method comprises the steps that name matching is carried out on a named entity to be identified and an experience named entity in an entity knowledge base, the similarity between the named entity to be identified and the experience named entity is obtained, and N candidate experience named entities are obtained based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
obtaining specific characteristics of candidate named entities; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
based on the specific characteristics, the similarity between the candidate named entity and the named entity to be identified is recalculated, the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified is obtained, and the named entity to be identified is classified into the standard named entity corresponding to the candidate named entity meeting the preset conditions.
Optionally, the preset relevance feature includes at least one or a combination of the following items:
whether the name of the standard named entity corresponding to the candidate empirical named entity is completely the same as the name of the named entity to be identified or not;
the names of the standard named entities corresponding to the candidate named entities are the frequency proportion of the names appearing in the corresponding named entities;
similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified.
Optionally, if the preset relevance feature includes a name of a standard named entity corresponding to the candidate empirical named entity and a ratio of times of appearance in the corresponding empirical named entity, the obtaining of the specific feature of the candidate empirical named entity includes:
searching the names of the standard named entities corresponding to the candidate named entities from the entity knowledge base, wherein the names are in the frequency proportion of the appearance in the corresponding named entities;
or acquiring names of standard named entities corresponding to the candidate named entities, counting the number of the acquired named entities corresponding to the standard named entities and the number of the named entities identical to the acquired name of the standard named entity, and calculating the proportion of the number of times that the acquired name of the standard named entity appears in the corresponding named entities based on the counted number.
Optionally, if the preset relevance feature includes a similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified, the obtaining of the specific feature of the candidate named entity includes:
acquiring a text to be recognized, in which the named entity to be recognized is recorded;
determining the semantic type of the named entity to be recognized by combining the text to be recognized;
obtaining semantic types of the candidate empirically named entities from the entity knowledge base;
and calculating the similarity between the semantic type of the candidate empirical named entity and the semantic type of the named entity to be identified.
Optionally, the recalculating the similarity between the candidate empirical named entity and the named entity to be identified includes:
assigning a preset weight to the specific feature;
and performing weighted calculation based on the distributed weights to obtain the similarity of the recalculated candidate empirical named entity and the named entity to be identified.
Optionally, the recalculating the similarity between the candidate empirical named entity and the named entity to be identified includes:
and inputting the specific features into a preset recognition model, and recalculating the similarity between the candidate empirical named entity and the named entity to be recognized.
Optionally, the method further includes:
acquiring N candidate named entities determined based on the similarity between the named entities in the historical text and the named entities tested in the entity knowledge base and specific characteristics of the candidate named entities;
performing model training on the specific characteristics of the candidate named entity after the label is added to obtain the preset recognition model; the label is used for indicating the standard named entity to which the named entity belongs in the history text.
Optionally, the method further includes:
and updating the preset identification model according to the named entity to be identified and the corresponding standard named entity.
Optionally, the preset recognition model includes a linear regression model;
the optimization goal of the linear regression model is to minimize | Xw-Y |;
where X is a particular feature, Y is 0 or 1, and w is the weight of the particular feature.
In a second aspect, the present invention provides a method for identifying a named entity, the method comprising:
acquiring a medical named entity to be identified in a medical text to be identified;
the medical named entity to be identified and the empirical medical named entity in the medical entity knowledge base are subjected to name matching, the similarity between the medical named entity to be identified and the empirical medical named entity is obtained, and N candidate empirical medical named entities are obtained based on the similarity; the named entities of the empirical medical treatment comprise named entities which are obtained from historical medical treatment texts and classified into standard medical treatment named entities; n is a positive integer;
acquiring specific characteristics of candidate empirical medical named entities; the specific characteristics comprise the similarity between the candidate medical named entity and the medical named entity to be identified and preset relevance characteristics between the candidate medical named entity and other named entities;
based on the specific characteristics, the similarity between the candidate named entities for medical treatment experience and the named entities for medical treatment to be identified is recalculated, candidate named entities for medical treatment experience with the similarity meeting preset conditions with the named entities for medical treatment to be identified are obtained, and the named entities for medical treatment to be identified are classified into standard named entities corresponding to the candidate named entities for medical treatment experience meeting the preset conditions.
In a third aspect, the present invention provides a method for identifying a named entity, where the method includes:
acquiring a text to be identified, and acquiring a named entity to be identified from the text to be identified;
the method comprises the steps that name matching is carried out on the named entity to be identified and an experience named entity in an entity knowledge base, the similarity between the named entity to be identified and the experience named entity is obtained, and N candidate experience named entities are obtained based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
obtaining specific characteristics of candidate named entities; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
based on the specific characteristics, recalculating the similarity between the candidate named entity and the named entity to be identified, obtaining the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified, and classifying the named entity to be identified into a standard named entity corresponding to the candidate named entity meeting the preset conditions;
and outputting the corresponding relation between the named entity to be identified and the standard named entity according to the classification result.
In a fourth aspect, the present invention provides an apparatus for identifying a named entity, the apparatus comprising:
the name matching unit is used for carrying out name matching on the named entity to be identified and the experience named entity in the entity knowledge base to obtain the similarity between the named entity to be identified and the experience named entity;
a first obtaining unit, configured to obtain N candidate empirically-named entities based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
the second acquisition unit is used for acquiring the specific characteristics of the candidate named entity; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
the calculating unit is used for recalculating the similarity between the candidate named entity and the named entity to be identified based on the specific characteristics to obtain the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified;
and the classifying unit is used for classifying the named entity to be identified into the standard named entity corresponding to the candidate empirical named entity meeting the preset condition.
Optionally, the preset relevance feature acquired by the second acquiring unit includes at least one or a combination of the following items:
whether the name of the standard named entity corresponding to the candidate empirical named entity is completely the same as the name of the named entity to be identified or not;
the names of the standard named entities corresponding to the candidate named entities are the frequency proportion of the names appearing in the corresponding named entities;
similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified.
Optionally, the second obtaining unit is configured to, if the preset relevance feature includes a name of a standard named entity corresponding to the candidate named entity, find a number of times ratio of the name of the standard named entity corresponding to the candidate named entity appearing in the corresponding named entity from the entity repository, and find a number of times ratio of the name of the standard named entity corresponding to the candidate named entity appearing in the corresponding named entity; or acquiring names of standard named entities corresponding to the candidate named entities, counting the number of the acquired named entities corresponding to the standard named entities and the number of the named entities identical to the acquired name of the standard named entity, and calculating the proportion of the number of times that the acquired name of the standard named entity appears in the corresponding named entities based on the counted number.
Optionally, the second obtaining unit is configured to obtain a text to be recognized, in which the named entity to be recognized is recorded, if the preset relevance feature includes a similarity between a semantic type of the candidate empirical named entity and a semantic type of the named entity to be recognized; determining the semantic type of the named entity to be recognized by combining the text to be recognized; obtaining semantic types of the candidate empirically named entities from the entity knowledge base; and calculating the similarity between the semantic type of the candidate empirical named entity and the semantic type of the named entity to be identified.
Optionally, the computing unit includes:
the distribution module is used for distributing preset weight to the specific characteristics;
and the first calculation module is used for performing weighted calculation based on the distributed weights to obtain the similarity between the recalculated candidate empirical named entity and the named entity to be identified.
Optionally, the computing unit includes:
and the second calculation module is used for inputting the specific characteristics into a preset recognition model, and recalculating the similarity between the candidate empirical named entity and the named entity to be recognized.
Optionally, the apparatus further comprises:
the third acquisition unit is used for acquiring N candidate named entities determined based on the similarity between the named entities in the historical text and the named entities in the entity knowledge base and the specific characteristics of the candidate named entities;
the training unit is used for carrying out model training on the specific characteristics of the candidate named entity after the label is added to obtain the preset recognition model; the label is used for indicating the standard named entity to which the named entity belongs in the history text.
Optionally, the apparatus further comprises:
and the updating unit is used for updating the preset identification model according to the named entity to be identified and the corresponding standard named entity.
Optionally, the preset recognition model used by the second calculation module includes a linear regression model;
the optimization goal of the linear regression model is to minimize | Xw-Y |;
where X is a particular feature, Y is 0 or 1, and w is the weight of the particular feature.
In a fifth aspect, the present invention provides an apparatus for identifying a named entity, the apparatus comprising:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring medical named entities to be identified in medical texts to be identified;
the name matching unit is used for performing name matching on the medical named entity to be identified and the empirical medical named entity in the medical entity knowledge base to obtain the similarity between the medical named entity to be identified and the empirical medical named entity;
a second obtaining unit, configured to obtain N candidate empirical medical named entities based on the similarity; the named entities of the empirical medical treatment comprise named entities which are obtained from historical medical treatment texts and classified into standard medical treatment named entities; n is a positive integer;
the third acquisition unit is used for acquiring specific characteristics of the candidate empirical medical named entities; the specific characteristics comprise the similarity between the candidate medical named entity and the medical named entity to be identified and preset relevance characteristics between the candidate medical named entity and other named entities;
the calculating unit is used for recalculating the similarity between the candidate medical named entity and the medical named entity to be identified based on the specific characteristics to obtain the candidate medical named entity of which the similarity with the medical named entity to be identified meets a preset condition;
and the classifying unit is used for classifying the medical named entities to be identified into standard medical named entities corresponding to the candidate empirical medical named entities meeting the preset conditions.
In a sixth aspect, the present invention provides an apparatus for identifying a named entity, the apparatus comprising:
the acquisition unit is used for acquiring a text to be recognized;
the first acquisition unit is used for acquiring the named entity to be identified from the text to be identified;
the name matching unit is used for carrying out name matching on the named entity to be identified and an empirical named entity in an entity knowledge base to obtain the similarity between the named entity to be identified and the empirical named entity;
a second obtaining unit, configured to obtain N candidate empirically-named entities based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
the third acquisition unit is used for acquiring the specific characteristics of the candidate named entity; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
the calculating unit is used for recalculating the similarity between the candidate named entity and the named entity to be identified based on the specific characteristics to obtain the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified;
the classifying unit is used for classifying the named entity to be identified into a standard named entity corresponding to the candidate empirical named entity meeting the preset condition;
and the output unit is used for outputting the corresponding relation between the named entity to be identified and the standard named entity according to the classification result.
In a seventh aspect, the present invention provides a storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for identifying a named entity according to any of the first to third aspects.
In an eighth aspect, the present invention provides an electronic device comprising a storage medium and a processor;
the processor is suitable for realizing instructions;
the storage medium adapted to store a plurality of instructions;
the instructions are adapted to be loaded by the processor and to perform a method of identifying a named entity as claimed in any of the first to third aspects.
By means of the technical scheme, the named entity recognition method and the named entity recognition device provided by the invention can firstly carry out name matching on the named entity to be recognized and the experience named entities in the entity knowledge base, screen N candidate experience named entities from all the experience named entities, realize rough recognition on the named entity from the own dimension of the name, determine a candidate experience named entity which is most matched with the named entity to be recognized from the N candidate experience named entities by combining other characteristics of the N candidate experience named entities, classify the named entity to be recognized into the standard named entity corresponding to the selected candidate experience named entity, and realize detailed recognition on the named entity from the own dimension of the non-name. That is, compared with the prior art that the named entity is only identified from the name dimension, the method and the device can comprehensively identify the named entity from the name dimension and the non-name dimension, and further improve the accuracy of the named entity classification.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart illustrating a method for identifying a named entity according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating another method for identifying a named entity according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating a further method for identifying a named entity according to an embodiment of the present invention;
FIG. 4 illustrates an exemplary diagram for annotating named entities provided by an embodiment of the present invention;
FIG. 5 illustrates an exemplary diagram of another annotated named entity provided by embodiments of the present invention;
FIG. 6 is a block diagram illustrating an apparatus for identifying a named entity according to an embodiment of the present invention;
FIG. 7 is a block diagram illustrating another named entity recognition apparatus provided by the present invention;
FIG. 8 is a block diagram illustrating an apparatus for identifying a named entity according to an embodiment of the present invention;
fig. 9 is a block diagram illustrating a still further apparatus for identifying a named entity according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides a method for identifying a named entity, which comprises the following steps of:
101. the method comprises the steps of carrying out name matching on a named entity to be identified and an empirical named entity in an entity knowledge base to obtain the similarity between the named entity to be identified and the empirical named entity, and obtaining N candidate empirical named entities based on the similarity.
The named entities include a person name, an organization name, a place name, a product name, a disease name and other entities identified by names. The broader entities also include numbers, dates, currencies, addresses, and the like. The entity knowledge base comprises standard named entities and experience named entities classified into the standard named entities. The named entities with experience comprise named entities which are obtained from historical texts and classified into the standard named entities, namely some named entities with experience accumulation; standard named entities are regular named entities published by the authorities in the field.
Illustratively, the standard named entity is "Beijing university college of court", one experience named entity corresponding to the standard named entity is "Beijing university college of court", and the named entity to be recognized in the text to be recognized is "Beijing university college of court".
When the standard named entity to which the named entity to be recognized belongs in the text to be recognized is recognized, name matching can be performed on the named entity to be recognized and the experience named entities in the entity knowledge base, the similarity of the names of the two entities is calculated, N experience named entities are determined from a large number of experience named entities according to the similarity, the N experience named entities are reordered by utilizing a reordering algorithm of the following steps, and the experience named entity which meets the preset condition in the final arrangement is determined to be the experience named entity which is closest to the named entity to be recognized. Since the higher the similarity of names is, the closer the two named entities are, in general, the more N empirical named entities are determined from the large number of empirical named entities according to the similarity, the more N empirical named entities with the similarity between the first N empirical named entities can be determined from the large number of empirical named entities. Where N is a positive integer, a reasonable value may be determined empirically, and the reasonable value is to select a smaller N as much as possible, for example, 100, under the condition that the quality of the secondary sorting is ensured.
The entity repository includes, in addition to the name of the named entity, other characteristics of the named entity, such as a ratio of times the name of the standard named entity appears in the corresponding empirical named entity, and a semantic type of the named entity. The semantic types comprise manufacturer names, brand names, product specifications and the like which are used for assisting in describing the characteristics of the named entities. Before name matching is carried out, a query index of the entity knowledge base can be established, so that the name of the named entity can be quickly found from the entity knowledge base storing a large amount of other characteristic information by using the query index.
When the similarity of two entity names is calculated, matching of two nouns can be realized by adopting a unary matching mode, a binary matching mode, a triple matching mode, a longest common character string matching mode and the like, and the similarity can be calculated by using the following characteristics when matching is performed: N-Gram matching, edit distance on characters. In addition, the similarity may be a calculated similarity value, or may be a score determined according to a preset scoring rule and the similarity value.
102. Specific features of the candidate empirically named entities are obtained.
The specific features comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance features between the candidate named entity and other named entities.
The preset relevance characteristics of the named entity comprise at least one or a combination of the following items:
(1) and whether the name of the standard named entity corresponding to the candidate empirical named entity is completely the same as the name of the named entity to be identified or not.
When the named entity is identified, the named entity to be identified is finally classified into the standard named entity, so that when the named entity to be identified is completely the same as the standard named entity in name, the probability of classifying the named entity to be identified into the standard named entity is higher, and the characteristic can be increased to serve as one of the judgment bases.
(2) The names of the standard named entities corresponding to the candidate named entities are the frequency proportion of the names appearing in the corresponding named entities.
Since the entity knowledge base not only has the name of the named entity, but also has other characteristics except the name for describing the uniqueness of the named entity, when the entity knowledge base is established, the named entities with different standards may be caused by the same name and different characteristics of different named entities with different experience. In this case, candidate empirical named entities with the same name and different other characteristics may exist in candidate empirical named entities obtained by simply performing matching based on names, and in order to further determine which candidate empirical named entity the named entity to be identified is closer to, the name of the standard named entity corresponding to the candidate empirical named entity and the ratio of the number of times of occurrence in the corresponding empirical named entity may be used as a judgment basis.
If the preset relevance feature includes the name of the standard named entity corresponding to the candidate empirical named entity and the frequency proportion of the name appearing in the corresponding empirical named entity, the obtaining of the specific feature of the candidate empirical named entity includes: the names of the standard named entities corresponding to the candidate named entities with experience can be searched from the entity knowledge base, and the frequency proportion of the names appearing in the corresponding named entities with experience is obtained; or acquiring names of standard named entities corresponding to the candidate named entities, counting the number of the acquired named entities corresponding to the standard named entities and the number of the named entities identical to the acquired name of the standard named entity, and calculating the proportion of the number of times that the acquired name of the standard named entity appears in the corresponding named entities based on the counted number.
For example, if a standard named entity has a name A, which corresponds to 10 empirically named entities, of which 8 empirically named entities have a name A and two empirically named entities have a name B, then the standard named entity has a name that appears 80% of the times in the empirically named entity, and the other name B has a name B that appears 20% of the times; as another example, if another standard named entity has a name B that corresponds to 10 empirically named entities, where 7 empirically named entities have names B and 3 empirically named entities A, then the standard named entity's name appears 70% of the empirically named entities and the other name A appears 30% of the empirically named entities. Thus, when the name of a candidate empirically named entity is A, there is a greater likelihood that the named entity to be identified belongs to the first standard named entity.
(3) Similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified.
Since the attributes of a manufacturer, specification, brand, etc. may be different for a named entity, when the name of the named entity to be identified is similar to the name of the named entity under experience, it may also represent two different named entities. In order to identify more accurately, semantic types can be added as a judgment basis.
If the preset relevance features include the similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified, acquiring the specific features of the candidate named entity comprises the following steps: acquiring a text to be recognized, in which the named entity to be recognized is recorded; determining the semantic type of the named entity to be recognized by combining the text to be recognized; obtaining semantic types of the candidate empirically named entities from the entity knowledge base; and calculating the similarity between the semantic type of the candidate empirical named entity and the semantic type of the named entity to be identified.
When the semantic type of the named entity to be recognized is determined by combining the text to be recognized, the semantic type can be determined by combining the context in the text to be recognized. For example, for a drug, there may be multiple brands in hospital pharmacy, and when a doctor prescribes a drug, the doctor writes the brand in addition to the name of the drug itself, so that the semantic type is a certain brand according to the context on the drug order.
103. Based on the specific characteristics, the similarity between the candidate named entity and the named entity to be identified is recalculated, the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified is obtained, and the named entity to be identified is classified into the standard named entity corresponding to the candidate named entity meeting the preset conditions.
When the similarity between the candidate empirical named entity and the named entity to be identified is recalculated based on the specific features, there are a variety of algorithms, which are described as follows:
the first method comprises the following steps: assigning a preset weight to the specific feature; and performing weighted calculation based on the distributed weights to obtain the similarity of the recalculated candidate empirical named entity and the named entity to be identified.
Specifically, a relatively reasonable weight may be set for each specific feature according to historical experience, for example, according to historical experience, it is determined whether the feature that "whether the name of the standard named entity corresponding to the candidate empirical named entity is identical to the name of the named entity to be identified" is more important than other features, a higher weight is assigned to the feature, and an obtained final result is more accurate, so that a reasonable weight corresponding to the feature may be determined according to a large number of tests.
If the result of "whether the name of the standard named entity corresponding to the candidate named entity is identical to the name of the named entity to be identified" is yes ", the result may be set to 1, and if the result is no, the result may be set to 0. In this case, the weighting formula may be final similarity, i.e., preliminarily calculated similarity, weight 1+ (1 or 0), weight 2+ degree proportion, i.e., weight 3+ semantic type similarity, i.e., weight 3.
And the second method comprises the following steps: and inputting the specific features into a preset recognition model, and recalculating the similarity between the candidate empirical named entity and the named entity to be recognized.
The preset recognition model is a machine self-learning model trained according to historical texts. Including but not limited to linear regression models, support vector machine models. After the preset recognition model is obtained, the specific features of the named entity to be recognized can be directly input into the preset recognition model, and a final similarity result is directly and automatically output.
After the similarity between the N candidate named entities and the named entity to be identified is recalculated, the N candidate named entities can be reordered to obtain the candidate named entities meeting the preset conditions, and the standard named entity corresponding to the candidate named entities meeting the preset conditions is the standard named entity mapped by the named entity to be identified. The preset condition may be that the similarity is the highest, or may be that the similarity is greater than a preset threshold value. That is, after the N candidate empirical named entities are reordered, the candidate empirical named entity with the first degree of similarity may be used as the final required candidate empirical named entity, or one candidate empirical named entity of at least one candidate empirical named entity with the degree of similarity greater than a preset threshold may be used as the final required candidate empirical named entity.
The named entity recognition method provided by the embodiment of the invention can firstly carry out name matching on the named entity to be recognized and the experience named entities in the entity knowledge base, screen out N candidate experience named entities from all the experience named entities, thereby realizing rough recognition of the named entity from the own dimension of the name, and then determine a candidate experience named entity which is most matched with the named entity to be recognized from the N candidate experience named entities by combining other characteristics of the N candidate experience named entities, and classify the named entity to be recognized into the standard named entity corresponding to the candidate experience named entity which is selected finally, thereby realizing detailed recognition of the named entity from the own dimension of the non-name. That is, compared with the prior art that the named entity is only identified from the name dimension, the method and the device can comprehensively identify the named entity from the name dimension and the non-name dimension, and further improve the accuracy of the named entity classification.
Optionally, before the similarity is recalculated by using the preset recognition model to reorder the N candidate empirical named entities, the preset recognition model needs to be trained first, and the training method of the preset recognition model is explained as follows:
acquiring N candidate named entities determined based on the similarity between the named entities in the historical text and the named entities tested in the entity knowledge base and specific characteristics of the candidate named entities; performing model training on the specific characteristics of the candidate named entity after the label is added to obtain the preset recognition model; the label is used for indicating the standard named entity to which the named entity belongs in the history text.
The method comprises the steps that a history named text is a large number of texts, N candidate named entities (preferably the first N) are preliminarily determined for each named entity of each history named text, M (positive integers) specific features corresponding to each candidate named entity are obtained, tags used for indicating standard named entities to which the named entities in the history text belong are manually added to the M specific features in a whole mode, a certain machine self-learning algorithm is adopted, a computer is enabled to continuously learn and adjust the weight corresponding to each specific feature, and finally when the identification accuracy reaches a preset threshold value, a finally needed preset identification model is obtained.
In the above embodiments, the preset recognition model may be a linear regression model, a support vector machine model, or a self-learning model of other machines.
The following takes a linear regression model as an example, and specific parameters thereof are explained:
the optimization goal of the linear regression model is to minimize | Xw-Y |;
where X is a particular feature, Y is 0 or 1, and w is the weight of the particular feature. The recognition accuracy is highest when the value of w is the smallest, so that the preset recognition model is obtained.
In addition, in order to further improve the accuracy of the preset recognition model for recognizing the named entity, after the standard named entity corresponding to the named entity to be recognized is determined, the preset recognition model can be updated according to the named entity to be recognized and the corresponding standard named entity.
In a specific embodiment, after a certain number of results are recognized by using the preset recognition model, the preset recognition model may be updated based on the results.
The embodiment of the invention can be applied to named entities in various fields, and can realize accurate classification of the named entities as long as the named entity name diversity exists. As shown in fig. 2, the following describes the implementation process of the present invention by taking the medical field with diversified named entity names as an example:
201. and acquiring the medical named entity to be identified in the medical text to be identified.
The manner of acquiring the medical text to be recognized includes but is not limited to: receiving medical texts to be identified, which are input by a user; or receiving medical texts to be identified sent by other equipment; or acquiring the medical text to be identified from the database.
Ways to obtain the medical named entity to be identified include, but are not limited to: receiving a medical named entity to be identified input by a user (or sent by other equipment); or, extracting the medical named entity to be recognized from the medical text to be recognized by performing semantic analysis on the medical text to be recognized. For example, matching the characters of the medical text to be recognized with a preset medical word bank, and if the matching is successful, taking the successfully matched words as the medical named entities to be recognized.
202. And performing name matching on the medical named entity to be identified and the empirical medical named entity in the medical entity knowledge base to obtain the similarity between the medical named entity to be identified and the empirical medical named entity, and acquiring N candidate empirical medical named entities based on the similarity.
The named entities of the empirical medical treatment comprise named entities which are obtained from historical medical treatment texts and are classified into the standard medical treatment named entities; n is a positive integer. The medical named entities include drug names, disease names, symptom names, treatment mode names, and the like. The Medical entity knowledge base may be a Unified Medical Language System (UMLS). Medical texts include prescriptions, laboratory sheets, checklists, medical treatises, medical journals, and the like.
203. Specific characteristics of candidate empirical medical named entities are obtained.
The specific characteristics comprise the similarity between the candidate medical named entity and the medical named entity to be identified and preset relevance characteristics between the candidate medical named entity and other named entities.
204. Based on the specific characteristics, the similarity between the candidate named entities for medical treatment experience and the named entities for medical treatment to be identified is recalculated, candidate named entities for medical treatment experience with the similarity meeting preset conditions with the named entities for medical treatment to be identified are obtained, and the named entities for medical treatment to be identified are classified into standard named entities corresponding to the candidate named entities for medical treatment experience meeting the preset conditions.
The named entity identification method provided by the embodiment of the invention can firstly carry out name matching on the medical named entity to be identified and the empirical medical named entities in the medical entity knowledge base, screen out N candidate medical named entities from all the empirical medical named entities, realize rough identification on the medical named entity from the own dimension of the name, and then determine a candidate empirical medical named entity which is most matched with the medical named entity to be identified from the N candidate empirical medical named entities by combining other characteristics of the N candidate empirical medical named entities, classify the medical named entity to be identified into the standard medical named entity corresponding to the candidate empirical medical named entity which is selected finally, and realize detailed identification on the medical named entity from the own dimension of the non-name. That is, compared with the prior art that the medical named entity is only identified from the name dimension, the medical named entity identification method and the medical named entity identification device can comprehensively identify the medical named entity from the name dimension and the non-name dimension, and therefore accuracy of medical named entity classification is improved.
Further, according to the above method embodiment, another embodiment of the present invention further provides a method for identifying a named entity, as shown in fig. 3, where the method includes:
301. the method comprises the steps of collecting a text to be recognized, and obtaining a named entity to be recognized from the text to be recognized.
The manner of collecting the text to be recognized includes but is not limited to: receiving a text to be recognized input by a user; or, receiving texts to be recognized sent by other equipment; or, the text to be recognized is obtained from the database.
The manner of obtaining the named entity to be recognized from the text to be recognized includes, but is not limited to: receiving text to be recognized input by a user (or sent by other equipment); or, the named entity to be recognized is extracted from the text to be recognized by performing semantic analysis on the text to be recognized. For example, matching the characters of the text to be recognized with a preset word bank, and if the matching is successful, taking the successfully matched words as the named entities to be recognized.
302. And performing name matching on the named entity to be identified and the experience named entity in the entity knowledge base to obtain the similarity between the named entity to be identified and the experience named entity, and acquiring N candidate experience named entities based on the similarity.
The named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer.
303. Specific features of the candidate empirically named entities are obtained.
The specific features comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance features between the candidate named entity and other named entities.
304. Based on the specific characteristics, the similarity between the candidate named entity and the named entity to be identified is recalculated, the candidate named entity with the similarity meeting the preset conditions with the named entity to be identified is obtained, and the named entity to be identified is classified into the standard named entity corresponding to the candidate named entity meeting the preset conditions.
305. And outputting the corresponding relation between the named entity to be identified and the standard named entity according to the classification result.
When the corresponding relationship between the named entity to be recognized and the standard named entity is output, only the corresponding relationship between the named entity to be recognized and the standard named entity may be output, or the corresponding relationship between the named entity to be recognized and the standard named entity may be marked in the text to be recognized. There are various specific implementation manners for marking the corresponding relationship between the named entity to be identified and the standard named entity in the text to be identified. For example, as shown in fig. 4, when the text to be recognized is output after the named entity to be recognized is classified, a corresponding annotation may be added beside the named entity to be recognized to indicate the standard named entity corresponding to the annotation. As another example, as shown in fig. 5, a comment having a correspondence between the named entity to be recognized and the standard named entity may also be added around the text content to be recognized (as below).
The method for identifying the named entity provided by the embodiment of the invention not only can improve the accuracy of the classification of the named entity, but also can enable a user to quickly know the standard meaning of the named entity to be identified by outputting the corresponding relation between the named entity to be identified and the standard named entity to the user.
Further, according to the above method embodiment, another embodiment of the present invention further provides an apparatus for identifying a named entity, as shown in fig. 6, where the apparatus includes:
a name matching unit 41, configured to perform name matching on a named entity to be identified and an experienced named entity in an entity knowledge base, so as to obtain a similarity between the named entity to be identified and the experienced named entity;
a first obtaining unit 42, configured to obtain N candidate named entities based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
a second obtaining unit 43, configured to obtain a specific feature of the candidate empirically named entity; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
the calculating unit 44 is configured to recalculate the similarity between the candidate named entity after experience and the named entity to be identified based on the specific feature, and obtain a candidate named entity after experience, of which the similarity with the named entity to be identified meets a preset condition;
and the classifying unit 45 is configured to classify the named entity to be identified into the standard named entity corresponding to the candidate empirical named entity meeting the preset condition.
Optionally, the preset relevance feature acquired by the second acquiring unit 43 includes at least one or a combination of the following items:
whether the name of the standard named entity corresponding to the candidate empirical named entity is completely the same as the name of the named entity to be identified or not;
the names of the standard named entities corresponding to the candidate named entities are the frequency proportion of the names appearing in the corresponding named entities;
similarity between the semantic type of the candidate named entity and the semantic type of the named entity to be identified.
Optionally, the second obtaining unit 43 is configured to, if the preset relevance feature includes a name of a standard named entity corresponding to the candidate named entity, find a number of times ratio of the name of the standard named entity corresponding to the candidate named entity appearing in the corresponding named entity from the entity repository, and find a number of times ratio of the name of the standard named entity corresponding to the candidate named entity appearing in the corresponding named entity; or acquiring names of standard named entities corresponding to the candidate named entities, counting the number of the acquired named entities corresponding to the standard named entities and the number of the named entities identical to the acquired name of the standard named entity, and calculating the proportion of the number of times that the acquired name of the standard named entity appears in the corresponding named entities based on the counted number.
Optionally, the second obtaining unit 43 is configured to obtain, if the preset relevance feature includes a similarity between a semantic type of the candidate named entity after experience and a semantic type of the named entity to be recognized, a text to be recognized in which the named entity to be recognized is recorded; determining the semantic type of the named entity to be recognized by combining the text to be recognized; obtaining semantic types of the candidate empirically named entities from the entity knowledge base; and calculating the similarity between the semantic type of the candidate empirical named entity and the semantic type of the named entity to be identified.
Optionally, as shown in fig. 7, the calculating unit 44 includes:
an assigning module 441, configured to assign a preset weight to the specific feature;
a first calculating module 442, configured to perform a weighted calculation based on the assigned weights, so as to obtain a similarity between the recalculated candidate empirical named entity and the named entity to be identified.
Optionally, as shown in fig. 7, the calculating unit 44 includes:
a second calculating module 443, configured to input the specific feature into a preset recognition model, and recalculate the similarity between the candidate named entity after experience and the named entity to be recognized.
Optionally, as shown in fig. 7, the apparatus further includes:
a third obtaining unit 46, configured to obtain N candidate named entities obtained from experience and specific features of the candidate named entities, where the N candidate named entities are determined based on similarity between the named entities in the historical text and the named entities in the entity knowledge base;
the training unit 47 is configured to perform model training on the specific features of the candidate named entity after the label is added, so as to obtain the preset recognition model; the label is used for indicating the standard named entity to which the named entity belongs in the history text.
Optionally, as shown in fig. 7, the apparatus further includes:
and the updating unit 48 is configured to update the preset identification model according to the named entity to be identified and the corresponding standard named entity.
Optionally, the preset recognition model used by the second calculation module 443 includes a linear regression model;
the optimization goal of the linear regression model is to minimize | Xw-Y |;
where X is a particular feature, Y is 0 or 1, and w is the weight of the particular feature.
The named entity recognition device provided by the embodiment of the invention can firstly carry out name matching on a named entity to be recognized and the experience named entities in the entity knowledge base, screen out N candidate experience named entities from all the experience named entities, thereby realizing rough recognition of the named entity from the own dimension of the name, and then determine a candidate experience named entity which is most matched with the named entity to be recognized from the N candidate experience named entities by combining other characteristics of the N candidate experience named entities, and classify the named entity to be recognized into the standard named entity corresponding to the candidate experience named entity which is selected finally, thereby realizing detailed recognition of the named entity from the own dimension of the non-name. That is, compared with the prior art that the named entity is only identified from the name dimension, the method and the device can comprehensively identify the named entity from the name dimension and the non-name dimension, and further improve the accuracy of the named entity classification.
Further, according to the above method embodiment, another embodiment of the present invention further provides an apparatus for identifying a named entity, as shown in fig. 8, where the apparatus includes:
a first obtaining unit 51, configured to obtain a medical named entity to be identified in a medical text to be identified;
the name matching unit 52 is configured to perform name matching on the medical named entity to be identified and an empirical medical named entity in a medical entity knowledge base, so as to obtain similarity between the medical named entity to be identified and the empirical medical named entity;
a second obtaining unit 53, configured to obtain N candidate empirical medical named entities based on the similarity; the named entities of the empirical medical treatment comprise named entities which are obtained from historical medical treatment texts and classified into standard medical treatment named entities; n is a positive integer;
a third obtaining unit 54, configured to obtain a specific feature of the candidate empirical medical named entity; the specific characteristics comprise the similarity between the candidate medical named entity and the medical named entity to be identified and preset relevance characteristics between the candidate medical named entity and other named entities;
the calculating unit 55 is configured to recalculate the similarity between the candidate named medical entity under experience and the medical named entity to be identified based on the specific feature, and obtain a candidate named medical entity under experience, of which the similarity with the medical named entity to be identified meets a preset condition;
the classifying unit 56 is configured to classify the medical named entity to be identified into the standard medical named entity corresponding to the candidate empirical medical named entity meeting the preset condition.
Further, according to the above method embodiment, another embodiment of the present invention further provides an apparatus for identifying a named entity, as shown in fig. 9, where the apparatus includes:
the acquisition unit 61 is used for acquiring texts to be recognized;
a first obtaining unit 62, configured to obtain a named entity to be recognized from the text to be recognized;
a name matching unit 63, configured to perform name matching on the named entity to be identified and an experienced named entity in an entity knowledge base, so as to obtain a similarity between the named entity to be identified and the experienced named entity;
a second obtaining unit 64, configured to obtain N candidate named entities based on the similarity; the named entities comprise named entities which are obtained from historical texts and classified into standard named entities; n is a positive integer;
a third obtaining unit 65, configured to obtain a specific feature of the candidate empirically named entity; the specific characteristics comprise the similarity between the candidate named entity and the named entity to be identified and preset relevance characteristics between the candidate named entity and other named entities;
a calculating unit 66, configured to recalculate the similarity between the candidate named entity after experience and the named entity to be identified based on the specific feature, and obtain a candidate named entity after experience, of which the similarity with the named entity to be identified meets a preset condition;
the classifying unit 67 is configured to classify the named entity to be identified into a standard named entity corresponding to the candidate empirical named entity meeting the preset condition;
and the output unit 68 is used for outputting the corresponding relation between the named entity to be identified and the standard named entity according to the classification result.
Further, another embodiment of the present invention also provides a storage medium storing a plurality of instructions adapted to be loaded by a processor and to execute the method for identifying a named entity as described above.
According to the instruction stored in the storage medium provided by the embodiment of the invention, the named entity to be identified and the experience named entities in the entity knowledge base can be subjected to name matching, N candidate experience named entities are screened from all the experience named entities, so that the rough identification of the identified named entity is realized from the own dimension of the name, then a candidate experience named entity which is most matched with the named entity to be identified is determined from the N candidate experience named entities by combining other characteristics of the N candidate experience named entities, the named entity to be identified is classified into the standard named entity corresponding to the selected candidate experience named entity, and the detailed identification of the named entity is realized from the own dimension of the non-name. That is, compared with the prior art that the named entity is only identified from the name dimension, the method and the device can comprehensively identify the named entity from the name dimension and the non-name dimension, and further improve the accuracy of the named entity classification.
Further, another embodiment of the present invention also provides an electronic device including a storage medium and a processor;
the processor is suitable for realizing instructions;
the storage medium adapted to store a plurality of instructions;
the instructions are adapted to be loaded by the processor and to perform the method of identifying a named entity as described above.
The electronic equipment provided by the embodiment of the invention can firstly carry out name matching on the named entity to be identified and the experience named entities in the entity knowledge base, screen out N candidate experience named entities from all the experience named entities, thereby realizing rough identification on the identified named entity from the own dimension of the name, and then determine a candidate experience named entity which is most matched with the named entity to be identified from the N candidate experience named entities by combining other characteristics of the N candidate experience named entities, and classify the named entity to be identified into the standard named entity corresponding to the candidate experience named entity which is selected finally, thereby realizing detailed identification on the named entity from the own dimension of the non-name. That is, compared with the prior art that the named entity is only identified from the name dimension, the method and the device can comprehensively identify the named entity from the name dimension and the non-name dimension, and further improve the accuracy of the named entity classification.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references above to specific languages are provided for disclosure of enablement and practice of the present invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the named entity identification method and apparatus according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.