Multi-feature fusion supply chain management entity knowledge extraction method and system
1. A method for extracting knowledge of a supply chain management entity with multi-feature fusion is characterized by comprising the following steps:
converting a text sentence into character-level vector representation and radical-level vector representation based on a character embedding layer and a radical embedding layer obtained by pre-training;
merging the radical level feature vector and the character level feature vector and inputting the merged layer to obtain a local context feature vector;
based on a Bi-LSTM model, obtaining a context feature vector from the character-level feature vector, and inputting the context feature vector into a convolutional layer to obtain a context salient feature vector;
combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting after passing through a three-layer Bi-LSTM model to obtain a hidden layer vector;
and constructing a weight connection graph of the relation among the entities, extracting the node characteristics of the region, and performing entity prediction by combining the entities and the weight connection graph.
2. The method of claim 1, wherein converting the text sentence into a character-level vector representation and a radical-level vector representation based on a pre-trained character embedding layer and a radical embedding layer comprises:
for a text sentence, the Chinese character sequence is T1={s1,s2,…,snSi is a character in a text sentence; obtaining the characteristic vector representation c of the Chinese character sequence based on the character embedding layer obtained by pre-training1;
Extracting radicals of each Chinese character to form a radical sequence R1={t1,t2,…,tnWhere ti is a radical in a radical sequence; based on the pre-trained radical embedding layer, a feature vector representation R1 of a radical sequence R1 is obtained.
3. The method for knowledge extraction of a multi-feature fused supply chain management entity according to claim 2, wherein the obtaining context feature vectors from the character-level feature vectors based on the Bi-LSTM model comprises:
expressing the feature vector of the Chinese character sequence c1Inputting the three layers into a Dropout layer and a Bi-LSTM model in sequence;
combining the final states of the forward and backward outputs in accordance with the calculation formula of each cell in the LSTM to generate a context feature vector c2;
The formula for each cell in LSTM is as follows:
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf);
it=σ(Wxixt+Whiht-1+Wcict-1+bi);
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc);
ot=σ(Wxoxt+Whoht-1+Wcoct+b0);
ht=ottanh(ct);
wherein ft represents t time forgetting gate output, it represents t time input gate output, ct represents t time cell state, ot represents t time output gate output, xt represents t time input, ht represents t time hidden layer output, tanh represents hyperbolic tangent activation function, and W and b are both learnable parameters.
4. The method of claim 3, wherein the inputting the context feature vector into a convolutional layer to obtain a context salient feature vector comprises:
based on convolution operation, making the context feature vector c2Outputting context-salient feature vectors c by convolutional layers3;
The convolution operation is represented as:
wherein Wuv is the convolution kernel parameter, Xi-u+1,j-v+1To input data, YijTo output data.
5. The method of claim 4, wherein the merging the radical-level feature vector and the character-level feature vector and inputting the convolution layer to obtain a local context feature vector comprises:
the characteristic vector of the radical sequence R1 is represented as R1, and the characteristic vector of the Chinese character sequence is represented as c1Merging, outputting through Dropout and convolutional layer processing, and extracting to obtain a local context feature vector representation w 1.
6. The method of claim 5, wherein the combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting the combined result after passing through a three-layer Bi-LSTM model to obtain a hidden-layer vector comprises:
merging the context feature vector c2Context salient feature vector c3Combining with the local context feature vector representation w1, inputting a three-layer Bi-LSTM model, and outputting to obtain an entity hidden layer representation sequence E ═ { E ═1,e2,…en};
The input three-layer Bi-LSTM model comprises:
a Dropout layer is added in front of each layer of Bi-LSTM model to prevent overfitting.
7. The method for extracting knowledge of supply chain management entities with multi-feature fusion as claimed in claim 1, wherein the constructing a weight connection graph of relationships among the entities, extracting regional node features, and performing entity prediction by combining the entities and the weight connection graph comprises:
respectively constructing a relation weight connection graph for each relation among the entities;
using characters as nodes and using the relationship between the characters as an adjacency matrix to construct a graph structure;
extracting hidden layer characteristics of the region nodes based on the Bi-GCN;
the expression of Bi-GCN is as follows:
wherein A is a relational adjacency matrix, l is the number of layers,for the implicit layer vector representation of node v at layer l, WlAnd blA learnable parameter representing the l-th layer; tanh represents a hyperbolic tangent activation function;
respectively substituting the extracted hidden layer characteristics into each relation weight connection diagram, and extracting to obtain hidden layer vector representation of each relation among each entity based on weighted Bi-GCN, wherein the expression of the weighted Bi-GCN is as follows:
wherein l is the number of layers,is a hidden layer vector representation of the node ei at the layer I of the GCN, Pr(eiV) represents the probability of the node ei and V when the relation is R, the weight and the bias of GCN when the relation is R when Wr and br are both, V is the set of all characters in the sentence, and R contains all the relations;
and expressing the obtained hidden layer vector, performing entity prediction by using CRF, and obtaining a loss value eloss by using a classification loss function.
8. A system for multi-feature fused supply chain management entity knowledge extraction, characterized in that the method for implementing multi-feature fused supply chain management entity knowledge extraction according to any one of claims 1 to 7 comprises:
a vector acquisition module to:
converting a text sentence into character-level vector representation and radical-level vector representation based on a character embedding layer and a radical embedding layer obtained by pre-training; merging the radical level feature vector and the character level feature vector and inputting the merged layer to obtain a local context feature vector;
based on a Bi-LSTM model, obtaining a context feature vector from the character-level feature vector, and inputting the context feature vector into a convolutional layer to obtain a context salient feature vector;
combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting after passing through a three-layer Bi-LSTM model to obtain a hidden layer vector;
and the prediction module is used for constructing a weight connection graph of the relationship among the entities, extracting the node characteristics of the region and performing entity prediction by combining the entities and the weight connection graph.
9. The system for multi-feature fused supply chain management entity knowledge extraction according to claim 8, wherein the vector acquisition module is configured to:
for a text sentence, the Chinese character sequence is T1={s1,s2,…,snSi is a character in a text sentence; obtaining the characteristic vector representation c of the Chinese character sequence based on the character embedding layer obtained by pre-training1;
Extracting radicals of each Chinese character to form a radical sequence R1={t1,t2,…,tnWhere ti is a radical in a radical sequence; acquiring a characteristic vector representation R1 of a radical sequence R1 based on a radical embedding layer obtained by pre-training;
expressing the feature vector of the Chinese character sequence c1Sequentially input into Dropout layer and Bi-LSTMIn the model;
combining the final states of the forward and backward outputs in accordance with the calculation formula of each cell in the LSTM to generate a context feature vector c2;
The formula for each cell in LSTM is as follows:
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf);
it=σ(Wxixt+Whiht-1+Wcict-1+bi);
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc);
ot=σ(Wxoxt+Whoht-1+Wcoct+b0);
ht=ottanh(ct);
ft represents t moment forgetting gate output, it represents t moment input gate output, ct represents t moment cell state, ot represents t moment output gate output, xt represents t moment input, ht represents t moment hidden layer output, tanh represents hyperbolic tangent activation function, and W and b are both learnable parameters;
based on convolution operation, making the context feature vector c2Outputting context-salient feature vectors c by convolutional layers3;
The convolution operation is represented as:
wherein Wuv is the convolution kernel parameter, Xi-u+1,j-v+1To input data, YijIs output data;
the characteristic vector of the radical sequence R1 is represented as R1, and the characteristic vector of the Chinese character sequence is represented as c1Merging, processing by Dropout and convolution layer, outputtingLine extraction to obtain a local context feature vector representation w 1;
merging the context feature vector c2Context salient feature vector c3Combining with the local context feature vector representation w1, inputting a three-layer Bi-LSTM model, and outputting to obtain an entity hidden layer representation sequence E ═ { E ═1,e2,…en}。
10. The system for multi-feature fused supply chain management entity knowledge extraction according to claim 9, wherein the prediction module is configured to:
respectively constructing a relation weight connection graph for each relation among the entities;
using characters as nodes and using the relationship between the characters as an adjacency matrix to construct a graph structure;
extracting hidden layer characteristics of the region nodes based on the Bi-GCN;
the expression of Bi-GCN is as follows:
wherein A is a relational adjacency matrix, l is the number of layers,for the implicit layer vector representation of node v at layer l, WlAnd blA learnable parameter representing the l-th layer; tanh represents a hyperbolic tangent activation function;
respectively substituting the extracted hidden layer characteristics into each relation weight connection diagram, and extracting to obtain hidden layer vector representation of each relation among each entity based on weighted Bi-GCN, wherein the expression of the weighted Bi-GCN is as follows:
wherein l is the number of layers,is a hidden layer vector representation of the node ei at the layer I of the GCN, Pr(eiV) represents the probability of the node ei and V when the relation is R, the weight and the bias of GCN when the relation is R when Wr and br are both, V is the set of all characters in the sentence, and R contains all the relations;
and expressing the obtained hidden layer vector, performing entity prediction by using CRF, and obtaining a loss value eloss by using a classification loss function.
Background
At present, domestic supply chain management mainly depends on manpower management, but because tasks are often too huge and knowledge is various, errors are easy to occur. In order to solve the problem, a supply chain management knowledge base needs to be established to assist management, and one of the key technologies required by knowledge base construction is knowledge extraction. Because the supply chain management knowledge base required in China is a Chinese knowledge base, compared with English, the unit vocabulary of Chinese has fuzzy boundaries, complex structure and various expression forms, and potential wrong vocabulary can cause the problem of interference recognition, which makes the Chinese knowledge extraction more difficult. Compared with the public data set, the corpus scale of the supply chain management field is smaller, but the professional terms are more, the knowledge structure is more complex, and the common knowledge extraction method cannot achieve good effect. How to extract knowledge more effectively for corpus in the field of supply chain management becomes one of the difficulties in knowledge base construction.
Disclosure of Invention
The invention provides a multi-feature fusion supply chain management entity knowledge extraction method and system, and solves the problems that the corpus scale in the field of supply chain management is small, but the professional terms are more, the knowledge structure is more complex, and a common knowledge extraction method cannot achieve a good effect.
In order to solve the above problems, the present invention provides a method for performing multi-feature extraction on a corpus in the field of supply chain management, and combining multiple features to achieve a better knowledge extraction effect. The invention extracts radical level characteristics through a radical embedding layer, combines the radical characteristics with character characteristics, and inputs the combined radical characteristics and character characteristics into CNN to extract local context characteristics. Chinese characters are pictographic characters, so similar characters often contain similar meanings, and the similarity is often reflected in the aspect of radicals. The use of radical features helps to identify characters that appear only in the test set, but not in the training set, improving generalization. Local context features are also important in supply chain management domain knowledge extraction. For example, "vendor selection" is an entity that frequently appears in a corpus, and "vendor" plays a decisive role in that "selection" is a noun rather than a verb, which proves the importance of extracting local context features. The invention enables the characters to capture long-distance dependence information by inputting the character characteristics into the Bi-LSTM to extract the context characteristics. The context salient features are extracted by inputting the context features into the CNN, so that the local context information and the long-distance dependence information are combined. The invention combines the local context feature, the context feature and the context salient feature and inputs the combined features into the stacked Bi-LSTM, extracts the global context feature and better fuses the three features together. The invention inputs the hidden layer vector output by the stacked Bi-LSTM into the Bi-GCN, and the Bi-GCN is used for coding entity relation information in the corpus and constructing a weight connection graph of the relation among all entities, thereby obtaining an entity relation adjacent matrix, extracting regional node characteristics and updating global context characteristics. And finally, outputting an entity prediction result through the CRF.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for extracting knowledge of a multi-feature fused supply chain management entity comprises the following steps:
converting a text sentence into character-level vector representation and radical-level vector representation based on a character embedding layer and a radical embedding layer obtained by pre-training; merging the radical level feature vector and the character level feature vector and inputting the merged layer to obtain a local context feature vector;
based on a Bi-LSTM (Bidirectional Long short term Memory) model, acquiring a context feature vector from the character-level feature vector, and inputting the context feature vector into a convolutional layer to acquire a context salient feature vector;
combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting a hidden layer vector through a three-layer Bi-LSTM model;
and constructing a weight connection graph of the relation among the entities, extracting the node characteristics of the region, and performing entity prediction by combining the entities and the weight connection graph.
Optionally, converting the text sentence into a character-level vector representation and a radical-level vector representation based on the pre-trained character embedding layer and the radical embedding layer, including:
for a text sentence, the Chinese character sequence is T1={s1,s2,…,snSi is a character in a text sentence; obtaining the characteristic vector representation c of the Chinese character sequence based on the character embedding layer obtained by pre-training1;
Extracting radicals of each Chinese character to form a radical sequence R1={t1,t2,…,tnWhere ti is a radical in a radical sequence; based on the pre-trained radical embedding layer, a feature vector representation R1 of a radical sequence R1 is obtained.
Optionally, the obtaining a context feature vector from the character-level feature vector based on the Bi-LSTM model includes:
expressing the feature vector of the Chinese character sequence c1Input into Dropout layer and Bi-LSTM model in sequence;
combining the final states of the forward and backward outputs in accordance with the calculation formula of each cell in the LSTM to generate a context feature vector c2;
The formula for each cell in LSTM is as follows:
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf);
it=σ(Wxixt+Whiht-1+Wcict-1+bi);
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc);
ot=σ(Wxoxt+Whoht-1+Wcoct+b0);
ht=ottanh(ct);
wherein ft represents t time forgetting gate output, it represents t time input gate output, ct represents t time cell state, ot represents t time output gate output, xt represents t time input, ht represents t time hidden layer output, tanh represents hyperbolic tangent activation function, and W and b are both learnable parameters.
Optionally, the inputting the context feature vector into a convolutional layer to obtain a context salient feature vector includes:
based on convolution operation, making the context feature vector c2Outputting context-salient feature vectors c by convolutional layers3;
The convolution operation is represented as:
wherein Wuv is the convolution kernel parameter, Xi-u+1,j-v+1To input data, YijTo output data.
Optionally, the merging the radical-level feature vector and the character-level feature vector and inputting the convolution layer to obtain a local context feature vector includes:
the characteristic vector of the radical sequence R1 is represented as R1, and the characteristic vector of the Chinese character sequence is represented as c1Merging, outputting through Dropout and convolutional layer processing, and extracting to obtain a local context feature vector representation w 1.
Optionally, the merging the context feature vector, the local context feature vector, and the context salient feature vector, and outputting a hidden layer vector through a three-layer Bi-LSTM model includes:
merging the context feature vector c2Context salient feature vector c3And the local context feature vector representation w1Inputting three layers of Bi-LSTM models, and outputting to obtain an entity hidden layer representation sequence E ═ E { (E)1,e2,…en};
The input three-layer Bi-LSTM model comprises:
a Dropout layer is added in front of each layer of Bi-LSTM model to prevent overfitting.
Optionally, the constructing a weight connection graph of relationships among the entities, extracting regional node features, and performing entity prediction by combining the entities and the weight connection graph includes:
respectively constructing a relation weight connection graph for each relation among the entities;
using characters as nodes and using the relationship between the characters as an adjacency matrix to construct a graph structure;
extracting hidden layer characteristics of the region nodes based on the Bi-GCN;
the expression of Bi-GCN is as follows:
wherein A is a relational adjacency matrix, l is the number of layers,for the implicit layer vector representation of node v at layer l, WlAnd blA learnable parameter representing the l-th layer; tanh represents a hyperbolic tangent activation function;
respectively substituting the extracted hidden layer characteristics into each relation weight connection diagram, and extracting to obtain hidden layer vector representation of each relation among each entity based on weighted Bi-GCN, wherein the expression of the weighted Bi-GCN is as follows:
wherein l is the number of layers,is a hidden layer vector representation of the node ei at the layer I of the GCN, Pr(eiV) represents the probability of the node ei and V when the relation is R, the weight and the bias of GCN when the relation is R when Wr and br are both, V is the set of all characters in the sentence, and R contains all the relations;
and expressing the obtained hidden layer vector, performing entity prediction by using CRF, and obtaining a loss value eloss by using a classification loss function.
The invention also provides a system for extracting knowledge of a supply chain management entity with multi-feature fusion, which is used for realizing the method for extracting knowledge of the supply chain management knowledge field data set, and comprises the following steps:
a vector acquisition module to:
converting a text sentence into character-level vector representation and radical-level vector representation based on a character embedding layer and a radical embedding layer obtained by pre-training; merging the radical level feature vector and the character level feature vector and inputting the merged layer to obtain a local context feature vector;
based on a Bi-LSTM model, obtaining a context feature vector from the character-level feature vector, and inputting the context feature vector into a convolutional layer to obtain a context salient feature vector;
combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting a hidden layer vector through a three-layer Bi-LSTM model;
and the prediction module is used for constructing a weight connection graph of the relationship among the entities, extracting the node characteristics of the region and performing entity prediction by combining the entities and the weight connection graph.
Optionally, the vector obtaining module is configured to:
for a text sentence, the Chinese character sequence is T1={s1,s2,…,snSi is a character in a text sentence; obtaining the characteristic vector representation c of the Chinese character sequence based on the character embedding layer obtained by pre-training1;
Extracting radicals of each Chinese character to form a radical sequence R1={t1,t2,…,tnWhere ti is a radical in a radical sequence; acquiring a characteristic vector representation R1 of a radical sequence R1 based on a radical embedding layer obtained by pre-training;
expressing the feature vector of the Chinese character sequence c1Inputting the three layers into a Dropout layer and a Bi-LSTM model in sequence;
combining the final states of the forward and backward outputs in accordance with the calculation formula of each cell in the LSTM to generate a context feature vector c2;
The formula for each cell in LSTM is as follows:
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf);
it=σ(Wxixt+Whiht-1+Wcict-1+bi);
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc);
ot=σ(Wxoxt+Whoht-1+Wcoct+b0);
ht=ottanh(ct);
ft represents t moment forgetting gate output, it represents t moment input gate output, ct represents t moment cell state, ot represents t moment output gate output, xt represents t moment input, ht represents t moment hidden layer output, tanh represents hyperbolic tangent activation function, and W and b are both learnable parameters;
based on convolution operationsThe context feature vector c2Outputting context-salient feature vectors c by convolutional layers3;
The convolution operation is represented as:
wherein Wuv is the convolution kernel parameter, Xi-u+1,j-v+1To input data, YijIs output data;
the characteristic vector of the radical sequence R1 is represented as R1, and the characteristic vector of the Chinese character sequence is represented as c1Merging, processing through Dropout and convolution layer, extracting after outputting, and obtaining a local context feature vector representation w 1;
merging the context feature vector c2Context salient feature vector c3Combining with the local context feature vector representation w1, inputting a three-layer Bi-LSTM model, and outputting to obtain an entity hidden layer representation sequence E ═ { E ═1,e2,…en}。
Optionally, the prediction module is to:
respectively constructing a relation weight connection graph for each relation among the entities;
using characters as nodes and using the relationship between the characters as an adjacency matrix to construct a graph structure;
extracting hidden layer characteristics of the region nodes based on the Bi-GCN;
the expression of Bi-GCN is as follows:
wherein A is a relational adjacency matrix, l is the number of layers,for the implicit layer vector representation of node v at layer l, WlAnd blA learnable parameter representing the l-th layer; tanh represents a hyperbolic tangent activation function;
respectively substituting the extracted hidden layer characteristics into each relation weight connection diagram, and extracting to obtain hidden layer vector representation of each relation among each entity based on weighted Bi-GCN, wherein the expression of the weighted Bi-GCN is as follows:
wherein l is the number of layers,is a hidden layer vector representation of the node ei at the layer I of the GCN, Pr(eiV) represents the probability of the node ei and V when the relation is R, the weight and the bias of GCN when the relation is R when Wr and br are both, V is the set of all characters in the sentence, and R contains all the relations;
and expressing the obtained hidden layer vector, performing entity prediction by using CRF, and obtaining a loss value eloss by using a classification loss function.
Compared with the prior art, the invention has the following beneficial effects:
by means of extraction of the radical features, vector representation of each Chinese character in a text sentence can be obtained by combining three features of a context feature vector, a local context feature vector and a context salient feature vector, relation weight propagation is further considered on the basis of a relation weight connection diagram, and more sufficient features are provided for each character; based on the method, the semantic reasoning can be better carried out on the Chinese characters which do not appear in the training set but are not trained, so that the difficulty of knowledge extraction on the data set in the supply chain management knowledge field is reduced, the knowledge extraction effect is improved, and the expected effect of the knowledge extraction on the data set in the field can be achieved.
Drawings
FIG. 1 is a flow chart of a method for extracting knowledge of a multi-feature converged supply chain management entity provided by the present invention;
FIG. 2 is a schematic model diagram of a method for extracting knowledge of a multi-feature fused supply chain management entity according to the present invention;
FIG. 3 is a block diagram of Stack Bi-LSTM in FIG. 2;
FIG. 4 is a flowchart illustrating step S1 of the method for extracting knowledge of a supply chain management entity with multi-feature fusion according to the present invention;
FIG. 5 is a flowchart illustrating step S2 of the method for extracting knowledge of a supply chain management entity with multi-feature fusion according to the present invention;
FIG. 6 is a flowchart illustrating step S3 of the method for extracting knowledge of a supply chain management entity with multi-feature fusion according to the present invention;
FIG. 7 is a flowchart illustrating step S5 of the method for extracting knowledge of a supply chain management entity with multi-feature fusion according to the present invention;
FIG. 8 is a block diagram of a system for knowledge extraction of a multi-feature converged supply chain management entity according to the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to facilitate understanding of the technical solution of the present invention, a brief introduction is made to the application scenario of the present invention:
chinese characters are pictographic characters, so similar characters often contain similar meanings, and the similarity is often reflected in the aspect of radicals. The use of radical features helps to identify characters that appear only in the test set, but not in the training set, improving generalization. Local context features are also important in supply chain management domain knowledge extraction. For example, "vendor selection" is an entity that frequently appears in a corpus, and "vendor" plays a decisive role in that "selection" is a noun rather than a verb, which proves the importance of extracting local context features.
Based on the method, the character features are input into the Bi-LSTM to extract the context features, so that the characters can capture long-distance dependence information; extracting context salient features by inputting the context features into the CNN, thereby combining local context information and long-distance dependency information; in addition, the local context feature, the context feature and the context salient feature are combined and then input into the stacked Bi-LSTM, the global context feature is extracted, and the three features are better fused together; the invention also inputs the hidden layer vector output by the stacked Bi-LSTM into the Bi-GCN, and the Bi-GCN is used for coding entity relation information in the corpus and constructing a weight connection diagram of the relation among all entities, thereby obtaining an entity relation adjacent matrix, extracting regional node characteristics and updating global context characteristics. And finally, outputting an entity prediction result through the CRF.
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
Referring to fig. 1 to fig. 3, an embodiment of the present invention provides a method for extracting knowledge of a supply chain management entity with multi-feature fusion, including the following steps:
s1, converting the text sentence into character-level vector representation and radical-level vector representation based on the character embedding layer and the radical embedding layer obtained by pre-training;
s2, merging the radical level feature vector and the character level feature vector and inputting the convolution layer to obtain a local context feature vector;
s3, acquiring context feature vectors from the character-level feature vectors based on the Bi-LSTM model; inputting the context feature vector into the convolution layer to obtain a context salient feature vector;
s4, combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting a hidden layer vector through a three-layer Bi-LSTM model;
s5, constructing a weight connection graph of the relation among the entities, extracting the regional node characteristics, and performing entity prediction by combining the entities and the weight connection graph.
Referring to fig. 4, specifically, in step S1, converting the text sentence into a character-level vector representation and a radical-level vector representation based on the pre-trained character embedding layer and radical embedding layer includes:
s11, for a text sentence, the Chinese character sequence is T1={s1,s2,…,snSi is a character in a text sentence; obtaining the characteristic vector representation c of the Chinese character sequence based on the character embedding layer obtained by pre-training1;
S12, extracting the radicals of each Chinese character to form a radical sequence R1={t1,t2,…,tnWhere ti is a radical in a radical sequence; based on the pre-trained radical embedding layer, a feature vector representation R1 of a radical sequence R1 is obtained.
In step S11, the sentence inputted by the user is first converted into the feature vector representation c of the kanji character sequence by the character embedding layer1Feature vector representation c based on the kanji character sequence1Character features of a sentence input by a user can be extracted.
In step S12, the sentence input by the user is converted into the feature vector representation R1 of the radical sequence R1 by the radical embedding layer, and the radical feature of the sentence input by the user can be extracted based on the feature vector representation R1 of the radical sequence R1.
Referring to fig. 5, in step S2, merging the radical-level feature vector and the character-level feature vector and inputting the convolution layer to obtain a local context feature vector, includes:
s21, representing the characteristic vector of the radical sequence R1 as R1, the character of the Chinese character sequenceEigenvector representation c1Merging;
and S22, extracting after output through Dropout and convolutional layer processing, and obtaining a local context feature vector representation w 1.
Among them, Dropout means that in the training process of the deep learning network, the neural network unit is temporarily discarded from the network according to a certain probability. Because of the temporary and random discard, each mini-batch is training a different net for random gradient descent. In each training batch of Dropout, the overfitting phenomenon can be significantly reduced by ignoring half of the feature detectors (letting half of the hidden layer node values be 0). This approach can reduce the interaction between feature detectors (hidden nodes) that some detectors rely on others to function.
It will be appreciated that Dropout causes the activation value of a neuron to stop working with a certain probability p while propagating forward, which makes the model more generalized since it is less dependent on some local features.
Referring to fig. 6, in step S3, obtaining a context feature vector from the character-level feature vector based on the Bi-LSTM model includes:
s31, representing the characteristic vector of the Chinese character sequence by c1Inputting the three layers into a Dropout layer and a Bi-LSTM model in sequence;
s32, combining the final states of the two directions of the forward and backward output according to the calculation formula of each cell in the LSTM (long short term memory neural network), and generating the context feature vector c2;
The formula for each cell in LSTM is as follows:
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf);
it=σ(Wxixt+Whiht-1+Wcict-1+bi);
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc);
ot=σ(Wxoxt+Whoht-1+Wcoct+b0);
ht=ottanh(ct);
wherein ft represents t time forgetting gate output, it represents t time input gate output, ct represents t time cell state, ot represents t time output gate output, xt represents t time input, ht represents t time hidden layer output, tanh represents hyperbolic tangent activation function, and W and b are both learnable parameters.
In the foregoing steps, the LSTM is used to learn the feature vectors of the kanji character sequences of the text sentences input by the user, so as to generate context feature vectors, which can replace manual work to extract features, and the feature extraction of the text sentences can better meet the current semantics, so that the problem of knowledge extraction in different fields can be solved.
In step S3, inputting the context feature vector into the convolutional layer to obtain a context salient feature vector, including:
s33, based on convolution operation, making context feature vector c2Outputting context-salient feature vectors c by convolutional layers3;
Convolution operation exists in a convolutional neural network in a large quantity, and is the multiplication and addition operation of input data and convolution kernel parameters, and the convolution operation is expressed as:
wherein Wuv is the convolution kernel parameter, Xi-u+1,j-v+1To input data, YijTo output data.
Referring to fig. 7, in step S4, merging the context feature vector, the local context feature vector, and the context salient feature vector, and outputting a hidden layer vector through a three-layer Bi-LSTM model includes:
s41, combiningContext feature vector c2Context salient feature vector c3Merged with the local context feature vector representation w 1;
s42, inputting a three-layer Bi-LSTM model, and outputting to obtain an entity hidden layer representation sequence E ═ E { (E) }1,e2,…en};
Inputting a three-layer Bi-LSTM model comprising:
a Dropout layer is added in front of each layer of Bi-LSTM model to prevent overfitting.
Context salient feature vectors extracted by using convolutional neural network are used as follow-up context feature vectors c2Context salient feature vector c3And local context feature vectors are carried out and are transmitted into a Bi-LSTM model training instrument to lay a foundation, and the acquisition of sentence information features implied between characters of text sentences can be realized.
In step S5, constructing a weight connection graph of the relationship between the entities, extracting the regional node features, and performing entity prediction by combining the entities and the weight connection graph, includes:
s51, constructing a relation weight connection graph for each relation among the entities;
specifically, assuming there are k relationships, there are k weight connection graphs.
S52, constructing a graph structure by taking the characters as nodes and the relationship between the characters as an adjacency matrix;
s53, extracting hidden layer characteristics of the region nodes based on the Bi-GCN;
in the step, a head entity and a tail entity exist in order to consider the entity relationship, so that the hidden layer characteristics of the region nodes are extracted based on the Bi-GCN.
The expression of Bi-GCN is as follows:
wherein A is a relational adjacency matrix, l is the number of layers,for the implicit layer vector representation of node v at layer l, WlAnd blA learnable parameter representing the l-th layer; tanh represents a hyperbolic tangent activation function;
s54, substituting the extracted hidden layer characteristics into each relation weight connection diagram respectively, and extracting to obtain hidden layer vector representation of each relation among the entities based on the weighted Bi-GCN;
the expression for weighted Bi-GCN is as follows:
wherein l is the number of layers,is a hidden layer vector representation of the node ei at the layer I of the GCN, Pr(eiV) represents the probability of the nodes ei and V when the relation is R, the weight and bias of the GCN when the relation is R when Wr and br are, V is the set of all characters in the sentence, and R contains all relations.
And S55, expressing the obtained hidden layer vector, performing entity prediction through CRF, and obtaining a loss value eloss by using a classification loss function.
A CRF (Conditional Random Field) is a Conditional probability distribution model of a set of output Random variables given a set of input Random variables, characterized by the assumption that the output Random variables constitute a Markov Random Field.
The loss function is used to measure the degree of disagreement between the predicted value and the true value of the model. If the loss function is very small, the machine learning model is very close to the real distribution of data, and the model performance is good; if the loss function is large, the difference between the machine learning model and the real distribution of the data is large, and the performance of the model is poor. And (4) utilizing the loss value eloss obtained by the classification loss function to realize the accuracy detection of the prediction result.
The embodiment of the invention can better deduce the semantics of the Chinese characters which do not appear in the training set but appear in the testing set by means of the extracted radical characteristics. By combining the three features of the context feature vector, the local context feature vector and the context salient feature vector, the vector representation of each Chinese character in the text can be obtained. The relationship weight connection graph can further consider relationship weight propagation, and provides more sufficient characteristics for each character.
Referring to fig. 8, based on the foregoing embodiment, the present invention further provides a system for extracting knowledge of a supply chain management entity with multi-feature fusion, which is used to implement the above method for extracting knowledge of a supply chain management knowledge domain data set, and includes:
a vector obtaining module 10, configured to:
converting a text sentence into character-level vector representation and radical-level vector representation based on a character embedding layer and a radical embedding layer obtained by pre-training; merging the radical level feature vector and the character level feature vector and inputting the merged layer to obtain a local context feature vector;
based on a Bi-LSTM model, acquiring a context feature vector from the character-level feature vector, and inputting the context feature vector into the convolutional layer to acquire a context salient feature vector;
combining the context feature vector, the local context feature vector and the context salient feature vector, and outputting a hidden layer vector through a three-layer Bi-LSTM model;
and the prediction module 20 is configured to construct a weight connection graph of the relationship between the entities, extract the regional node features, and perform entity prediction by combining the entities and the weight connection graph.
Specifically, the vector obtaining module 10 is configured to:
for a text sentence, the Chinese character sequence is T1={s1,s2,…,snSi is a character in a text sentence; obtaining the characteristic vector representation c of the Chinese character sequence based on the character embedding layer obtained by pre-training1;
Extracting radicals of each Chinese character to form a radical sequence R1={t1,t2,…,tnWhere ti is a radical in a radical sequence; acquiring a characteristic vector representation R1 of a radical sequence R1 based on a radical embedding layer obtained by pre-training;
expressing the feature vector of the Chinese character sequence c1Inputting the three layers into a Dropout layer and a Bi-LSTM model in sequence;
combining the final states of the forward and backward outputs in accordance with the calculation formula of each cell in the LSTM to generate a context feature vector c2;
The formula for each cell in LSTM is as follows:
ft=σ(Wxfxt+Whfht-1+Wcfct-1+bf);
it=σ(Wxixt+Whiht-1+Wcict-1+bi);
ct=ftct-1+ittanh(Wxcxt+Whcht-1+bc);
ot=σ(Wxoxt+Whoht-1+Wcoct+b0);
ht=ottanh(ct);
ft represents t moment forgetting gate output, it represents t moment input gate output, ct represents t moment cell state, ot represents t moment output gate output, xt represents t moment input, ht represents t moment hidden layer output, tanh represents hyperbolic tangent activation function, and W and b are both learnable parameters;
context feature vector c based on convolution operation2Outputting context-salient feature vectors c by convolutional layers3;
The convolution operation is represented as:
wherein Wuv is the convolution kernel parameter, Xi-u+1,j-v+1To input data, YijIs output data;
the characteristic vector of the radical sequence R1 is represented as R1, and the characteristic vector of the Chinese character sequence is represented as c1Merging, processing through Dropout and convolution layer, extracting after outputting, and obtaining a local context feature vector representation w 1;
merging context feature vectors c2Context salient feature vector c3Combining with the local context feature vector representation w1, inputting a three-layer Bi-LSTM model, and outputting to obtain an entity hidden layer representation sequence E ═ { E ═1,e2,…en}。
Further, in this embodiment, the prediction module 20 is configured to:
respectively constructing a relation weight connection graph for each relation among the entities;
using characters as nodes and using the relationship between the characters as an adjacency matrix to construct a graph structure;
extracting hidden layer characteristics of the region nodes based on the Bi-GCN;
the expression of Bi-GCN is as follows:
wherein A is a relational adjacency matrixAnd l is the number of layers,for the implicit layer vector representation of node v at layer l, WlAnd blA learnable parameter representing the l-th layer; tanh represents a hyperbolic tangent activation function;
respectively substituting the extracted hidden layer characteristics into each relation weight connection diagram, extracting and obtaining hidden layer vector representation of each relation among each entity based on the weighted Bi-GCN, wherein the expression of the weighted Bi-GCN is as follows:
wherein l is the number of layers,is a hidden layer vector representation of the node ei at the layer I of the GCN, Pr(eiV) represents the probability of the node ei and V when the relation is R, the weight and the bias of GCN when the relation is R when Wr and br are both, V is the set of all characters in the sentence, and R contains all the relations;
and expressing the obtained hidden layer vector, performing entity prediction by using CRF, and obtaining a loss value eloss by using a classification loss function.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种针对中文文本的智能标注方法及系统