Semantic matching method, apparatus and medium
1. A semantic matching method, comprising:
performing word segmentation and splicing processing on the input first text and the input second text to obtain a first word sequence;
providing the first word sequence to an embedded network, and converting the first word sequence into a first word vector through the embedded network;
providing the first word vector to a transformation network, wherein the transformation network further comprises first to nth transformation layers connected in series, where N is an integer greater than 1, the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as their input vectors, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performs feature extraction on the input vector and generates a feature vector, and each transformation layer has a classification network corresponding thereto; and
starting from a first conversion layer, performing the following operations layer by layer until semantic matching results of the first text and the second text are generated:
providing the feature vectors generated by the transform layer to the corresponding classification network;
generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer;
and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
2. The method of claim 1, wherein the semantic matching prediction result comprises a probability value indicating whether the first and second texts match, and the predetermined condition comprises: the probability value is greater than a predetermined threshold.
3. The method of claim 1, wherein in case that the semantic matching prediction result of the classification network corresponding to the ith transformation layer satisfies a predetermined condition, stopping the operations of other transformation layers and their corresponding classification networks in the transformation network, wherein i is an integer greater than or equal to 1.
4. The method of claim 1, wherein each of the plurality of classification networks comprises: a full connection layer, a classification conversion layer and a normalization layer, and
wherein the classification network generates semantic matching predictors based on the feature vectors it receives, comprising:
the full connection layer receives the feature vectors output by the conversion layer corresponding to the classification network, and converts the feature vectors into feature vectors with dimensions corresponding to the category number of the semantic matching prediction result by the full connection layer;
providing the feature vectors output by the full-link layer to the classification transform layer and outputting transformed feature vectors by the classification transform layer; and
providing the transformed feature vectors to the normalization layer, performing normalization on elements therein by the normalization layer, and taking the normalized feature vectors as the semantic matching prediction results.
5. The method of claim 1, wherein each network is trained by:
training classification networks corresponding to the embedded network, the transformation network and the Nth transformation layer by utilizing a first training data set; and
and under the condition that parameters in the embedded network, the transformation network and the classification network corresponding to the Nth transformation layer which are trained are kept fixed, training the classification networks corresponding to the first to (N-1) th transformation layers by using a second training data set.
6. The method of claim 5, wherein the first training data set comprises a plurality of training data, each training data comprising a third text, a fourth text, and true semantic matching results for the third text and the fourth text, wherein training the classification network to which the embedded network, the transformation network, and the N-th transformation layer correspond using the first training data set comprises:
performing word segmentation and splicing processing on a third text and a fourth text of each training data in at least one part of training data in the first training data set to obtain a second word sequence;
providing the second word sequence to the embedded network and converting the second word sequence into a second word vector through the embedded network;
providing the second word vector to the transformation network, and providing the feature vector output by the Nth transformation layer in the transformation network to a classification network corresponding to the feature vector;
calculating a first loss function between a semantic matching prediction result output by the classification network corresponding to the Nth conversion layer and a real semantic matching result; and
and training the classification network corresponding to the embedding network, the transformation network and the Nth transformation layer based on the first loss function.
7. The method of claim 5, wherein the second training data set comprises a plurality of training data, each training data comprising a fifth text and a sixth text, wherein the classification networks corresponding to the first through (N-1) th conversion layers are trained using the second training data set:
performing word segmentation and splicing on a fifth text and a sixth text of each training data in at least one part of training data in the second training data set to obtain a third word sequence;
providing the third word sequence to the embedded network and converting the third word sequence into a third word vector through the embedded network;
providing the third word vector to the transformation network, and providing the feature vectors output by the first to (N-1) th transformation layers in the transformation network to the classification networks corresponding thereto, respectively;
calculating a second loss function between a plurality of semantic matching prediction results output by a plurality of classification networks corresponding to the first to (N-1) th conversion layers and a semantic matching prediction result output by a classification network corresponding to the N-th conversion layer; and
and training the classification network corresponding to the first to (N-1) th conversion layers based on the second loss function.
8. The method of claim 7, wherein computing a second loss function between the plurality of semantic matching predictors output by the plurality of classification networks corresponding to the first through (N-1) th transform layers and the semantic matching predictor output by the classification network corresponding to the N-th transform layer comprises:
calculating KL divergence between a semantic matching prediction result output by each of a plurality of classification networks corresponding to the first to (N-1) th transformation layers and a semantic matching prediction result output by a classification network corresponding to the Nth transformation layer; and
taking the sum of all the calculated KL divergences as the second loss function.
9. A semantic matching method, comprising:
receiving an input user question;
the method according to any one of claims 1 to 8, determining semantic matching results of each of at least a part of standard questions in a standard question-and-answer library with the user question;
displaying a standard question that best matches the semantics of the user question.
10. A semantic matching apparatus comprising:
a memory for storing a computer program thereon;
a processor for performing the following processing when executing the computer program:
performing word segmentation and splicing processing on the input first text and the input second text to obtain a first word sequence;
providing the first word sequence to an embedded network, and converting the first word sequence into a first word vector through the embedded network;
providing the first word vector to a transformation network, wherein the transformation network further comprises first to nth transformation layers connected in series, wherein the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as their input vectors, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performs feature extraction on the input vector and generates a feature vector, and each transformation layer has a classification network corresponding thereto; and
starting from a first conversion layer, performing the following operations layer by layer until semantic matching results of the first text and the second text are generated:
providing the feature vectors generated by the transform layer to the corresponding classification network;
generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer;
and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
11. The apparatus of claim 10, wherein the semantic matching prediction result comprises a probability value indicating whether the first and second texts match, and the predetermined condition comprises: the probability value is greater than a predetermined threshold.
12. The apparatus of claim 10, wherein each of the plurality of classification networks comprises: a full connection layer, a classification conversion layer and a normalization layer, and
the method for generating the semantic matching prediction result by utilizing the classification network corresponding to the conversion layer based on the received feature vector comprises the following steps:
the full connection layer receives the feature vectors output by the conversion layer corresponding to the classification network, and converts the feature vectors into feature vectors with dimensions corresponding to the category number of the semantic matching prediction result by the full connection layer;
providing the feature vectors output by the full-link layer to the classification transform layer and outputting transformed feature vectors by the classification transform layer; and
providing the transformed feature vectors to the normalization layer, performing normalization on elements therein by the normalization layer, and taking the normalized feature vectors as the semantic matching prediction results.
13. The apparatus of claim 9, wherein each network is trained by:
training classification networks corresponding to the embedded network, the transformation network and the Nth transformation layer by utilizing a first training data set; and
and under the condition that parameters in the embedded network, the transformation network and the classification network corresponding to the Nth transformation layer which are trained are kept fixed, training the classification networks corresponding to the first to (N-1) th transformation layers by using a second training data set.
14. A semantic matching apparatus comprising:
the word segmentation and splicing unit is used for performing word segmentation and splicing processing on the input first text and the input second text to obtain a first word sequence;
the embedding unit comprises an embedding network and converts the first word sequence into a first word vector through the embedding network;
a transformation unit including a transformation network, wherein the transformation network includes first to nth transformation layers connected in series, where N is an integer greater than 1, the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as input vectors thereof, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performing feature extraction on the input vectors and generating feature vectors; and
the classification unit comprises N classification networks, each conversion layer is provided with a classification network corresponding to the conversion layer one by one, and each classification network is used for receiving the characteristic vector from the corresponding conversion layer and generating semantic matching prediction results of the first text and the second text corresponding to the conversion layer;
wherein the transformation network and the N classification networks are configured to: starting from a first conversion layer, performing the following operations layer by layer until semantic matching results of the first text and the second text are generated:
providing the feature vectors generated by the transform layer to the corresponding classification network;
generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer;
and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
15. A semantic matching device comprising:
a receiving device for receiving the input user question;
the semantic matching apparatus according to any one of claims 10 to 14, configured to determine, for each of at least a part of standard questions in a standard question-and-answer library, a semantic matching result with the user question; and
display means for displaying a standard question that best matches the semantics of the user question.
16. A computer-readable medium, on which a computer program is stored which, when executed by a processor, performs the method of any one of claims 1 to 9.
Background
Aiming at the semantic matching problem of a user question and a standard question bank in a search scene, the technical evolution from unsupervised learning to supervised learning and from traditional machine learning to deep learning is carried out. The earliest algorithms, such as TF-IDF (term frequency-inverse document frequency), ld (levenshtein distance), lcs (locality Common subsequence), obtain semantic matching degree by calculating vocabulary overlap degree of two sentences. However, since these algorithms determine the semantic matching degree based on the word overlap ratio and the co-occurrence information, the semantic information mining of the sentence itself is not enough, and deep understanding of the user question cannot be achieved. Meanwhile, when the user asks for a long sentence, the keyword or the key phrase cannot be located.
In recent years, deep learning algorithms have made breakthrough progress, and the application of deep learning to semantic matching tasks is receiving more attention. Although the semantic matching model based on deep learning can mine deeper semantic information, parameters introduced by the model are too many (for example, more than 1 hundred million parameters), so that the model is difficult to be really online for service.
Disclosure of Invention
In view of the above circumstances, it is desirable to provide a new semantic matching method, apparatus, and medium, which can achieve the purpose of accelerating prediction on the premise of ensuring that the model performance does not significantly decrease, and solve the problems of expensive computing resources and insufficient memory.
According to an aspect of the present disclosure, there is provided a semantic matching method, including: performing word segmentation and splicing processing on the input first text and the input second text to obtain a first word sequence; providing the first word sequence to an embedded network, and converting the first word sequence into a first word vector through the embedded network; providing the first word vector to a transformation network, wherein the transformation network further comprises first to nth transformation layers connected in series, where N is an integer greater than 1, the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as input vectors thereof, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performs feature extraction on the input vectors and generates feature vectors, and each transformation layer has a classification network corresponding thereto; starting from a first conversion layer, performing the following operations layer by layer until semantic matching results of the first text and the second text are generated: providing the feature vectors generated by the transform layer to the corresponding classification network; generating a semantic matching prediction result based on the received feature vector by utilizing the classification network corresponding to the transformation layer; and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
Further, in a method according to an embodiment of the present disclosure, the semantic matching prediction result includes a probability value indicating whether the first text and the second text match, and the predetermined condition includes: the probability value is greater than a predetermined threshold.
In addition, in the method according to the embodiment of the disclosure, in the case that the semantic matching prediction result of the classification network corresponding to the ith conversion layer satisfies a predetermined condition, the operations of other conversion layers and their corresponding classification networks in the conversion network are stopped, wherein i is an integer greater than or equal to 1.
Further, in a method according to an embodiment of the present disclosure, each of the plurality of classification networks includes: a fully connected layer, a classification transformation layer, and a normalization layer, and wherein the classification network generates semantic matching predictors based on the feature vectors it receives, comprising: the full connection layer receives the feature vectors output by the conversion layer corresponding to the classification network, and converts the feature vectors into feature vectors with dimensions corresponding to the category number of the semantic matching prediction result by the full connection layer; providing the feature vectors output by the fully-connected layer to the classification transform layer and outputting transformed feature vectors by the classification transform layer; providing the transformed feature vectors to the normalization layer, performing normalization on elements therein by the normalization layer, and taking the normalized feature vectors as the semantic matching prediction results.
Further, in a method according to an embodiment of the present disclosure, each network is trained by: training the classification networks corresponding to the embedded network, the transformation network and the Nth transformation layer by using a first training data set; and under the condition that parameters in the embedded network, the transformation network and the classification network corresponding to the Nth transformation layer which are trained are kept fixed, training the classification networks corresponding to the first to (N-1) th transformation layers by using a second training data set.
In addition, in the method according to the embodiment of the present disclosure, the first training data set includes a plurality of training data, each of which includes a third text and a fourth text and a true semantic matching result of the third text and the fourth text, and the training the classification network corresponding to the embedding network, the transformation network, and the N-th transformation layer using the first training data set includes: performing word segmentation and splicing processing on a third text and a fourth text of each training data in at least one part of training data in the first training data set to obtain a second word sequence; providing the second word sequence to the embedded network and converting the second word sequence into a second word vector through the embedded network; providing the second word vector to the transformation network, and providing the feature vector output by the Nth transformation layer in the transformation network to a classification network corresponding to the feature vector; calculating a first loss function between a semantic matching prediction result output by the classification network corresponding to the Nth conversion layer and a real semantic matching result; and training the classification network corresponding to the embedding network, the transformation network and the Nth transformation layer based on the first loss function.
In addition, in the method according to the embodiment of the present disclosure, the second training data set includes a plurality of training data, each of which includes a fifth text and a sixth text, wherein the classification networks corresponding to the first to (N-1) th conversion layers are trained using the second training data set: performing word segmentation and splicing on a fifth text and a sixth text of each training data in at least one part of training data in the second training data set to obtain a third word sequence; providing the third word sequence to the embedded network and converting the third word sequence into a third word vector through the embedded network; providing the third word vector to the transformation network, and providing the feature vectors output by the first to (N-1) th transformation layers in the transformation network to the classification networks corresponding thereto, respectively; calculating a second loss function between a plurality of semantic matching prediction results output by a plurality of classification networks corresponding to the first to (N-1) th conversion layers and a semantic matching prediction result output by a classification network corresponding to the N-th conversion layer; and training the classification network corresponding to the first to (N-1) th conversion layers based on the second loss function.
In addition, in a method according to an embodiment of the present disclosure, calculating a second loss function between a plurality of semantic matching prediction results output by a plurality of classification networks corresponding to the first to (N-1) th transform layers and a semantic matching prediction result output by a classification network corresponding to the nth transform layer includes: calculating KL divergence between a semantic matching prediction result output by each of a plurality of classification networks corresponding to the first to (N-1) th transformation layers and a semantic matching prediction result output by a classification network corresponding to the Nth transformation layer; taking the sum of all the calculated KL divergences as the second loss function.
According to another aspect of the present disclosure, there is provided a semantic matching method, including: receiving an input user question; determining semantic matching results of each of at least a portion of the standard questions in the standard question-and-answer library with the user questions according to the method described hereinabove; displaying a standard question that best matches the semantics of the user question.
According to another aspect of the present disclosure, there is provided a semantic matching apparatus including: a memory for storing a computer program thereon; a processor for, when executing the computer program, performing the following: performing word segmentation and splicing processing on the input first text and the input second text to obtain a first word sequence; providing the first word sequence to an embedded network, and converting the first word sequence into a first word vector through the embedded network; providing the first word vector to a transformation network, wherein the transformation network further comprises first to nth transformation layers connected in series, wherein the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as their input vectors, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performs feature extraction on the input vector and generates a feature vector, and each transformation layer has a classification network corresponding thereto; and carrying out the following operations layer by layer from the first conversion layer until semantic matching results of the first text and the second text are generated: providing the feature vectors generated by the transform layer to the corresponding classification network; generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer; and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
Further, in an apparatus according to an embodiment of the present disclosure, the semantic matching prediction result includes a probability value indicating whether the first text and the second text match, and the predetermined condition includes: the probability value is greater than a predetermined threshold.
Further, in an apparatus according to an embodiment of the present disclosure, wherein each of the plurality of classification networks includes: a full-link layer, a classification transformation layer, and a normalization layer, and wherein semantic matching predictors are generated based on feature vectors received by the classification network corresponding to the transformation layer, comprising: the full-connection layer receives the feature vectors output by the conversion layer corresponding to the classification network, and converts the feature vectors into feature vectors of dimensionality corresponding to the category number of the semantic matching prediction result by the full-connection layer; providing the feature vectors output by the full-link layer to the classification transform layer and outputting transformed feature vectors by the classification transform layer; and providing the transformed feature vectors to the normalization layer, performing normalization on elements therein by the normalization layer, and taking the normalized feature vectors as the semantic matching prediction results.
In addition, in an apparatus according to an embodiment of the present disclosure, each network is trained by: training the classification networks corresponding to the embedded network, the transformation network and the Nth transformation layer by using a first training data set; and training the classification networks corresponding to the first to (N-1) th conversion layers by using a second training data set under the condition that parameters in the trained embedding network, the trained transformation network and the classification network corresponding to the Nth conversion layer are kept fixed.
According to another aspect of the present disclosure, there is provided a semantic matching apparatus including: the word segmentation and splicing unit is used for performing word segmentation and splicing processing on the input first text and the input second text to obtain a first word sequence; the embedding unit comprises an embedding network and converts the first word sequence into a first word vector through the embedding network; a transformation unit including a transformation network, wherein the transformation network includes first to nth transformation layers connected in series, where N is an integer greater than 1, the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as input vectors thereof, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performs feature extraction on the input vectors and generates feature vectors; the classification unit comprises N classification networks, each conversion layer is provided with a classification network corresponding to one conversion layer, and each classification network is used for receiving the characteristic vector from the corresponding conversion layer and generating semantic matching prediction results of the first text and the second text corresponding to the conversion layer; wherein the transformation network and the N classification networks are configured to: starting from the first conversion layer, carrying out the following operations layer by layer until semantic matching results of the first text and the second text are generated: providing the feature vectors generated by the transform layer to the corresponding classification network; generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer; and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
In addition, according to another aspect of the present disclosure, there is provided a semantic matching apparatus including: a receiving device for receiving the input user question; semantic matching means as described hereinbefore for determining a semantic matching result with said user question for each of at least a part of standard questions in a standard question-and-answer library; and a display means for displaying a standard question that best matches the semantics of the user question.
According to yet another aspect of the present disclosure, a computer-readable medium is provided, having stored thereon a computer program which, when executed by a processor, performs the method as described hereinabove.
With the semantic matching method, apparatus, and medium according to the embodiments of the present disclosure, once the prediction result output by the classification network corresponding to the shallow transform layer satisfies a predetermined condition, for example, the confidence is high enough, the processing of the following deep transform layer is not performed. By means of the arrangement, under the condition that the user problem is simple, satisfactory semantic matching prediction results of the classification networks corresponding to the shallow conversion layers can be output in advance, and the semantic matching prediction results of the classification networks corresponding to the last conversion layer do not need to be used. Therefore, the calculation burden of the semantic matching model is reduced, the reasoning speed of the model is improved, and the performance of the semantic matching calculation is not obviously reduced.
Drawings
FIG. 1 is a flow chart illustrating a process of a semantic matching method according to an embodiment of the present disclosure;
FIG. 2 is a schematic block diagram illustrating a semantic matching model according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a training process in a first phase according to an embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating a training process in a second phase according to an embodiment of the present disclosure;
fig. 5 is a diagram illustrating a first example of an application scenario to which a semantic matching method according to an embodiment of the present disclosure is applied;
FIG. 6 is a diagram illustrating a second example of an application scenario to which a semantic matching method according to an embodiment of the present disclosure is applied;
fig. 7 is a functional block diagram illustrating a configuration of a semantic matching apparatus according to an embodiment of the present disclosure; and
FIG. 8 is a schematic diagram of an architecture of an exemplary computing device, according to an embodiment of the present disclosure.
Detailed Description
Various preferred embodiments of the present invention will be described below with reference to the accompanying drawings. The following description with reference to the accompanying drawings is provided to assist in understanding the exemplary embodiments of the invention as defined by the claims and their equivalents. It includes various specific details to assist understanding, but they are to be construed as merely illustrative. Accordingly, those skilled in the art will recognize that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present invention. Also, in order to make the description more clear and concise, a detailed description of functions and configurations well known in the art will be omitted.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will relate to natural language, i.e. the language that people use daily, so it is closely related to the research of linguistics. For example, natural language processing techniques may include text processing, semantic understanding, semantic matching, and the like.
First, a semantic matching method according to an embodiment of the present disclosure will be described with reference to fig. 1. As shown in fig. 1, the method includes the following steps.
First, in step S101, word segmentation and concatenation processing is performed on an input first text and an input second text to obtain a first word sequence.
For example, when the semantic matching method according to the embodiment of the present disclosure is applied to semantic matching questions of a user question and a standard question library in a search scenario, the first text may be a user question (query) and the second text may be a standard question (query). Assume that the first text is represented asAnd the second text is represented asP and M denote the word sequence lengths of query and query, respectively. Wherein the content of the first and second substances,respectively represent each word obtained after performing word segmentation processing on the first text, andrespectively representing each word obtained after the word segmentation processing is carried out on the second text.
Then, in order to construct a standard input provided to a subsequent network, the two word sequences of the first text and the second text are concatenated into a first word sequence, which may be represented, for example, as Wherein, [ CLS]The mark is placed at the head of the first sentence, [ SEP]The flag is used to separate two input sentences, and since the first text and the second text are input, the SEP is added between the first text and the second text]And (5) marking.
Next, the process proceeds to step S102. In step S102, the first word sequence is provided to an embedded network, and the first word sequence is converted into a first word vector through the embedded network. Through the processing of step S102, character-type data of the input text can be converted into numerical-type data. For example, assuming that a first sequence of words is converted into a first word vector e through the embedded network, the following formula exists:
wherein the content of the first and second substances,is represented as an embedded vector of the ith word andis a vector with dimension of 1 × d, and is increased by one [ CLS]Sign and two]Flags, so that the embedded vector of the last word after concatenation is
Then, in step S103, the first word vector is provided to a transformation network. FIG. 2 illustrates a semantic matching model according to an embodiment of the present disclosure. As shown in FIG. 2, the semantic matching model includes the embedding network 201 described above, as well as the transformation network 202 and the plurality of classification networks 203 to be described herein1、2032、……203N. Wherein the transformation network 202 includes first to Nth transformation layers 202 connected in series1、2022、……202N. Where N is an integer greater than 1, the first transform layer 2021Receiving stationThe first word vector is used as the input vector and the other transform layers 2022、……202NThe feature vector generated by the last conversion layer connected in series with the conversion layer is received as an input vector thereof, each conversion layer performs feature extraction on the input vector and generates a feature vector, and each conversion layer has a classification network corresponding thereto. In fig. 2, it can be seen that the output of each conversion layer is connected to its corresponding classification network. And, a plurality of classification networks 2031、2032、……203NAre identical except that the parameters of the respective networks are different from each other.
The transformation network 202 is used for extracting semantic information layer by layer from the input first word vector. For example, as one possible implementation, each transformation layer in the transformation network 202 may obtain feature information in the input word vector through a self-attention mechanism.
The first step in computing self-attention is to create three vectors based on the input vectors of the transform layer: q vector, K vector, and V vector. These vectors are obtained by multiplying the input vectors with 3 transformation matrices. The (i +1) th conversion layer will be described as an example. Suppose that the eigenvector output from the i-th transform layer is HiAnd the feature vector is input to the next transform layer, i.e., the (i +1) th transform layer.
In the (i +1) th conversion layer, based on the input vector H, by the following formulai(e.g., dimension of (P + M +3) × dk) To create QiVector, KiVector sum ViVector quantity:
Qi=HiWi Q,Ki=HiWi K,Vi=HiWi V (2)
wherein, Wi Q、Wi K、Wi VRespectively represent respectively and QiVector, KiVector sum ViAnd (5) a parameter matrix corresponding to the vector. For example, Wi Q、Wi K、Wi VMay be dk×dkAnd the dimensions of the Qi, Ki and Vi vectors may be (P + M +3) x dk。
The second step in calculating self-attention is to calculate an attention score. Assuming that a self-attention score is to be calculated for the features of the first word, the features of each word in the input vector need to be scored. This score determines how much attention is paid to other words when transforming a word at a certain position. For example, this score is calculated by calculating QiVector sum KiThe dot product of the vectors.
The third step in calculating self-attention is to divide the score by KiDimension of vector (d)k) The square root of (a). This makes the process of gradient update more stable.
The fourth step in calculating self-attention is then to operate on the results by a Softmax function. The Softmax function normalizes the scores so that they are both positive numbers and add up to 1.
The fifth step in calculating self-attention is to assign ViThe quantity is multiplied by the result of the Softmax function. This is done in order to try to keep the eigenvalues of the words that are wanted to be focused on constant, while masking out those words that are not relevant (e.g. multiplying them by a small number).
Suppose HiRepresenting the feature vector output from the (i +1) th transform layer, the processes of the second step through the fifth step described above can be implemented by the following formulas:
referring back to fig. 1, after step S103, the process proceeds to step S104. In step S104, the following operations are performed layer by layer from the first transformation layer until semantic matching results of the first text and the second text are generated: providing the feature vectors generated by the transform layer to the corresponding classification network; generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer; and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
As described above, the network structures of the plurality of classification networks corresponding to the first to nth conversion layers are the same, and only specific network parameters are different. Specifically, each of the plurality of classification networks may include: a full connection layer, a classification conversion layer and a normalization layer.
Wherein the classification network generates semantic matching predictors based on the feature vectors it receives, comprising the following steps.
Firstly, the feature vectors output by the conversion layer corresponding to the classification network are provided for the full connection layer, and the feature vectors output by the conversion layer are converted into the feature vectors with the dimensionality corresponding to the category number of the semantic matching prediction result by the full connection layer. Suppose that the characteristic vector output from the i-th conversion layer is HiAnd the vector output from the fully-connected layer is Y, then the following formula exists:
Y=WYHi+bY (4)
wherein WYRepresents the corresponding parameter matrix, and bYRepresenting a constant.
For example, if the semantic matching prediction result output from the classification network is set to a 2-class prediction result, the feature vector Y output from the fully-connected layer is a 1 × 2-dimensional vector.
Then, the feature vectors output by the full-connection layer are provided to the classification transform layer, and the transformed feature vectors are output by the classification transform layer. For example, the classification transform layer herein may have a similar network structure to the first to nth transform layers described above, but different network parameters. Representing the feature vectors output from the classification transform layer by Y', then the following formula exists:
Y′=Transformer(Y) (5)
for example, in the case where the prediction result is a 2-class result,wherein y is0' and y1' is not necessarily a number in the range of 0 to 1, and the sum of both is not necessarily equal to 1.
Finally, the transformed feature vectors are provided to the normalization layer, normalization is performed on elements therein by the normalization layer, and the normalized feature vectors are taken as the semantic matching prediction results. The normalization is performed by a Softmax function. For example, withRepresenting the feature vector output from the hierarchy, then the following formula exists:
in particular, the amount of the solvent to be used,
wherein, with y0' and y1In a' different way,andis a number in the range of 0 to 1 and the sum of both equals 1.
Additionally, the semantic matching prediction result may include a probability value indicating whether the first text and the second text match, and the predetermined condition includes: the probability value is greater than a predetermined threshold. The closer the probability value output by the classification network is to 1, the higher its confidence is considered. For example, the predetermined threshold may be set to 0.8, 0.7, etc.
In fig. 2, an example of outputting 2 classification results as semantic prediction results is shown. Including P by classification network output0、P1The predicted results of these two elements, where P0Represents the firstProbability that text and second text do not semantically match, and P1Represents the probability that the first text and the second text semantically match, and P0And P1The sum is 1. In this case, as long as P0And P1If one of the semantic matching prediction results is larger than the preset threshold value, the semantic matching prediction result is considered to meet the preset condition, and then the semantic matching prediction result can be output as a final semantic matching prediction result.
However, in the present disclosure, the semantic matching prediction results output by the classification network are not limited to 2-classification results. For example, the semantic matching prediction result output by the classification network may also be a 3-class result. In this case, P may be included as output by the classification network0、P1、P2The predicted results of these three elements, where P0Representing the probability that the first text and the second text do not semantically match, P1Represents the probability that the first text and the second text semantically match, and P2Represents the probability that it cannot be determined whether the first text and the second text semantically match, and P0、P1、P2The sum is 1. Of course, other classification results may also occur, such as 4 classification results, 5 classification results, etc., depending on the application scenario.
In addition, it should be noted here that, when the semantic matching prediction result of the classification network corresponding to the ith conversion layer satisfies a predetermined condition, the operations of other conversion layers and the corresponding classification networks in the conversion network are stopped, where i is an integer greater than or equal to 1. In other words, once the prediction result output by the classification network corresponding to the shallow conversion layer satisfies a predetermined condition, for example, the confidence is high enough, the processing of the subsequent deep conversion layer is not performed. By the arrangement, under the condition that the user problem is simple, a satisfactory semantic matching prediction result of the classification network corresponding to the shallow conversion layer can be output in advance, and the semantic matching prediction result of the classification network corresponding to the last conversion layer is not required to be used. Therefore, the calculation burden of the semantic matching model is reduced, the reasoning speed of the model is improved, and the performance of the semantic matching calculation is not obviously reduced. This enables a semantic matching model according to the present disclosure to be truly online.
In the above, referring to the flowchart of fig. 1, and in conjunction with the model structure diagram of fig. 2, a specific process of the semantic matching method according to the embodiment of the present disclosure is described in detail. The method described above is performed with the embedded network, the transformed network, and the classification network all trained. Next, a training process of each network included in the semantic matching model will be specifically described.
The training process of each network included in the semantic matching model according to an embodiment of the present disclosure may include two stages.
In a first stage, the classification networks corresponding to the embedding network, the transformation network, and the Nth transformation layer are trained using a first training data set.
In the second stage, under the condition that parameters in the embedded network, the transformation network and the classification network corresponding to the Nth transformation layer which are trained are kept fixed, the classification networks corresponding to the first to (N-1) th transformation layers are trained by using a second training data set.
Referring back to FIG. 2, it can be seen that a plurality of classification networks 203 are networked1、2032、……203NIs divided into two parts. Will 203NAs the classification network of the first part, for example, this part of the classification network may be called a teacher classification network, and parameter adjustment is performed in the training of the first stage. In addition, the network 203 is to be classified1、2032、……203N-1As the classification network of the second part, for example, the classification network of this part may be called a student classification network, and parameter adjustment is performed in the training of the second stage.
These two-stage training processes can be considered knowledge-based distillation training processes. Knowledge distillation refers to training a small model using knowledge learned from a large model, such that the small model has the generalization ability of the large model. The generalization ability refers to the adaptability of a machine learning algorithm to a fresh sample. The purpose of learning is to learn the rules behind the data, and the trained network can also give appropriate output to data except for the learning set with the same rules. This ability is called generalization ability. In the present disclosure, since the teacher classification network is added to the last conversion layer, it is inevitably a large network. In contrast, since the student classification network is added on the basis of each layer of conversion layer, it corresponds to a small network. Training a student classification network by training a finished teacher classification network with semantic matching task related data sets, namely: the student classification network is used to distill the probability distribution of the teacher classification network.
Next, the training process of the first stage will be described first with reference to fig. 3. The first training data set may include a plurality of training data, each training data including third and fourth texts and true semantic matching results of the third and fourth texts. As shown in fig. 3, training the classification networks corresponding to the embedded network, the transformation network, and the nth transformation layer using a first training data set includes the following steps.
First, in step S301, in at least a part of the training data in the first training data set, word segmentation and concatenation processing are performed on the third text and the fourth text of each training data to obtain a second word sequence. This step is similar to step S101 described above with reference to fig. 1, except that the data is input differently. In fig. 1, the actual user questions and standard questions are entered, and in fig. 3, the user questions and standard questions are entered for training. The standard problem for the trained user problem has a true semantic match result as a positive solution compared to the actual user problem and the standard problem.
Then, in step S302, the second word sequence is provided to the embedded network, and is converted into a second word vector through the embedded network. This step is similar in processing manner to step S102 described above with reference to fig. 1.
Next, in step S303, the second word vector is provided to the transformation network, and the feature vector output by the nth transformation layer in the transformation network is provided to the classification network corresponding thereto.
Then, in step S304, a first loss function between the semantic matching prediction result output by the classification network corresponding to the nth conversion layer and the real semantic matching result is calculated.
Next, in step S305, based on the first loss function, the classification networks corresponding to the embedding network, the transformation network, and the nth transformation layer are trained.
Specifically, in at least a part of the training data in the first training data set, the processes of steps S301 to S305 described above are repeatedly performed for each training data to continuously adjust the network parameters in the classification network corresponding to the embedding network, the transformation network, and the nth transformation layer based on the first loss function. And ending the training process of the first stage until the first loss function converges, and fixing the network parameters of the classification networks corresponding to the embedded network, the transformation network and the Nth transformation layer at the moment for the subsequent training process of the second stage.
Next, the training process of the second stage will be described with reference to fig. 4. The second training data set includes a plurality of training data, each training data including a fifth text and a sixth text. As shown in fig. 4, training the classification networks corresponding to the first to (N-1) th transform layers using the second training data set includes the following steps.
First, in step S401, in at least a part of the training data in the second training data set, word segmentation and concatenation processing is performed on the fifth text and the sixth text of each training data to obtain a third word sequence. This step is similar to step S101 described above with reference to fig. 1, except that the data is input differently. In fig. 1, the actual user questions and standard questions are entered, and in fig. 4, the user questions and standard questions are entered for training. The standard questions of the user questions used for training have semantic matching results as positive solutions compared to the actual user questions and standard questions. However, unlike the first stage of training in fig. 3, the semantic matching result as a positive solution here is not a true semantic matching result, but a semantic matching prediction result output by the trained teacher classification network.
Then, in step S402, the third word sequence is provided to the embedded network, and is converted into a third word vector through the embedded network. This step is similar in processing manner to step S102 described above with reference to fig. 1.
Next, in step S403, the third word vector is provided to the transformation network, and the feature vectors output by the first to (N-1) th transformation layers in the transformation network are provided to the classification networks corresponding thereto, respectively.
Then, in step S404, a second loss function is calculated between the semantic matching prediction results outputted from the classification networks corresponding to the first to (N-1) th conversion layers and the semantic matching prediction results outputted from the classification network corresponding to the nth conversion layer.
For example, as a possible implementation, calculating a second loss function between the semantic matching prediction results outputted by the classification networks corresponding to the first to (N-1) th conversion layers and the semantic matching prediction results outputted by the classification networks corresponding to the N-th conversion layer may include the following steps.
First, KL divergences (also called relative entropies) between semantic matching prediction results output from each of the plurality of classification networks corresponding to the first to (N-1) th transform layers and semantic matching prediction results output from the classification network corresponding to the nth transform layer are calculated. Suppose thatThe prediction result of the ith student classification network is described by taking the case of 2 classifications as an example.
In this case, it is preferable that the air conditioner,as indicated above, P0And P1The probabilities of whether the predicted query and query are semantically matched are respectively. In addition, let p betIs a predicted result of the teacher classification network, and similarly,the KL divergence can be used as a measure by the following formulaAnd ptThe difference therebetween:
pt=Teacher_Classifier(HN) (8)
wherein, Teacher _ Classifier represents Teacher classification network, Student _ Classifier _ i represents ith Student classification network, HNFeature vector representing the output of the Nth transform layer, and HiRepresenting the feature vector output by the i-th transform layer.
Then, the sum of all calculated KL divergences is taken as the second loss function. In the case where there are (N-1) student classification networks in total in the semantic matching model described above, the second loss function is the sum of the KL divergences of all student classification networks. Order toRepresenting a second loss function, then the following formula exists:
next, in step S405, based on the second loss function, the classification networks corresponding to the first to (N-1) th conversion layers are trained.
Specifically, in at least a part of the training data in the second training data set, the processes of the above steps S401 to S405 are repeatedly performed for each training data to continuously adjust the network parameters in the classification network corresponding to the first to (N-1) th conversion layers based on the second loss function. The training process of the second phase ends until when the second loss function converges.
In the above, the training process of each network in the semantic matching model according to the embodiment of the present disclosure is described in detail with reference to fig. 3 and 4. Next, an application scenario of the semantic matching method according to an embodiment of the present disclosure will be described with reference to fig. 5 and 6.
Fig. 5 is a schematic diagram illustrating a search scenario of a browser to which a semantic matching method according to an embodiment of the present disclosure is applied. Specifically, in a search scenario, a question-and-answer library including a plurality of standard questions and an associated plurality of answers may be stored in advance. When a user enters a user question (e.g., "how the royal is glory and is played") in the search box 501, the user question (as a first text) and each of the plurality of standard questions (as a second text) in the question-and-answer library may be processed according to the semantic matching method described above with reference to fig. 1 to 4 to generate semantic matching prediction results for the user question and each of the standard questions. Of course, alternatively, a part of the standard questions may be screened out from the question-and-answer library in advance through other processing so as to be used for semantic matching with the user questions. Then, the standard question with the highest semantic matching degree is selected as the standard question corresponding to the user question and displayed in a block 502. Here, in the case where the semantic matching prediction result includes a probability value, it can be considered that the larger the probability value indicating semantic matching of two texts is, the higher the matching degree of the two is. The user can directly browse the answer corresponding to the question by clicking on the box 502.
In addition, in fig. 5, it can be seen that after the user inputs a question and before clicking on the search button 503, the standard question that best matches the user question is displayed. After the user inputs the question, other questions, such as "how the royal glory skin looks", "royal glory white and angel relationship", etc., which are suggested based on the user question are displayed in a block 504. When the user desires to switch the entered question to one of the questions to associate, the switch may be performed by clicking on the corresponding question in box 504. After the switching of the user question is performed, the standard question and the answer thereof that most match the switched user question may be further correspondingly displayed.
Fig. 6 is a schematic diagram illustrating a search scenario of a browser to which a semantic matching method according to an embodiment of the present disclosure is applied. In fig. 6, the user enters the user question "why the sky is blue" in search box 601, and then determines in the question-answer library one standard question "why the sky is blue, what rationale" that best matches the user question is displayed in box 602 based on a process similar to that described above with reference to fig. 5. The user may directly browse the answer corresponding to the question by clicking on box 602.
It can be seen that, unlike the interface shown in fig. 5, fig. 6 shows the interface after the user clicks the search button. The web page search results associated with the user question are displayed in block 603. At this point, the standard question and its answer that best match the user's question are still displayed in block 602.
Therefore, in a search scenario, as long as the user has completed inputting the user's question, the standard question and its answer that most closely match it are displayed regardless of whether the user has clicked the search button.
Under the condition that the semantic matching method is applied to the scene of a browser, high-quality question-answer pairs can be matched for user questions, and the semantic matching method is a key scheme for directly searching and recalling the question-answer pairs. On the service criteria evaluation set, compared with the prior art, by applying the semantic matching formula method according to the present disclosure, each index for evaluating the classification is significantly improved, for example, auc (area Under cut) is increased by 8%, acc (accuracy) is increased by 7.7%, and TOP1 recall (which reflects the proportion of correctly judged positive cases to total positive cases) is increased by 33.4%. Service deployment on-line post-question-answer exposure PV (page view) rises by 23%, click PV rises by 15%. Therefore, based on the indexes, it can be seen that in a search scene applying the semantic matching method according to the present disclosure, question-answer pairs can be recalled with high quality, and user requirements can be met accurately. Moreover, on the service standard evaluation set, the recall rate can be obviously improved under the condition that the accuracy rate is not reduced.
Although fig. 5 and fig. 6 illustrate a case where the semantic matching method according to the embodiment of the present disclosure is applied to a search scene of a browser, those skilled in the art will understand that the semantic matching method according to the embodiment of the present disclosure may also be similarly applied to any other scene where semantic analysis and matching of text are required.
In the above, the semantic matching method according to the embodiment of the present disclosure has been described in detail with reference to fig. 1 to 6. By the semantic matching method according to the embodiment of the disclosure, once the prediction result output by the classification network corresponding to the shallow conversion layer meets a predetermined condition, for example, the confidence is high enough, the processing of the later deep conversion layer is not performed. By the arrangement, under the condition that the user problem is simple, a satisfactory semantic matching prediction result of the classification network corresponding to the shallow conversion layer can be output in advance, and the semantic matching prediction result of the classification network corresponding to the last conversion layer is not required. Therefore, the calculation burden of the semantic matching model is reduced, the reasoning speed of the model is improved, and the performance of the semantic matching calculation is not obviously reduced.
Next, a semantic matching apparatus according to an embodiment of the present disclosure will be described with reference to fig. 7. As shown in fig. 7, the semantic matching apparatus 700 includes: a word segmentation and concatenation unit 701, an embedding unit 702, a transformation unit 703 and a classification unit 704.
The word segmentation and concatenation unit 701 is configured to perform word segmentation and concatenation processing on the input first text and the input second text to obtain a first word sequence. For example, when the semantic matching method according to the embodiment of the present disclosure is applied to a searchWhen the semantics of the user question and the standard question library in the scene are matched with a question, the first text can be a user question (query), and the second text can be a standard question (query). Suppose that the first text is represented asAnd the second text is represented asP and M denote the word sequence lengths of query and query, respectively. Wherein the content of the first and second substances,respectively represent each word obtained by performing word segmentation processing on the first text by the word segmentation concatenation unit 901, andrespectively representing each word obtained after the word segmentation processing is carried out on the second text. Then, in order to construct a standard input provided to a subsequent network, the word segmentation concatenation unit 701 concatenates two word sequences of the first text and the second text into a first word sequence, which may be represented, for example, asWherein, [ CLS]The mark is placed at the head of the first sentence, [ SEP]The flag is used to separate two input sentences, and since the first text and the second text are input, the SEP is added between the first text and the second text]And (5) marking.
The embedding unit 702 comprises an embedding network and converts the first word sequence into a first word vector via the embedding network. Character-type data of an input text can be converted into numerical-type data by processing performed by the embedding unit 702.
The transformation unit 703 comprises a transformation network further comprising first to nth transformation layers connected in series, wherein the first transformation layer receives the first word vector as an input vector and the other transformation layers receive, as their input vectors, feature vectors generated by a previous transformation layer connected in series thereto, each transformation layer performs feature extraction on the input vectors and generates feature vectors, and each transformation layer has a classification network corresponding thereto.
The transformation network is used for extracting semantic information layer by layer from the input first word vector. For example, as one possible implementation, each transformation layer in the transformation network may obtain feature information in the input word vector through a self-attention mechanism. The specific processing has been described above.
The classification unit 704 includes N classification networks, each transformation layer has a one-to-one correspondence classification network with it, and each classification network is configured to receive a feature vector from its corresponding transformation layer and generate semantic matching prediction results of the first text and the second text corresponding to the transformation layer.
Wherein the transformation network and the N classification networks are configured to: starting from the first conversion layer, carrying out the following operations layer by layer until semantic matching results of the first text and the second text are generated: providing the feature vectors generated by the transform layer to the corresponding classification network; generating a semantic matching prediction result based on the received feature vector by using the classification network corresponding to the conversion layer; and generating semantic matching results of the first text and the second text based on the semantic matching prediction results under the condition that the semantic matching prediction results meet preset conditions.
As described above, the network structures of the plurality of classification networks corresponding to the first to nth conversion layers are the same, and only specific network parameters are different. Specifically, each of the plurality of classification networks may include: a full connection layer, a classification conversion layer and a normalization layer.
The classification unit 704 may be further configured to: providing the feature vectors output by the conversion layer corresponding to the classification network to the full-connection layer, and converting the feature vectors output by the conversion layer into feature vectors of dimensions corresponding to the category number of the semantic matching prediction result by the full-connection layer; providing the feature vectors output by the full-link layer to the classification transform layer and outputting transformed feature vectors by the classification transform layer; providing the transformed feature vectors to the normalization layer, performing normalization on elements therein by the normalization layer, and taking the normalized feature vectors as the semantic matching predictors.
Additionally, the semantic matching prediction result comprises a probability value indicating whether the first text and the second text match, and the predetermined condition comprises: the probability value is greater than a predetermined threshold. The closer the probability value output by the classification network is to 1, the higher its confidence is considered. For example, the predetermined threshold may be set to 0.8, 0.7, etc.
In addition, it is noted herein that, wherein the transformation network and the classification network may be further configured to: and stopping the operation of other conversion layers and corresponding classification networks in the conversion network under the condition that the semantic matching prediction result of the classification network corresponding to the ith conversion layer meets a preset condition, wherein i is an integer greater than or equal to 1. In other words, once the prediction result output by the classification network corresponding to the shallow conversion layer satisfies a predetermined condition, for example, the confidence is high enough, the processing of the later deep conversion layer is not performed. By the arrangement, under the condition that the user problem is simple, a satisfactory semantic matching prediction result of the classification network corresponding to the shallow conversion layer can be output in advance, and the semantic matching prediction result of the classification network corresponding to the last conversion layer does not need to be used. Therefore, the calculation burden of the semantic matching model is reduced, the reasoning speed of the model is improved, and the performance of the semantic matching calculation is not obviously reduced. This enables a semantic matching model according to the present disclosure to be truly online.
In addition, the specific training procedures of the embedding network, the transformation network, and the classification network used by the embedding unit 702, the transformation unit 703, and the classification unit 704 in performing the processing have been described in detail above. For the sake of brevity, no further description is provided herein.
Furthermore, methods or apparatus in accordance with embodiments of the present disclosure may also be implemented by way of the architecture of computing device 800 shown in FIG. 8. As shown in fig. 8, computing device 800 may include a bus 810, one or more CPUs 820, a Read Only Memory (ROM)830, a Random Access Memory (RAM)840, communication ports 850 to a network, input/output components 860, a hard disk 870, and the like. A storage device in the computing device 800, such as the ROM 830 or the hard disk 870, may store various data or files used for processing and/or communication of the information processing method provided by the present disclosure and program instructions executed by the CPU. Of course, the architecture shown in FIG. 8 is merely exemplary, and one or more components of the computing device shown in FIG. 8 may be omitted as needed in implementing different devices.
In addition, according to another aspect of the present disclosure, there is provided a semantic matching apparatus, which may include: a receiving device for receiving an input user question; the semantic matching device is used for determining the semantic matching result of each of at least a part of standard questions in the standard question-answer library and the user question; and the display device is used for displaying a standard question which is matched with the user question most.
Embodiments of the present disclosure may also be implemented as a computer-readable storage medium. A computer readable storage medium according to an embodiment of the present disclosure has computer readable instructions stored thereon. The computer readable instructions, when executed by a processor, may perform the semantic matching method according to embodiments of the present disclosure described with reference to the above figures. The computer-readable storage medium includes, but is not limited to, volatile memory and/or non-volatile memory, for example. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc.
Additionally, embodiments of the disclosure may also be implemented as a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the semantic matching method.
Heretofore, a semantic matching method and apparatus according to an embodiment of the present disclosure have been described in detail with reference to fig. 1 to 8. By the semantic matching method and the semantic matching device, once the prediction result output by the classification network corresponding to the shallow conversion layer meets a predetermined condition, for example, the confidence is high enough, the processing of the later deep conversion layer is not performed. By means of the arrangement, under the condition that the user problem is simple, the satisfactory semantic matching prediction result of the classification network corresponding to the shallow conversion layer can be output in advance, and the semantic matching prediction result of the classification network corresponding to the last conversion layer does not need to be used. Therefore, the calculation burden of the semantic matching model is reduced, the reasoning speed of the model is improved, and the performance of the semantic matching calculation is not obviously reduced.
It should be noted that, in the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Finally, it should be noted that the series of processes described above includes not only processes performed in time series in the order described herein, but also processes performed in parallel or separately, rather than in time series.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by software entirely. With this understanding in mind, the technical solutions of the present invention may be embodied in whole or in part in the form of a software product, which can be stored in a storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The present invention has been described in detail, and the principle and embodiments of the present invention are explained herein by using specific examples, which are only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:文本比对方法、装置、介质及电子设备