Method and device for determining text relevance, readable medium and electronic equipment
1. A method for determining text relevance, the method comprising:
acquiring a text to be searched and a text to be matched corresponding to the text to be searched;
dividing the text to be matched into a plurality of target texts according to preset text dividing elements, wherein the preset text dividing elements are used for representing different dimensions of the text to be matched;
and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
2. The method according to claim 1, wherein the obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts comprises:
dividing the text to be searched and each target text according to a word division mode to obtain a word division text to be searched corresponding to the text to be searched and a target word division text corresponding to each target text, wherein the word division text comprises one or more words;
and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the target word division texts.
3. The method according to claim 2, wherein the obtaining the correlation between the text to be searched and the text to be matched according to the text to be divided by the word to be searched and the plurality of texts to be divided by the target word comprises:
obtaining a plurality of word matching matrixes corresponding to the target word division text according to the to-be-searched word division text and the target word division texts;
and acquiring the correlation between the text to be searched and the text to be matched according to the word matching matrixes.
4. The method according to claim 3, wherein the obtaining a plurality of word matching matrices corresponding to the target word segmentation text according to the to-be-searched word segmentation text and the plurality of target word segmentation texts comprises:
and aiming at each target word division text, obtaining the similarity between a word vector corresponding to each word in the text of the word division to be searched and a word vector corresponding to each word in the text of the target word division, and obtaining a word matching matrix corresponding to the text of the target word division according to the similarity.
5. The method according to claim 3, wherein the obtaining the correlation between the text to be searched and the text to be matched according to the plurality of word matching matrices comprises:
obtaining a word feature matching vector corresponding to each word matching matrix through a pre-trained vector obtaining model;
and acquiring the correlation between the text to be searched and the text to be matched according to the word feature matching vectors.
6. The method according to claim 5, wherein the obtaining the correlation between the text to be searched and the text to be matched according to the plurality of word feature matching vectors comprises:
acquiring a preset splicing rule;
according to the preset splicing rule, splicing the word feature matching vectors to obtain a plurality of target word feature matching vectors;
obtaining a matching value corresponding to each target word feature matching vector;
and determining the correlation between the text to be searched and the text to be matched according to the maximum matching value.
7. The method of claim 6, wherein the word feature matching vector comprises a first word feature matching vector and a second word feature matching vector, and wherein the target word feature matching vector comprises a first target word feature matching vector and a second target word feature matching vector; the splicing the plurality of word feature matching vectors according to the preset splicing rule to obtain a plurality of target word feature matching vectors comprises:
and according to the preset splicing rule, splicing the plurality of first word feature matching vectors to obtain a plurality of first target word feature matching vectors, and splicing the plurality of second word feature matching vectors to obtain a plurality of second target word feature matching vectors.
8. The method according to claim 7, wherein before said obtaining the matching value corresponding to each of the target word feature matching vectors, the method further comprises:
performing maximum pooling on each first target word feature matching vector to obtain a first pooled word feature matching vector corresponding to the first target word feature matching vector;
performing maximum pooling on each second target word feature matching vector, and then performing average pooling to obtain second pooled word feature matching vectors corresponding to the second target word feature matching vectors;
splicing the plurality of first pooling word feature matching vectors and the plurality of second pooling word feature matching vectors to obtain a plurality of target pooling word feature matching vectors;
the obtaining of the matching value corresponding to each target word feature matching vector includes:
and inputting the target pooling word feature matching vector into a pre-trained full-connection layer aiming at each target pooling word feature matching vector to obtain a matching value corresponding to the target pooling word feature matching vector.
9. The method of claim 6, wherein the obtaining the preset splicing rule comprises:
determining a service scene corresponding to the text to be matched;
and determining a preset splicing rule corresponding to the service scene according to a preset splicing rule incidence relation, wherein the splicing rule incidence relation comprises the corresponding relation between different service scenes and the preset splicing rule.
10. The method according to any one of claims 2 to 9, wherein before the dividing the text according to the word to be searched and the dividing the text according to the plurality of target words to obtain the correlation between the text to be searched and the text to be matched, the method further comprises:
dividing the text to be searched and each target text respectively according to a character dividing mode to obtain a character dividing text to be searched corresponding to the text to be searched and a target character dividing text corresponding to each target text, wherein the character dividing text comprises one or more characters;
the step of obtaining the correlation between the text to be searched and the text to be matched according to the text to be divided by the word to be searched and the plurality of texts to be divided by the target word comprises the following steps:
and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched, the target word division texts, the character division text to be searched and the target character division texts.
11. An apparatus for determining text relevance, the apparatus comprising:
the text acquisition module is used for acquiring a text to be searched and a text to be matched corresponding to the text to be searched;
the first text division module is used for dividing the text to be matched into a plurality of target texts according to preset text division elements, wherein the preset text division elements are used for representing different dimensions of the text to be matched;
and the correlation obtaining module is used for obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
12. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1-10.
13. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 10.
Background
In a search system, it is particularly important to measure the correlation between a keyword input by a user and search contents. At present, the keywords may be segmented, and the correlation between the keywords and the search content may be calculated through the word vectors of the segmented words. However, the search content may be many, and in the case of music, the search content may include a singer, a song title, lyrics, and the like.
In the related art, under the condition that the search content is more, the search content needs to be spliced firstly, and then the correlation between the keywords and the spliced search content is calculated, but under the condition that the matching rules are different, the correlation between the calculated keywords and the search content is also different, so that the accuracy of the determined text correlation is lower.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a method of determining text relevance, the method comprising:
acquiring a text to be searched and a text to be matched corresponding to the text to be searched;
dividing the text to be matched into a plurality of target texts according to preset text dividing elements, wherein the preset text dividing elements are used for representing different dimensions of the text to be matched;
and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
In a second aspect, the present disclosure provides an apparatus for determining text relevance, the apparatus comprising:
the text acquisition module is used for acquiring a text to be searched and a text to be matched corresponding to the text to be searched;
the first text division module is used for dividing the text to be matched into a plurality of target texts according to preset text division elements, wherein the preset text division elements are used for representing different dimensions of the text to be matched;
and the correlation obtaining module is used for obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect of the present disclosure.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method of the first aspect of the present disclosure.
According to the technical scheme, the text to be searched and the text to be matched corresponding to the text to be searched are obtained; dividing the text to be matched into a plurality of target texts according to preset text dividing elements, wherein the preset text dividing elements are used for representing different dimensions of the text to be matched; and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts. That is to say, under the condition that the text to be matched includes multiple dimensions, the multiple dimensions of the text to be matched do not need to be spliced, the text to be matched can be divided into multiple target texts according to the preset text dividing element, and the correlation between the text to be searched and the text to be matched is obtained according to the text to be searched and the multiple target texts, so that the calculation results of the correlation cannot be influenced under the condition that the matching rules are different because the target texts with different dimensions are independent, and the accuracy of the determined text correlation is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow diagram illustrating a method of determining text relevance in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating a second method of determining text relevance in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a third method of determining text relevance in accordance with one illustrative embodiment;
FIG. 4 is a block diagram illustrating a method of determining text relevance in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating an apparatus for determining text relevance in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a second apparatus for determining text relevance in accordance with an illustrative embodiment;
FIG. 7 is a block diagram illustrating a third apparatus for determining text relevance in accordance with an illustrative embodiment;
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
First, an application scenario of the present disclosure will be explained. Currently, the text relevance acquisition methods include three types: (1) manually constructing characteristics such as texts, lexical methods and the like, and training by using a traditional machine learning model; (2) obtaining vector representation of a text based on a bag-of-words model, reducing dimensions based on a word-hashing method, finally obtaining 128-dimensional text embedding, and finally calculating cosine similarity of the two text embedding to obtain a similarity score, such as word2 vec; (3) and (3) performing deep interactive calculation on the embedding of the two texts and the embedding of each word thereof on the basis of the mode (2), such as the current Bert.
The inventor of the present disclosure finds that, in an actual usage scenario, the above-mentioned manner of acquiring text relevance has two problems: (1) the keywords input by the user are not standard, so that the word segmentation accuracy is low, and the accuracy of an acquisition mode of text correlation depending on word segmentation results is low; (2) the content to be retrieved comprises a plurality of dimensions (fields), which may include, for example, singers, song titles, lyrics, etc., in the case of music. In the related art, when the search content includes multiple dimensions, the multiple dimensions in the search content are spliced, and then the correlation between the keyword and the spliced search content is calculated, but when the matching rules are different, the correlation between the calculated keyword and the search content is also different, for example, the correlation is 3 (highest) when the keyword is matched to a singer, the correlation is 2 when the song name is matched, and the correlation is 1 when the song lyrics are matched, so that the accuracy of the determined text correlation is low.
In order to solve the existing problems, the present disclosure provides a method, an apparatus, a readable medium and an electronic device for determining text relevance, the text to be matched is divided into a plurality of target texts according to the preset text dividing element, and the relevance between the text to be searched and the text to be matched is obtained according to the text to be searched and the target texts, so that the calculation result of the relevance is not affected under the condition that the matching rules are different because the target texts with different dimensions are independent, thereby improving the accuracy of the determined text relevance.
The present disclosure is described below with reference to specific examples.
FIG. 1 is a flow diagram illustrating a method of determining text relevance, according to an example embodiment, which may include, as shown in FIG. 1:
s101, obtaining a text to be searched and a text to be matched corresponding to the text to be searched.
The text to be searched can be a keyword input by a user, and the text to be matched can be a text corresponding to a recall result obtained after the text to be searched is queried. For example, taking a music player as an example, if the text to be searched is "zhou jenlong qilix", a plurality of pieces of music searched by the music player are the text to be matched.
It should be noted that the text to be matched may also be any preset text for which a correlation with the text to be searched needs to be determined, which is not limited in this disclosure.
And S102, dividing the text to be matched into a plurality of target texts according to preset text dividing elements.
The preset text partition elements corresponding to different text types may be different, and the preset text partition elements may be used to represent different dimensions (domains) of the text to be matched, for example, for a music text, the preset text partition elements may include singers, song names, lyrics, word authors, song authors, and the like, and for a paper text, the preset text partition elements may include titles, abstracts, texts, and the like.
S103, obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
In this step, after the text to be matched is divided into a plurality of target texts, the text to be searched and each target text may be divided according to a word division manner to obtain a word division text to be searched corresponding to the text to be searched and a target word division text corresponding to each target text, where the word division text includes one or more words, and then the text to be searched and the target word division texts are divided according to the word division text to be searched and the target word division texts to obtain the correlation between the text to be searched and the text to be matched.
Dividing a text for each target word, dividing the text according to the word to be searched and the target word to obtain a word matching matrix corresponding to the word to be searched, and obtaining the correlation between the text to be searched and the text to be matched according to the word matching matrices.
By adopting the method, the text to be matched is divided into the plurality of target texts according to the preset dividing elements, and the correlation between the text to be searched and the text to be matched is obtained according to the text to be searched and the plurality of target texts, so that the calculation result of the correlation cannot be influenced under the condition that the matching rules are different because the target texts with different dimensions are independent, and the accuracy of the determined text correlation is improved.
FIG. 2 is a flow diagram illustrating a second method of determining text relevance, according to an example embodiment, which may include, as shown in FIG. 2:
s201, obtaining a text to be searched and a text to be matched corresponding to the text to be searched.
The text to be searched can be a keyword input by a user, and the text to be matched can be a text corresponding to a recall result obtained after the text to be searched is queried. For example, taking a music player as an example, if the text to be searched is "zhou jenlong qilix", a plurality of pieces of music searched by the music player are the text to be matched.
It should be noted that the text to be matched may also be any preset text for which a correlation with the text to be searched needs to be determined, which is not limited in this disclosure.
S202, dividing the text to be matched into a plurality of target texts according to preset text dividing elements.
The preset text partition elements corresponding to different text types may be different, and the preset text partition elements may be used to represent different dimensions (domains) of the text to be matched, for example, for a music text, the preset text partition elements may include singers, song names, lyrics, word authors, song authors, and the like, and for a paper text, the preset text partition elements may include titles, abstracts, texts, and the like.
In this step, after the text to be searched and the text to be matched corresponding to the text to be searched are obtained, the preset text partition element may be obtained according to the type of the text to be matched, and then the text to be matched may be partitioned into a plurality of target texts according to the preset text partition element. For example, if the text to be searched is "zhonglun qilix", the text to be matched corresponding to the text to be searched and searched in the search engine is "qilix (Live) -zhonglun (Jay Chou) word: a square culture mountain yeast: the peacock outside the window of Zhougelong has a summer feeling … … 'when you say that the sentence is very popular on the telegraph pole, the text to be matched can be divided into 5 target texts, wherein the 1 st target text is' Qilixiang (Live) ', the 2 nd target text is' Zhougelon (Jay Chou) ', and the 3 rd target text is' word: the 4 th target text is "song: zhou Jilun ", the 5 th target text is" the peacock outside the window has a summer feeling … … when you say that the sentence is very loud on the pole ".
S203, dividing the text to be searched and each target text according to a word division mode to obtain a word division text to be searched corresponding to the text to be searched and a target word division text corresponding to each target text.
The word segmentation text may include one or more words, and different words may be separated by a preset symbol, for example, the preset symbol may be "##".
In this step, after the text to be searched and the plurality of target texts corresponding to the text to be matched are obtained, the text to be searched may be divided according to the word division manner to obtain a word division text to be searched corresponding to the text to be searched, and each target text is divided to obtain a target word division text corresponding to each target text. The word division mode may be any word division mode in the prior art, and the present disclosure does not limit this.
For example, if the text to be searched is "zhou jerusalem qilix", the text to be searched may be divided according to the word dividing manner, so as to obtain a word dividing text "zhou jerusalem # # qilix" corresponding to the text to be searched; if the text to be searched is "pop hunna corrvette", the text to be searched can be divided according to the word division mode to obtain the word division text "pop hunn # # a corrv # # ette. If the 1 st target text is 'qilixiang (Live)', the 1 st target text can be divided according to the word division mode to obtain a target word division text 'qilixiang # # (Live)', which corresponds to the 1 st target text; if the 5 th target text is "a sentence of a sparrow outside a window is too many mouths on a telegraph pole to say that the sparrow has a feeling of summer … …", the 5 th target text can be divided according to the word division mode to obtain a target word division text "a sentence # sparrow # # of an # outside the window # is too # on a # telegraph pole # # to say # so that the # # very # has a # feeling # # … … of a # summer #". The dividing manner of the 2 nd target text, the 3 rd target text and the 4 th target text may refer to the dividing manner of the 1 st target text and the 5 th target text, which is not described herein again.
S204, dividing the text according to the word to be searched and the plurality of target words, and obtaining the correlation between the text to be searched and the text to be matched.
In this step, after the word segmentation text to be searched and the target word segmentation text corresponding to each target text are obtained, a plurality of word matching matrices corresponding to the target word segmentation text may be obtained according to the word segmentation text to be searched and the plurality of target word segmentation texts, and the correlation between the text to be searched and the text to be matched is obtained according to the plurality of word matching matrices.
The word vector corresponding to each word in the word segmentation text to be searched and the word vector corresponding to each word in each target word segmentation text can be obtained in a word embedding mode, then the text is segmented aiming at each target word, the similarity between the word vector corresponding to each word in the word segmentation text to be searched and the word vector corresponding to each word in the target word segmentation text can be obtained, and the word matching matrix corresponding to the target word segmentation text can be obtained according to the similarity.
Illustratively, taking the to-be-searched word segmentation text and the target word segmentation text in step S203 as an example, the text "seven miles # (Live)" is segmented for the 1 st target word, the to-be-searched word segmentation text "zhou ji # seven miles" may be obtained by first obtaining word vectors corresponding to "seven miles" and "(Live)" in the target word segmentation text and word vectors corresponding to "zhou jiron" and "seven miles" in the to-be-searched word segmentation text, and then, a similarity between the word vector corresponding to "zhou jiron" and the word vector corresponding to "seven miles" may be obtained, a similarity between the word vector corresponding to "zhou jiron" and the word vector corresponding to "(Live)", asimilarity between the word vector corresponding to "seven miles" and the word vector corresponding to "seven miles" may be obtained, and a similarity between the word vector corresponding to "seven miles" and the word vector corresponding to "(Live)", and acquiring a word matching matrix corresponding to the target word segmentation text 'Qilixiang # # (Live)' according to the obtained four similarities, wherein the word matching matrix can be a 2-by-2 matrix.
It should be noted that, in the above example, only the 1 st target word segmentation text is listed, and for each target word segmentation text in the 2 nd target word segmentation text, the 3 rd target word segmentation text, the 4 th target word segmentation text, and the 5 th target word segmentation text, a word matching matrix corresponding to the target word segmentation text needs to be obtained. In addition, the similarity may include dot product similarity and cosine similarity, so that the obtained word matching matrix corresponding to the target word segmentation text "qilixiang # # (Live)" also includes 2 words, thereby enabling the accuracy of the correlation between the text to be searched and the text to be matched, which is obtained according to the word matching matrix, to be higher. The processing flow of the 2 word matching matrices is similar to that of the 1 word matching matrix, and the disclosure only takes the 1 word matching matrix as an example for explanation.
After a plurality of word matching matrixes corresponding to the target word division text are obtained, a word feature matching vector corresponding to each word matching matrix can be obtained through a pre-trained vector obtaining model, and the correlation between the text to be searched and the text to be matched is obtained according to the plurality of word feature matching vectors. The vector acquisition model may be obtained by training a convolutional neural network model through an existing model training method, which is not described herein again, and the structure of the convolutional neural network model may be 2 layers 1 × 1 and 2 layers 3 × 3.
Further, after the plurality of word feature matching vectors are obtained, a preset splicing rule may be obtained, the plurality of word feature matching vectors are spliced according to the preset splicing rule to obtain a plurality of target word feature matching vectors, a matching value corresponding to each target word feature matching vector is obtained, and the correlation between the text to be searched and the text to be matched is determined according to the maximum matching value.
The preset splicing rule can be determined according to the service scene corresponding to the text to be matched, in a possible implementation manner, the service scene corresponding to the text to be matched can be determined, and the preset splicing rule corresponding to the service scene is determined through a preset splicing rule association relationship, wherein the splicing rule association relationship comprises the correspondence between different service scenes and the preset splicing rule. Illustratively, taking the text to be matched as a music text as an example, the preset concatenation rule may be a singer, a song title, a concatenation of the singer and the song title, a concatenation of the singer and lyrics, and a concatenation of the song title and lyrics.
It should be noted that, in order to improve the accuracy of the determined text relevance, when the vector acquisition model is used to acquire the word feature matching vector corresponding to each word matching matrix, the channel may be expanded, and for each word matching matrix, 2 word feature matching vectors corresponding to the word matching matrix may be acquired through the vector acquisition model. The word feature matching vector may include a first word feature matching vector and a second word feature matching vector, and the target word feature matching vector may include a first target word feature matching vector and a second target word feature matching vector. When the plurality of word feature matching vectors are spliced, the plurality of first word feature matching vectors can be spliced according to the preset splicing rule to obtain a plurality of first target word feature matching vectors, and the plurality of second word feature matching vectors are spliced to obtain a plurality of second target word feature matching vectors.
Further, in order to reduce information loss, after obtaining the plurality of first target word feature matching vectors and the plurality of second target word feature matching vectors, performing maximum pooling on the first target word feature matching vectors for each first target word feature matching vector to obtain first pooled word feature matching vectors corresponding to the first target word feature matching vectors; performing maximum pooling on each second target word feature matching vector, and then performing average pooling to obtain second pooled word feature matching vectors corresponding to the second target word feature matching vectors; splicing the plurality of first pooling word feature matching vectors and the plurality of second pooling word feature matching vectors to obtain a plurality of target pooling word feature matching vectors; and inputting the target pooling word feature matching vector into a pre-trained full-connection layer aiming at each target pooling word feature matching vector to obtain a matching value corresponding to the target pooling word feature matching vector. And then, according to the maximum matching value, determining the correlation between the text to be searched and the text to be matched.
The following describes the step S204 by taking the text to be matched as a music text as an example.
If the preset splicing rule is the splicing of the singer, the song name, the singer and the song name, the splicing of the singer and the lyrics, and the splicing of the song name and the lyrics, the first word feature matching vector comprises: a first singer word feature matching vector, a first name word feature matching vector, and a first lyric word feature matching vector, the second word feature matching vector comprising: and the second singer word feature matching vector, the second name word feature matching vector and the second lyric word feature matching vector are spliced according to the preset splicing rule to obtain 5 first target word feature matching vectors: a first singer word feature matching vector (a1 word feature matching vector), a first name word feature matching vector (B1 word feature matching vector), a first singer word feature matching vector + a first name word feature matching vector (C1 word feature matching vector), a first singer word feature matching vector + a first lyric word feature matching vector (D1 word feature matching vector), a first name word feature matching vector + a first lyric word feature matching vector (E1 word feature matching vector), 5 second target word feature matching vectors: a second singer word feature matching vector (a2 word feature matching vector), a second name word feature matching vector (B2 word feature matching vector), a second singer word feature matching vector + a second name word feature matching vector (C2 word feature matching vector), a second singer word feature matching vector + a second lyric word feature matching vector (D2 word feature matching vector), a second name word feature matching vector + a second lyric word feature matching vector (E2 word feature matching vector).
Then, the 5 first target word feature matching vectors may be subjected to maximum pooling processing to obtain 5 first pooled word feature matching vectors: a1, B1, C1, D1 and E1 pooling word feature matching vectors; after the 5 second target word feature matching vectors are subjected to maximum pooling, average pooling is performed to obtain 5 second pooled word feature matching vectors: an A2 pooled word feature matching vector, a B2 pooled word feature matching vector, a C2 pooled word feature matching vector, a D2 pooled word feature matching vector, and an E2 pooled word feature matching vector. Finally, splicing the A1 pooling word feature matching vector with the A2 pooling word feature matching vector to obtain an A12 target pooling word feature matching vector, splicing the B1 pooling word feature matching vector with the B2 pooling word feature matching vector to obtain a B12 target pooling word feature matching vector, splicing the C1 pooling word feature matching vector with the C2 pooling word feature matching vector to obtain a C12 target pooling word feature matching vector, splicing the D1 pooling word feature matching vector with the D2 pooling word feature matching vector to obtain a D12 target pooling word feature matching vector, and splicing the E1 pooling word feature matching vector with the E2 pooling word feature matching vector to obtain an E12 target pooling word feature matching vector.
And finally, respectively inputting the A12 target pooled word feature matching vector, the B12 target pooled word feature matching vector, the C12 target pooled word feature matching vector, the D12 target pooled word feature matching vector and the E12 target pooled word feature matching vector into the full connection layer to obtain a matching value corresponding to each target pooled word feature matching vector, and determining the correlation between the text to be searched and the text to be matched according to the maximum matching value. The larger the matching value is, the stronger the correlation between the text to be searched and the text to be matched is, and the smaller the matching value is, the weaker the correlation between the text to be searched and the text to be matched is.
By adopting the method, the text to be matched is divided into the plurality of target texts according to the preset text dividing elements, the text to be searched and each target text are divided according to the word dividing mode to obtain the plurality of word dividing texts to be searched corresponding to the text to be searched and the plurality of target word dividing texts corresponding to each target text, and the correlation between the text to be searched and the text to be matched is obtained according to the plurality of word dividing texts to be searched and the plurality of target word dividing texts.
FIG. 3 is a flow diagram illustrating a third method of determining text relevance according to an example embodiment, which may include, as shown in FIG. 3:
s301, obtaining a text to be searched and a text to be matched corresponding to the text to be searched.
The text to be searched can be a keyword input by a user, and the text to be matched can be a text corresponding to a recall result obtained after the text to be searched is queried. For example, taking a music player as an example, if the text to be searched is "zhou jenlong qilix", a plurality of pieces of music searched by the music player are the text to be matched.
It should be noted that the text to be matched may also be any preset text for which a correlation with the text to be searched needs to be determined, which is not limited in this disclosure.
S302, dividing the text to be matched into a plurality of target texts according to preset text dividing elements.
The preset text partition elements corresponding to different text types may be different, and the preset text partition elements may be used to represent different dimensions (domains) of the text to be matched, for example, for a music text, the preset text partition elements may include singers, song names, lyrics, word authors, song authors, and the like, and for a paper text, the preset text partition elements may include titles, abstracts, texts, and the like.
S303, dividing the text to be searched and each target text according to a word division mode to obtain a word division text to be searched corresponding to the text to be searched and a target word division text corresponding to each target text.
The word segmentation text may include one or more words, and different words may be separated by a preset symbol, for example, the preset symbol may be "##".
S304, the text to be searched and each target text are divided according to a character dividing mode to obtain a character dividing text to be searched corresponding to the text to be searched and a target character dividing text corresponding to each target text.
The character dividing text may include one or more characters, and different words may be separated by a preset symbol, for example, the preset symbol may be a space.
In this step, after the text to be searched and the plurality of target texts corresponding to the text to be matched are obtained, the text to be searched may be divided according to the character division manner to obtain a character division text to be searched corresponding to the text to be searched, and each target text is divided to obtain a target character division text corresponding to each target text. The character division mode may be any character division mode in the prior art, and the present disclosure does not limit this. For example, if the text to be searched is "pop hunna corrvette", the text to be searched can be divided into "po op ph … … et tt te" according to the character division manner, and different character division texts to be searched are separated by spaces.
It should be noted that, in the present disclosure, the execution order of the step S303 and the step S304 is not limited, and the step S303 may be executed first and then the step S304 is executed, or the step S304 may be executed first and then the step S303 is executed.
S305, obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched, the text to be divided by the target words, the text to be divided by the characters to be searched and the text to be divided by the target characters.
In this step, after the word segmentation text to be searched, the target word segmentation text, the character segmentation text to be searched, and the target character segmentation text are obtained, a plurality of word matching matrices corresponding to the target word segmentation text may be obtained first, and a manner of obtaining the word matching matrices may refer to step S204, which is not described herein again. Then, a character vector corresponding to each character in the character dividing text to be searched and a character vector corresponding to each character in each target character dividing text can be obtained in a word embedding mode, then, the text is divided aiming at each target character, the similarity between the character vector corresponding to each character in the character dividing text to be searched and the character vector corresponding to each character in the target character dividing text can be obtained, and a character matching matrix corresponding to the character dividing text to be target is obtained according to the similarity.
After a plurality of word matching matrices corresponding to the target word division text and a plurality of character matching matrices corresponding to the target character division text are obtained, a first word feature matching vector and a second word feature matching vector corresponding to each word matching matrix and a first character feature matching vector and a second character feature matching vector corresponding to each character matching matrix can be obtained through a pre-trained vector obtaining model.
After obtaining a plurality of first word feature matching vectors, a plurality of second word feature matching vectors, a plurality of first character feature matching vectors and a plurality of second character feature matching vectors, a preset splicing rule can be obtained, the plurality of first word feature matching vectors are spliced according to the preset splicing rule to obtain a plurality of first target word feature matching vectors, the plurality of second word feature matching vectors are spliced to obtain a plurality of second target word feature matching vectors, the plurality of first character feature matching vectors are spliced to obtain a plurality of first target character feature matching vectors, and the plurality of second character feature matching vectors are spliced to obtain a plurality of second target character feature matching vectors.
After obtaining a plurality of first target word feature matching vectors, a plurality of second target word feature matching vectors, a plurality of first target character feature matching vectors and a plurality of second target character feature matching vectors, performing maximum pooling processing on the first target word feature matching vectors aiming at each first target word feature matching vector to obtain first pooled word feature matching vectors corresponding to the first target word feature matching vectors; performing maximum pooling on each first target character feature matching vector to obtain a first pooled character feature matching vector corresponding to the first target character feature matching vector; performing maximum pooling on each second target word feature matching vector, and then performing average pooling to obtain second pooled word feature matching vectors corresponding to the second target word feature matching vectors; and aiming at each second target character feature matching vector, performing maximum pooling on the second target character feature matching vector and then performing average pooling to obtain a second pooled character feature matching vector corresponding to the second target character feature matching vector.
After the plurality of first pooled word feature matching vectors, the plurality of first pooled character feature matching vectors, the plurality of second pooled word feature matching vectors and the plurality of second pooled character feature matching vectors are obtained, the plurality of first pooled word feature matching vectors and the plurality of second pooled word feature matching vectors can be spliced to obtain a plurality of target pooled word feature matching vectors, and the plurality of first pooled character feature matching vectors and the plurality of second pooled character feature matching vectors are spliced to obtain a plurality of target pooled character feature matching vectors. Finally, inputting the target pooling word feature matching vector into a pre-trained full-connection layer aiming at each target pooling word feature matching vector to obtain a matching value corresponding to the target pooling word feature matching vector; and inputting the target pooled character feature matching vector into the full-connection layer aiming at each target pooled character feature matching vector to obtain a matching value corresponding to the target pooled character feature matching vector. And then, according to the maximum matching value, determining the correlation between the text to be searched and the text to be matched.
The following describes the step S305 with the text to be matched as a music text.
Fig. 4 is a block diagram illustrating a method for determining text relevance according to an exemplary embodiment, where as shown in fig. 4, query (pop Hunna corrvette) is the text to be searched, and adderall (corrvette) Popp Hunna is the text to be matched, and after the text to be matched is divided according to the preset dividing elements, 3 target texts are obtained: title: adderall, Extra: corvette corvette, Author: popp Hunna. Dividing the text to be searched according to a word dividing mode to obtain a text to be searched corresponding to the text to be searched, namely a text to be searched, namely 'pop hunn # # a corrv # # ette', dividing the text to be searched according to a character dividing mode to obtain a text to be searched corresponding to the text to be searched, namely 'po op ph … … et tt te'; dividing the 1 st target text according to a word division mode to obtain a target word division text 'adder # # all', dividing the 1 st target text according to a character dividing mode to obtain a target character dividing text 'ad dd de er ra al ll', dividing the 2 nd target text according to a word division mode to obtain a target word division text 'corv # # ette', dividing the 2 nd target text according to a character dividing mode to obtain a target character dividing text 'co or rv … … et tt te', dividing the 3 rd target text according to a word division mode to obtain a target word division text 'popp hunn # # a', and dividing the 3 rd target text according to a character dividing mode to obtain a target character dividing text 'po op pp … … nn na'.
After the word division text to be searched ("pop hunn # # a corrv # # ette"), the character division text to be searched ("po op ph … … et tt te"), a plurality of target word division texts (the name word division text "adder # # all", the lyric word division text "corrv # # ette", and the singer word division text "pop hunn # # a") and a plurality of target character division texts (the name character division text "ad dd de ra ll", the lyric character division text "co or rv … … op tt te", the singer character division text "po ph … … nn na") are obtained, a name word matching matrix corresponding to the name word division text, a lyric word matching matrix corresponding to the lyric word division text, a singer word matching matrix corresponding to the singer word division text, a name character matching matrix corresponding to the name character division text, and a plurality of target character division texts (the name character division text "ad dd de ra all # # all #"), and the singer word division text "po ph … … nn #") are obtained, The lyric character matching matrix corresponding to the lyric character division text and the singer character matching matrix corresponding to the singer character division text.
Then, a first name word feature matching vector and a second name word feature matching vector corresponding to the name word matching matrix, a first lyric word feature matching vector and a second lyric word feature matching vector corresponding to the lyric word matching matrix, a first singer word feature matching vector and a second singer word feature matching vector corresponding to the singer word matching matrix, a first name character feature matching vector and a second lyric character feature matching vector corresponding to the name character matching matrix, a first lyric character feature matching vector and a second lyric character feature matching vector corresponding to the lyric character matching matrix, a first singer character feature matching vector and a second singer character feature matching vector corresponding to the singer character matching matrix can be obtained through a pre-trained vector obtaining model.
If the preset splicing rule is splicing of a singer (Author), a name (Title), the name (Title) and the singer (Author), splicing of the singer (Author) and lyrics (Extra), splicing of the singer (Author) and the lyrics (Extra), and splicing of the singer (Author) and the lyrics (Extra), the first word feature matching vector comprises: a first singer word feature matching vector, a first name word feature matching vector, and a first lyric word feature matching vector, the second word feature matching vector comprising: a second singer word feature matching vector, a second name word feature matching vector, and a second lyric word feature matching vector, the first character feature matching vector comprising: a first singer character feature matching vector, a first name character feature matching vector, and a first lyric character feature matching vector, the second character feature matching vector comprising: and the second singer character feature matching vector, the second name character feature matching vector and the second lyric character feature matching vector are spliced according to the preset splicing rule to obtain 5 first target word feature matching vectors: a first singer word feature matching vector (a1 word feature matching vector), a first name word feature matching vector (B1 word feature matching vector), a first name word feature matching vector + a first singer word feature matching vector (C1 word feature matching vector), a first singer word feature matching vector + a first lyric word feature matching vector (D1 word feature matching vector), a first singer word feature matching vector + a first lyric word feature matching vector (E1 word feature matching vector), 5 second target word feature matching vectors: a second singer word feature matching vector (a2 word feature matching vector), a second name word feature matching vector (B2 word feature matching vector), a second name word feature matching vector + a second singer word feature matching vector (C2 word feature matching vector), a second singer word feature matching vector + a second lyric word feature matching vector (D2 word feature matching vector), a second singer word feature matching vector + a second lyric word feature matching vector (E2 word feature matching vector), 5 first target character feature matching vectors: a first singer character feature matching vector (a1 character feature matching vector), a first name character feature matching vector (B1 character feature matching vector), a first name character feature matching vector + a first singer character feature matching vector (C1 character feature matching vector), a first singer character feature matching vector + a first lyric character feature matching vector (D1 character feature matching vector), a first singer character feature matching vector + a first lyric character feature matching vector (E1 character feature matching vector), 5 second target character feature matching vectors: a second singer character feature matching vector (a2 character feature matching vector), a second name character feature matching vector (B2 character feature matching vector), a second name character feature matching vector + a second singer character feature matching vector (C2 character feature matching vector), a second singer character feature matching vector + a second lyric character feature matching vector (D2 character feature matching vector), a second singer character feature matching vector + a second lyric character feature matching vector (E2 character feature matching vector).
The Title in fig. 4 includes 4 vectors as the B1 word feature matching vector, B2 word feature matching vector, B1 character feature matching vector and B2 character feature matching vector, the Author includes 4 vectors as the a1 word feature matching vector, a2 word feature matching vector, a1 character feature matching vector and a2 character feature matching vector, the Title + Author includes 4 vectors as the C1 word feature matching vector, C2 word feature matching vector, C1 character feature matching vector and C2 character feature matching vector, the Title + exta includes 4 vectors as the E1 word feature matching vector, E2 word feature matching vector, E1 character feature matching vector and E2 character feature matching vector, and the Author + exta includes 4 vectors as the D1 word feature matching vector, D2 word feature matching vector, D1 character matching vector and D2 character matching vector.
Then, the 5 first target word feature matching vectors may be subjected to maximum pooling processing to obtain 5 first pooled word feature matching vectors: a1, B1, C1, D1 and E1 pooling word feature matching vectors; after the 5 second target word feature matching vectors are subjected to maximum pooling, average pooling is performed to obtain 5 second pooled word feature matching vectors: a2, B2, C2, D2 and E2 pooling word feature matching vectors; performing maximum pooling on the 5 first target character feature matching vectors to obtain 5 first pooled character feature matching vectors: a1 pooled character feature matching vectors, B1 pooled character feature matching vectors, C1 pooled character feature matching vectors, D1 pooled character feature matching vectors, E1 pooled character feature matching vectors; after the 5 second target character feature matching vectors are subjected to maximum pooling, average pooling is carried out to obtain 5 second pooled character feature matching vectors: a2 pooled character feature matching vectors, B2 pooled character feature matching vectors, C2 pooled character feature matching vectors, D2 pooled character feature matching vectors, E2 pooled character feature matching vectors. Wherein Global Max Pooling shown in fig. 4 is the maximum Pooling process, and Query Pooling is the maximum Pooling process and the average Pooling process.
Further, splicing the A1 pooled word feature matching vector with the A2 pooled word feature matching vector to obtain an A12 target pooled word feature matching vector, splicing the B1 pooled word feature matching vector with the B2 pooled word feature matching vector to obtain a B12 target pooled word feature matching vector, splicing the C1 pooled word feature matching vector with the C2 pooled word feature matching vector to obtain a C12 target pooled word feature matching vector, splicing the D1 pooled word feature matching vector with the D2 pooled word feature matching vector to obtain a D12 target pooled word feature matching vector, splicing the E1 pooled word feature matching vector with the E2 pooled word feature matching vector to obtain an E12 target pooled word feature matching vector, splicing the A1 pooled character feature matching vector with the A2 pooled character feature matching vector to obtain an A12 target pooled character matching vector, splicing the B1 pooled character feature matching vector with the B2 pooled character feature matching vector to obtain a B12 target pooled character feature matching vector, splicing the C1 pooled character feature matching vector with the C2 pooled character feature matching vector to obtain a C12 target pooled character feature matching vector, splicing the D1 pooled character feature matching vector with the D2 pooled character feature matching vector to obtain a D12 target pooled character feature matching vector, and splicing the E1 pooled character feature matching vector with the E2 pooled character feature matching vector to obtain an E12 target pooled character feature matching vector.
And finally, inputting the A12 target pooled word feature matching vector, the B12 target pooled word feature matching vector, the C12 target pooled word feature matching vector, the D12 target pooled word feature matching vector, the E12 target pooled word feature matching vector, the A12 target pooled character feature matching vector, the B12 target pooled character feature matching vector, the C12 target pooled character feature matching vector, the D12 target pooled character feature matching vector and the E12 target pooled character feature matching vector into the full connection layer respectively to obtain a matching value corresponding to each target pooled word feature matching vector and a matching value corresponding to each target pooled character feature matching vector, and determining the correlation between the text to be searched and the text to be matched according to the maximum matching value. The larger the matching value is, the stronger the correlation between the text to be searched and the text to be matched is, and the smaller the matching value is, the weaker the correlation between the text to be searched and the text to be matched is. Wherein, the sense Layer shown in FIG. 4 is the full connection Layer, and score is the matching value.
By adopting the method, the text to be matched is divided into a plurality of target texts according to the preset text dividing elements, the text to be searched and each target text are divided according to the word dividing mode and the character dividing mode to obtain the text to be searched and the character dividing text to be searched corresponding to the text to be searched, and the target word dividing text and the target character dividing text corresponding to each target text, the correlation between the text to be searched and the text to be matched is obtained according to the text to be searched, the target word dividing texts, the text to be searched and the target character dividing texts, so that the correlation calculated according to the text to be searched and the different target word dividing texts does not influence the calculation result of the overall correlation under the condition that the matching rules are different because the target texts with different dimensions are independent, thereby improving the accuracy of the determined text relevance. Moreover, after the character division mode and the word division mode are fused, aiming at the condition that the text to be searched is not standard or individual characters are misspelled, the correlation between the text to be searched and the text to be matched can be obtained by using the matching value obtained by calculating other correct characters, so that the accuracy of the determined text correlation can be further improved.
Fig. 5 is a block diagram illustrating an apparatus for determining text relevance according to an example embodiment, which may include, as shown in fig. 5:
the text acquiring module 501 is configured to acquire a text to be searched and a text to be matched corresponding to the text to be searched;
a first text division module 502, configured to divide the text to be matched into a plurality of target texts according to preset text division elements, where the preset text division elements are used to represent different dimensions of the text to be matched;
a correlation obtaining module 503, configured to obtain, according to the text to be searched and the plurality of target texts, a correlation between the text to be searched and the text to be matched.
Accordingly, the correlation obtaining module 503 is further configured to:
dividing the text to be searched and each target text according to a word division mode to obtain a word division text to be searched corresponding to the text to be searched and a target word division text corresponding to each target text, wherein the word division text comprises one or more words;
and obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched and the target word division texts.
Accordingly, the correlation obtaining module 503 is further configured to:
obtaining a plurality of word matching matrixes corresponding to the target word division text according to the to-be-searched word division text and the target word division texts;
and acquiring the correlation between the text to be searched and the text to be matched according to a plurality of word matching matrixes.
Accordingly, the correlation obtaining module 503 is further configured to:
and aiming at each target word division text, obtaining the similarity between a word vector corresponding to each word in the text of the word division to be searched and a word vector corresponding to each word in the text of the target word division, and obtaining a word matching matrix corresponding to the text of the target word division according to the similarity.
Accordingly, the correlation obtaining module 503 is further configured to:
obtaining a word feature matching vector corresponding to each word matching matrix through a pre-trained vector obtaining model;
and acquiring the correlation between the text to be searched and the text to be matched according to a plurality of word feature matching vectors.
Accordingly, the correlation obtaining module 503 is further configured to:
acquiring a preset splicing rule;
according to the preset splicing rule, splicing the word feature matching vectors to obtain a plurality of target word feature matching vectors;
obtaining a matching value corresponding to each target word feature matching vector;
and determining the correlation between the text to be searched and the text to be matched according to the maximum matching value.
Correspondingly, the word feature matching vector comprises a first word feature matching vector and a second word feature matching vector, and the target word feature matching vector comprises a first target word feature matching vector and a second target word feature matching vector; the correlation obtaining module 503 is further configured to:
and according to the preset splicing rule, splicing the plurality of first word feature matching vectors to obtain a plurality of first target word feature matching vectors, and splicing the plurality of second word feature matching vectors to obtain a plurality of second target word feature matching vectors.
Accordingly, fig. 6 is a block diagram illustrating a second apparatus for determining text relevance according to an example embodiment, which may include, as shown in fig. 6:
a first pooling vector obtaining module 504, configured to perform, for each first target word feature matching vector, maximum pooling on the first target word feature matching vector to obtain a first pooling word feature matching vector corresponding to the first target word feature matching vector;
a second pooling vector obtaining module 505, configured to perform maximum pooling on each second target word feature matching vector, and then perform average pooling to obtain a second pooling word feature matching vector corresponding to the second target word feature matching vector;
a third pooling vector obtaining module 506, configured to splice the plurality of first pooling word feature matching vectors and the plurality of second pooling word feature matching vectors to obtain a plurality of target pooling word feature matching vectors;
the correlation obtaining module 503 is further configured to:
and inputting the target pooling word feature matching vector into a pre-trained full-connection layer aiming at each target pooling word feature matching vector to obtain a matching value corresponding to the target pooling word feature matching vector.
Accordingly, the correlation obtaining module 503 is further configured to:
determining a service scene corresponding to the text to be matched;
and determining a preset splicing rule corresponding to the service scene according to a preset splicing rule incidence relation, wherein the splicing rule incidence relation comprises the corresponding relation between different service scenes and the preset splicing rule.
Accordingly, fig. 7 is a block diagram illustrating a third apparatus for determining text relevance according to an example embodiment, which may include, as shown in fig. 7:
a second text division module 507, configured to divide the text to be searched and each target text according to a character division manner, to obtain a character division text to be searched corresponding to the text to be searched and a target character division text corresponding to each target text, where the character division text includes one or more characters;
the correlation obtaining module 503 is further configured to:
and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched, the target word division texts, the character division text to be searched and the target character division texts.
By the device, the text to be matched is divided into the plurality of target texts according to the preset text dividing elements, and the correlation between the text to be searched and the text to be matched is obtained according to the text to be searched and the plurality of target texts, so that the calculation result of the correlation cannot be influenced under the condition that the matching rules are different because the target texts with different dimensions are independent, and the accuracy of the determined text correlation is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Referring now to FIG. 8, shown is a schematic diagram of an electronic device 800 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a text to be searched and a text to be matched corresponding to the text to be searched; dividing the text to be matched into a plurality of target texts according to preset text dividing elements; and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not constitute a limitation on the module itself under a certain condition, for example, the text acquiring module may also be described as a "module that acquires a text to be searched and a text to be matched corresponding to the text to be searched".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides, in accordance with one or more embodiments of the present disclosure, a method of determining text relevance, comprising: acquiring a text to be searched and a text to be matched corresponding to the text to be searched; dividing the text to be matched into a plurality of target texts according to preset text dividing elements, wherein the preset text dividing elements are used for representing different dimensions of the text to be matched; and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
According to one or more embodiments of the present disclosure, example 2 provides the method of example 1, wherein the obtaining of the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts comprises: dividing the text to be searched and each target text according to a word division mode to obtain a word division text to be searched corresponding to the text to be searched and a target word division text corresponding to each target text, wherein the word division text comprises one or more words; and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched and the target word division texts.
According to one or more embodiments of the present disclosure, example 3 provides the method of example 2, where the obtaining of the correlation between the text to be searched and the text to be matched according to the text to be divided by the word to be searched and the plurality of texts to be divided by the target word includes: obtaining a plurality of word matching matrixes corresponding to the target word division text according to the to-be-searched word division text and the target word division texts; and acquiring the correlation between the text to be searched and the text to be matched according to the word matching matrixes.
According to one or more embodiments of the present disclosure, example 4 provides the method of example 3, where the obtaining, according to the target word segmentation text and the target word segmentation texts, a plurality of word matching matrices corresponding to the word segmentation text to be searched includes: and aiming at each target word division text, obtaining the similarity between a word vector corresponding to each word in the text of the word division to be searched and a word vector corresponding to each word in the text of the target word division, and obtaining a word matching matrix corresponding to the text of the target word division according to the similarity.
Example 5 provides the method of example 3, and the obtaining the correlation between the text to be searched and the text to be matched according to the plurality of word matching matrices includes: obtaining a word feature matching vector corresponding to each word matching matrix through a pre-trained vector obtaining model; and acquiring the correlation between the text to be searched and the text to be matched according to the word feature matching vectors.
Example 6 provides the method of example 5, and the obtaining the correlation between the text to be searched and the text to be matched according to the plurality of word feature matching vectors includes: acquiring a preset splicing rule; according to the preset splicing rule, splicing the word feature matching vectors to obtain a plurality of target word feature matching vectors; obtaining a matching value corresponding to each target word feature matching vector; and determining the correlation between the text to be searched and the text to be matched according to the maximum matching value.
Example 7 provides the method of example 6, the word feature matching vector comprising a first word feature matching vector and a second word feature matching vector, the target word feature matching vector comprising a first target word feature matching vector and a second target word feature matching vector, in accordance with one or more embodiments of the present disclosure; the splicing the plurality of word feature matching vectors according to the preset splicing rule to obtain a plurality of target word feature matching vectors comprises: and according to the preset splicing rule, splicing the plurality of first word feature matching vectors to obtain a plurality of first target word feature matching vectors, and splicing the plurality of second word feature matching vectors to obtain a plurality of second target word feature matching vectors.
Example 8 provides the method of example 7, wherein before the obtaining of the matching value corresponding to each of the target word feature matching vectors, the method further includes: performing maximum pooling on each first target word feature matching vector to obtain a first pooled word feature matching vector corresponding to the first target word feature matching vector; performing maximum pooling on each second target word feature matching vector, and then performing average pooling to obtain second pooled word feature matching vectors corresponding to the second target word feature matching vectors; splicing the plurality of first pooling word feature matching vectors and the plurality of second pooling word feature matching vectors to obtain a plurality of target pooling word feature matching vectors; the obtaining of the matching value corresponding to each target word feature matching vector includes: and inputting the target pooling word feature matching vector into a pre-trained full-connection layer aiming at each target pooling word feature matching vector to obtain a matching value corresponding to the target pooling word feature matching vector.
Example 9 provides the method of example 6, wherein the obtaining of the preset stitching rule includes: determining a service scene corresponding to the text to be matched; and determining a preset splicing rule corresponding to the service scene according to a preset splicing rule incidence relation, wherein the splicing rule incidence relation comprises the corresponding relation between different service scenes and the preset splicing rule.
Example 10 provides the method of any one of examples 2 to 9, in accordance with one or more embodiments of the present disclosure, before the obtaining a correlation between the text to be searched and the text to be matched according to the text to be divided according to the word to be searched and the plurality of target word divided texts, the method further includes: dividing the text to be searched and each target text respectively according to a character dividing mode to obtain a character dividing text to be searched corresponding to the text to be searched and a target character dividing text corresponding to each target text, wherein the character dividing text comprises one or more characters; the step of obtaining the correlation between the text to be searched and the text to be matched according to the text to be divided by the word to be searched and the plurality of texts to be divided by the target word comprises the following steps: and acquiring the correlation between the text to be searched and the text to be matched according to the text to be searched, the target word division texts, the character division text to be searched and the target character division texts.
Example 11 provides, in accordance with one or more embodiments of the present disclosure, an apparatus to determine text relevance, comprising: the text acquisition module is used for acquiring a text to be searched and a text to be matched corresponding to the text to be searched; the first text division module is used for dividing the text to be matched into a plurality of target texts according to preset text division elements, wherein the preset text division elements are used for representing different dimensions of the text to be matched; and the correlation obtaining module is used for obtaining the correlation between the text to be searched and the text to be matched according to the text to be searched and the plurality of target texts.
Example 12 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-10, in accordance with one or more embodiments of the present disclosure.
Example 13 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-10.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.