Target crowd search intention identification method and device, electronic equipment and medium
1. A method for identifying a search intention of a target crowd is characterized by comprising the following steps:
when a search request is obtained, obtaining search features and statistical features of search words in the search request within a preset time period, wherein the statistical features are used for representing the distribution condition of the search words within the preset time period;
determining candidate search terms from the search terms according to the search characteristics of each search term;
and judging whether the candidate search word has the intention of searching the target crowd or not according to the statistical characteristics of the candidate search word.
2. The method of claim 1, wherein after said determining candidate search terms from said search terms based on search characteristics of each of said search terms, said method further comprises:
when the search word is not a candidate search word, judging whether the search word has the intention of searching a target crowd or not based on the associated search word of the search word or the target user list, wherein the user in the target user list belongs to the target crowd.
3. The method of claim 2, wherein the determining whether the search term has a search target crowd intent based on an associated search term or a target user list of the search term comprises:
determining a rewriting word corresponding to the search word based on a mapping relation between each search word and the rewriting word, wherein the rewriting word is a participle for replacing the search word;
and when the rewritten word has the intention of the search target crowd, determining that the search word has the intention of the search target crowd.
4. The method of claim 2, wherein the determining whether the search term has a search target crowd intent based on an associated search term or a target user list of the search term comprises:
acquiring the target user list;
calculating a first text similarity between the search word and each user identifier in the target user list;
and judging whether the search word has the intention of a search target crowd or not according to the first text similarity.
5. The method of claim 2, wherein the determining whether the search term has a search target crowd intent based on an associated search term or a target user list of the search term comprises:
determining a target search word with the correlation degree larger than a correlation degree threshold value from historical search words with the intention of searching target crowd;
calculating a second text similarity between the search term and the target search term;
and judging whether the search word has the intention of the search target crowd or not according to the second text similarity.
6. The method of claim 5, wherein said calculating a second text similarity between the search term and the target search term comprises:
determining an editing distance between the search word and the target search word, wherein the editing distance is the minimum operation times required for editing the search word to obtain the target search word;
and calculating the similarity of the second text according to the editing distance, the number of the characters of the search word and the number of the characters of the target search word.
7. The method of claim 1, wherein said determining whether the candidate search term has a search target crowd intent according to the statistical characteristics of the candidate search term comprises:
and when the statistical characteristics of the candidate search words meet preset conditions, determining that the candidate search words have the intention of searching target people.
8. An apparatus for identifying a search intention of a target population, comprising:
the search processing module is configured to obtain search features and statistical features of search words in a search request within a preset time period when the search request is obtained, wherein the statistical features are used for representing the distribution condition of the search words within the preset time period;
the determining module is configured to determine candidate search terms from the search terms according to the search characteristics of each search term;
and the judging module is configured to judge whether the candidate search word has the intention of a search target crowd according to the statistical characteristics of the candidate search word.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of identifying a target population search intent according to any one of claims 1-7.
10. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the method of identifying a target population search intent according to any of claims 1-7.
Background
In the search field, accurate search intention can assist in filtering of recall results, and thus, search intention identification plays an important role. In some search scenarios, there is a search behavior for searching for target people, for example, a social platform searches for a large number of users, and therefore, it is necessary to identify a search intention of the target people.
In the related art, a common method for identifying a user search intention is search intention identification based on machine learning, and mainly includes labeling a search intention based on an existing search word, training a machine learning model by using a labeled sample, and predicting the search intention of the search word by using the trained model.
However, in the search intention recognition based on machine learning, the search intention is recognized by learning semantic information of a text, and generally, the user names of the target groups have less semantic information and low recognition accuracy.
Disclosure of Invention
According to a first aspect of the embodiments of the present disclosure, there is provided a method for identifying a search intention of a target population, including:
when a search request is obtained, obtaining search features and statistical features of search words in the search request within a preset time period, wherein the statistical features are used for representing the distribution condition of the search words within the preset time period;
determining candidate search terms from the search terms according to the search characteristics of each search term;
and judging whether the candidate search word has the intention of searching the target crowd or not according to the statistical characteristics of the candidate search word.
In a possible implementation manner of the embodiment of the first aspect of the present disclosure, after determining, according to a search feature of each search term, a candidate search term from the search terms, the method further includes:
when the search word is not a candidate search word, judging whether the search word has the intention of searching a target crowd or not based on the associated search word of the search word or the target user list, wherein the user in the target user list belongs to the target crowd.
In a possible implementation manner of the embodiment of the first aspect of the present disclosure, the determining, based on the associated search term of the search term or the target user list, whether the search term has a search target crowd intention includes:
determining a rewriting word corresponding to the search word based on a mapping relation between each search word and the rewriting word, wherein the rewriting word is a participle for replacing the search word;
and when the rewritten word has the intention of the search target crowd, determining that the search word has the intention of the search target crowd.
In a possible implementation manner of the embodiment of the first aspect of the present disclosure, the determining, based on the associated search term of the search term or the target user list, whether the search term has a search target crowd intention includes:
acquiring the target user list;
calculating a first text similarity between the search word and each user identifier in the target user list;
and judging whether the search word has the intention of a search target crowd or not according to the first text similarity.
In a possible implementation manner of the embodiment of the first aspect of the present disclosure, the determining, based on the associated search term of the search term or the target user list, whether the search term has a search target crowd intention includes:
determining a target search word with the correlation degree larger than a correlation degree threshold value from historical search words with the intention of searching target crowd;
calculating a second text similarity between the search term and the target search term;
and judging whether the search word has the intention of the search target crowd or not according to the second text similarity.
In one possible implementation manner of the embodiment of the first aspect of the present disclosure, the calculating a second text similarity between the search term and the target search term includes:
determining an editing distance between the search word and the target search word, wherein the editing distance is the minimum operation times required for editing the search word to obtain the target search word;
and calculating the similarity of the second text according to the editing distance, the number of the characters of the search word and the number of the characters of the target search word.
In a possible implementation manner of the embodiment of the first aspect of the present disclosure, the determining, according to the statistical feature of the candidate search term, whether the candidate search term has an intention of a search target crowd includes:
and when the statistical characteristics of the candidate search words meet preset conditions, determining that the candidate search words have the intention of searching target people.
In a possible implementation manner of the embodiment of the first aspect of the present disclosure, the determining, according to the search feature of each search term, a candidate search term from the search terms, where the search feature includes a number of search times and a click rate of a search result, includes:
and taking the search word of which the search times and the click rate are both larger than the corresponding threshold values as the candidate search word.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for identifying a search intention of a target group, including:
the search processing module is configured to obtain search features and statistical features of search words in a search request within a preset time period when the search request is obtained, wherein the statistical features are used for representing the distribution condition of the search words within the preset time period;
the determining module is configured to determine candidate search terms from the search terms according to the search characteristics of each search term;
and the judging module is configured to judge whether the candidate search word has the intention of a search target crowd according to the statistical characteristics of the candidate search word.
In a possible implementation manner of the embodiment of the second aspect of the present disclosure, the determining module is further configured to determine, when the search word is not a candidate search word, whether the search word has an intention of searching a target crowd based on an associated search word of the search word or a target user list, where a user in the target user list belongs to the target crowd.
In one possible implementation manner of the embodiment of the second aspect of the present disclosure, the determining module is configured to:
determining a rewriting word corresponding to the search word based on a mapping relation between each search word and the rewriting word, wherein the rewriting word is a participle for replacing the search word;
and when the rewritten word has the intention of the search target crowd, determining that the search word has the intention of the search target crowd.
In one possible implementation manner of the embodiment of the second aspect of the present disclosure, the determining module is configured to:
acquiring the target user list;
calculating a first text similarity between the search word and each user identifier in the target user list;
and judging whether the search word has the intention of a search target crowd or not according to the first text similarity.
In one possible implementation manner of the embodiment of the second aspect of the present disclosure, the determining module is configured to:
determining a target search word with the correlation degree larger than a correlation degree threshold value from historical search words with the intention of searching target crowd;
calculating a second text similarity between the search term and the target search term;
and judging whether the search word has the intention of the search target crowd or not according to the second text similarity.
In one possible implementation manner of the embodiment of the second aspect of the present disclosure, the determining module is configured to:
determining an editing distance between the search word and the target search word, wherein the editing distance is the minimum operation times required for editing the search word to obtain the target search word;
and calculating the similarity of the second text according to the editing distance, the number of the characters of the search word and the number of the characters of the target search word.
In one possible implementation manner of the embodiment of the second aspect of the present disclosure, the determining module is configured to:
and when the statistical characteristics of the candidate search words meet preset conditions, determining that the candidate search words have the intention of searching target people.
In one possible implementation manner of the embodiment of the second aspect of the present disclosure, the search feature includes a number of searches and a click rate of a search result, and the determining module is configured to:
and taking the search word of which the search times and the click rate are both larger than the corresponding threshold values as the candidate search word.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method for identifying the search intention of the target people group according to the embodiment of the first aspect.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, wherein instructions when executed by a processor of an electronic device enable the electronic device to perform the method for identifying a target group search intention as described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when executed by a processor of an electronic device, enables the electronic device to perform the method for identifying a target group search intention as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: when the search request is obtained, the search characteristics and the statistical characteristics of each search word in the search request within a preset time period are obtained, candidate search words are determined from each search word according to the search characteristics of each search word, and whether the candidate search words have the intention of a search target crowd or not is judged according to the statistical characteristics of the candidate search words. Therefore, whether the search word has the intention of the search target crowd or not is determined according to the search characteristics and the statistical characteristics of the search word in the preset time period, and the search accuracy is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method for identifying a target population search intent, according to an exemplary embodiment.
FIG. 2 illustrates a schematic diagram of a process for identifying a large V search intention, according to an exemplary embodiment.
Fig. 3 is a flowchart illustrating another method for identifying a search intention of a target group of people according to an exemplary embodiment.
Fig. 4 is a flowchart illustrating another method for identifying a search intention of a target group of people according to an exemplary embodiment.
Fig. 5 is a flowchart illustrating another method for identifying a search intention of a target group of people according to an exemplary embodiment.
Fig. 6 is a flowchart illustrating another method for identifying a search intention of a target group of people according to an exemplary embodiment.
Fig. 7 is a block diagram illustrating an apparatus for identifying search intent of a target population according to an exemplary embodiment.
FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a method for identifying a target group search intention according to an exemplary embodiment, where the method for identifying a target group search intention, as shown in fig. 1, includes the following steps.
In step 101, when a search request is acquired, search features and statistical features of search terms in the search request within a preset time period are acquired.
In the present disclosure, the target crowd may refer to users whose fans are greater than a threshold number, or users whose attention number exceeds a threshold number, or users whose praise number or forwarding exceeds a certain number, and so on.
When the search intention of the target population is identified, at least one search word can be obtained according to the obtained search request, for example, a query sentence input by a user is subjected to word segmentation to obtain a plurality of search words.
In the disclosure, the search characteristics and the statistical characteristics of each search term in the search request within the preset time period may be counted according to the historical search records within the preset time period. The search characteristics can be used for measuring the triggering conditions of the user on the search terms, the search results corresponding to the search terms and the like within a preset time period; the statistical characteristics are used for representing the distribution condition of the search terms within a preset time period.
The search characteristics corresponding to each search term may include the number of searches of each search term within a preset time period, the click rate of the search result, and the like. The click volume of the search result can be understood as the total number of clicks of the search result by the user within a preset time period under the search term.
Wherein, the statistical characteristics corresponding to each search term may include: the basic coefficient of the click rate distribution within the preset time period under each search word, the basic coefficient of the attention rate distribution within the preset time period under each search word, the maximum click rate within the preset time period under each search word, the maximum attention rate within the preset time period under each search word and the like.
It should be noted that the preset time period may be determined as needed, for example, the last 3 days, the last 7 days, the last 10 days, and the like, which is not limited by the present disclosure.
In step 102, candidate search terms are determined from the search terms according to the search characteristics of each search term.
Since the search characteristics of the search word, such as the number of searches, are more, the interpretation accuracy of the intention of the search word by the click and attention behavior of the user is higher. Based on this, in the present disclosure, candidate search terms may be determined from the respective search terms according to search characteristics of the search terms.
Optionally, the search feature may include a number of searches, the number of searches for each search word may be compared with a corresponding threshold, and if the number of searches for a search word is greater than the corresponding threshold, the search word may be considered as a candidate search word.
Optionally, the search feature may include a click rate of the search result, the click rate of the search result of each search term may be compared with a corresponding threshold, and if the click rate of the search result of the search term is greater than the corresponding threshold, the search term may be considered as a candidate search term.
Alternatively, the search characteristics may include the number of searches and the number of clicks of the search result, and the number of searches for each search term may be compared with a corresponding threshold value, and the number of clicks of the search result may be compared with a corresponding threshold value. If the number of search times and the click rate of the search result are both greater than the corresponding threshold values, the search term can be considered as a candidate search term.
The search frequency and the click amount of the search result each correspond to a threshold, and the thresholds may be the same or different.
For example, when the preset time period is the last 7 days, the search characteristics of the search term include: the total search amount Pv _7d of the search term in the last 7 days and the total Click amount Click _7d of the user in the last 7 days under the search term can be determined as a candidate search term when Pv _7d corresponding to the search term is greater than or equal to 100 and Click _7d is greater than or equal to 10.
It is understood that there may be a case where the search characteristics of each search term in the search request do not satisfy the corresponding condition, and the number of candidate search terms is zero. That is, the number of candidate search terms may be zero or equal to or greater than 1.
In this embodiment, candidate search terms with rich history feedback can be screened from each search term in the search request according to the search characteristics of the search terms, so that a part of search terms without rich history feedback can be filtered.
In step 103, it is determined whether the candidate search term has the intention of the search target crowd according to the statistical characteristics of the candidate search term.
Because the statistical characteristics are used for representing the distribution condition of the search words in the preset time period, in order to improve the accuracy of the search intention, after the candidate search words are obtained, whether the candidate search words have the intention of searching the target crowd can be further judged according to the statistical characteristics corresponding to the candidate search words.
As an implementation manner, if the statistical characteristics of the candidate search terms satisfy the preset conditions, the candidate search terms may be considered to have the intention of the search target population. If the number of the statistical characteristics is multiple, the statistical characteristics of the candidate search term meet preset conditions, and any one of the statistical characteristics meets the corresponding preset conditions, or each of the statistical characteristics meets the corresponding preset conditions.
For example, the statistical features include a kini coefficient of a click rate distribution, a kini coefficient of an attention rate distribution, a maximum click rate, and a maximum attention rate. And if the keny coefficient of the click rate distribution, the keny coefficient of the attention rate distribution, the maximum click rate and the maximum attention rate of the search word are all larger than the corresponding threshold values, the search word can be considered to have the intention of the search target crowd.
It should be noted that the kini coefficient of the click rate distribution of the search term, the kini coefficient of the attention rate distribution, the maximum click rate, and the maximum attention rate each correspond to one threshold, that is, 4 thresholds in total, and these thresholds may be set as needed.
For example, when the preset time period is the last 7 days, the statistical characteristics of the search term include: a chiny coefficient Ctr _ gini of click rate distribution, a chiny coefficient Ftr _ gini of attention rate distribution, a maximum click rate Max _ Ctr, and a maximum attention rate Max _ Ftr. Assuming that the thresholds corresponding to the statistical features are 0.8, 0.7, 0.28, and 0.3, respectively, if Ctr _ gini of the candidate search word a is greater than or equal to 0.8, Ftr _ gini is greater than or equal to 0.7, Max _ Ctr is greater than or equal to 0.28, and Max _ Ftr is greater than or equal to 0.3, the candidate search word a may be considered to have the target crowd search intention.
It can be understood that, if a plurality of candidate search terms are determined from the search terms, whether each candidate search term has the intention of the search target crowd can be determined according to the statistical characteristics of each candidate search term.
In this disclosure, if the statistical characteristics of the candidate search term do not satisfy the preset conditions, it may be determined that the candidate search term does not have the intention of the search target population. If there are a plurality of statistical features, the condition that the statistical features do not satisfy the preset condition can be understood as that any one of the statistical features does not satisfy the corresponding preset condition.
For example, the second thresholds of Ctr _ gini, Ftr _ gini, Max _ Ctr, and Max _ Ftr are 0.8, 0.7, 0.28, and 0.3, respectively, and if Ctr _ gini of the search word B is less than 0.8, the search word B may be considered to have no search target crowd intention.
If there are multiple candidate search terms in the search request, the search request may be considered to have the intent of the search target crowd when at least one candidate search term has the intent of the search target crowd.
The method for identifying the target crowd search intention can be applied to multiple applications such as short video applications and social software, and can determine whether the search intention of the search request is the intention of the search target crowd when the user initiates the search request in the applications.
Taking a short video application as an example, in the short video application, a search behavior of an author who issues a short video may be referred to as a user search, and user search intentions in the user search may be classified into a large V (an author with a large number of fans, such as an author who has more than 10 ten thousand fans) search intention, a general user search intention, a similar user search intention (e.g., an author related to a certain game), and the like. In short video application, a user has obvious head effect on the consumption behavior of a video, so that the search behavior of a head author (marked as a large V) is also the most concentrated part of the search behavior of the user, and accurate large V intention identification has important significance for improving the search accuracy of the user. When a user inputs a search word in the short video application to initiate a search request, whether the search intention of the search word is a large-V search intention or not can be determined based on the identification method of the target crowd search intention, and the search accuracy of the user in the short video application is improved.
According to the method for identifying the target crowd search intention, when the search request is obtained, the search characteristics and the statistical characteristics of all search words in the search request within the preset time period are obtained, the candidate search words are determined from all the search words according to the search characteristics of the search words, and whether the candidate search words have the target crowd search intention or not is judged according to the statistical characteristics of the candidate search words. Therefore, whether the search word has the intention of the search target crowd or not can be determined according to the search characteristics and the statistical characteristics of the search word in the preset time period, and the search accuracy is improved.
In order to improve the accuracy of the intention identification, for the above-mentioned threshold value used for determining whether the candidate search word has the intention of the search target population, in an embodiment of the present disclosure, the candidate search word may also be clustered according to the statistical characteristics corresponding to the candidate search word, and the threshold value is generated according to the clustering result.
In one embodiment, the candidate search terms may be clustered according to statistical characteristics corresponding to the candidate search terms to generate a plurality of clusters, and the clusters with the largest values of the kini coefficient of the click rate distribution, the kini coefficient of the attention rate distribution, the largest click rate, the largest attention rate, and the like in the statistical characteristics may be used as the search target crowd intention regions. After the search target crowd intention area is obtained, the boundary of the search target crowd intention area can be used as a threshold corresponding to each statistical feature, and whether the candidate search word has the search target crowd intention or not is judged by using the obtained threshold.
Alternatively, the candidate search terms may be clustered using a K-Means cluster (K-Means clustering) algorithm.
For example, the statistical features include: ctr _ gini, Ftr _ gini, Max _ Ctr and Max _ Ftr, and clustering the candidate search words by using a K-Means algorithm according to the four characteristics, wherein the number of categories when the contour coefficient is maximum can be selected as the number of actual partitions.
In another embodiment, a large number of search terms can be obtained according to the history, threshold values corresponding to the kini coefficient of the click rate distribution, the kini coefficient of the attention rate distribution, the maximum click rate, the maximum attention rate and the like in the statistical characteristics are determined according to the large number of search terms, and whether the search terms have the intention of searching the target population or not can be judged in an online real-time search system based on the threshold values generated offline. In the following, taking the large V search intention identification as an example, the description is made with reference to fig. 2, and fig. 2 shows a schematic diagram of an identification process of a large V search intention according to an exemplary embodiment.
As shown in the offline processing of fig. 2, step 201, a large number of search terms and the search features and statistical features of each search term are obtained; step 202, screening out search characteristics such as search terms with search times larger than corresponding threshold values from a large number of search terms; step 203, clustering the screened search terms based on the statistical characteristics Ctr _ gini, Ftr _ gini, Max _ Ctr and Max _ Ftr; step 204, selecting the regions with the maximum statistical characteristics, taking the region boundary as the respective threshold of the 4 statistical characteristics, and generating 4 thresholds for real-time processing.
As shown in fig. 2, in real-time processing, when a search occurs, step 205, obtains the search features and statistical features of the search word; step 206, determine whether the search characteristics of the search terms, such as the number of searches, are greater than the corresponding threshold. If not, adopting other schemes for processing; if yes, step 207 is executed to determine whether the statistical characteristics of the search terms are all greater than the corresponding threshold values, if yes, the search has a large V search intention, otherwise, the search does not have the large V search intention.
In the embodiment of the disclosure, a plurality of candidate search terms are clustered according to the statistical characteristics corresponding to the candidate search terms, a search target crowd intention region is generated, and the statistical characteristics, such as a keny coefficient of click rate distribution, a keny coefficient of attention rate distribution, a maximum click rate and a threshold corresponding to the maximum attention rate, are generated according to the region boundary of the search target crowd intention region. Therefore, the selected candidate search words are clustered based on the statistical characteristics to generate corresponding threshold values, accuracy of the threshold values is improved, and accordingly identification accuracy of the intentions of the search target population is improved.
In practical application, in some scenarios, when it is determined that a search word is not a candidate search word, that is, is not a candidate search word, according to a search feature of the search word, the search word may have an intention of a search target population. For example, in the scenario of searching for a large V in a short video application, the user searches for a fuzzy search where the word is filled in incorrectly or the user does not know the real name of the large V. In order to further improve the accuracy of the search, in the disclosure, after the candidate search word is determined from each search word according to the search feature of each search word, when the search word is not a candidate search word, whether the search word has the intention of searching a target population may be determined based on the associated search word of the search word or a target user list, where a user in the target user list belongs to the target population.
In one embodiment of the present disclosure, when a search word is not a candidate search word, a determination may be made as to whether a search word has an intention of a search target crowd.
Fig. 3 is a flowchart illustrating another method for identifying a search intention of a target group according to an exemplary embodiment, which is described below with reference to fig. 3. As shown in fig. 3, the method for identifying the search intention of the target population includes:
in step 301, when a search request is obtained, search features and statistical features of search terms in the search request within a preset time period are obtained.
In step 302, candidate search terms are determined from the search terms according to the search characteristics of each search term.
In step 303, it is determined whether the candidate search term has the intention of the search target population according to the statistical characteristics of the candidate search term.
In the present disclosure, steps 301 to 303 are similar to steps 101 to 103, and thus are not described herein again.
In step 304, when the search term is not a candidate search term, the rewritten term corresponding to the search term is determined based on the mapping relationship between each search term and the rewritten term.
In the present disclosure, if a user does not click on a search result after searching based on a certain search word, searches for another search word in a short time (e.g., 30 seconds), and then clicks on the search result, and the texts of the two search words have an overlapping portion, the other search word may be referred to as a rewritten word.
When the search term is not a candidate search term, for example, the number of searches for the search term is less than or equal to the corresponding threshold, the search term may be counted from the historical search records within a preset time period, and the search term may be considered as an associated search term of the search term. Or, counting the rewriting words of each search word in the historical browsing records in advance, establishing the corresponding relation between the search words and the rewriting words, and determining the rewriting words of the current search word according to the corresponding relation during online real-time search. If the corresponding relation does not have the current search word, the rewritten word corresponding to the search word with the highest similarity can be used as the rewritten word of the current search word.
It should be noted that there may be one rewrite word for each search word, or there may be a plurality of rewrite words.
In order to reduce the calculation amount and improve the accuracy of intention recognition, the search term with the largest proportion in the rewriting terms of the search terms can be obtained, and can be called top1 rewriting term, and the proportion refers to the ratio of the rewriting times to the total rewriting times in a preset time period. For example, the top1 rewrite of the search term C refers to the largest rewrite of all the rewrites of the search term C.
In the method, the top1 rewritten words of the search words are obtained, whether the search words have the intention of the search target crowd is judged based on the top1 rewritten words of the search words, the calculation amount can be reduced, and the identification accuracy and the search accuracy are improved.
In step 305, when the rewritten word has a search target crowd intention, it is determined that the search word has the search target crowd intention.
In this disclosure, the search feature and the statistical feature of each rewrite word of the search word may be obtained, and whether each rewrite word has the intention of the search target crowd is determined based on the search feature and the statistical feature of each rewrite word. When any rewriting word of the search word has the intention of the search target crowd, the search word can be judged to have the intention of the search target crowd, otherwise, the search word can be judged not to have the intention of the search target crowd, or other schemes are adopted for judgment.
In order to reduce the amount of calculation, alternatively, only whether the top1 rewrite of the search word has the search target crowd intention may be determined, and when the top1 rewrite has the search target crowd intention, the search word may be determined to have the search target crowd intention. If the top1 rewrite word does not have the intention of the target crowd for searching, the search word can be judged not to have the intention of the target crowd for searching, or other schemes can be adopted for judgment.
According to the method for identifying the target crowd search intention, when the search word is not the candidate search word, whether the search word has the search target crowd intention or not can be indirectly judged by means of the rewriting word of the search word, so that the situation that the input name of the target user is wrong or inaccurate, the target crowd search intention is identified wrongly can be avoided, the accuracy of identifying the target crowd search intention is improved, and the search accuracy is further improved.
In one embodiment of the disclosure, when the search word is not a candidate search word, whether the search word has the intention of the search target crowd can be indirectly determined based on the browsing records of the user. Fig. 4 is a flowchart illustrating another method for identifying a search intention of a target group according to an exemplary embodiment, which is described below with reference to fig. 4.
As shown in fig. 4, the method for identifying the search intention of the target population includes:
in step 401, when a search request is obtained, search features and statistical features of search terms in the search request within a preset time period are obtained.
In step 402, candidate search terms are determined from the search terms according to the search characteristics of each search term.
In step 403, it is determined whether the candidate search word has the intention of the search target crowd according to the statistical characteristics of the candidate search word.
In the present disclosure, steps 401 to 403 are similar to steps 101 to 103, and therefore are not described herein again.
In step 404, when the search term is not a candidate search term, a list of target users is obtained.
According to the method and the device, historical browsing records in a preset time period can be obtained, all browsed users are obtained according to the historical browsing records, users meeting conditions, such as users with the number of fans being larger than a set threshold value, are screened out, and a target user list is obtained and used for real-time searching. The users in the target user list belong to a target group, and the target user list may include target user identifications, such as user names and other information.
In real-time searching, when the search word is not a candidate search word, a target user list can be obtained.
Taking short video application as an example, history records of videos browsed by a user in the last 7 days can be obtained, a list of authors of the videos is obtained, and a part of the list, which is a large V, is screened out and is used as a large V list browsed by the user recently and stored in an online cache for real-time search.
In step 405, a first text similarity between the search term and each user identification in the list of target users is calculated.
In the present disclosure, a text similarity between the search term and each user identifier in the target user list may be calculated, and for convenience of distinction, the text similarity is referred to as a first text similarity.
When the first text similarity is calculated, the first text similarity may be calculated according to the vector corresponding to the search term and the vector corresponding to the user identifier.
When calculating the first text similarity, the calculation may also be performed using the edit distance ratio. The editing distance represents the minimum number of operations required by one text to obtain another text through editing operations such as adding, deleting and modifying characters. In the present disclosure, the edit distance between the search term and the user identifier may represent that the search term is subjected to editing operations such as adding, deleting, and modifying characters, so as to obtain the minimum number of operations required by the user identifier. The calculation formula of the first text similarity is as follows:
wherein q represents a search term, d represents a user identifier, r (q, d) represents a first text similarity, L (q, d) represents an edit distance between the search term q and the user identifier d, q.size represents the size of the search term q, i.e., the number of contained characters, d.size represents the size of the user identifier d, i.e., the number of contained characters, and max (q.size, d.size) represents taking the maximum value of q.size and d.size.
In step 406, whether the search word has the intention of the search target crowd is judged according to the first text similarity.
In this embodiment, when the first text similarity between the search word and the at least one user identifier is greater than the corresponding similarity threshold, it is described that the similarity between the search word and the user identifier is relatively high, and it can be considered that the search word has the intention of the search target crowd.
For example, the similarity threshold is 0.5, and when the text similarities between the search term and one or more user identifiers are greater than 0.5, the search term may be considered to have the intention of the search target population.
It should be noted that the similarity threshold may be set as needed, and the disclosure does not limit this.
Taking a short video application as an example, in one search request, if the search feature of a search word, such as a threshold corresponding to the number of searches, is greater than the similarity threshold of 0.5, the text similarity between the search word and one or more large V names in a large V list recently browsed by the user, the search word may also be considered to have a large V search intention.
According to the method for identifying the target crowd search intention, when the search word is not the candidate search word, whether the search word has the target crowd search intention or not can be judged by means of the text similarity between the search word and the user identification of the user belonging to the target crowd, so that the situation that the target crowd search intention is identified incorrectly due to the fact that the input name of the target user is incorrect or inaccurate can be avoided, the accuracy of identifying the target crowd search intention is improved, and the search accuracy is further improved.
In one embodiment of the present disclosure, when the search word is not a candidate search word, it may also be determined whether the search word has the search target crowd intention based on a text similarity between the search word and the search word having the search target crowd intention. Referring to fig. 5, fig. 5 is a flowchart illustrating another method for identifying a search intention of a target group according to an exemplary embodiment.
As shown in fig. 5, the method for identifying the search intention of the target population includes:
when the search request is obtained in step 501, search features and statistical features of search terms in the search request within a preset time period are obtained.
In step 502, candidate search terms are determined from the search terms according to the search characteristics of each search term.
In step 503, it is determined whether the candidate search term has the intention of the search target population according to the statistical characteristics of the candidate search term.
In the present disclosure, steps 501-503 are similar to steps 101-103, and therefore are not described herein again.
In step 504, when the search word is not a candidate search word, a target search word with a correlation degree greater than a correlation degree threshold value with the search word is determined from the historical search words with the intention of the search target crowd.
In this embodiment, a plurality of search terms having the intention of the search target crowd within a preset time period may be obtained in advance. When the search word is not a candidate search word, the correlation between the search word and the search word with the intention of the search target crowd can be calculated, and the search word with the correlation larger than the threshold value of the correlation is screened out.
The correlation threshold may be set as needed, which is not limited in this disclosure.
In step 505, a second text similarity between the search term and the target search term is calculated.
In the present disclosure, when calculating the second text similarity, the second text similarity may be calculated according to the vector corresponding to the search term and the vector corresponding to the target search term.
Or, an editing distance between the search word and the target search word may also be determined, where the editing distance is the minimum number of operations required for performing an editing operation on the search word to obtain the target search word, a maximum number of characters is determined from the number of characters of the search word and the number of characters of the target search word, and the second text similarity is calculated according to the maximum number of characters and the editing distance. That is, the calculation can be performed using the above-described calculation formula of the first text similarity.
In the method, the second text similarity is calculated by using the editing distance, and the method is convenient and simple.
In step 506, whether the search word has the intention of the search target crowd is judged according to the second text similarity.
In this embodiment, when the similarity of the search term and the second text of one or more target search terms is greater than the corresponding similarity threshold, the search term may be considered to have the target crowd search intention.
The similarity threshold corresponding to the similarity of the second text may be the same as or different from the similarity threshold corresponding to the similarity of the first text, and may be set as required.
Taking a short video application as an example, search terms with a large V search intention can be acquired by day, and these search terms can be regarded as user names to establish an inverted index. In an online search request, if a search word is not a candidate search word, the search word can be used to retrieve a related search word from the inverted index, and the text similarity is calculated, where the related search word is recalled below the search word and the text similarity is greater than a threshold value of 0.5, and the search word can be considered to have a large-V search intention.
According to the method for identifying the target crowd search intention, when the search word is not a candidate search word, whether the search word has the search target crowd intention or not can be judged by utilizing the search word which is related to the search word and has the search target crowd intention, so that the situation that the input name of the target user is wrong or inaccurate, the target crowd search intention is identified wrongly can be avoided, the accuracy of identifying the target crowd search intention is improved, and the search accuracy is further improved.
Fig. 6 is a flowchart illustrating another method for identifying a search intention of a target group of people according to an exemplary embodiment. The method for identifying the search intention of the target group according to the embodiment of the present disclosure is further described below with reference to fig. 6.
As shown in fig. 6, the method for identifying the search intention of the target population includes:
in step 601, search characteristics and statistical characteristics of search terms in the search request are obtained.
When a user initiates a search, search features and statistical features of search terms in a search request within a preset time period can be obtained, which are not described herein again as described in the above embodiments.
In step 602, it is determined whether the search characteristics of the search term satisfy corresponding preset conditions. If the preset condition is met, executing step 603; otherwise, step 604 is performed.
For example, the search feature includes a search frequency, and if the search frequency is greater than a corresponding threshold, the search feature may be considered to satisfy a corresponding preset condition. For another example, the search feature includes a click amount of the search result, and if the click amount of the search result is greater than the corresponding threshold, the search feature may be considered to satisfy the corresponding preset condition. For another example, the search feature includes a search frequency and a click amount of the search result, and if the search frequency is greater than the corresponding threshold and the click amount of the search result is also greater than the corresponding threshold, the search feature may be considered to satisfy the corresponding preset condition.
In step 603, it is determined whether the statistical characteristics of the search term satisfy the corresponding preset conditions.
For example, the statistical features include: a chiny coefficient of the click rate distribution, a chiny coefficient of the attention rate distribution, a maximum click rate, and a maximum attention rate. If the keny coefficient of the click rate distribution, the keny coefficient of the attention rate distribution, the maximum click rate and the maximum attention rate are all larger than the corresponding threshold values, the statistical characteristics of the search terms can be considered to meet the corresponding preset conditions.
In step 604, it is determined whether the search word has the intention of the search target population based on the rewritten word of the search word.
When the search characteristics of the search terms do not meet the corresponding preset conditions, namely the search terms are not candidate search terms, whether the search terms have the intention of the search target crowd or not can be judged based on the rewritten terms of the search terms. The judging process can refer to the embodiment shown in fig. 3.
If the rewritten word has the intention of the search target crowd, the search word can be considered to have the intention of the search target crowd. Otherwise, step 605 is executed.
In step 605, based on the target user list, it is determined whether the search word has the intention of searching the target population.
When it is determined that the search word does not have the search target crowd intention based on the rewritten word of the search word, it may be determined whether the search word has the search target crowd intention based on the target user list. The judging process can refer to the embodiment shown in fig. 4.
If the first text similarity between the search word and the user identifier in the target user list is greater than the corresponding similarity threshold, the search word can be considered to have the intention of searching the target crowd. Otherwise, step 606 is performed.
In step 606, it is determined whether the search word has the search target crowd intention based on the search word having the search target crowd intention.
When it is determined that the search word does not have the search target crowd intention based on the target user list, the determination may be made based on the search word having the search target crowd intention. The judging process can refer to the embodiment shown in fig. 5.
And when the similarity of the second text between the search word and the target search is greater than the corresponding similarity threshold value, the search word can be considered to have the intention of the search target crowd, otherwise, the search word does not have the intention of the search target crowd.
In the embodiment of the disclosure, when the search characteristics of the search word satisfy the corresponding preset conditions, the rewritten word can be used to judge whether the search word has the intention of the search target crowd. When it is determined that the search word does not have the search target crowd intention based on the rewritten word, it may be determined whether the search word has the search target crowd intention based on the target user list. When the search word is judged not to have the search target crowd intention based on the target user list, the search word with the search intention of the target crowd can be judged, so that the search accuracy is greatly improved.
It should be noted that, when the search term is not a candidate search term, it may be determined whether the search term has the search target crowd intention based on the target user list, or the determination may be performed based on the search term having the search target crowd intention.
Alternatively, in practical applications, when the search term is not a candidate search, at least one of step 604, step 605 and step 606 may be used for determination.
Fig. 7 is a block diagram illustrating an apparatus for identifying search intent of a target population according to an exemplary embodiment. Referring to fig. 7, the apparatus 700 includes: an obtaining module 710, a determining module 720, and a determining module 730.
The obtaining module 710 is configured to, when a search request is obtained, obtain search features and statistical features of search terms in the search request within a preset time period, where the statistical features are used to characterize a distribution situation of the search terms within the preset time period;
the determining module 720 is configured to determine candidate search terms from the search terms according to the search features of each search term;
the determining module 730 is configured to determine whether the candidate search word has the intention of the search target crowd according to the statistical characteristics of the candidate search word.
In a possible implementation manner of the embodiment of the present disclosure, the determining module 730 is further configured to determine, when the search term is not a candidate search term, whether the search term has an intention of searching a target group based on an associated search term of the search term or a target user list, where a user in the target user list belongs to the target group.
In a possible implementation manner of the embodiment of the present disclosure, the determining module 730 is configured to:
determining a rewriting word corresponding to the search word based on a mapping relation between each search word and the rewriting word, wherein the rewriting word is a participle for replacing the search word;
and when the rewritten word has the intention of the search target crowd, determining that the search word has the intention of the search target crowd.
In a possible implementation manner of the embodiment of the present disclosure, the determining module 730 is configured to:
acquiring the target user list;
calculating a first text similarity between the search word and each user identifier in the target user list;
and judging whether the search word has the intention of a search target crowd or not according to the first text similarity.
In a possible implementation manner of the embodiment of the present disclosure, the determining module 730 is configured to:
determining a target search word with the correlation degree larger than a correlation degree threshold value from historical search words with the intention of searching target crowd;
calculating a second text similarity between the search term and the target search term;
and judging whether the search word has the intention of the search target crowd or not according to the second text similarity.
In a possible implementation manner of the embodiment of the present disclosure, the determining module 730 is configured to:
determining an editing distance between the search word and the target search word, wherein the editing distance is the minimum operation times required for editing the search word to obtain the target search word;
and calculating the similarity of the second text according to the editing distance, the number of the characters of the search word and the number of the characters of the target search word.
In a possible implementation manner of the embodiment of the present disclosure, the determining module 730 is configured to:
and when the statistical characteristics of the candidate search words meet preset conditions, determining that the candidate search words have the intention of searching target people.
In a possible implementation manner of this disclosure, the search feature includes a number of searches and a click rate of a search result, and the determining module 720 is configured to:
and taking the search word of which the search times and the click rate are both larger than the corresponding threshold values as the candidate search word.
In practical use, the apparatus for identifying the target people search intention provided by the embodiments of the present disclosure may be configured in any electronic device to execute the aforementioned method for identifying the target people search intention.
According to the device for identifying the target crowd search intention, when the search request is obtained, the search characteristics and the statistical characteristics of all search words in the search request within the preset time period are obtained, the candidate search words are determined from all the search words according to the search characteristics of all the search words, and whether the candidate search words have the target crowd search intention is judged according to the statistical characteristics of the candidate search words. Therefore, whether the search word has the intention of the search target crowd or not is determined according to the search characteristics and the statistical characteristics of the search word in the preset time period, and the search accuracy is improved.
FIG. 8 is a block diagram illustrating an electronic device 800 for information querying, according to an example embodiment.
As shown in fig. 8, the electronic device 800 includes:
a memory 810 and a processor 820, a bus 830 connecting different components (including the memory 810 and the processor 820), wherein the memory 810 stores a computer program, and when the processor 820 executes the program, the method for identifying the search intention of the target people group according to the embodiment of the disclosure is realized.
Bus 830 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
The electronic device 800 typically includes a variety of electronic device readable media. Such media may be any available media that is accessible by electronic device 800 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 810 may also include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)840 and/or cache memory 850. The electronic device 800 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 860 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 830 by one or more data media interfaces. Memory 810 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.
A program/utility 880 having a set (at least one) of program modules 870 may be stored, for example, in memory 810, such program modules 870 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 870 generally perform the functions and/or methodologies of embodiments described in this disclosure.
The electronic device 800 may also communicate with one or more external devices 890 (e.g., keyboard, pointing device, display 891, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 892. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 893. As shown, the network adapter 893 communicates with the other modules of the electronic device 800 over a bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 820 executes various functional applications and data processing by executing programs stored in the memory 810.
It should be noted that, for the implementation process and the technical principle of the electronic device of the embodiment, reference is made to the foregoing explanation of the method for identifying the target group search intention according to the embodiment of the present disclosure, and details are not repeated here.
The electronic device provided by the embodiment of the disclosure can execute the method for identifying the target crowd search intention as described above, and when a search request is obtained, the search characteristics and the statistical characteristics of each search word in the search request within a preset time period are obtained, a candidate search word is determined from each search word according to the search characteristics of each search word, and whether the candidate search word has the target crowd search intention is determined according to the statistical characteristics of the candidate search word. Therefore, whether the search word has the intention of the search target crowd or not is determined according to the search characteristics and the statistical characteristics of the search word in the preset time period, and the search accuracy is improved.
In order to implement the above embodiments, the present disclosure also provides a storage medium.
Wherein the instructions in the storage medium, when executed by the processor of the electronic device, enable the electronic device to perform the method for identifying a target group search intention as described above.
In order to implement the above embodiments, the present disclosure also provides a computer program product, which when executed by a processor of an electronic device, enables the electronic device to execute the method for identifying a target group search intention as described above.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:基于语音固定条件下的查询方法