Method, apparatus and storage medium for determining concept in information title
1. A method for determining concepts in a message title, comprising:
performing dependency syntax analysis on the information title, and determining a dependency syntax analysis result, wherein the dependency syntax analysis result comprises at least two tuples, and the tuples comprise a dependency relationship, a core word and a dependency word;
if the preset keyword is not a dependency word in the composite noun, determining at least one keyword of the information title according to the preset keyword and a preset dependency relationship;
establishing a keyword list according to the preset keywords and the keywords;
and combining the words in the keyword list to form a concept according to the word sequence in the information title.
2. The method according to claim 1, wherein the determining at least one keyword of the information title according to the preset keyword and a preset dependency relationship comprises:
determining a target tuple in the dependency syntax analysis result according to the preset keyword, wherein a core word in the target tuple is the preset keyword; if the dependency relationship corresponding to the target tuple meets a first preset dependency relationship, acquiring a dependency word in the target tuple;
determining the dependency words in the target tuple as new preset keywords, taking a second preset dependency relationship which is subordinate to the first preset dependency relationship as a new first preset dependency relationship, and performing recursive matching according to the steps to complete at least one complete preset dependency relationship sequence, wherein each complete preset dependency relationship sequence consists of a plurality of preset dependency relationships, and two adjacent preset dependency relationships in the sequence have a subordinate relationship therebetween, and the dependency word corresponding to the first preset dependency relationship is a core word corresponding to the second preset dependency relationship;
and determining the at least one keyword according to the dependency words in the target tuple.
3. The method according to claim 1, wherein the establishing a keyword list according to the preset keyword and the keyword comprises:
establishing an initial keyword list according to the preset keywords and the keywords;
and carrying out duplication elimination processing on the words in the initial keyword list to determine a keyword list.
4. The method according to any of claims 1-3, wherein after said determining at least one keyword of said information title, the method further comprises:
judging whether the dependency relationship corresponding to the tuple of the preset keyword is a direct object or not;
and if the dependency relationship corresponding to the tuple where the preset keyword is located is the direct object, determining a supplementary keyword according to the core word in the tuple where the preset keyword is located.
5. The method according to claim 4, wherein the determining at least one supplementary keyword according to the core word in the tuple in which the preset keyword is located comprises:
judging whether a core word in a tuple of the preset keyword is a preset predicate verb or not;
if the core word in the tuple where the preset keyword is located is a preset predicate verb, determining a supplementary tuple in the dependency syntax analysis result according to the core word in the tuple where the preset keyword is located, wherein the core word in the supplementary tuple is the core word in the tuple where the preset keyword is located;
and if the dependency relationship corresponding to the supplementary tuple meets a supplementary preset dependency relationship and the dependency word in the supplementary tuple is in front of the core word in the supplementary tuple in the information title, determining the dependency word in the supplementary tuple as a supplementary keyword.
6. The method of claim 5, further comprising:
if the core word in the tuple where the preset keyword is located is not the preset predicate verb and the next word adjacent to the core word in the supplementary tuple in the information title is in the keyword list, performing word segmentation on the core word in the supplementary tuple and the adjacent next word to generate a word segmentation result;
and if the core word in the supplementary tuple and the next adjacent word form a word, determining the core word in the supplementary tuple as a supplementary keyword.
7. The method of claim 5, further comprising:
and if the core word in the tuple where the preset keyword is located is the first verb, determining the core word in the tuple where the preset keyword is located as the supplementary keyword.
8. An apparatus for determining a concept in an information title, applied to a display device, the apparatus comprising:
the analysis module is used for carrying out dependency syntax analysis on the information title and determining a dependency syntax analysis result, wherein the dependency syntax analysis result comprises at least two tuples, and the tuples comprise a dependency relationship, a core word and a dependency word;
the processing module is used for determining at least one keyword of the information title according to the preset keyword and a preset dependency relationship when the preset keyword is not a dependency word in the compound noun;
the processing module is also used for establishing a keyword list according to the preset keywords and the keywords;
and the determining module is used for combining the words in the keyword list to form a concept according to the word sequence in the information title.
9. An apparatus for determining concepts in a message title, comprising a memory and a processor; wherein the content of the first and second substances,
the memory for storing a computer program;
the processor is configured to read the computer program stored in the memory and execute the method for determining the concept in an information title according to any one of claims 1 to 7 according to the computer program in the memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein computer program instructions, which, when executed, implement the method of determining concepts in information headings according to any one of claims 1-7.
Background
With the continuous development of the internet, a user can obtain required information through an electronic device, for example, obtain information in a manner of direct query by the user, or receive information pushed by a client.
Currently, since the title of information is related to the content of the information, when acquiring the information, it is usually necessary to search the title of the information and determine the information to be acquired. However, the information has various titles, and there may be a lengthy and complicated information title, so that there are too many information points in the information title, and the probability of such information being retrieved is low.
Disclosure of Invention
The embodiment of the application provides a method, a device and a storage medium for determining concepts in information titles, which can combine keywords in the information titles to form information concepts, and can improve the probability of retrieving sub-information by using concept-assisted retrieval.
In a first aspect, an embodiment of the present application provides a method for determining a concept in an information title, including:
and performing dependency syntax analysis on the information title, and determining a dependency syntax analysis result, wherein the dependency syntax analysis result comprises at least two tuples, and the tuples comprise a dependency relationship, a core word and a dependency word.
And if the preset keyword is not a dependency word in the composite noun, determining at least one keyword of the information title according to the preset keyword and the preset dependency relationship.
And establishing a keyword list according to the preset keywords and the keywords.
And combining the words in the keyword list to form a concept according to the word sequence in the information title.
In some possible implementations, the determining at least one keyword of the information title according to the preset keyword and a preset dependency relationship includes:
determining a target tuple in the dependency syntax analysis result according to the preset keyword, wherein a core word in the target tuple is the preset keyword; and if the dependency relationship corresponding to the target tuple meets a first preset dependency relationship, acquiring a dependency word in the target tuple.
Determining the dependency words in the target tuple as new preset keywords, taking a second preset dependency relationship which is subordinate to the first preset dependency relationship as a new first preset dependency relationship, and performing recursive matching according to the steps to complete at least one complete preset dependency relationship sequence, wherein each complete preset dependency relationship sequence consists of a plurality of preset dependency relationships, and two adjacent preset dependency relationships in the sequence have a subordinate relationship therebetween, and the dependency word corresponding to the first preset dependency relationship is a core word corresponding to the second preset dependency relationship.
And determining the at least one keyword according to the dependency words in the target tuple.
In some possible implementation manners, the establishing a keyword list according to the preset keyword and the keyword includes:
and establishing an initial keyword list according to the preset keywords and the keywords.
And carrying out duplication elimination processing on the words in the initial keyword list to determine a keyword list.
In some possible implementations, after the determining at least one keyword of the information title, the method further includes:
and judging whether the dependency relationship corresponding to the tuple where the preset keyword (in the application, the preset keyword is an initial preset keyword and is not a new preset keyword determined according to the dependency word in the target tuple) is located is a direct object.
And if the dependency relationship corresponding to the tuple where the preset keyword is located is the direct object, determining a supplementary keyword according to the core word in the tuple where the preset keyword is located.
In some possible implementation manners, the determining at least one supplementary keyword according to the core word in the tuple where the preset keyword is located includes:
and judging whether the core word in the tuple of the preset keyword is a preset predicate verb or not.
If the core word in the tuple where the preset keyword is located is the preset predicate verb, determining a supplementary tuple in the dependency syntax analysis result according to the core word in the tuple where the preset keyword is located, wherein the core word in the supplementary tuple is the core word in the tuple where the preset keyword is located.
And if the dependency relationship corresponding to the supplementary tuple meets a supplementary preset dependency relationship and the dependency word in the supplementary tuple is in front of the core word in the supplementary tuple in the information title, determining the dependency word in the supplementary tuple as a supplementary keyword.
In some possible implementations, the method further includes:
and if the core word in the tuple where the preset keyword is located is not the preset predicate verb and the next word adjacent to the core word in the supplementary tuple in the information title is in the keyword list, performing word segmentation on the core word in the supplementary tuple and the adjacent next word to generate a word segmentation result.
And if the core word in the supplementary tuple and the next adjacent word form a word, determining the core word in the supplementary tuple as a supplementary keyword.
In some possible implementations, the method further includes:
and if the core word in the tuple where the preset keyword is located is the first verb, determining the core word in the tuple where the preset keyword is located as the supplementary keyword.
In a second aspect, an embodiment of the present application provides an apparatus for determining a concept in an information title, including:
the analysis module is used for carrying out dependency syntax analysis on the information title and determining a dependency syntax analysis result, wherein the dependency syntax analysis result comprises at least two tuples, and the tuples comprise dependency relationships, core words and dependency words.
And the processing module is used for determining at least one keyword of the information title according to the preset keyword and a preset dependency relationship when the preset keyword is not a dependency word in the compound noun.
The processing module is further used for establishing a keyword list according to the preset keywords and the keywords.
And the determining module is used for combining the words in the keyword list to form a concept according to the word sequence in the information title.
In some possible implementation manners, the processing module is specifically configured to determine, according to the preset keyword, a target tuple in the dependency parsing result, where a core word in the target tuple is the preset keyword; and if the dependency relationship corresponding to the target tuple meets a first preset dependency relationship, acquiring the dependency word in the target tuple. Determining the dependency words in the target element group as new preset keywords, taking a second preset dependency relationship which is subordinate to the first preset dependency relationship as a new first preset dependency relationship, and according to the steps, performing recursive matching to complete at least one complete preset dependency relationship sequence, wherein each complete preset dependency relationship sequence consists of a plurality of preset dependency relationships, and two adjacent preset dependency relationships in the sequence have a subordinate relationship, and the dependency word corresponding to the first preset dependency relationship is a core word corresponding to the second preset dependency relationship. And determining the at least one keyword according to the dependency word in the target tuple.
In some possible implementation manners, the processing module is specifically configured to establish an initial keyword list according to the preset keyword and the keyword; and carrying out duplication elimination processing on the words in the initial keyword list to determine a keyword list.
In some possible implementation manners, the apparatus further includes a supplement module, where the supplement module is configured to determine whether a dependency relationship corresponding to a tuple in which the preset keyword is located is a direct object; and when the dependency relationship corresponding to the tuple where the preset keyword is located is the direct object, determining a supplementary keyword according to the core word in the tuple where the preset keyword is located.
In some possible implementation manners, the supplement module is specifically configured to determine whether a core word in a tuple in which the preset keyword is located is a preset predicate verb; and when the core word in the tuple where the preset keyword is located is a preset predicate verb, determining a supplementary tuple in the dependency syntax analysis result according to the core word in the tuple where the preset keyword is located, wherein the core word in the supplementary tuple is the core word in the tuple where the preset keyword is located. And when the dependent relationship corresponding to the supplementary tuple meets a supplementary preset dependent relationship and the dependent word in the supplementary tuple is in front of the core word in the supplementary tuple in the information title, determining the dependent word in the supplementary tuple as a supplementary keyword.
In some possible implementation manners, the supplementary module is specifically configured to perform word segmentation on a core word in the supplementary tuple and a next word adjacent to the core word in the supplementary tuple when the core word in the tuple in which the preset keyword is located is not a preset predicate verb and the next word adjacent to the core word in the supplementary tuple in the information title is in the keyword list, so as to generate a word segmentation result. And when the core word in the supplementary tuple and the next adjacent word form a word in the word segmentation processing result, determining the core word in the supplementary tuple as a supplementary keyword.
In some possible implementation manners, the supplement module is specifically configured to determine, when a core word in a tuple in which the preset keyword is located is a first verb, the core word in the tuple in which the preset keyword is located as the supplement keyword.
In a third aspect, an embodiment of the present application further provides an apparatus for determining a concept in an information title, where the apparatus for determining a concept in an information title may include a memory and a processor; wherein the content of the first and second substances,
the memory is used for storing the computer program.
The processor is adapted to read the computer program stored in the memory and to implement the method of determining a concept in any of the information headings according to the first aspect of the application in accordance with the computer program in the memory.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, in which computer program instructions are stored, and when the computer program instructions are executed, the method for determining the concept in any one of the information headings according to the first aspect of the present application is implemented.
In a fifth aspect, embodiments of the present application further provide a computer program product comprising a computer program that, when executed by a processor, performs the method for determining a concept in any of the information headings according to the first aspect of the present application.
The application provides a method, a device and a storage medium for determining concepts in an information title, wherein a dependency syntax analysis result is determined by performing dependency syntax analysis on the information title, the dependency syntax analysis result comprises at least two tuples, and the tuples comprise a dependent relation, core words and dependency words; if the preset keyword is not a dependency word in the composite noun, determining at least one keyword of the information title according to the preset keyword and the preset dependency relationship; establishing a keyword list according to preset keywords and keywords; and combining the words in the keyword list to form a concept according to the word sequence in the information title. The technical scheme provided by the application processes the dependency syntax analysis result, and forms the keywords in the information title into concepts, so that the information with long and complicated titles can be searched according to the concepts of the information, and the probability of searching the information is improved.
These and other aspects of the present application will be more readily apparent from the following description of the embodiment(s).
Drawings
In order to more clearly illustrate the embodiments of the present application or the implementation manner in the related art, a brief description will be given below of the drawings required for the description of the embodiments or the related art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a schematic view of an application scenario of a method for determining a concept in an information title according to an embodiment of the present application;
fig. 2 is a flowchart illustrating a method for determining concepts in information titles according to an embodiment of the present application;
FIG. 3a is a schematic diagram of a dependency tree according to an embodiment of the present application;
fig. 3b is a tree-shaped schematic diagram of word segmentation position numbers according to an embodiment of the present disclosure;
FIG. 3c is a schematic tree diagram of words in an information title provided by an embodiment of the present application;
FIG. 4a is a schematic diagram of another dependency tree provided in the embodiment of the present application;
FIG. 4b is a schematic diagram of a tree of another dependency relationship provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of a dependency tree for each word in a candidate word list according to an embodiment of the present application;
fig. 6 is a flowchart illustrating a method for determining a supplemental keyword according to an embodiment of the present application;
fig. 7 is a flowchart illustrating a method for determining concepts in a video title according to an embodiment of the present application;
fig. 8 is a schematic flowchart of a method for determining a video concept according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an apparatus for determining concepts in information headings according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of another apparatus for determining concepts in information headings according to an embodiment of the present application.
Detailed Description
To make the objects, embodiments and advantages of the present application clearer, exemplary embodiments of the present application will be described more clearly and completely below in conjunction with the attached drawings in exemplary embodiments of the present application, and it is obvious that the exemplary embodiments described are only a part of the embodiments of the present application, not all of the embodiments.
All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without making any inventive step, fall within the scope of the appended claims. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.
It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.
Furthermore, the terms "comprises" and "comprising," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
The term "module" as used herein refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
The technical scheme provided by the embodiment of the application can be applied to the analysis scene of the natural language. For example, in human-computer interaction, fig. 1 is a schematic view of an application scenario of a method for determining a concept in an information title provided in an embodiment of the present application, where a user may speak an instruction to be executed by the display device 102 to the control device 101, the display device 102 may collect voice data of the user in real time through the control device, recognize the user's instruction included in the voice data through a controller in the display device 102, and directly execute the instruction after recognizing the user's instruction, and in the whole process, the user does not actually operate the display device 102, but simply speaks the instruction. The control device 101 may be a remote controller, and the communication between the remote controller and the display device 102 includes infrared protocol communication, bluetooth protocol communication, other short-distance communication methods, and the like, and controls the display device 102 in a wireless or other wired manner.
For example, the user may speak a voice such as "playing a football game video" or "a bon of a palace chicken", etc. to the control apparatus 101, the display device 102 receives the voice input by the user, performs semantic parsing on the voice through the controller, and searches a video resource corresponding to the language input by the user in the video library through a search algorithm, thereby displaying the searched video resource through the display device 102. The processing of the received voice may be noise reduction, text preprocessing, service positioning, error correction, intended slot position analysis, and the like, and the processing manner is not specifically limited in the embodiment of the present application.
Illustratively, the user may enter text information directly on the display device 102 via the control means 101. For example, the user inputs text information such as "a course of vegetable salad" or "news linkage" on the display device 102 through the remote controller. The controller in the display device 102 retrieves the corresponding video asset from the video library according to the text information input by the user, and displays the retrieved video asset through the display device 102.
In another application scenario provided by the present application, the display device 102 may push the corresponding video resource for the user according to the search record of the user on the video resource within a period of time.
In the application scenario, the semantics parsing and the asset retrieval are challenged because of the fact that the user's saying and asset title sentence patterns are strange.
Because of the long and various titles of the media assets, the titles are difficult to be retrieved when the information is retrieved. Of the billions of media asset data, only 0.1% of the information can have an opportunity to appear to the user, and other media assets can almost never be retrieved. This not only causes resource waste, but also increases the time consumption of information retrieval and increases the burden of server computation.
Practical tests have found that even if the user reads in front of the tv following these titles, the tv will not return the assets for these titles. Because the semantic engine carries out semantic analysis (text preprocessing, service positioning, error correction and intention slot analysis) on the user request, the text for inquiring the media assets is converted, and finally the retrieved media assets cannot be the media assets of the title read by the user through the information retrieval algorithm of service processing. Eventually, only a small amount of assets will be frequently retrieved, and the probability that other large amounts of assets will be retrieved is small.
In order to solve the problem that the information title is long and complex and the probability of information retrieval is low, dependency syntax analysis can be performed on the information title, a plurality of keywords can be determined in the information title according to the dependency syntax analysis result, and the keywords can be combined to form a concept. The concept corresponding to the information is short compared with the title of the information, and the meaning of the information can be accurately expressed, so that the probability of information being searched can be improved when the information is searched according to the concept of the information.
Illustratively, dependency parsing functions to identify word-to-word interdependencies in a sentence. In dependency syntax theory, "dependency" refers to a relationship between words and terms that is dominant and dominant, that is not peer-to-peer, and that has an orientation. Specifically, the dominant component is called a dominant component (i.e., a dominant component), and the dominant component is called a subordinate component (i.e., a subordinate component). There is a common basic assumption for dependency syntax: syntactic structures essentially contain word-to-word dependencies (modifiers). A dependency relationship connects two words, a core word (head) and a dependency word (dependent).
Each tuple contained in the dependency syntax parsing dependency _ part result list is a dependency relationship connecting two words. The tuple contains three elements, the first element is a satisfied dependency relationship, the second element is a core word, and the third element is a dependency word.
By way of example, a concept is a collection of information or entities, such as a love picture, an ancient drama, a inspirational song, a trembling spirit, a nearby food, a mosquito trap, a star feature, a starring actor for a television show, and the like, and the embodiments of the present application are not limited in any way.
Hereinafter, the determination method of the concept in the information title provided in the present application will be described in detail by specific examples. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart illustrating a method for determining a concept in an information title according to an embodiment of the present application. The method for determining the concept in the information title may be executed by software and/or hardware means, for example, the hardware means may be a device for determining the concept in the information title, and the device for determining the concept in the information title may be a terminal or a processing chip in the terminal. For example, referring to fig. 2, the method for determining the concept in the information title may include:
s201, dependency syntax analysis is carried out on the information title, and a dependency syntax analysis result is determined.
The dependency syntax analysis result comprises at least two tuples, and the tuples comprise dependency relations, core words and dependency words.
Exemplary dependencies may include: compound nouns modify nmod (non compound modifier), such as (purto, shanghai); association modifier assmod (associative modifier), such as NP | QP (lesson, special area); name word combination form nn: a non compound modifier; adjective modification of amod (adjetive modifier), as (case, new); noun subject language nsubj: a nominal subject; direct object dobj, e.g. (promulgated, documentation); negative modification of neg (negative modifier), e.g. (met, not); the shape language advmod: an adverbinary modifier; the dependency dep: the present application is described by taking the above dependency relationship as an example, but the present application is not limited to this.
For example, dependency parsing of the information title "05 year of the music awarding evening from a host, and also BC" can result in word tokenize [ '05 year', 'a', 'host', 'music', 'awarding', 'evening', 'data,', 'still', 'B', 'C' ]. A, B, C represent different names of persons, respectively. The dependency syntax analysis result of the information title may be dependency _ part: [ ('ROOT', 0, 12), ('nmod', 3, 1), ('case', 1, 2), ('nsubj', 4, 3), ('acl', 8, 4), ('mark', 4, 5), ('compound: nn', 8, 6), ('compound: nn', 8, 7), ('dep', 12, 8), ('punct', 12, 9), ('dep', 12, 10), ('dep', 12, 11) ],
illustratively, the dependency parsing result of the information header may also be represented by:
for example, after determining the dependency syntax analysis result, it may be determined whether a preset keyword is included in the dependency syntax analysis result. If the dependency syntax analysis result does not contain the preset keyword, the information title does not need to be processed; if the predetermined keyword is included in the parsing result, determining whether the predetermined keyword is a dependent word in the compound noun, and if the predetermined keyword is not a dependent word in the compound noun, executing the following S202:
s202, if the preset keyword is not the dependency word in the compound noun, determining at least one keyword of the information title according to the preset keyword and the preset dependency relationship.
Illustratively, the predetermined keywords may be movies, television shows, novels, comics, cartoons, recipes, applications, cartoons, symptoms, strategies, specials, methods, hazards, manifestations, efficacies, formulas, side effects, and the like. The present application is described by taking the preset keywords as examples, but the present application is not limited to the preset keywords.
Illustratively, two nouns in a compound noun (a noun combination form) are formed, one is a modified noun, i.e., one is a dependent word, and the other is a central noun, i.e., a core word.
When at least one keyword of the information title is determined according to the preset keyword and the preset dependency relationship, a target tuple can be determined in the dependency syntax analysis result according to the preset keyword, and a core word in the target tuple is the preset keyword; and if the dependency relationship corresponding to the target tuple meets the first preset dependency relationship, acquiring the dependency word in the target tuple. Determining the dependency words in the target tuple as new preset keywords, taking a second preset dependency relationship which is subordinate to the first preset dependency relationship as a new first preset dependency relationship, and performing recursive matching to complete at least one complete preset dependency relationship sequence according to the steps, wherein each complete preset dependency relationship sequence consists of a plurality of preset dependency relationships, a subordinate relationship exists between two adjacent preset dependency relationships in the sequence, and the dependency word corresponding to the first preset dependency relationship is a core word corresponding to the second preset dependency relationship; and determining at least one keyword according to the dependency words in the target tuple.
For example, the complete pre-determined dependency relationship sequence may be a first pre-determined dependency relationship, a combination of the first pre-determined dependency relationship and a second pre-determined dependency relationship, or a combination of the first pre-determined dependency relationship, the second pre-determined dependency relationship and a third pre-determined dependency relationship, and the number of pre-determined dependency relationships in the complete pre-determined dependency relationship sequence is not limited in this embodiment.
For example, the complete pre-defined dependency series is: first preset dependency' nmod: assmod'; the first preset dependency' compound: nn'; a first preset dependency 'dobj'; a first preset dependency 'amod'; a first preset dependency 'nmod'; first preset dependency' nmod: assmod '+ second dependency' compound: nn'; first predetermined dependency' nmod: assmod ' + a second predetermined dependency ' amod '; first preset dependency' nmod: assmod ' + a second predetermined dependency ' amod '; the first predetermined dependency relationship 'amod' + the second predetermined dependency relationship 'nsubj'; first preset dependency' compound: nn '+ second predetermined dependency' compound: nn'; presetting a first-order dependency relationship' compound: nn '+ second order dependency' nmod: assmod'; presetting a first-order dependency relationship 'dobj' + a second-order dependency relationship 'nsubj'; presetting a first-order dependency relationship' compound: nn ' + second order dependency ' nmod '; first preset dependency' compound: nn ' + a second predetermined dependency ' amod '; first preset dependency' compound: nn ' + a second predetermined dependency ' advmod '; the first predetermined dependency relationship 'dobj' + the second predetermined dependency relationship 'nsubj'; first preset dependency ' nmod ' + second preset dependency ' nmod: assmod'; the first predetermined dependency 'amod' + the second predetermined dependency 'neg'. In the complete sequence of predetermined dependencies, two adjacent predetermined dependencies have a membership relationship therebetween, i.e. the second predetermined dependency relationship is subordinate to the first predetermined dependency relationship.
It is understood that, when determining at least one keyword according to the dependency words in the target tuple, the number of determined keywords is different because the number of preset dependencies in the complete preset dependency relationship sequence is different. For example, if the complete sequence of predetermined dependencies includes a first predetermined dependency and a second predetermined dependency, two keywords can be determined according to the above method.
In the embodiment of the application, at least one target tuple can be determined according to a preset keyword and a preset dependency relationship, the dependency word in each target tuple is determined as the keyword, and each target tuple meets the corresponding preset dependency relationship, so that the determined keyword is more accurate, and meanwhile, the keyword with higher information relevance can be reserved in the information title.
S203, establishing a keyword list according to preset keywords and keywords.
When a keyword list is established according to preset keywords and keywords, an initial keyword list can be established according to the preset keywords and the keywords; and carrying out duplication elimination processing on the words in the initial keyword list to determine a keyword list.
In the embodiment of the application, the words in the initial keyword list are subjected to duplication elimination, so that the uniqueness of the words in the keyword list can be ensured, and the concept formed by combining the words in the keyword list is more accurate.
And S204, combining the words in the keyword list to form a concept according to the word sequence in the information title.
Illustratively, the word order in the information title is the front-back order of the words in the information title. For example, word segmentation results in word token [ '05', 'a', 'host', 'music', 'award', 'evening', 'also', 'B', 'C', ] indicating that the above words are in the order of 1-12 from front to back.
Therefore, in the method for determining the concept in the information title provided by the embodiment of the application, the dependency syntax analysis result is determined by performing dependency syntax analysis on the information title, the dependency syntax analysis result comprises at least two tuples, and the tuples comprise a dependency relationship, a core word and a dependency word; if the preset keyword is not a dependency word in the composite noun, determining at least one keyword of the information title according to the preset keyword and a preset dependency relationship; establishing a keyword list according to preset keywords and keywords; the words in the keyword list are combined to form concepts according to the word sequence in the information title, so that the information can be retrieved when the information is retrieved according to the concepts corresponding to the information, and the probability of retrieving the information is improved.
In order to facilitate the determination method of the concept in the information title provided in the embodiment of the present application, the following will describe the technical solution of the present application in detail through a specific example.
Illustratively, the information is marked as "the best new way of eating streaky pork, bright color, fragrant smell, and fat but not greasy taste". The following steps are performed for the information header:
step 1, judging whether the information title contains preset keywords or not, and if not, exiting; if yes, go to step 2. It can be known that the information header includes a preset keyword's way'.
And 2, judging whether the preset keyword is a modified noun in the compound noun in the sentence through dependency syntax analysis. If so, quitting, namely, the phrase ending by the preset keyword in the information title can not be used as a concept; if not, the following step 3 is executed. Wherein, whether the preset keyword is a modified noun in the compound noun in the sentence is judged, that is, whether the preset keyword is a dependency' compound: nn' connects the dependent words in the two words, and the implementation method is as follows: judging whether a third element in each tuple in the dependency _ part result list is a word cutting position number of a preset keyword in a sentence, if so, judging whether a first element of the tuple is a dependency relationship' compound: and nn'.
Illustratively, for the above information headings, the dependency parsing result is: dependency _ parse [ (' ROOT ', 0, 9), (' nsubj ', 3, 1), (' advmod ', 3, 2), (' amod ', 6, 3), (' mark ', 3, 4), (' amod ', 6, 5), (' nmod: topic ', 9, 6), (' punct ', 9, 7), (' nsubj ', 9, 8), (' punct ', 9, 10), (' nsubj ', 12, 11), ("conj ', 9, 12), (' punct ', 9, 13), (' dep ', 19, 14), (' advmod: rcomp ', 14, 15), (' dobj ', 14, 16), (' advmod ', 19, 17), (' neg ', 19, 18), (' conj ', 9, 19) ]. And the word segmentation word token result list is: the term "fresh" as used herein means "fresh", "fat", "fresh", "fat",. After the sentence is segmented, the preset keyword 'doing' is the 6 th word segmentation segment, and the position number is 6. According to the dependency syntax analysis result, a tree diagram mainly based on a preset keyword "do it" can be determined. Fig. 3a is a schematic diagram of a dependency tree according to an embodiment of the present application. Fig. 3a can be simplified to a diagram only including the word segmentation position number, for example, it can be simplified to fig. 3b, and fig. 3b is a tree diagram of the word segmentation position number provided in the embodiment of the present application. Fig. 3c is a schematic tree diagram of words in an information title provided in this application. According to step 2, if the predetermined keyword in the information title is the third element in the tuple ('nmod: topic', 9, 6), and the dependency relationship of the tuple is not a compound noun relationship, the following step 3 is performed.
Step 3, judging whether the preset keyword is a core word in a certain dependency relationship, and if not, exiting; if yes, go to step 4. Exemplary, whether the second element in the tuple is a word cutting position number of a preset keyword in a sentence can be determined by traversing each tuple in the dependency _ part result list. In this information header, 6 is the second element in the tuples ('amod', 6, 3) and ('amod', 6, 5) in the dependency _ part result list, respectively.
And 4, judging whether the dependency relationship is a preset first-order dependency relationship or not, if not, exiting, and if so, executing the step 5. According to step 3, the dependency relationship 'amod' is the preset first-order dependency relationship in the above embodiment.
And 5, traversing the dependency _ part result list, judging whether one or more dependency relations which take the dependent words of the dependency relations as core words exist or not, if not, exiting, and if so, executing the step 6. According to step 3, the dependency words of the dependency relationship 'amod' are 3 and 5. Three dependencies with 3 as a core word are present, namely, 'nsubj' in ('nsubj', 3, 1), 'advmod' in ('advmod', 3, 2), and 'mark' in ('mark', 3, 4); the dependency relationship with 5 as the core word does not exist.
Step 6, judging whether the dependency relations are second-order dependency relations of the preset first-order dependency relations, and if yes, executing step 8; if not, go to step 7. According to the above-mentioned step 5, if there is a second-order dependency relationship in the dependencies 'nsubj', 'advmod' and 'mark' of the tuple with 3 as the core word, which is the preset first-order dependency relationship 'amod', then step 8 is executed with this branch path. If the dependency with 5 as the core word does not exist, then this branch path performs step 7. Specifically, it can be represented by fig. 4a, and fig. 4a is another dependency tree diagram provided in this embodiment of the present application. Fig. 4a can be simplified to the tree diagram shown in fig. 4b, and fig. 4b is a tree diagram of another dependency relationship provided in this embodiment of the present application.
Step 7, judging whether the first-order dependency relationship can be independently used as a concept, if so, executing a step 8; if not, exiting. According to the above-mentioned step 6, if the dependency relationship 'amod' is a predetermined dependency relationship that can be independently used as a concept, step 8 is executed.
And 8, putting all the words on the preset dependency relationship path obtained in the steps 6 and 7 into a candidate word list, wherein the candidate word list comprises the following words (zero, one or more second-order dependency relationship dependency words, core word(s) of the first-order dependency relationship and dependency word(s), and the dependency words are used for connecting the second-order dependency relationship). For this information title, the dependency word of the second order dependency 'nsubj' is 1, and the core word of the first order dependency is 6 and 3 and 5. Specifically, as shown in fig. 5, fig. 5 is a schematic diagram of a tree of dependency relationships between words in a candidate word list according to an embodiment of the present application.
And 9, removing the repetition of the words in the candidate word list, and combining the words in the candidate word list into concepts according to the sequence of the words in the sentences. According to step 8, the words of 1, 3, 5, 6 positions in the order of appearance in the sentence [ 'streaky pork', 'good eating', 'new', 'doing so' ] are combined into the concept 'streaky pork good eating new doing'.
In summary, the method provided by the embodiment of the present application conforms to the maximum matching principle, and performs matching according to the domination relationship, so that a short and accurate concept can be formed for lengthy and complex information titles, thereby improving the probability of information being retrieved.
The following describes in detail the method for determining the concept in the information title provided in the above embodiments of the present application with reference to specific steps. In the following application embodiments, specific methods can be referred to the above embodiments, and are not described herein again.
Exemplary, the subject of the information is "10 minutes for a different nutritional breakfast," (' ROOT ', 0, 3), (' nsubj ', 3, 1), (' mark: clf ', 1, 2), (' neg ', 5, 4), (' amod ', 8, 5), (' mark ', 5, 6), (' compound: nn ', 8, 7), (' dobj ', 3, 8), (' punct ', 3, 9), (' compound: nn ', 12, 10), (' compound: nn ', 12, 11), (' nmoss ', 16, 12), (' case ', 12, 13), (' amond ', 16, 14), (' amond ', 15 ', 16 ', 3, 16 ', 3, etc. 'same', 'nutrient', 'breakfast', 'onion', 'egg', 'cake', 'simple', 'good eating', 'doing it' ]. The keyword is preset as "method", and words in the keyword list are determined to be according to the tuples ('compound: nn', 12, 10) ('compound: nn', 12, 11) ('nmod: assmod', 16, 12): "onion", "egg", "cake", "course of action", the concept of this message is "onion egg cake course of action".
For example, the subject of the information is that "a 10-year-old woman is in a school bus and disturbs students being driven off the bus, which is the correct way to cure the bear children", and the dependency syntax analysis results are title _ dependency _ parse [ ('ROOT', 0, 9), ('nummod', 5, 1), ('mark: clf', 1, 2), ('compound: nn', 4, 3), ('compound: nn', 5, 4), ('nmod: topic', 9, 5), ('compound: nn', 7, 6), ('nsubjpas', 9, 7), ('auxpas', 9, 8), ('dep', 18, 10), ('dobj', 10, 11), ('advmod', 18, 12), ('cop', 18, 13), ('amja', 14, 15) ('modd', 18, 17), ('modd', 18, 17). title _ word _ token ize [ '10', 'old', 'daughter', 'at school', 'at car', 'disturbance', 'classmate', 'being', 'getting down', 'this', 'just', 'being', 'treating bear', 'child', 'of', 'correct', 'doing', etc. ], the preset keyword is "doing", and according to the tuple ('amod', 15, 14) ('nmod: assmod', 18, 15), the word in the keyword list is determined to be: the concept of the message is "treating bear child practice".
Exemplary, information is titled "new balcony planting method, which replaces soil, a very recent practice", and the dependency syntax analysis processing result is title _ dependency _ space [ ('ROOT', 0, 2), ('nsubj', 2, 1), ('amod', 4, 3), ('dobj', 2, 4), ('punct', 2, 5), ('case', 7, 6), ('nmod: prep', 8, 7), ('conj', 2, 8), ('dobj', 8, 9), ('punct', 2, 10), ('nmod: ass', 13, 11), ('case', 11, 12), ('conj', 2, 13) ]. title _ word _ token [ ' balcony ', ' plant ', ' new ', ' method ', ' use ', ' it ', ' instead ', ' soil ', ' very often ', ' of ', ' do ', ' preset keyword is "method", from tuple (' nsubj ', 2, 1), (' dobj ', 2, 4), the word in the keyword list is determined to be: "balcony", "planting", method ", the concept of this information is" balcony planting method ".
Illustratively, the message is entitled "plastic wrap also has proper method of use? However, people who have done mistakes and understood them all the time have already benefited, and the result of the dependency syntax analysis processing is title _ dependency _ parse [ (' ROOT ', 0, 3), (' nsubj ', 3, 1), (' advmod ', 3, 2), (' amod ', 6, 4), (' compound: nn ', 6, 5), (' dobj ', 3, 6), (' punct ', 3, 7), (' ROOT ', 0, 5), (' advmod ', 5, 1), (' dep ', 3, 2), (' nsubj ', 5, 3), (' advmod ', 5, 4), (' punct ', 5, 6), (' dep ', 11, 7), (' dobj ', 7, 8), (' advmod ', 11, 9), (' adv ', 11, 10 ', 5, 11, 5) ]. title _ word _ token ize [ ' preservative film ', ' also ', ' with ', ' correct ', ' use ', ' method ', '? ', ' but ', ' good ', ' human ', ' good ', ' bad ', ', ' understandable ', ' human ', ' early ', ' good ', ' benefit ', ' the keyword is a "method", the words in the keyword list are determined to be, based on the tuple (' dobj ', 3, 6), (' amod ', 6, 4), (' compound: nn ', 6, 5), (' advmod ', 5, 1): "cling film", "correct", "use", "method", the concept of the message is "cling film correct use method".
Illustratively, the title of the information is "what symptom is chronic gastritis", and the dependency parsing process result is title _ dependency _ part [ ('ROOT', 0, 5), ('amod', 2, 1), ('nsubj', 5, 2), ('cop', 5, 3), ('compound: nn', 5, 4) ]. title _ word _ token size [ 'chronic', 'gastritis', 'is', 'what', and 'symptom', ] with the preset keyword as "symptom", the words in the keyword list are determined to be, according to the tuple ('amod', 2, 1) ('nsubj', 5, 2): "Chronic", "gastritis", "symptom", the concept of the message is "chronic gastritis symptom".
According to the embodiment, the method for determining the concept in the information title provided by the application can accurately determine the main vocabulary in the information title and form the concept of the information, and the concept of the information can accurately represent the meaning of the information, so that the information can be retrieved.
In the above embodiment, after determining at least one keyword of the information title, it may be further determined whether the information title contains a supplementary keyword. Fig. 6 is a flowchart illustrating a method for determining a supplemental keyword according to an embodiment of the present application. As shown in fig. 6, the method for determining the supplementary keyword includes:
s601, judging whether the dependency relationship corresponding to the tuple where the preset keyword is located is a direct object or not.
Wherein, the direct object is represented by 'dobj' dependency relationship, and the core word is predicate or preposition.
For example, it is determined whether the dependency corresponding to the tuple where the preset keyword is located is the direct object, that is, it is determined whether the preset keyword is a dependency in the direct object dependency 'dobj', that is, it is determined whether the preset keyword is a dependency of the sentence predicate verb. The realization method comprises the following steps: and judging whether one tuple exists or not by traversing each tuple in the dependency _ part result list, wherein the first element of the tuple is 'dobj' and the third element is a word cutting position number of a preset keyword in the sentence. For example, ('dobj', a, b), where b is a preset keyword and a is a predicate verb.
S602, if the dependency relationship corresponding to the tuple where the preset keyword is located is the direct object, determining a supplementary keyword according to the core word in the tuple where the preset keyword is located.
When at least one supplementary keyword is determined according to the core word in the tuple where the preset keyword is located, whether the core word in the tuple where the preset keyword is located is a preset predicate verb can be judged; if the core word in the tuple where the preset keyword is located is the preset predicate verb, determining a supplementary tuple in the dependency syntax analysis result according to the core word in the tuple where the preset keyword is located, wherein the core word in the supplementary tuple is the core word in the tuple where the preset keyword is located; and if the dependency relationship corresponding to the supplementary tuple satisfies the supplementary preset dependency relationship and the dependency word in the supplementary tuple is in front of the core word in the supplementary tuple in the information title, determining the dependency word in the supplementary tuple as the supplementary keyword. Where the predicate verb is found by the direct object dependency ' dobj, ' dep ' is just a complementary pre-set dependency.
For example, it is determined whether a core word in a tuple where the preset keyword is located is a preset predicate verb, that is, whether a predicate verb is 'present' and whether the preset keyword is one of 'expression', 'method', 'doing', 'action', and the like. The embodiments of the present application are only described by taking the preset verb located as 'present' as an example, but do not represent that the embodiments of the present application are limited thereto.
In the embodiment of the application, the complementary tuple is determined according to whether the core word in the tuple where the preset keyword is located is the predicate verb and according to the complementary preset dependency relationship, and the dependent word in the complementary tuple is determined as the complementary keyword, so that the problem that the formed concepts are incomplete due to the fact that the words forming the concepts are respectively located on two sides of the predicate verb can be avoided, the accuracy of the formed concepts is further improved, and the probability of information being searched is improved.
Illustratively, if the core word in the tuple where the preset keyword is located is not the preset predicate verb and the next word adjacent to the core word in the supplementary tuple in the information title is in the keyword list, performing word segmentation on the core word in the supplementary tuple and the adjacent next word to generate a word segmentation result; and if the core word in the supplementary tuple and the next adjacent word form a word, determining the core word in the supplementary tuple as the supplementary keyword. For example, if the predicate verb is 'sunning', and the next word 'out' adjacent to the predicate verb is in the keyword list, then word segmentation is performed on 'sunning' and 'out', so that word segmentation results are obtained, that is, 'sunning' and 'out' form a word 'sunning', and then 'sunning' is determined as a supplementary keyword.
In the embodiment of the application, when the core word in the tuple where the preset keyword is located is not the preset predicate verb, the core word in the supplementary tuple and the next adjacent word are subjected to word segmentation, so that the core word in the supplementary tuple is determined as the supplementary keyword, more important words in the information title can be avoided from being omitted, and the accuracy of the determined keyword is improved.
In another possible implementation manner, if the core word in the tuple where the preset keyword is located is the first verb, the core word in the tuple where the preset keyword is located is determined as the supplementary keyword, so that the missing of the keyword can be avoided, and the accuracy of the determined concept is improved. For example, if the core word in the tuple where the preset keyword is located is the verb 'lose weight', the 'lose weight' is directly determined as the supplement keyword.
Therefore, according to the method for determining the supplementary keyword provided by the embodiment of the application, whether the dependency relationship corresponding to the element group where the preset keyword is located is the direct object is judged; if the dependency relationship corresponding to the tuple where the preset keyword is located is the direct object, the supplementary keyword is determined according to the core word in the tuple where the preset keyword is located, so that part of keywords in the information title can be prevented from being omitted, and the formed concept is more accurate.
The method for determining the supplementary keyword provided in the above embodiments of the present application is described in detail below with reference to specific steps. In the following embodiments of the present application, the method for determining the supplemental keyword according to the dependency parsing result can be referred to the above embodiments, and the embodiments of the present application are not described herein again.
Illustratively, the title of the information is "what prevention method is for anal fissure", and the dependency parsing processing result is title _ dependency _ part [ ('ROOT', 0, 3), ('punct', 3, 1), ('dep', 3, 2), ('det', 6, 4), ('compound: nn', 6, 5), ('dobj', 3, 6) ]. The word segmentation result is: title _ word _ token size [ ',', 'anal fissure', 'present', 'which', 'prevention', 'method' ], the preset keyword is "method", and by the method described in the above embodiment, the words in the keyword list can be determined as: "anal fissure", "prevention", "method", the concept of the message is "anal fissure prevention method".
Illustratively, the information is titled "aunt 01 has these 3 abnormal manifestations, suggesting that the big disease has come on line, neglecting regret a later year again and again, and the dependency syntax analysis processing result is title _ dependency _ pars [ ('ROOT', 0, 3), ('punct', 3, 1), ('dep', 3, 2), ('det', 8, 4), ('dep', 4, 5), ('mark: clf', 5, 6), ('amod', 8, 7), ('dobj', 3, 8), ('punct', 3, 9), ('conj', 3, 10), ('nsubj', 13, 11), ('adubv', 13, 12), ('ccomp', 10, 13), ('punct', 13, 14), ('dep', 20, 15), ('demomp', 19, 17), ('addomp', 17), ('conj', 13, 20) ]. The word segmentation result is: title _ word _ tokenize [ ' 01 ', ' aunt ', ' this ', ' 3 ', ' abnormal ', ' performance ', ' implication ', ' big disease ', ' already ', ' upper line ', ', ' one ', ' again ', ' ignore ', ' late year ', ' post remorse ', ' Mo and ' ], the keyword is preset to "performance", and by the method described in the above embodiment, the words in the keyword list can be determined to be: the concept of the information is ' aunt ' abnormal expression '.
Illustratively, the title of the information is "01 fanciful project," D5 most organized shot gather, full feeling of tension ", the dependency syntax analysis result is title _ dependency _ pars [ ('ROOT', 0, 3), ('punct', 3, 1), ('dep', 3, 2), ('det', 8, 4), ('dep', 4, 5), ('mark: clf', 5, 6), ('amod', 8, 7), ('dobj', 3, 8), ('punct', 3, 9), ('conj', 3, 10), ('ubj', 13, 11), ('modvub', 13, 12), ('ccomp', 10, 13), ('punct', 13, 14), ('dep', 20, 15), ('modvu', 17), 17 ', 2', or" can be "connected to one, or" connected to one bit, or "connected to one another. 13, 20)]. The word segmentation result is: title _ word _ token _ ze [ '01', 'creature', 'plan', 'D', '5', 'big', 'most', 'stubbing', 'shot', 'aggregate', 'tension', 'rest', 'is', 'full', 'and' of ',' feeling ',' the preset keyword is "shot", and the words in the keyword list can be determined as: the concept of "pinching", "lens", "gathering", and "pinching-lens" is "pinching-lens gathering".
Illustratively, the title of the information is "01 dancing with earth, peasant dance with heartsound" ("ROOT ', 0, 13), (" punct ', 13, 1), ("dep ', 13, 2), (" compound: nn ', 4, 3), ("dobj ', 2, 4), (" punct ', 13, 5), ("output: nn ', 8, 6), (" compound: nn ', 8, 7), ("dep ', 13, 8), (" punct ', 13, 9), ("punct ', 13, 10), (" punct ', 13, 11), ("nsubj ', 13, 12) ]. The word segmentation result is: title _ word _ token ize [ '01', 'ground', 'gas', 'dance', 'peasant', 'heart', 'dance', 'body', 'word in', 'attack', 'look' etc. ], the preset keyword is "dance", and by the method described in the above embodiment, it can be determined that the words in the keyword list are: "grounded", "pneumatic", "dance", the concept of the message is "grounded-air dance".
Illustratively, the information is titled "01 recent bubble gum practice! Successful foaming using only non-Newtonian fluids! No borax ", the dependency syntax analysis processing result is title _ dependency _ parse [ (' ROOT ', 0, 3), (' punct ', 3, 1), (' advmod ', 3, 2), (' compound: nn ', 5, 4), (' dobj ', 3, 5), (' punct ', 3, 6), (' ROOT ', 0, 3), (' advmod ', 3, 1), (' xcomp ', 3, 2), (' advmod ', 5, 4), (' coomp ', 3, 5), (' dobj ', 5, 6), (' punct ', 3, 7), (' advmod ', 10, 8), (' aux: mod ', 10, 9), (' conj ', 3, 10), (' comp ', 12, 11, 12 ', 1, 13) ]. The word segmentation result is: title word token size [ ' 01 ', ' latest ', ' foam ', ' glue ', ' action ', ' | for example! ', ' only ', ' need ', ' use ', ' non ', ' Newton ', ' fluid ', ', ' just ', ' can ', ' succeed ', ' foam ', ' grow up ', ' I! 'no', 'borax' ], the preset keyword is "to do", and by the method described in the above embodiment, the words in the keyword list can be determined as: "foaming", "glue", "practice", the concept of the message is "foaming glue practice".
Illustratively, the title of the message is "01 child's wisdom: educational video with hands-on ability enjoyed by children! To follow a learning bar! And the dependency syntax analysis processing result is title _ dependency _ part [ ('ROOT', 0, 3), ('punct', 3, 1), ('compound: nn', 3, 2), ('parataxis: prnmod', 3, 4), ('nsubj', 6, 5), ('dep', 4, 6), ('mark', 6, 7), ('acl', 6, 8), ('compound: nn', 11, 9), ('compound: nn', 11, 10), ('dobj', 8, 11), ('punct', 6, 12), ('ROOT', 0, 6), ('xcomp', 2, 1), ('dep', 6, 2), ('adv', 4, 3), ('ccomp', 2, 4), ('music', 5) ]. The word segmentation result is: title word token size [ ', ', ' child ', ' wisdom ', ': ',' kids ',' likes ',' hands ',' abilities ',' wisdom ',' video ',' I! ',' to ',' follow ',' together ',' learn ',' Bar ',' | A! ', preset the keyword as "video", by the method described in the above embodiment, the words in the keyword list can be determined as: "hands-on", "ability", "intelligence", "video", the concept of information is "hands-on ability intelligence video".
Illustratively, the information titled "" X Intelligence "Yihang 184 aircraft publishes manned test video", and the dependency syntax analysis processing results are title _ dependency _ parse [ ('ROOT', 0, 9), ('punct', 9, 1), ('nmod: assmod', 3, 2), ('dep', 8, 3), ('punct', 8, 4), ('dep', 6, 5), ('nmod: assmod', 8, 6), ('dep', 8, 7), ('nsubj', 9, 8), ('ccomp', 9, 10), ('compound: nn', 12, 11), ('dobj', 10, 12) ]. The word segmentation result is: title _ word _ token _ size [ ", ' X ', intelligence ',", ' hundred ', ' aviation ', ' 184 ', ' aircraft ', ' public ', ' manned ', ' test ', ' video ' ], the preset keyword is "star", and by the method described in the above embodiment, the words in the keyword list can be determined as: "weight loss", "big", "star", the message is entitled "weight loss star".
Illustratively, the message is titled "10 years ago, it is clear that a small hole in the sink has such a large effect, is a true severity, a quick test", the dependency syntax analysis process results in title _ dependency _ part [ ('ROOT', 0, 18), ('dep', 5, 1), ('mark: clf', 1, 2), ('aux: asp', 5, 3), ('advmod', 5, 4), ('dep', 18, 5), ('punct', 5, 6), ('nmod', 10, 7), ('case', 7, 8), ('case', 7, 9), ('dep', 11, 10), ('conj', 5, 11), ('vmod', 13, 12), ('amncod', 14, 13), ('dobj', 11, 14), ('dopt', 18 ', 17), (' 19 ', 18', 17), ('advmod', 21, 20), ('conj', 18, 21) ]. The word segmentation result is: title _ word _ token _ ze [ '10', 'year', 'just', 'clear', 'sink', 'up', 'small hole', 'so', 'big', 'action', 'is', 'true', 'harsh', 'fast', 'try', etc. ], the preset keyword is "action", and by the method described in the above embodiment, the word in the keyword list can be determined as: "aperture", "large", "effect", the concept of the message is "aperture large effect".
Exemplary, the information is titled "10 minute fat burning and sweating female dancing, abdomen contracting and leg slimming, ('ROOT', 0, 3), ('dep', 3, 1), ('mark: clf', 1, 2), ('nmod: ass', 6, 4), ('compound: nn', 6, 5), ('dobj', 3, 6), ('punct', 3, 7), ('nsubj', 9, 8), ('conj', 3, 9), ('dobj', 9, 10), ('punct', 3, 11), ('nsubj', 15, 12), ('mark: clf', 12, 13), ('advadvj', 15, 14), ('modj', 3, 15), 15, 18, 17), ('18', 18, 17). The word segmentation result is: title word token is [ ' 10 ', ' minute ', ' burning ', ' fat sweat ', ' woman ' group ', ' dance ', ' abdomen ', ' thin ', ' leg ', ' 3 ', ' head ', ' connection ', ' jumping ', ' super ', ' fat losing ', ' body losing ' ], preset keywords are "action", and by the method described in the above embodiment, the words in the keyword list can be determined to be: "burn", "fat sweating", "female group", "dance", the concept of the message is "fat burning sweating female group dance".
Illustratively, the information title is "# earn money, a story of a 90 back minor who cannot spend money on chaffy dish stores, and the understanders all have business minds, (' ROOT ', 0, 2), (' punct ', 2, 1), (' punct ', 2, 3), (' nummod ', 6, 4), (' mark: clf ', 4, 5), (' advmod: loc ', 10, 6), (' case ', 6, 7), (' nsubj ', 10, 8), (' neg ', 10, 9), (' conj ', 2, 10), (' nummod ', 13, 11), (' mark: clf ', 11, 12), (' doj ', 10, 13), (' j ', 10, 14), (' conj ', 16 ', 16, modnumod ', 16 ', 16, modnund, 16 ', 16, modus, 16 ', 2, 1, 3, 14, 16, or 16, or 16, or 16, 2, 3, or 16, or more than one or more, or more than one, or more, or less, or more than one, or more, or less, or more than one, or less, or more than one, or less, or more than one, or less, or more than one, or less, or more than one, 14, 18), (' punct ', 10, 19), (' acl ', 22, 20), (' mark ', 20, 21), (' nsubj ', 24, 22), (' advmod ', 24, 23), (' conj ', 10, 24), (' compound: nn ', 26, 25), (' dobj ', 24, 26), (' punct ', 10, 27), (' punct ', 10, 28), (' conj ', 10, 29) ]. The word segmentation result is: title _ word _ token _ size [ ' # ', ' earn ', ', ', ' one ', ' 90 ', ' back ', small part ', ' not ', ' flower ', ' one ', ' part ', ' money ', ' fire ', ' pot ', ' shop ', ' story ', ' see ', ' person ', ' all ', ' business ', ' head ', ' and ', ' # ' thinking ', ' preset keyword is "effect", and the words in the keyword list can be determined to be: "fire", "pot", "store", "story", the concept of the message is "fire pot store story".
According to the embodiment, the method for supplementing the candidate word list can avoid missing part of keywords in the information topic, so that the determined information concept is more accurate.
In order to facilitate understanding of the technical solution provided by the embodiment of the present application, a scene in which a user searches for a video through the apparatus shown in fig. 1 in the embodiment of the present application will be described as an example, specifically, refer to fig. 7, where fig. 7 is a schematic flow chart of a method for determining a concept in a video title provided by the embodiment of the present application. As shown in fig. 7, the method for determining a concept in a video title may include the steps of:
s701, receiving a video query request.
The video query request may include text information.
Illustratively, the user may input a request for determining a concept in the information title through the device shown in fig. 1, for example, the user may speak a voice such as "i want to watch a video of a football game" or "do it with a chicken bouillon" to the remote controller, and input a video query request. It is understood that the display device transmits the voice including the video query request to the controller after receiving the video query request of the voice input by the user, and the controller generates the corresponding text information by processing the received voice. The processing on the received voice may be noise reduction processing, error correction processing, and the like, and the processing manner is not specifically limited in the embodiment of the present application.
In another possible implementation, the controller may be the user directly inputting text information on the display device when receiving the video query request. For example, the user inputs text information such as "vegetable salad practice" or "news simulcast" on the display device through the remote controller.
Upon receiving the video query request, the following S702 may be performed:
s702, concepts corresponding to the videos in the video library are obtained.
In this embodiment of the present application, the obtained concepts corresponding to the multiple videos in the video library may be obtained by performing dependency parsing on an original title of each of the multiple videos, determining at least one keyword in the original title according to a preset keyword in the original title, and recombining all the keywords according to a word sequence of the multiple keywords in the original title, so as to determine the concepts corresponding to each of the multiple videos.
And S703, retrieving concepts corresponding to the videos in the video library according to the video query request.
Illustratively, when the concepts corresponding to the videos in the video library are retrieved according to the video query request, the texts of the concepts corresponding to the videos may be retrieved according to the texts corresponding to the text information in the video query request, and the videos successfully matched are retrieved by matching the texts of the concepts corresponding to the videos with the texts corresponding to the text information, where the successful matching may be determined according to the size of the matching degree, that is, the videos with the matching degree greater than the threshold value of the matching degree may be determined as successful matching, so as to retrieve the videos successfully matched. For example, if the matching degree of the text of the concept corresponding to the video and the text corresponding to the text information is greater than 70%, it is determined that the video is successfully matched. In the embodiment of the present application, the size of the matching degree is not specifically limited, and in addition, the embodiment of the present application only takes text matching as an example for description, and specifically, other matching methods may be selected according to actual situations, which is not limited in this embodiment of the present application.
And S704, outputting the searched video.
Illustratively, a video that is successfully matched with the text information may be output through the display device. It can be understood that when the display device displays the videos successfully matched, the original titles of the respective videos can be correspondingly displayed.
The method for determining the concept in the information title provided by the embodiment of the application receives a video query request when the method is used for querying videos; the method comprises the steps of obtaining concepts corresponding to a plurality of videos in a video library; the determination request of the concept in the information title comprises text information; the method comprises the steps of obtaining concepts corresponding to a plurality of videos in a video library; retrieving concepts corresponding to a plurality of videos in a video library according to the video query request; and outputting the retrieved video. The concept corresponding to the obtained video is the combination of partial words in the original title, and the original title which is long and complex can be simplified into the concept, so that when the video concept in the video library is searched according to the video query request, the video can be searched, the probability of the video being searched is increased, and the exposure rate of the video is effectively improved.
On the basis of the above embodiment, when the concept corresponding to each of the plurality of videos in the video library is obtained, the concept of the video needs to be determined according to the original title of each video in the video library. Referring to fig. 8, fig. 8 is a schematic flowchart illustrating a method for determining a video concept according to an embodiment of the present application. As shown in fig. 8, the method of determining a video concept includes:
s801, acquiring original titles of the videos in the video library.
Illustratively, the original title of each video in the video library is the title attached to the video by the publisher of each video. For example, regarding a clip excerpted video of the cartoon "Cornan", the original title of the video is a video about a football game, the original title of the video is the most exciting end stop in the football game, and such a stop is really inexplicable in buy years! ".
S802, performing dependency syntax analysis on the original title of each video, and determining a dependency syntax analysis result.
According to the embodiment, the dependency syntax analysis result includes at least two tuples, and each tuple includes a dependency relationship, a core word and a dependency word.
For example, the preset keyword may be a service keyword, which is a keyword for indicating the meaning of each video. The preset key words can be movies, television shows, novels, comics, cartoons, recipes, applications, cartoons, symptoms, strategies, specials, methods, hazards, manifestations, efficacy, formulations, side effects, etc. The present application is described by taking the preset keywords as examples, but the present application is not limited to the preset keywords. The specific preset keywords can be set according to actual conditions.
After the processing result of the dependency syntax analysis is obtained, the following S803 may be executed:
s803, determining at least one keyword according to at least a dependency syntax analysis result, and establishing a keyword list.
The method for determining at least one keyword according to at least the dependency parsing result is the same as the above embodiment, and details are not repeated again in this embodiment of the present application.
For example, when the keyword list is established, an initial keyword list may be established according to a preset keyword and the determined at least one keyword, and the keyword list may be determined through a deduplication process. After the initial keyword list is established, the method for determining the supplementary keyword described in the above embodiment may also be used to determine the supplementary keyword in the original title of the video.
S804, combining the words in the keyword list to form a concept according to the word sequence in the original title of each video.
For example, the original title of the video is "01 learn music basic phonetic symbol teaching video color learning English children's song! And the dependency syntax analysis processing result is title _ dependency _ part [ ('ROOT', 0, 9), ('dep', 3, 1), ('compound: nn', 3, 2), ('nmod: assmod', 8, 3), ('case', 3, 4), ('amod', 6, 5), ('compound: nn', 8, 6), ('compound: nn', 8, 7), ('nsubj', 9, 8), ('dobj', 9, 10), ('conj', 9, 11), ('compound: nn', 13, 12), ('compound: nn', 14, 13), ('dobj', 11, 14), ('punct', 9, 15) ]. title _ word _ token size [ ' 01 ', ' learning ', ' music ', ' basic ', ' phonetic symbol ', ' teaching ', ' video ', ' recognition ', ' color ', ' learning ', ' english ', ' children ' and | children ' and ' children ' respectively! ' ], the keyword is preset to "music", and words added to the keyword list are determined to be according to the tuple (' compound: nn ', 3, 2) (' nmod: assmod ', 8, 3): "learning", "music", "video", i.e. the concept of video is "learning music video".
In summary, the method for determining concepts in information titles provided by the embodiment of the present application can extract part of keywords in a lengthy and complicated original title of a video when the original title of the video is processed, and combine the keywords into a concept of the video, so that when a search is performed according to the concept of the video, the probability of the video being searched can be increased.
Illustratively, in another embodiment of the present application, when the concept in the title of the video is determined, after the concept of each video is determined, the video can be pushed to the user according to the concept of the video, so that the video with a long and complicated title can be pushed to the user, and the probability of pushing the video is improved.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 9 is a schematic structural diagram of a device 90 for determining a concept in an information title according to an embodiment of the present application. The apparatus for determining concepts in information titles is applied to a display device. As shown in fig. 9, the apparatus 90 for determining a concept in an information title provided in the embodiment of the present application includes:
the analysis module 901 is configured to perform dependency parsing on the information title, and determine a dependency parsing result, where the dependency parsing result includes at least two tuples, and the tuples include a dependency relationship, a core word, and a dependency word;
a processing module 902, configured to determine at least one keyword of an information title according to a preset keyword and a preset dependency relationship when the preset keyword is not a dependency word in a compound noun;
the processing module 902 is further configured to establish a keyword list according to preset keywords and keywords;
a determining module 903, configured to combine words in the keyword list to form a concept according to the word order in the information title.
In some possible implementation manners, the processing module 902 is specifically configured to determine, according to a preset keyword, a target tuple in the dependency parsing result, where a core word in the target tuple is the preset keyword; and if the dependency relationship corresponding to the target tuple meets the first preset dependency relationship, acquiring the dependency word in the target tuple. Determining the dependency words in the target tuple as new preset dependency words, taking a second preset dependency relationship which is subordinate to the first preset dependency relationship as a new first preset dependency relationship, and according to the steps, performing recursive matching to complete at least one complete preset dependency relationship sequence, wherein each complete preset dependency relationship sequence consists of a plurality of preset dependency relationships, and two adjacent preset dependency relationships in the sequence have a subordinate relationship, and the dependency word corresponding to the first preset dependency relationship is a core word corresponding to the second preset dependency relationship. And determining at least one keyword according to the dependency words in the target tuple.
In some possible implementation manners, the processing module 902 is specifically configured to establish an initial keyword list according to preset keywords and keywords; and performing duplicate removal processing on the words in the initial keyword list to determine the keyword list.
In some possible implementation manners, the apparatus further includes a supplement module 904, where the supplement module 904 is configured to determine whether a dependency relationship corresponding to a tuple in which the preset key word is located is a direct object; and when the dependency relationship corresponding to the tuple where the preset keyword is located is the direct object, determining the supplementary keyword according to the core word in the tuple where the preset keyword is located.
In some possible implementation manners, the supplementing module 904 is specifically configured to determine whether a core word in a tuple where a preset keyword is located is a preset predicate verb; and when the core word in the tuple where the preset keyword is located is the preset predicate verb, determining a supplementary tuple in the dependency syntax analysis result according to the core word in the tuple where the preset keyword is located, wherein the core word in the supplementary tuple is the core word in the tuple where the preset keyword is located. And when the dependence relationship corresponding to the supplementary tuple meets the supplementary preset dependence relationship and the dependence word in the supplementary tuple is in front of the core word in the supplementary tuple in the information title, determining the dependence word in the supplementary tuple as a supplementary keyword.
In some possible implementations, the supplementing module 904 is specifically configured to perform word segmentation on the core word in the supplementary tuple and an adjacent next word when the core word in the tuple where the preset keyword is located is not the preset predicate verb and the next word adjacent to the core word in the supplementary tuple is in the keyword list in the information title, so as to generate a word segmentation result. And when the core word in the supplementary tuple and the next adjacent word form a word, determining the core word in the supplementary tuple as the supplementary keyword.
In some possible implementation manners, the supplement module 904 is specifically configured to determine, when a core word in a tuple where the preset keyword is located is a first verb, the core word in the tuple where the preset keyword is located as the supplement keyword.
It should be noted that the apparatus provided in this embodiment may be used to execute the method for determining the concept in the information title, and the implementation manner and the technical effect are similar, which are not described herein again.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And the modules can be realized in the form that software is called by a processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a function of the processing module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element may be an integrated circuit having signal processing capabilities. In implementation, the steps of the method or the modules may be implemented by hardware integrated logic circuits in a processor element or instructions in software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more ASICs (Application Specific Integrated circuits), or one or more DSPs (Digital Signal processors), or one or more FPGAs (Field Programmable Gate arrays), etc. For another example, when a module is implemented in the form of a processing element dispatcher code, the processing element may be a general purpose processor, such as a CPU or other processor that can invoke the program code. As another example, these modules may be integrated together and implemented in the form of a System-on-a-Chip (SOC).
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program can be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, e.g., the computer program can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Fig. 10 is a schematic structural diagram of another apparatus 100 for determining a concept in an information title provided in an embodiment of the present application, and for example, please refer to fig. 10, the apparatus 100 for determining a concept in an information title may include a processor 1001 and a memory 1002; wherein the content of the first and second substances,
the memory 1002 is used for storing computer programs.
The processor 1001 is configured to read the computer program stored in the memory 1002, and execute the technical solution of the method for determining the concept in the information title in any of the embodiments according to the computer program in the memory 1002.
Alternatively, the memory 1002 may be separate or integrated with the processor 1001. When the memory 1002 is a device independent of the processor 1001, the apparatus 100 for determining a concept in a title of information may further include: a bus for connecting the memory 1002 and the processor 1001.
Optionally, this embodiment further includes: a communication interface, which may be connected to the processor 1001 through a bus. The processor 1001 may control the communication interface to implement the functions of the above-described determination device 100 of the concept in the information title.
The apparatus 100 for determining a concept in an information title shown in this embodiment of the present application may execute the technical solution of the method for determining a concept in an information title in any embodiment described above, and its implementation principle and beneficial effect are similar to those of the method for determining a concept in an information title, which can be referred to as the implementation principle and beneficial effect of the method for determining a concept in an information title, and are not described herein again.
The present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for determining a concept in an information title as described in any of the above method embodiments is implemented.
The present application further provides a computer program product, which includes a computer program, where the computer program is stored in a computer-readable storage medium, and at least one processor can read the computer program from the computer-readable storage medium, and when the computer program is executed by the at least one processor, the at least one processor can implement the method for determining a concept in an information title as described in any one of the method embodiments above.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:多媒体合集的管理方法、装置及电子设备