Data information storage device for search

文档序号:7673 发布日期:2021-09-17 浏览:46次 中文

1. A computer-based retrieval term candidate extraction method, comprising:

a subject term extraction step of extracting a subject term associated with a phrase in a document, which is a term included in a certain page of the document, from subject term storage means for storing the subject term associated with the phrase in the document; and

and a search term candidate extraction step of extracting a search term candidate for a certain page of the document from the subject word extracted in the subject word extraction step.

2. A method for storing data retrieval information using a computer, comprising:

a phrase extraction step of extracting phrases in the document, that is, phrases included in a certain page of the document;

a keyword extraction step of extracting a plurality of keywords associated with the material phrase extracted in the phrase extraction step from keyword storage means (5) that stores phrases that are keywords associated with the material phrase;

a subject term extraction step of extracting subject terms associated with a plurality of keywords extracted in the keyword extraction step from subject term storage means (9) for storing subject terms associated with the keywords;

a retrieval term candidate extraction step of extracting a retrieval term candidate for a certain page of the document from the subject term extracted in the subject term extraction step and the plurality of keywords extracted in the keyword extraction step;

a search term candidate display step of displaying, on a display unit (15), a candidate for a search term extracted in the search term candidate extraction step;

a search term input step of receiving an input indicating a search term among the search term candidates displayed on the display unit (15); and

and a document search information storage step of storing the search term input in the search term input step in association with information on a certain page of the document.

Background

Japanese patent application laid-open No. 2019-16355 discloses a search information management device, a search information management method, and a search information management program. In this way, it is common to relate search terms for searching to various documents. The user may find suitable material by using search terms. On the other hand, since the search term attached to each document is not necessarily applied to the search, it is desirable to suggest the search term applied to the search and to be able to follow the intention of the user.

Documents of the prior art

Patent document

Patent document 1: japanese patent laid-open publication No. 2019-16355

Disclosure of Invention

Problems to be solved by the invention

The invention aims to provide a system which can appropriately suggest candidates of a search term for each page of a document. The invention aims to provide a data information storage device for searching, which can store the information related to each page and the searching wording related to each page in a mode of effectively searching each page of data.

Means for solving the problems

The invention is based essentially on the following knowledge: after words included in each page of the document are extracted as keywords and subject words related to the keywords are extracted, the subject words with high scores are displayed, and candidates of search words suitable for each page of the document can be suggested.

The present invention relates to a search document information storage device.

The device is a processing device by a computer, comprising: term extraction means 3, keyword storage means 5, keyword extraction means 7, subject term storage means 9, subject term extraction means 11, search term candidate extraction means 13, search term candidate display means 17, search term input means 19, and data search information storage means 21. Each means is achieved by means of a computer through cooperation of hardware and software.

The phrase extracting means 3 is a means for extracting phrases in the material, which are included in a certain page of the material.

The keyword storage means 5 is means for storing an expression which becomes a keyword associated with a phrase in a document.

The keyword extraction means 7 is means for extracting a plurality of keywords related to the phrases in the material from the keyword storage means 5 by using the phrases in the material extracted by the phrase extraction means 3.

The topic word storage means 9 is means for storing topic words associated with keywords.

The topic word extraction means 11 is means for extracting topic words associated with the keywords from the topic word storage means 9 using the plurality of keywords extracted by the keyword extraction means 7.

The search term candidate extracting means 13 is means for extracting a candidate for a search term of a certain page of the document from the subject word extracted by the subject word extracting means 11 and the plurality of keywords extracted by the keyword extracting means 7.

The search term candidate display means 17 is means for displaying the candidate of the search term extracted by the search term candidate extraction means 13 on the display unit 15.

The search term input means 19 is for receiving an input indicating a search term among the search term candidates displayed on the display unit 15.

The material search information storage means 21 is means for storing information on a certain page of the material in association with the search term inputted by the search term input means 19.

The search data information storage device may further include a classification word storage means 25 and a classification word extraction means 27.

The phrase storage means 25 is means for storing phrases associated with subject words.

The classification word extracting means 27 is means for extracting a classification word associated with the subject word from the classification word storage means 25 by using the subject word extracted by the subject word extracting means 11.

Then, the search term candidate display means 17 of the search data information storage device further extracts the classified term extracted by the classified term extraction means 27 as a candidate for the search term.

In the retrieval material information storage device, a keyword storage means 5 can store a plurality of keywords in association with individual keyword scores,

the keyword extraction means 7 can extract a plurality of keywords and extract a plurality of keyword scores at the same time.

In the data information storage device for searching,

the subject word storage means 9 may also associate and store the subject words with the subject word scores of the individuals,

the topic word extraction means 11 may also use a predetermined number of keywords (one or more) with higher scores from the plurality of keywords extracted by the keyword extraction means 7 as topic word strong candidates, extract topic words associated with the predetermined number of keywords (one or more) from the topic word storage means 9,

the search term candidate extraction means 13 may extract a predetermined number of keywords (one or more) having a high score among the plurality of keywords extracted by the keyword extraction means 7 as candidates for the search term, and extract a predetermined number of subject words (one or more) as candidates for the search term from the subject words extracted by the subject word extraction means 11 using the scores of the keywords and the scores of the subject words.

In the data information storage device for searching,

the search term candidate display means 17 is capable of displaying a predetermined number (one or more) of keywords extracted as candidates for the search term and a predetermined number (one or more) of subject words extracted as candidates for the search term on the display unit 15 as candidates for the search term, and displaying those keywords extracted by the keyword extraction means 7 which are not extracted as candidates for the search term and those subject words extracted by the subject word extraction means 11 which are not extracted as candidates for the search term on the display unit 15 as candidates for the search term,

the search term input means 19 is used to input a search term,

when an input indicating a search term is received as a pre-candidate for the search term, the pre-candidate for the search term is used as the search term,

the candidate displayed as the search term may be a search term other than the input that received the instruction not to be the search term.

The present invention also provides a search data information storage program for causing a computer to function as the following means, and a computer-readable recording medium storing the search data information storage program.

The means comprises:

a phrase extraction means 3 for extracting phrases in the document, i.e., phrases contained in a certain page of the document;

a keyword storage means 5 for storing a term which is a keyword related to a phrase in the document;

a keyword extraction means 7 for extracting a plurality of keywords related to the phrases in the material from the keyword storage means 5 by using the phrases in the material extracted by the phrase extraction means 3;

a subject term storage means 9 for storing a subject term associated with the keyword;

a subject term extracting means 11 for extracting a subject term associated with the keyword from the subject term storing means 9 by using the plurality of keywords extracted by the keyword extracting means 7;

a search term candidate extraction means 13 for extracting a search term candidate for a page of the document from the subject term extracted by the subject term extraction means 11 and the plurality of keywords extracted by the keyword extraction means 7;

a search term candidate display means 17 for displaying the candidate of the search term extracted by the search term candidate extraction means 13 on the display unit 15;

search term input means 19 for receiving and inputting a search term indicating a search term among the search term candidates displayed on the display unit 15; and

the document search information storage means 21 stores the search term inputted by the search term input means 19 in association with information on a certain page of the document.

Effects of the invention

The invention can provide a system capable of appropriately proposing a search term candidate related to each page number of a document. The invention can provide a data information storage device for searching, which can store the information related to each page and the searching wording related to each page in a mode of effectively searching each page of data.

Drawings

Fig. 1 is a block diagram illustrating a search material information storage device according to the present invention.

Fig. 2 is a block diagram showing a basic configuration of a computer.

Fig. 3 is a conceptual diagram showing an example of the system of the present invention.

FIG. 4 is an example of one page of presentation material.

Fig. 5 is a conceptual diagram showing an example of storage of the keyword storage means.

Fig. 6 is a conceptual diagram showing an example of storage of the subject word storage means.

Fig. 7 is a conceptual diagram showing an example of storage in the phrase storage means.

FIG. 8 is a conceptual diagram showing extracted (classified words), subject words, keywords, and phrases in documents.

Fig. 9 is an example of a display screen.

Fig. 10 is a flowchart for explaining an example of use of the search material information storage device according to the present invention.

Fig. 11 is a conceptual diagram for explaining an example of use of the search material information storage device of the present invention.

Detailed Description

The following description will explain embodiments of the present invention by using the drawings. The present invention is not limited to the embodiments described below, and includes modifications within the scope of knowledge of those skilled in the art from the embodiments described below.

Fig. 1 is a block diagram illustrating a search material information storage device according to the present invention. The apparatus is a processing apparatus by a computer. The computer may be any one of or a combination of two or more of a mobile terminal device, a desktop personal computer, and a server. These are generally connected to each other in such a manner that information can be transmitted and received via the internet (intranet) or the like. It is also possible to provide a part of the functions of any one computer and share the functions with a plurality of computers.

Fig. 2 is a block diagram showing a basic structure of a computer. As shown in this figure, the computer includes an input unit 31, an output unit 33, a control unit 35, an arithmetic unit 37, and a storage unit 39, and each element is connected via a bus 41 or the like so as to be able to transmit and receive information. For example, the storage unit may store a control program, and may store various kinds of information. When predetermined information is input from the input unit, the control unit reads the control program stored in the storage unit. Then, the control unit appropriately reads the information stored in the storage unit and transmits the information to the arithmetic unit. The control unit transmits appropriate input information to the arithmetic unit. The arithmetic unit performs arithmetic processing using the received various information and stores the result in the storage unit. The control unit reads the calculation result stored in the storage unit and outputs the result from the output unit. Various processes are carried out in this manner. The execution of these various processes is performed by various means.

Fig. 3 is a conceptual diagram showing an example of the system of the present invention. As shown in fig. 3, the system of the present invention (system including the apparatus of the present invention) may be a system including a mobile terminal 45 connected to the internet or intranet 43 and a server 47 connected to the internet or intranet 43. Of course, a single computer or mobile terminal device may also function as the apparatus of the present invention, and multiple servers may also be present.

The search material information storage device 1 stores information (for example, an identification number and a page number of a presentation material) for reading each page of the presentation material in association with one or more search terms associated with the page so as to facilitate searching for information required by a user. The search material information storage device 1 may include any of a terminal device and a storage unit (storage device) of a computer (or server). The search data information storage device may include a database and database management software. In the search material information storage device, classification or scoring of the presentation material page may be performed for each search term. Consider, for example, the case where multiple pages are associated and stored with a search term known as diabetes. In this case, for a search term called diabetes, information on the highest-ranked page, the next-highest-ranked page, the highest-ranked page, and the next-highest-ranked page may be stored in the storage unit.

As shown in fig. 1, the search term information storage device 1 includes term extraction means 3, keyword storage means 5, keyword extraction means 7, subject term storage means 9, subject term extraction means 11, search term candidate extraction means 13, search term candidate display means 17, search term input means 19, and data search information storage means 21. Each means is a means of a computer, and each process is achieved by cooperation of hardware and software.

The phrase extracting means 3 is a means for extracting phrases in the material, which are included in a certain page of the material. Examples of the material are so-called presentation materials. The format of the presentation material is not particularly limited. Examples of the software for bulletin include microsoft corporation (registered trademark) PowerPoint (registered trademark), knifing gmbh (registered trademark) WPS OFFICE (registered trademark), Apache (registered trademark) OpenOffice expression (registered trademark), Keynote (registered trademark), Lotus freeness (registered trademark), Illustrator (registered trademark), PDF (registered trademark), and Prezi (registered trademark). Examples of the data include data created by any of these presentation software. The presentation software is software that can display the contents of each page on a display unit such as a screen.

Fig. 4 is an example of a page of presentation material. As shown in fig. 4, the presentation data includes a plurality of texts inputted by the creator. The user can recognize a plurality of characters with eyes. On the other hand, the computer stores information such as a text input by the user or input information related to the text (a character size, a character color, presence or absence of animation) together with the text. A preferable example of the term extraction means 3 is one that gives a score (point) to a text based on input information (character size, character color, presence or absence of animation) related to the text when extracting the text. For example, the larger the character, the higher the score, since it is often the content of the presentation material. For example, when a red character color is attached or when a moving image is attached, a high score is given because the content of the presentation data is large. The phrase extraction means 3 stores in advance a score (score) of an effect associated with a text, reads the score associated with the text when extracting phrases, and performs scoring by adding or multiplying the score with another score when calculating the score described later.

On the other hand, the term extraction means 3 itself is a known technique. The presentation material has a plurality of text messages. Then, the presentation data is stored in, for example, a storage unit in a server (or a computer). The expression extraction means 3 reads each page of the stored presentation data and reads the text contained in each page. Subsequently, the term extraction means 3 analyzes the part of speech of the read text. In this case, the storage unit stores, for example, a part-of-speech database, and stores various expressions and parts of speech thereof. In this case, scores of search terms, which are various terms, may be stored in the storage unit in common according to the application. For example, if the search data information storage device is used by a drug manufacturer, MR (medical information manager), MS (pharmaceutical wholesale), a higher score can be assigned to each disease name than the general term. In addition, although the names of various drugs or active ingredients are less important than the names of diseases, they may be assigned higher scores than general terms. Then, the expression extracting means 3 can extract the expressions (particularly nouns) included in the text, and can extract the expressions in one or more pieces of data by using the frequency or the expression scores stored in the storage unit. For example, the expression extraction means 3 extracts the expression a, the expression B, and the expression C from a certain page, and when the expression C appears twice and the expressions a and B appear once, and the scores of the expressions A, B and C stored in the storage unit are 5, 50, and 40, respectively, the scores of the expressions A, B and C can be 5, 50, and 80, respectively. For example, when the number of words extracted from the data is set to 2, the word extraction means 3 can extract the words C andb extracting the words as data. Then, the words (words C and B) in the extracted data are associated with information on the number of pages from which the page can be read, and stored in the storage unit. Thus, words C and B can be read with the page. Another example of the term extraction means 3 is to identify a part of a presentation using the largest font. Then, a predetermined coefficient is given to the expression in the data included in the portion using the maximum font. Coefficient (first coefficient: a)1) Can be stored in the storage section. The expression extraction means 3 stores the first coefficient in the storage unit together with the expression in the data included in the portion using the maximum font. The expression extraction means 3 may also use a coefficient (second coefficient: a) according to the font size2) Stored in the storage unit together with the wording in the data.

The keyword storage means 5 is means for storing an expression which becomes a keyword associated with a phrase in a document. The keyword storage means 5 can be realized by a storage unit and a requirement (for example, a control program) for reading information from the storage unit. The keyword is not only terms in a plurality of documents but also terms related to these terms as search terms for easily searching each page when searching each page. This reduces the number of search terms stored in association with each page, thereby enabling a quick search. However, there are also cases where the language in the document is directly the keyword. The keyword may be referred to as a first translation associated with a phrase in the material. The keyword may be a word selected from a plurality of types of data and suitable for use in a search.

The term in the data is the term contained in the brief report. Therefore, the term in the data may not necessarily match the search term or may not be suitable as the search term. For example, the briefs contain terms named ob gene or ob/ob mouse. It is related to obesity gene (and obesity, obesity experimental animal). Therefore, the keyword storage means 5 stores the obesity gene (and obesity, obesity experimental animal) as the keyword in association with the ob gene or ob/ob mouse as the term in the data.

Since the keyword storage means 5 is present, the search terms stored in association with each page are unified terms. Therefore, the number of related pages can be read quickly when searching.

Fig. 5 is a conceptual diagram showing an example of storage of the keyword storage means. As shown in FIG. 5, the keyword storage means stores one or more keywords in association with each other for terms in a plurality of documents, and also stores a score (the score is b) for each keyword1) Associated therewith and stored. The score is preferably input in advance so that the higher the wording score suitable for use in the search.

The keyword extraction means 7 is means for extracting a plurality of keywords related to the phrases in the material from the keyword storage means 5 by using the phrases in the material extracted by the phrase extraction means 3. The keyword storage means 5 stores a term which becomes a keyword related to a phrase in the material. Therefore, the keyword extraction means 7 can use the term in the material and read the term as the keyword related to the term in the material from the keyword storage means 5. Often, a plurality of phrases are extracted from a page. Therefore, a plurality of terms that become keywords related to the page are usually extracted. In addition, a plurality of terms (each of which may be assigned a score) are also commonly used as keywords related to phrases in the document. Therefore, a plurality of terms that become keywords related to the page are usually extracted. It is needless to say that there are cases where the phrase in the document is a keyword. That is, the word in the data can be extracted directly as the keyword. The keyword extraction means 7 can use the coefficient of the expression in the data stored in the storage unit and the score of the keyword to evaluate the score of the individual keyword. An example of a score for a keyword is a1×a2×b1. In order to calculate the score, a control program for performing the above calculation is stored in the storage unit, so that the control unit can read the control program, read each coefficient and score stored in the storage unit, and perform the calculation of a in the calculation unit1×a2×b1And storing the operation result in the storage unit. In addition, the frequency of occurrence of the words in the data (coefficient is a) can be used21) Or from multiple kinds of materialThe addition coefficient (coefficient a) for extracting a specific keyword22) Stored in the storage unit in advance, by obtaining a1×a2×a21×a22×b1The score of the keyword is obtained and stored in the storage unit. In addition, a stronger coefficient may be given to the emphasis color included in a certain page. In this case, a means for analyzing the expression color from the page and a storage unit for storing the coefficient for each color may be provided, and the coefficient relating to the color may be read from the storage unit using the analyzed expression color. In addition, basically, the subject word and the classification word are also similar to the keyword in that coefficients and scores for various requirements are stored in advance, the coefficients and scores are read and multiplied or added to obtain scores, and scores for individual languages are stored and compared to obtain candidates for priority.

The topic word storage means 9 is means for storing topic words associated with keywords. The subject word storage means 9 can be realized by a storage unit and a requirement (for example, a control program) for reading information from the storage unit.

For example, in the subject term storage means, a subject term named obesity may be associated with and stored in a keyword named an obesity gene, obesity, an obesity experimental animal, or the like. The subject term may be a term in which a plurality of keywords are further unified or a higher-level conceptualized term. The search can be performed more quickly by using the subject word. Examples of subject words are the name of the disease, the name of the drug, the name of the active ingredient, and the name of the pharmaceutical company. That is, the subject term can be referred to as a second translation associated with the term in the data. The topic word may be a term suitable for search for a plurality of types of keywords. In addition, the subject term may also be a message-related person.

The topic word extraction means 11 is means for extracting a plurality of keywords by using the keyword extraction means 7 and extracting topic words associated with the keywords from the topic word storage means 9.

The subject term storage means 9 stores therein subject terms associated with the keywords. Therefore, the subject word extracting means 11 extracts a plurality of keywords using the keyword extracting means 7, and extracts a subject word associated with the keyword from the subject word storing means 9.

Fig. 6 is a conceptual diagram showing an example of storage of the subject word storage means. As shown in fig. 6, the topic word storage means associates and stores one or more topic words for each of a plurality of keywords, and associates and stores scores for individual topic words. This score is preferably input in advance so that the higher the score of the speaker suitable for use in the search.

The search term candidate extracting means 13 is means for extracting a candidate for a search term of a certain page of the document from the subject word extracted by the subject word extracting means 11 and the plurality of keywords extracted by the keyword extracting means 7.

For example, a subject term associated with a page is stored in one or more storage sections. In addition, a plurality of keywords associated with a certain page are stored.

For example, when the control program performs control in which a subject word is a candidate for all the search terms and a keyword is a candidate for a plurality of search terms (for example, four in size displayed on the display unit), the search term candidate extraction means 13 makes the read subject word a candidate for all the search terms and makes four keywords among the keywords a candidate for the search terms.

The keyword storage means 5 may store a plurality of keywords in association with the scores of the individual keywords, and the keyword extraction means 7 may extract a plurality of keywords and the scores of the individual keywords. In this case, for example, a keyword having a high score is extracted as a candidate for the search term.

The topic word storage means 9 may store topic words in association with the scores of the individual topic words, and the topic word extraction means 11 may extract topic words associated with a predetermined number of topic word strong candidates from the topic word storage means 9 by using a predetermined number of keywords (one or more) having higher scores from the plurality of keywords extracted by the keyword extraction means 7 as topic word strong candidates.

The search data information storage device may further comprise a classification word storage means 25 and a classification word extraction means 27.

The phrase storage means 25 is means for storing phrases associated with subject words.

The classification word extracting means 27 is means for extracting the subject word by using the subject word extracting means 11 and extracting the classification word associated with the subject word from the classification word storing means 25. The taxonomy may be referred to as a third transformation related to the terms in the material. The classification word may be a word selected from a plurality of types of subject words and suitable for classification search. An example of a taxonomy may also be to show objects that are interested in the material. Examples of the categorizing words may also be "MR", "diabetes", "medicine", for example, if a page of the document is a certain diabetic drug for MR (which may be, for example, associated and stored with a subject word). In addition, if a page of the document is accounting information for bank clerks, examples of the classification language may be clerks, and may also be "clerks" and "accounting". In addition, the taxonomy may be information related to the artifact. Then, the search term candidate display means 17 of the search data information storage device further extracts the classified term extracted by the classified term extraction means 27 as one of the candidates for the search term. Fig. 7 is a conceptual diagram showing an example of storage in the phrase storage means. The classified term storage means stores one or more classified terms in association with each other for each of a plurality of subject words, and stores scores for individual classified terms in association with each other. The score is preferably input in advance so that the higher the score of the speaker suitable for use in the search.

FIG. 8 is a conceptual diagram showing extracted (classified words), subject words, keywords, and phrases in documents.

The search term candidate extraction means 13 may extract a predetermined number of keywords (one or more) having a high score among the plurality of keywords extracted by the keyword extraction means 7 as candidates for the search term. The search term candidate extraction means 13 may extract a predetermined number (one or two or more) of candidates for the search term from the subject words extracted by the subject word extraction means 11, using the scores of the keywords and the scores of the subject words. For example, the topic word storage means 9 associates and stores topic words with the scores of the individual topic words. Further, the keyword storage means 5 stores a plurality of keywords in association with the scores of the individual keywords. A subject word has keywords from which it originates. That is, the subject word is read using the keyword. The subject term is often associated with one or more keywords. At this time, the search term candidate extracting means 13 reads the score related to a certain subject word from the subject word storing means 9, and also reads the score of each keyword from which the subject word is extracted from the keyword storing means 5. Next, for example, when there are a plurality of keywords in a certain subject word, the search term candidate extraction means 13 causes the calculation unit to sum the scores of the keywords and multiply the score of the subject word by the score of the keyword (or the total score of the keywords). In this manner, the total score of the subject term is obtained and stored in the storage unit. The search term candidate extraction means 13 reads the total score for a plurality of subject words, and compares the scores with a predetermined number (one or two or more) of subject words by the arithmetic section. In this way, the search term candidate extraction means 13 can extract a predetermined number of subject words even when the number of subject word extractions is determined.

The search term candidate display means 17 is means for displaying the candidate of the search term extracted by the search term candidate extraction means 13 on the display unit 15.

The search term candidate display means 17 is a display means for displaying,

when a predetermined number (one or more) of keywords extracted as candidates for search terms and a predetermined number (one or more) of subject words extracted as candidates for search terms are displayed as candidates for search terms on the display unit 15,

the display unit 15 displays the extracted ones not being the candidates for the search term among the plurality of keywords extracted by the keyword extraction means 7 and the extracted ones not being the candidates for the search term among the subject terms extracted by the subject term extraction means 11 as the candidates for the search term,

search term input means 19

When an input indicating a search term is received as a pre-candidate for a search term, the pre-candidate for the search term may be used as the search term,

the candidate displayed as the search term may be the search term other than the input that received the instruction not to be the search term.

The material search information storage means 21 is means for storing the search term inputted by the search term input means 19 in association with information on a certain page of the material.

The apparatus of the present invention may also further display candidates for the content type according to the kind of the presentation material, and associate and store the content type with each page of the presentation (or the presentation itself). At this time, the apparatus of the present invention reads the format (PowerPoint (registered trademark), PDF (registered trademark), Word (registered trademark), etc.) of the presentation stored in the storage unit. The device of the invention reads the text contained in the read format. The apparatus of the present invention includes a content analysis term database in which content analysis terms are stored. Next, the apparatus of the present invention analyzes the type of the content using the expression stored in the content analysis expression database. For example, when the material is PDF (registered trademark) and a text named "attached file" exists at the beginning, the "attached file" is extracted as a candidate for the content type of the material. Then, the "attached file" as the content type is displayed on the display section, and when the user inputs confirmation, the "attached file" related to the content type is stored in association with the data.

Fig. 9 is an example of a display screen. In this case, a certain page of the presentation data is displayed on the upper half of the display screen. Next, the candidates for the search term are displayed with and without icons (check blocks), and the candidates for each search term are displayed at the same time. The candidates for the search term are arranged in the order of the phrase, the subject word, and the keyword from the left in the example of fig. 9. The display unit may display the words in the data. In the example of fig. 9, the term extracted by the candidate search term extraction means 13 is checked in the block using the coring. A confirmation button is provided at the lower part of the display screen, and when a user uses the confirmation button and inputs a confirmation instruction to a computer (terminal device), candidates for the search term are confirmed. The device 1 that receives an input from the computer associates a certain page of the presentation with the confirmed search term (and the score of the individual search term) and stores the page in the storage unit.

The search term input means 19 is means for receiving an input indicating a search term among the search term candidates displayed on the display unit 15. In the example of fig. 9, the input of the check block functions as the search term input means 19. When the user inputs the candidate for the search term in the adopted state as not-adopted, for example, the user inputs a check in the non-adopted check block. The device 1 that receives an unused input from the check block sets the indicated candidate for the search term to the unused state. Next, when the user inputs a confirmation instruction to the computer (terminal device) using the confirmation button, the candidate for the search term is not used. Note that, after the apparatus 1 reduces the score of the candidate of the search term that is not used (for example, reduces the score by half), the candidate may be stored as the search term associated with the page. The candidate retrieval word extraction means 13 does not use a check block for checking out a word not extracted as a retrieval word (or does not check out any check block). When the user inputs the candidate for the search term in the non-adopted state as the adopted input, for example, the user inputs a check in the check block. The device 1 that receives the adoption input from the check block brings the candidate of the indicated search term into the adopted state. Next, when the user inputs a confirmation instruction to a computer (terminal device) using a confirmation button, the user adopts a candidate for a search term. That is, the term is associated with a page and stored as a term for the page. In this case, the search term selected by the user may be stored in a state where the score is added or multiplied.

Fig. 10 is a flowchart for explaining an example of use of the search material information storage device according to the present invention. That is, the figure is a diagram for explaining a search data information storage method using the search data information storage device. In the figure, S represents a step (step).

The user creates a presentation material (S101). In this way, the terminal device or the computer of the user stores the presentation data in the storage unit (or the storage unit of the server).

The apparatus 1 extracts, for each page of the presentation material, a term included in the page, that is, a material term (S102). In this case, the apparatus 1 may assign a score to the term in the document. For example, when the frequency of appearance of the phrase in the document is high, or when bold characters, colored characters, animation, or the like are accompanied, the phrase may be added and registered in advance, and the score may be given to the phrase in the document using the already registered addition information. The device 1 may have a dictionary of terms in the data, and the terms in the data may be stored in the dictionary in association with the scores for the terms in the data, or the device 1 may read the scores for the terms in the data. Then, the score of the term in the data may be obtained by using the score of the term in the data in the dictionary and the score related to the addition (for example, addition or multiplication). In this case, when the number of terms in the data is set in advance, the term with the higher score can be used as the term in the data.

The apparatus 1 extracts a plurality of keywords associated with the phrases in the material from the storage unit using the extracted phrases in the one or more materials (S103). The storage unit records a term which is a keyword associated with a phrase in the document. Thus, the apparatus 1 can use the wording in the material, thereby extracting the keyword related thereto from the storage section. At this time, a score as a search term may be given to each keyword. When the same keyword is selected from different materials, the keyword is highly likely to be a search term, and thus the keyword can be also used as an addend. In this case, the addend with a high keyword frequency may be registered in advance, and the addend according to the number of times of keyword repetition is read, and the score may be added or multiplied. This allows the scores of multiple keywords (and individual keywords) to be determined.

When the apparatus 1 extracts a subject word associated with a keyword from the storage unit using a plurality of keywords (S104), the process is the same as the keyword extraction process.

The apparatus 1 may use the extracted subject word and extract a classification word associated with the subject word from the storage section (S105). The process is any process.

The apparatus 1 extracts candidates for a search term of a certain page of the document from the subject term and the plurality of keywords (and the classification term) (S106). The apparatus 1 may be configured to store a control command for extracting a candidate for a search term in advance, and extract a candidate for a search term for a certain page of the document from the subject word and the plurality of keywords (and the phrases) based on the control command. An example of the control instruction is to extract four terms with higher scores among the keywords and two terms with higher scores (and all the terms) among the subject terms as candidates for the search term. In this way, a candidate for a search term for a page of the presentation material is automatically extracted. The storage unit may store the extracted candidate of the search term as a search term of a certain page.

Next, when the user confirms or determines the search term, the device 1 may display the candidate of the extracted search term on the display unit (S107). In this case, the target page (reduced) of the presentation, the subject term that is not a candidate for the search term, and the plurality of keywords (and the classification language) may be displayed on the display unit together. In this case, the user can select the search term.

When the user confirms directly, the terminal device receives an input concerning confirmation, and stores the candidate of the search term extracted by the apparatus 1 as the search term associated with a certain page of the presentation material directly in the storage unit (S111).

On the other hand, when the terminal device receives an input that does not use a candidate for a search term, or when the terminal device receives an input that uses a term that is not a candidate for a search term, the terminal device sets the candidate reflecting the corrected search term as a search term associated with a certain page in the storage unit (S121).

After the correction, if the user confirms, the terminal device accepts an input concerning the confirmation, and stores a candidate of the search term after the correction in the storage unit as a search term associated with a certain page of the presentation material (S122).

The present invention also provides a search data information storage program for causing a computer to function as means described below, or a computer-readable recording medium on which the search data information storage program is read.

The means comprises:

a phrase extraction means 3 for extracting phrases in the document, i.e., phrases contained in a certain page of the document;

a keyword storage means 5 for storing a term which is a keyword related to a phrase in the document;

a keyword extraction means 7 for extracting a plurality of keywords related to the phrases in the material from the keyword storage means 5 by using the phrases in the material extracted by the phrase extraction means 3;

a subject term storage means 9 for storing a subject term associated with the keyword;

a subject term extracting means 11 for extracting a subject term associated with the keyword from the subject term storing means 9 by using the plurality of keywords extracted by the keyword extracting means 7;

a search term candidate extraction means 13 for extracting a search term candidate for a page of the document from the subject term extracted by the subject term extraction means 11 and the plurality of keywords extracted by the keyword extraction means 7;

a search term candidate display means 17 for displaying the candidate of the search term extracted by the search term candidate extraction means 13 on the display unit 15;

search term input means 19 for receiving an input indicating a search term among the search term candidates displayed on the display unit 15; and

the document search information storage means 21 stores the search term inputted by the search term input means 19 in association with information on a certain page of the document.

Fig. 11 is a conceptual diagram (block diagram) for explaining a use example of the search material information storage device of the present invention. In this example, the basic Database (DB) includes a content DB, a customer DB, a record DB, and a DB storing other information. These databases are then interfaced to an engine called the Interactive-PRO framework. The engine can transmit and receive information with various terminal devices (such as tablet computers, mobile terminal devices and mobile phones) through an Application Programming Interface (API). In addition, the engine can transmit and receive information with the control program or application program of the client, HTML data, animation data, PowerPoint data, PDF data, document data, and database management software. In addition, the engine can synchronize with a server (cloud) and send and receive information. On the other hand, in the example of fig. 11, the server can transmit and receive information to and from various databases and software including a BI (business intelligence) of a customer, a CRM (customer relationship management), and a DWH (data warehouse).

Industrial applicability of the invention

The present invention can be used in the information providing industry.

Description of the reference numerals

1 data information storage device for search

3 term extraction means

5 keyword storage means

7 keyword extraction means

9 subject term storage means

11 means for extracting subject term

13 search term candidate extraction means

15 display part

17 display means for candidate search terms

19 search term input means

21 data search information storage means

23 data information storage means for search

25 classified term storage means

27 classified term extracting means

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种命名实体识别方法、装置、存储介质及电子设备

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!