Knowledge graph-based question and answer method and device and computer readable storage medium
1. A question-answering method based on a knowledge graph is characterized by comprising the following steps:
acquiring a user question and retrieval data, wherein the retrieval data comprises a target entity in the user question;
according to the retrieval data, retrieving index data of a plurality of SPO triples comprising the target entity from an index table of a pre-constructed knowledge graph, and according to the index data, acquiring the plurality of SPO triples from the knowledge graph, wherein for each SPO triplet, the S tuple and the O tuple are nodes in the knowledge graph, the S tuple and the O tuple are adjacent nodes, and the P tuple is an edge representing the relationship between the S tuple and the O tuple in the knowledge graph;
determining the matching degree of each SPO triple and the user question, and outputting the SPO triple with the highest matching degree with the user question as target question-answer data;
wherein the index table of the knowledge-graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table, and a data storage table, wherein the node index table includes: aiming at each node in the knowledge graph, the node ID of the node and an index position interval corresponding to the node ID are included; the data storage table includes: and aiming at the index position interval corresponding to each node ID, a data pair corresponding to each index position of the node is included, and the data pair comprises the adjacent node ID of the node and the relation ID of the node and the adjacent node.
2. The method of claim 1, wherein the retrieved data further comprises one or a combination of the following data:
retrieving a target relationship in the user question;
searching a target adjacent node with a preset connection direction with the target entity;
retrieving a target neighbor node type for the target entity;
retrieving a relationship between a first target entity and a second target entity of the connection target entities;
a common adjacency node between a third target entity and a fourth target entity of the connection target entities is retrieved.
3. The method of claim 1, wherein retrieving index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge-graph comprises:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
searching an adjacent node ID corresponding to each index position and a relation ID of the target node and the adjacent node from the data storage table according to the index position interval corresponding to the target node ID;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
4. The method of claim 2, wherein the data storage table comprises a first data storage table and a second data storage table;
the first data storage table is obtained by sequencing each data pair in a sequencing mode that adjacent node IDs are arranged from small to large according to an index position interval corresponding to each node ID and then corresponding the data pairs to index positions one by one according to the sequence of the index positions;
and the second data storage table is obtained by sequencing each data pair in a sequencing mode that the relation IDs are from small to large aiming at the index position interval corresponding to each node ID, and corresponding the data pairs and the index positions one by one according to the sequence of the index positions.
5. The method of claim 4, wherein if the retrieved data further comprises a target relationship in the user question;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target relation ID of the target relation from the mapping table of the relation-to-relation ID;
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a first target index position interval where the target relation ID is located from the second data storage table according to the index position interval corresponding to the target node ID;
aiming at each first target index position, retrieving an adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
6. The method of claim 4, wherein if the retrieving data further comprises retrieving a target neighboring node having a predetermined connection direction with the target entity, the predetermined connection direction comprises a forward connection direction and a reverse connection direction, the forward connection direction indicates that the target entity is an S-tuple, and the reverse connection direction indicates that the target entity is an O-tuple;
the second data storage table includes a first connection symbol indicating that the relationship ID is a forward connection direction and a second connection symbol indicating that the relationship ID is a reverse connection direction, which correspond to the relationship ID;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
if the preset connection direction is a forward connection direction, searching an adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the first connection symbol and according to the searching sequence of the relation IDs from large to small; alternatively, the first and second electrodes may be,
if the preset connection direction is the reverse connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the second connection symbol and the searching sequence of the relation IDs from small to large;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
7. The method of claim 4, wherein the index table of the knowledge-graph further comprises a node type index table, the node type index table comprising: for each node in the knowledge graph, taking the node as a subtree of a root node, wherein the subtree comprises the node ID, a starting child node type ID of a starting child node, an ending node type ID taking the node as an ending and a node type ID range from the starting child node type ID to the ending node type ID;
if the retrieval data further includes a target adjacent node type of the target entity, the retrieving, from an index table of a pre-constructed knowledge graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring the node type ID range from the node type index table;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a second target index position interval meeting the node type ID range from the first data storage table according to the index position interval corresponding to the target node ID;
for each second target index position in the second target index position interval, retrieving a target adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation from the data storage table;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
8. The method of claim 4, wherein if retrieving the data comprises retrieving a relationship between a first target entity and a second target entity of the connected target entities;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a first target node ID of the first target entity from the mapping table from the node to the node identification ID, and acquiring a second target node ID of the second target entity;
acquiring a first index position interval corresponding to the first target node ID from the node index table according to the first target node ID, and acquiring a second index position interval corresponding to the second target node ID from the node index table according to the second target node ID;
acquiring a first position number included in the first index position interval and a second position number included in the second index position interval, and if the first position number is smaller than the second position number, searching a first target adjacent node ID corresponding to the first index position from the first data storage table for each index position in the first index position interval;
determining a node ID including the second target node ID in the first target adjacent node ID according to each first target adjacent node ID to obtain a relationship ID including the second target node ID, the first target node ID and the second target node ID;
and obtaining index data of the SPO triple including the second target node ID and the first target node ID according to the second target node ID, the first target node ID and the relationship ID of the second target node ID.
9. The method according to claim 4, wherein if said retrieving data comprises retrieving a common neighboring node between a third target entity and a fourth target entity of the connected target entities;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a third target node ID of the third target entity from the mapping table from the node to the node identification ID, and acquiring a fourth target node ID of the fourth target entity;
acquiring a third index position interval corresponding to the third target node ID from the node index table according to the third target node ID, and acquiring a fourth index position interval corresponding to the fourth target node ID from the node index table according to a fourth target node ID;
retrieving, for each third index position in the third index position interval and for each fourth index position in the fourth index position interval, a common neighbor node ID from the first data storage table that has a same neighbor node ID as the third target node and the fourth target node;
determining a relationship ID of the third target node ID and the common adjacent node ID according to the common adjacent node ID, and determining a relationship ID of the fourth target node ID and the common adjacent node ID;
obtaining index data of the SPO triple including the third target node ID and the common adjacent node ID according to the common adjacent node ID, the third target node ID and the relationship ID of the common adjacent node ID, and obtaining index data of the SPO triple including the fourth target node ID and the common adjacent node ID according to the relationship ID of the common adjacent node ID, the fourth target node ID and the common adjacent node ID.
10. A knowledge-graph-based question answering device, comprising:
the system comprises an acquisition module, a search module and a processing module, wherein the acquisition module is used for acquiring a user question and search data, and the search data comprises a target entity in the user question;
the retrieval module is used for retrieving index data of a plurality of SPO triples comprising the target entity from a pre-constructed index table of a knowledge graph according to the retrieval data, and acquiring the plurality of SPO triples from the knowledge graph according to the index data, wherein for each SPO triplet, the S-tuple and the O-tuple are nodes in the knowledge graph, the S-tuple and the O-tuple are adjacent nodes, and the P-tuple is an edge representing the relationship between the S-tuple and the O-tuple in the knowledge graph;
the determining module is used for determining the matching degree of each SPO triple and the user question and outputting the SPO triple with the highest matching degree with the user question as target question-answer data;
wherein the index table of the knowledge-graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table, and a data storage table, wherein the node index table includes: aiming at each node in the knowledge graph, the node ID of the node and an index position interval corresponding to the node ID are included; the data storage table includes: and aiming at the index position interval corresponding to each node ID, a data pair corresponding to each index position of the node is included, and the data pair comprises the adjacent node ID of the node and the relation ID of the node and the adjacent node.
11. A knowledge-graph-based question answering device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: the steps of performing the method of any one of claims 1-9.
12. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 9.
Background
Knowledge-graph-based question answering is a question answering system for answering actual questions based on structured data of knowledge graphs. A knowledge graph is a structured form of knowledge organization, consisting of a strip of triplets containing facts. The SPO triple refers to a Subject-Predicate-Object triple: the S-tuple represents an entity name, the P-tuple represents an attribute name corresponding to the entity name, and the O-tuple represents an attribute value of the entity name.
In the knowledge-graph-based question-answering technology, the current popular Information Extraction (IE) -based method mainly comprises the steps of extracting key entities from user query queries by adopting technologies such as entity link and the like, constructing a batch of SPO triples (sub-graphs) containing answers according to the extracted key entities, finally constructing features to calculate the matching degree of the SPO triples (sub-graphs) and the user query, and finally selecting the SPO triples which best meet conditions after the processes such as sorting and the like.
At present, the mature scheme in the industry is to implement the construction of subgraphs by retrieving a graph database or a traditional database. According to the investigation situation, the scheme constructed by the subgraph is popular in the games of question and answer categories of the knowledge graph, and is a general solution, but for complex query, the retrieval is very time-consuming, and the online real-time retrieval for complex subgraphs cannot be achieved.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a knowledge-graph-based question answering method, apparatus, and computer-readable storage medium.
According to a first aspect of the embodiments of the present disclosure, there is provided a knowledge-graph-based question-answering method, including: acquiring a user question and retrieval data, wherein the retrieval data comprises a target entity in the user question;
according to the retrieval data, retrieving index data of a plurality of SPO triples comprising the target entity from an index table of a pre-constructed knowledge graph, and according to the index data, acquiring the plurality of SPO triples from the knowledge graph, wherein for each SPO triplet, the S tuple and the O tuple are nodes in the knowledge graph, the S tuple and the O tuple are adjacent nodes, and the P tuple is an edge representing the relationship between the S tuple and the O tuple in the knowledge graph;
determining the matching degree of each SPO triple and the user question, and outputting the SPO triple with the highest matching degree with the user question as target question-answer data;
wherein the index table of the knowledge-graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table, and a data storage table, wherein the node index table includes: aiming at each node in the knowledge graph, the node ID of the node and an index position interval corresponding to the node ID are included; the data storage table includes: and aiming at the index position interval corresponding to each node ID, a data pair corresponding to each index position of the node is included, and the data pair comprises the adjacent node ID of the node and the relation ID of the node and the adjacent node.
Optionally, the retrieval data further includes one or a combination of several of the following data:
retrieving a target relationship in the user question;
searching a target adjacent node with a preset connection direction with the target entity;
retrieving a target neighbor node type for the target entity;
retrieving a relationship between a first target entity and a second target entity of the connection target entities;
a common adjacency node between a third target entity and a fourth target entity of the connection target entities is retrieved.
Optionally, the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
searching an adjacent node ID corresponding to each index position and a relation ID of the target node and the adjacent node from the data storage table according to the index position interval corresponding to the target node ID;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, the data storage table comprises a first data storage table and a second data storage table;
the first data storage table is obtained by sequencing each data pair in a sequencing mode that adjacent node IDs are arranged from small to large according to an index position interval corresponding to each node ID and then corresponding the data pairs to index positions one by one according to the sequence of the index positions;
and the second data storage table is obtained by sequencing each data pair in a sequencing mode that the relation IDs are from small to large aiming at the index position interval corresponding to each node ID, and corresponding the data pairs and the index positions one by one according to the sequence of the index positions.
Optionally, if the retrieval data further includes a target relationship in the user question;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target relation ID of the target relation from the mapping table of the relation-to-relation ID;
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a first target index position interval where the target relation ID is located from the second data storage table according to the index position interval corresponding to the target node ID;
aiming at each first target index position, retrieving an adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, if the retrieval data further includes retrieving a target adjacent node having a preset connection direction with the target entity, where the preset connection direction includes a forward connection direction and a reverse connection direction, the forward connection direction represents that the target entity is an S-tuple, and the reverse connection direction represents that the target entity is an O-tuple;
the second data storage table includes a first connection symbol indicating that the relationship ID is a forward connection direction and a second connection symbol indicating that the relationship ID is a reverse connection direction, which correspond to the relationship ID;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
if the preset connection direction is the forward connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the searching sequence of the relation IDs from large to small; alternatively, the first and second electrodes may be,
if the preset connection direction is the reverse connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the searching sequence of the relation IDs from small to large;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, the index table of the knowledge-graph further includes a node type index table, and the node type index table includes: for each node in the knowledge graph, taking the node as a subtree of a root node, wherein the subtree comprises the node ID, a starting child node type ID of a starting child node, an ending node type ID taking the node as an ending and a node type ID range from the starting child node type ID to the ending node type ID;
if the retrieval data further includes a target adjacent node type of the target entity, the retrieving, from an index table of a pre-constructed knowledge graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring the node type ID range from the node type index table;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a second target index position interval meeting the node type ID range from the first data storage table according to the index position interval corresponding to the target node ID;
for each second target index position in the second target index position interval, retrieving a target adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation from the data storage table;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, if the search data includes a relationship between a first target entity and a second target entity in the search connection target entities;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a first target node ID of the first target entity from the mapping table from the node to the node identification ID, and acquiring a second target node ID of the second target entity;
acquiring a first index position interval corresponding to the first target node ID from the node index table according to the first target node ID, and acquiring a second index position interval corresponding to the second target node ID from the node index table according to the second target node ID;
acquiring a first position number included in the first index position interval and a second position number included in the second index position interval, and if the first position number is smaller than the second position number, searching a first target adjacent node ID corresponding to the first index position from the first data storage table for each index position in the first index position interval;
determining a node ID including the second target node ID in the first target adjacent node ID according to each first target adjacent node ID to obtain a relationship ID including the second target node ID, the first target node ID and the second target node ID;
and obtaining index data of the SPO triple including the second target node ID and the first target node ID according to the second target node ID, the first target node ID and the relationship ID of the second target node ID.
Optionally, if the search data includes a common adjacent node between a third target entity and a fourth target entity in the search connection target entities;
the retrieving, from an index table of a pre-constructed knowledge-graph, index data including a plurality of SPO triples of the target entity includes:
acquiring a third target node ID of the third target entity from the mapping table from the node to the node identification ID, and acquiring a fourth target node ID of the fourth target entity;
acquiring a third index position interval corresponding to the third target node ID from the node index table according to the third target node ID, and acquiring a fourth index position interval corresponding to the fourth target node ID from the node index table according to a fourth target node ID;
retrieving, for each third index position in the third index position interval and for each fourth index position in the fourth index position interval, a common neighbor node ID from the first data storage table that has a same neighbor node ID as the third target node and the fourth target node;
determining a relationship ID of the third target node ID and the common adjacent node ID according to the common adjacent node ID, and determining a relationship ID of the fourth target node ID and the common adjacent node ID;
obtaining index data of the SPO triple including the third target node ID and the common adjacent node ID according to the common adjacent node ID, the third target node ID and the relationship ID of the common adjacent node ID, and obtaining index data of the SPO triple including the fourth target node ID and the common adjacent node ID according to the relationship ID of the common adjacent node ID, the fourth target node ID and the common adjacent node ID.
According to a second aspect of the embodiments of the present disclosure, there is provided a knowledge-graph-based question answering apparatus, including: the system comprises an acquisition module, a search module and a processing module, wherein the acquisition module is used for acquiring a user question and search data, and the search data comprises a target entity in the user question;
the retrieval module is used for retrieving index data of a plurality of SPO triples comprising the target entity from a pre-constructed index table of a knowledge graph according to the retrieval data, and acquiring the plurality of SPO triples from the knowledge graph according to the index data, wherein for each SPO triplet, the S-tuple and the O-tuple are nodes in the knowledge graph, the S-tuple and the O-tuple are adjacent nodes, and the P-tuple is an edge representing the relationship between the S-tuple and the O-tuple in the knowledge graph;
the determining module is used for determining the matching degree of each SPO triple and the user question and outputting the SPO triple with the highest matching degree with the user question as target question-answer data;
wherein the index table of the knowledge-graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table, and a data storage table, wherein the node index table includes: aiming at each node in the knowledge graph, the node ID of the node and an index position interval corresponding to the node ID are included; the data storage table includes: and aiming at the index position interval corresponding to each node ID, a data pair corresponding to each index position of the node is included, and the data pair comprises the adjacent node ID of the node and the relation ID of the node and the adjacent node.
Optionally, the retrieval data further includes one or a combination of several of the following data:
retrieving a target relationship in the user question;
searching a target adjacent node with a preset connection direction with the target entity;
retrieving a target neighbor node type for the target entity;
retrieving a relationship between a first target entity and a second target entity of the connection target entities;
a common adjacency node between a third target entity and a fourth target entity of the connection target entities is retrieved.
Optionally, the retrieving module retrieves index data of a plurality of SPO triples including the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
searching an adjacent node ID corresponding to each index position and a relation ID of the target node and the adjacent node from the data storage table according to the index position interval corresponding to the target node ID;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, the data storage table comprises a first data storage table and a second data storage table;
the first data storage table is obtained by sequencing each data pair in a sequencing mode that adjacent node IDs are arranged from small to large according to an index position interval corresponding to each node ID and then corresponding the data pairs to index positions one by one according to the sequence of the index positions;
and the second data storage table is obtained by sequencing each data pair in a sequencing mode that the relation IDs are from small to large aiming at the index position interval corresponding to each node ID, and corresponding the data pairs and the index positions one by one according to the sequence of the index positions.
Optionally, if the retrieval data further includes a target relationship in the user question;
optionally, the retrieving module retrieves index data of a plurality of SPO triples including the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a target relation ID of the target relation from the mapping table of the relation-to-relation ID;
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a first target index position interval where the target relation ID is located from the second data storage table according to the index position interval corresponding to the target node ID;
aiming at each first target index position, retrieving an adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, if the retrieval data further includes retrieving a target adjacent node having a preset connection direction with the target entity, where the preset connection direction includes a forward connection direction and a reverse connection direction, the forward connection direction represents that the target entity is an S-tuple, and the reverse connection direction represents that the target entity is an O-tuple;
the second data storage table includes a first connection symbol indicating that the relationship ID is a forward connection direction and a second connection symbol indicating that the relationship ID is a reverse connection direction, which correspond to the relationship ID;
the retrieval module retrieves index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
if the preset connection direction is the forward connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the searching sequence of the relation IDs from large to small; alternatively, the first and second electrodes may be,
if the preset connection direction is the reverse connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the searching sequence of the relation IDs from small to large;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, the index table of the knowledge-graph further includes a node type index table, and the node type index table includes: for each node in the knowledge graph, taking the node as a subtree of a root node, wherein the subtree comprises the node ID, a starting child node type ID of a starting child node, an ending node type ID taking the node as an ending and a node type ID range from the starting child node type ID to the ending node type ID;
if the retrieval data further includes a target adjacent node type of the target entity, the retrieval module retrieves index data including a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge graph in the following manner: acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring the node type ID range from the node type index table;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a second target index position interval meeting the node type ID range from the first data storage table according to the index position interval corresponding to the target node ID;
for each second target index position in the second target index position interval, retrieving a target adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation from the data storage table;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, if the search data includes a relationship between a first target entity and a second target entity in the search connection target entities;
the retrieval module retrieves index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a first target node ID of the first target entity from the mapping table from the node to the node identification ID, and acquiring a second target node ID of the second target entity;
acquiring a first index position interval corresponding to the first target node ID from the node index table according to the first target node ID, and acquiring a second index position interval corresponding to the second target node ID from the node index table according to the second target node ID;
acquiring a first position number included in the first index position interval and a second position number included in the second index position interval, and if the first position number is smaller than the second position number, searching a first target adjacent node ID corresponding to the first index position from the first data storage table for each index position in the first index position interval;
determining a node ID including the second target node ID in the first target adjacent node ID according to each first target adjacent node ID to obtain a relationship ID including the second target node ID, the first target node ID and the second target node ID;
and obtaining index data of the SPO triple including the second target node ID and the first target node ID according to the second target node ID, the first target node ID and the relationship ID of the second target node ID.
Optionally, if the search data includes a common adjacent node between a third target entity and a fourth target entity in the search connection target entities;
the retrieval module retrieves index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a third target node ID of the third target entity from the mapping table from the node to the node identification ID, and acquiring a fourth target node ID of the fourth target entity;
acquiring a third index position interval corresponding to the third target node ID from the node index table according to the third target node ID, and acquiring a fourth index position interval corresponding to the fourth target node ID from the node index table according to a fourth target node ID;
retrieving, for each third index position in the third index position interval and for each fourth index position in the fourth index position interval, a common neighbor node ID from the first data storage table that has a same neighbor node ID as the third target node and the fourth target node;
determining a relationship ID of the third target node ID and the common adjacent node ID according to the common adjacent node ID, and determining a relationship ID of the fourth target node ID and the common adjacent node ID;
obtaining index data of the SPO triple including the third target node ID and the common adjacent node ID according to the common adjacent node ID, the third target node ID and the relationship ID of the common adjacent node ID, and obtaining index data of the SPO triple including the fourth target node ID and the common adjacent node ID according to the relationship ID of the common adjacent node ID, the fourth target node ID and the common adjacent node ID.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the knowledge-graph based question-answering method provided by the first aspect of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: by constructing an index table of the knowledge graph in advance, the index table of the knowledge graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table and a data storage table. When the user problem is obtained and the data is retrieved, the index data of the plurality of SPO triples of the target entity is retrieved from the index table of the knowledge graph, so that the adjacent nodes of the target entity in the knowledge graph and the relation between the target entity and the adjacent nodes can be quickly positioned, the SPO triples of the target entity can be quickly constructed, and the effect of quickly responding to the user problem is achieved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow diagram illustrating a method of knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method of knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 3 is a flow diagram illustrating a method of knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 4 is a flow diagram illustrating a method of knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 5 is a flow diagram illustrating a method of knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 6 is a flow diagram illustrating a method of knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 7 is a block diagram illustrating a knowledge-graph based question answering apparatus in accordance with an exemplary embodiment.
FIG. 8 is a block diagram illustrating an apparatus for knowledge-graph based question answering in accordance with an exemplary embodiment.
FIG. 9 is an exemplary diagram illustrating a node index table in accordance with an illustrative embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The present disclosure is applicable to knowledge-graph-based question-answering scenarios in which knowledge-graph data is stored in some form in a database, and sub-graphs of particular patterns can be retrieved and constructed depending on the retrieval function of the database. The databases commonly used to solve such problems are divided into two categories: one is a traditional relational database such as MySQL, SQLite, etc. The other is a special graph database which is specially optimized for storing and retrieving data of a graph structure, such as Neo4j, Nebula and the like. The relational database-based retrieval method is that a database is adopted to search triples meeting certain conditions, and corresponding subgraphs are pieced together according to the triples. The method is often used in quiz games, in which only the type of the candidate subgraph needs to be enough to cover as many question types as possible, and there is not much requirement on the time consumption of the subgraph construction. And the retrieval based on the graph database can support certain types of path retrieval and can be used for directly retrieving subgraphs. Taking query "a work played by actor a and actor B together", after key entities < actor a > and < actor B > are extracted, a subpicture of two entity single-hops satisfying the condition can be retrieved on Neo4j by using the following Cypher sentence.
The graph database is optimized to a certain extent compared with the traditional relational database for retrieving the data of the graph structure and the path. Thus, there is an application of graph databases in some of the spectrum-based industry's practices.
At present, the question and answer of the knowledge graph are realized mainly based on an information extraction method. When the knowledge graph is asked and answered based on the information extraction method, the SPO triple namely sub-graph retrieval has higher response speed and less time consumption aiming at the query of a single entity.
Under a question-and-answer scene, feasibility of constructing an online real-time question-and-answer system based on graph database retrieval is researched. The graph database has stronger capacity in retrieving subgraphs, has higher response speed and less time consumption for simple subgraph retrieval. However, for sub-graph retrieval of multiple entities, graph data time consumption still cannot well meet the scene of online real-time request, and the time consumption is about several hundred milliseconds or more. In an actual question-answering scene, a user often asks complex problems related to a plurality of entities, and in the retrieval of the graph database aiming at the subgraphs of the plurality of entities, the time consumption for retrieving the complex subgraphs is large, so that the requirement on the performance on a real line is difficult to meet.
The basic idea of the graph database is to abstract the graph data to a certain degree, support various different requirements of data addition, deletion, modification, retrieval and the like, and meanwhile, to meet the requirements of various different service scenes, the specific functions are subjected to generalized processing, but the specific optimization is not specially performed aiming at the characteristics of the question and answer scenes.
In practical application, knowledge graph data of the practical question-answering scene is basically fixed. This means that the knowledge-graph data has little need for deletion and modification. While for mature knowledge-graph data, the incremental data is essentially negligible. The real question answering function is only a retrieval function of the subgraph, and other modifications on the map data are not needed. Therefore, what is needed is a simplified version of a "graph database" in terms of constructing an index of knowledge-graph data that can focus on optimizing the search requirements in exchange for reducing the search time on-line by adequately computing off-line and building an index rationally. The technical scheme solves the problem that the indexing and storage mode of the map is optimized to reduce the retrieval time required for constructing the subgraph so as to meet the requirement of online real-time request.
In view of the above, in the knowledge-graph-based question-answering method disclosed in the present disclosure, the retrieved knowledge-graph data is optimized by constructing an index of the knowledge-graph data. Namely, by constructing an index table of the knowledge graph in advance, the index table of the knowledge graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table and a data storage table. After the user problem is obtained and data is retrieved, index data of a plurality of SPO triples of the target entity are retrieved from an index table of the knowledge graph, adjacent nodes of the target entity in the knowledge graph and the relation between the target entity and the adjacent nodes can be quickly positioned, the SPO triples of the target entity can be quickly constructed, and the purpose of quickly responding to the user problem is achieved.
Fig. 1 is a flow chart illustrating a method for a knowledge-graph based question answering method according to an exemplary embodiment, and as shown in fig. 1, the method for a knowledge-graph based question answering method includes the following steps.
In step S11, a user question is obtained, and retrieval data is obtained, the retrieval data including a target entity in the user question.
In one embodiment, the retrieved data may be data of tagged target entities obtained after mining of the entities by user questions, such as entity linking and other techniques.
In step S12, index data of a plurality of SPO triples including the target entity is retrieved from an index table of a pre-constructed knowledge graph, and the plurality of SPO triples are obtained from the knowledge graph according to the index data.
In step S13, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
In one embodiment, the index data comprising the plurality of SPO triples of the target entity may be retrieved from an index table of a pre-constructed knowledge-graph, for example, by:
for each SPO triple, the S-tuple and the O-tuple are nodes in the knowledge graph, the S-tuple and the O-tuple are adjacent nodes, and the P-tuple is an edge representing a relationship between the S-tuple and the O-tuple in the knowledge graph.
In a real-time question-answering scene of a knowledge graph, in order to meet an online real-time request of a user, an SPO triple of a target entity is quickly retrieved according to a query of the user, and the method can be established in advance and comprises the following steps: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table, and a knowledge graph index of a data storage table.
Wherein, the node index table includes: aiming at each node in the knowledge graph, the node ID of the node and an index position interval corresponding to the node ID are included; the data storage table includes: and aiming at the index position interval corresponding to each node ID, a data pair corresponding to each index position of the node is included, and the data pair comprises the adjacent node ID of the node and the relation ID of the node and the adjacent node.
Based on the index table of the pre-constructed knowledge graph, the adjacent nodes of the target entity in the knowledge graph and the relation between the target entity and the adjacent nodes can be quickly positioned, further, the SPO triple of the target entity can be quickly constructed, and the purpose of quickly responding to the user problem is achieved.
For example, an index table of the knowledge graph can be constructed in advance in the following manner:
aiming at each node (including S tuple and O tuple) in the knowledge graph, respectively establishing a mapping relation between the node and the node identification ID for each node to obtain a mapping table T from the node to the node identification IDn,TnNode ID in (2) is, for example, an integer from 0:
zhangtris- >0 (node ID)
(node) Litetra- >1 (node ID)
(node) wangwu- >2 (node ID)
(node) Zhao Liu- >3 (node ID)
………
Aiming at the edges of the relations of the connecting nodes in the knowledge graph, the mapping relation between the relation and the relation ID is respectively established for each relation, and a mapping table T from the relation to the relation ID is obtainedr,TrRelationship ID in (1) is, for example, an integer from 1:
(relationship) wife- >0 (relationship ID)
(relationship) partner- >1 (relationship ID)
………
Node index table TiThis may be represented, for example, as shown in FIG. 9:
node index shown in FIG. 9Watch TiIn the example of (1), 0 in fig. 9 indicates the node ID of yesan, 661 indicates the end index position of the index position section (1, 661) corresponding to the node ID (0) of yesan, and since the node ID (0) of yesan is the first node in the mapping relationship between the node and the node ID, it is only necessary to identify the end index position 661 corresponding to the node ID of yesan.
Accordingly, the index position section corresponding to node ID (1) of lie four is (662, 1023), and the index position section corresponding to node ID (2) of wang five is (1024, 1323).
The mapping of the identification ID to integer data has the following advantages: on one hand, the space occupied by each integer in the memory and the disk is the same, so that the storage is convenient; on the other hand, the sizes of the integers can be compared, and a list of integers can be sorted, so that the required integers can be quickly found (that is, entities/relations/attribute values corresponding to the integers can be conveniently found).
After mapping node-node identification IDs in the knowledge graph and mapping relation-relation IDs, firstly, counting the times of occurrence of the nodes and the relations in the SPO triples of the knowledge graph respectively. The node and the degree of correspondence of the node (i.e. the number of nodes associated by the SPO triplet of the knowledge graph) are marked as (node)i,counti) Relationships are edges of relationships of the knowledge graph, and relationships and the number of times the relationships appear in the graph are recorded as (edge)m,countm)。
Obtaining the relationship and the degree of correspondence of the relationship (edge)m,countm) Thereafter, relationships can be ID' ed, which is binary data (edge) for all relationshipsm,countm) According to the corresponding countmThe sorting can be in any mode of ascending or descending. Starting from the natural number 1, the ID of the integer corresponding to the relationship is the sorted serial number. For example, for a binary list of relationships { (nationality, 100), (occupation, 68), (gender, 235), (age, 98),(date of birth, 197), (place of birth, 155) }, the number of occurrences in ascending order gave: { (occupation, 68), (age, 98), (nationality, 100), (birth place, 155), (birth date, 197), (gender, 235) }. Or a related two-tuple list { (father, 6), (partner, 13), (wife, 25) }, which is { (wife, 25), (partner, 13), (father, 6) } after descending the order of occurrence, and the corresponding ID is: wife-1, partner-2, father-3.
After obtaining the node and the corresponding degree of the node (node)i,counti) Afterwards, the data storage and retrieval mode of the knowledge graph still adopts<Head entity S>-<Relation P>-<Tail entity/attribute value O>Triple data is the basis, except that the entities/attribute values/relationships are all represented by IDs. The storage of knowledge-graph data is explained step by step, taking the storage of a single node in a knowledge-graph as an example.
Assume that node a owns the following triplets: a-p0-b0,…,a-pi-bi,bi+1-pi+1-a,…,bi+j- pi+j-a. The ternary list may be represented as { (b)0,p0),…,(bi,pi),(bi+1,-pi+1),…,(bi+j,-pi+j) Where the negative sign in front of node a indicates that the current triplet is stored in reverse. From another perspective, it is actually centered on a node, storing its associated neighbor nodes and corresponding edge relationships.
To improve the efficiency of the search, the list may be converted to an ordered list for presentation. Two ordered lists are involved, one in terms of edge relationships (denoted as p)i) In ascending order according to the ID of (b)i) Are sorted in ascending order.
Under the condition of question answering, the requirement of indexing according to edges and node is met, so that two lists are required to be reserved and respectively marked as EaAnd Na. Although the ordering is different, since the number of triplets contained is the same, EaAnd NaIs also of lengthAs such. With the entities in FIG. 9<Zhang San>By way of example (where for ease of description, the entity Zhang three is considered to belong to the genre "actor" without the genre "singer"), the corresponding ID is 0, and it is assumed that the entity has triples of 0-1-3, 0-11-5, 0-96-12, 500-8-0, 0-1009-. Respectively obtaining lists E indexed by edges0And a list N indexed by node0As follows:
E0={(500,-8),(3,1),(5,11),(12,96),(1033,1009)}
N0={(3,1),(5,11),(12,96),(500,-8),(1033,1009)}
the number of nodes in the graph is assumed to be omega + psi (assuming that each type of entity has only one entity type), wherein omega is denoted as the number of entities in the graph, psi is denoted as the number of attribute values in the graph, and the node sequence number starts from 0 and ends at omega + psi-1. The data storage table can be respectively expressed as T according to different sorting methods of the relation and the nodesr=E0:E1:…:EΩ+Ψ-1,Tsn=N0:N1:…:NΩ+Ψ-1。
Wherein: indicating a connect operation. That is, the lists of the nodes are connected together to form a large list.
Data storage table TsrAnd TsnListing of neighbors of different nodes EaAnd NaAre connected end to end, which has the advantage that the whole map data can be uniformly represented by a list (in two ways of edge-based indexing and node-based indexing, the division into TsrAnd TsnTwo lists).
In so doing, the entire map can be conveniently serialized into binary data for direct storage in disk space. However, when it is necessary to read the adjacency list of a certain node, such as node i, it is necessary to know that the adjacent node corresponding to it is in the data storage table TsrAnd TsnThe start position and the end position of the node, which requires the construction of a node index table TiTo assist in completion.
And the node index table is a list with the length of omega + psi. The data is expressed as d0,d1,...,dΩ+Ψ-1}. Element d of position iiIt shows that the node i is in the data storage table TsrAnd TsnEnd position of data storage of the neighboring node in (1), di-1Is the end position of the element at position i-1, corresponding to the next position di-1+1 means that the node i is in the data storage table TsrAnd TsnThe start position of the data storage of the adjacent node in (1).
In fact, diIs equal to TsrSub-list E of0:E1:…:EiMinus 1 (not starting from 1 since the node positions are numbered from 0). Likewise, due to EaAnd NaAre of uniform length, diIs also equal to TsnSub-list N of0:N1:...:NiMinus 1.
Thus, the section in which the adjacent node data corresponding to the node i is stored can be represented as (d)i-1,di]. For node i equal to 0, d-1There is no such index, but the start index position of the entire list is 0, so the range of the section when i is 0 is (-1, d)0]Define d-1=-1。
Taking fig. 9 as an example, assuming that the entities lie four (ID is 1), king five (ID is 2), and zhao six (ID is 3), the corresponding adjacent node lists are respectively as follows (again, only a small number of triples are taken here for the convenience of discussion):
E1={(7,8),(9,4),(130,-76)},N1={(130,-76),(9,4),(7,8)}
E2={(23,8),(56,4),(99,-6)},N2={(99,-6),(56,4),(23,8)}
E3={(0,-1),(300,-48),(688,80)},N3={(300,-48),(0,-1),(688,80)}
the data storage table and node index table consisting of the entities of zhang san (ID of 0), li xi (ID of 1), wang wu (ID of 2), and zhao liu (ID of 3) are respectively:
Tsr={(500,-8),(3,1),(5,11),(12,96),(1033,1009),(7,8),(9,4),(130,-76),(23,8), (56,4),(99,-6),(0,-1),(300,-48),(688,80)}
Tsn={(3,1),(5,11),(12,96),(500,-8),(1033,1009),(130,-76),(9,4),(7,8),(99,-6), (56,4),(23,8),(300,-48),(0,-1),(688,80)}
Tiwhen the first index position corresponding to each entity ID is omitted, T is obtained { (1, 4) (5, 7) (8, 10) (11, 13) }i={4,7,10,13}
The node index table and the data storage table are directly serialized into binary data and stored in the disk space. The space occupied by the core data storage is calculated as follows: single ID is calculated by 4 byte integer data storage (number of nodes supported is 2)31) And one triplet only stores the edge relation and the adjacent node and occupies 8 bytes. Each triple in the map is stored twice by taking a head node and a tail node as centers. If the total number of the triples is delta, the disk space occupied by the whole node index table is 16 delta bytes. The size of the node index table is consistent with the number of nodes, the bytes occupied by the ID of a single integer are calculated according to 4 bytes, and the occupied space is 4 (omega + psi) bytes.
TABLE 1 statistics of data occupancy
Thus, for example, when index data of a plurality of SPO triples including the target entity is retrieved from an index table of a previously constructed knowledge graph, all adjacent nodes of the SPO triplet search node related to the target entity are retrieved for the target entity, and the node index table T is indexed from iiIn the data storage table T of the acquisition nodesnAnd TsrOf (d) and an end index position (d)i-1,di) The elapsed time is O (1) time. According toThe time taken to iteratively retrieve all the neighboring nodes for the index position interval is related to the number of neighboring nodes. Note that the number of adjacent nodes is H, and the time consumption is o (H). Since the adjacent nodes are searched in the continuous storage space, the searching speed is high. And acquiring the target node ID of the target entity from the mapping table from the node to the node identification ID.
According to the target node ID, obtaining an index position interval (d) corresponding to the target node ID from the node index tablei-1,di) According to the index position interval (d) corresponding to the target node IDi-1,di) And retrieving the adjacent node ID corresponding to each index position and the relation ID of the target node and the adjacent node from the data storage table, and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
After index data of the SPO triples of the target entity are obtained, a plurality of SPO triples can be obtained from the knowledge graph according to the index data, query and the SPO triples are matched, the matching degree of each SPO triplet and a user question is determined, the SPO triplet with the highest matching degree with the user question is finally determined according to a matching result, and the SPO triplet with the highest matching degree with the user question is output as target question-answer data.
In an exemplary embodiment of the present disclosure, by constructing an index table of a knowledge graph in advance, the index table of the knowledge graph includes: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table and a data storage table. After user questions are obtained and data are retrieved, index data of a plurality of SPO triples of the target entity are retrieved from an index table of the knowledge graph, adjacent nodes of the target entity in the knowledge graph and the relation between the target entity and the adjacent nodes can be quickly located, the SPO triples of the target entity can be quickly constructed and obtained, the query and the SPO triples of the target entity can be quickly matched based on the quickly constructed SPO triples of the target entity, target question and answer data are obtained, and the purpose of quickly responding to the user questions is achieved.
In this disclosure, when retrieving an SPO triplet in a knowledge graph based on the retrieval data, after the retrieval data includes a target entity in a user question, one or a combination of several of the following retrieval data may be further included:
and searching for the target relationship in the user problem, namely searching for the adjacent nodes of which the target nodes meet the target relationship in the user problem, such as the adjacent nodes of the target relationship [ partner ] which the target node < Zusan > in the query "which is the [ partner ] of Zusan > meets.
And retrieving a target adjacent node type having a preset connection direction with the target entity, wherein the preset connection direction comprises a forward connection direction and a reverse connection direction, the forward connection direction represents that the target entity is an S-tuple, and the reverse connection direction represents that the target entity is an O-tuple. That is, retrieving the adjacent node with the specified direction existing in the target entity, that is, the node associated with the triplet with the current node as the head entity (or tail entity). For example, the triplet defining the "out" direction is < zhang three > - < wife > - < answer >, "in" direction is < answer > - < partner > - < zhang three >.
The target neighbor type of the target entity is retrieved, i.e. the neighbor of the specified entity type is retrieved, e.g. the retrieved data includes which ones of the human entities that are directly connected to the target entity (zhang san) by triples.
And searching whether a directly connected relation exists between the two target nodes. I.e. whether there is a graph triple that associates two nodes. Such as query "what relation is between dandy and grandson", it is searched whether there are a first target node < dandy > and a second target node < grandson >, and whether there are < dandy > - <? Either- < grandchild > or < grandchild > - <? < triple of Deng > was used.
A common neighbor node of the two target nodes is retrieved. I.e., whether there is a common adjacent node, two given target entities are associated by two different triplets, such as "actors have played work a and work b together", the find entity type is designated "actor", and the common adjacent entities of the entities "work a" and "work b" can be associated. Such as "< work a > - <? p1> - < answer >, < work B > - <? p2> - < answer > "the nodes that can be found have < kudzu >, etc.
Where "< answer >" represents the common contiguous nodes that exist within the subgraph that never appear in the query.
Thus, a correct sub-graph is obtained by the combination of the basic search operations described above.
The present disclosure below exemplifies the basic search data concerned.
Fig. 2 is a flow chart illustrating a method for a knowledge-graph based question-answering method according to an exemplary embodiment, and as shown in fig. 2, the method for a knowledge-graph based question-answering method includes the following steps.
In step S21, a user question is obtained, and data is retrieved, the retrieved data including a target entity in the user question and also including a target relationship in the user question.
In step S22, index data of a plurality of SPO triples including the target entity is retrieved from an index table of a pre-constructed knowledge graph, and the plurality of SPO triples are obtained from the knowledge graph according to the index data.
In step S23, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
In one embodiment, the index data comprising the plurality of SPO triples of the target entity may be retrieved from an index table of a pre-constructed knowledge-graph, for example, by:
obtaining target relation ID of target relation from mapping table of relation to relation ID, recording ID of target relation as ri. And acquiring the target node ID of the target entity from the mapping table of the node to the node identification ID, wherein the target node ID of the target entity is i.
According to the ID i of the target node, the slave node indexes the table TiIn the data storage table T of the target nodesnAnd TsrIn (d), i.e., the start index position and the end index position (d)i-1,di) The elapsed time is recorded as O (1) time. Obtaining an index position section (d)i-1,di) Then, according to the ID r of the target relationiRetrieving the data storage table Tsr. Due to TsrThe table is obtained by sequencing each data pair according to the sequencing mode that the ID of the relation is from small to large and corresponding the data pair to the index position one by one according to the sequence of the index position, namely TsrThe IDs of the relationships in the table are ordered and the relationship IDs are comparable in size.
Further, for the index position interval (d)i-1,di) For example, a binary search is performed to obtain a first target index position section in which the target relationship ID is located. Time consuming single search log2(di-di-1) Note that the number of adjacent nodes is H, and the total time consumption is o (logh).
In step S23, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
In an exemplary embodiment of the present disclosure, the table T is indexed from nodeiIn the index position interval (d) of the target nodei-1,di) Then due to TsrThe table is obtained by sequencing each data pair according to a sequencing mode that the relation ID is from small to large and corresponding the data pairs with the index positions one by one according to the sequence of the index positions. And then at TsrWhen index data of the SPO triple including the target node is searched in the table, the index position interval (d) is not requiredi-1,di) Retrieval one by one, but directly from the ID size of the target relationship, and TsrThe arrangement sequence of the relation IDs in the table quickly locks the target index position interval of the target relation, so that adjacent nodes meeting the target entity and the target relation in the knowledge graph can be quickly obtained, the purpose of quickly constructing and obtaining the SPO triple of the target entity and quickly responding to the user problem in real time is achieved.
Fig. 3 is a flow chart illustrating a method of knowledge-graph based question answering according to an exemplary embodiment, and as shown in fig. 3, the method of knowledge-graph based question answering includes the following steps.
In step S31, a user question is obtained, and data is retrieved, where the data includes a target entity in the user question and a target adjacent node having a preset connection direction with the target entity.
The preset connection direction includes a forward connection direction and a reverse connection direction, the forward connection direction may represent that the target entity is an S-tuple, and the reverse connection direction may represent that the target entity is an O-tuple.
In step S32, index data of a plurality of SPO triples including the target entity is retrieved from an index table of a pre-constructed knowledge graph, and the plurality of SPO triples are obtained from the knowledge graph according to the index data.
In step S33, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
In one embodiment, the index data comprising the plurality of SPO triples of the target entity may be retrieved from an index table of a pre-constructed knowledge-graph, for example, by:
and acquiring the target node ID of the target entity from the mapping table of the node to the node identification ID, wherein the target node ID of the target entity is i. According to the target node ID i, acquiring an index position interval (d) corresponding to the target node ID i from the node index tablei-1,di) Then, the data storage table T is searchedsr。
Due to the pre-stored TsrThe first connection symbol corresponding to the relationship ID and indicating that the relationship ID is a forward connection direction and the second connection symbol corresponding to the relationship ID and indicating that the relationship ID is a reverse connection direction are included.
The first connection symbol may be "plus +", and in practical applications, the first connection symbol may be a number itself by default. The second connection sign may be "minus-", and for a target adjacent node in the reverse connection direction, the target adjacent node may be preceded by "-".
Thus, if the preset connection direction is the forward connection direction, the relationship IDs are searched in descending order, that is, according to the storage table T, according to the index position section corresponding to the target node IDsrUntil all target neighboring nodes in the forward direction are searched, the number of search times is related to the number of edges in the forward direction, e.g., the number of search times is recorded as sum HbTime complexity of O (H)b) And obtaining the adjacent node ID corresponding to each index position and the relation ID of the target node and the adjacent node.
If the preset connection direction is the reverse connection direction, according to the index position interval corresponding to the target node ID, according to the retrieval sequence of the relationship IDs from small to large, namely according to the storage table TsrUntil all target neighboring nodes in the reverse connection direction are searched, the number of search times is related to the number of reverse edges, for example, the number of search times is recorded as sum HfTime complexity of O (H)f) And obtaining the adjacent node ID corresponding to each index position and the relation ID of the target node and the adjacent node.
In an exemplary embodiment of the present disclosure, since the table T is indexed from the nodeiIn the index position interval (d) of the target nodei-1,di) Then due to TsrThe table includes a first connection symbol indicating that the relationship ID is a forward connection direction and a second connection symbol indicating that the relationship ID is a reverse connection direction corresponding to the relationship ID, and then at TsrWhen index data of the SPO triple including the target node is searched in the table, the target index position interval of the target relationship can be quickly locked directly according to the connection symbol of the ID of the target relationship and the arrangement sequence of the relationship IDs, so that adjacent nodes meeting the target entity and the target relationship in the knowledge graph can be quickly obtained, the purpose of quickly constructing the SPO triple of the target entity and quickly responding to user problems in real time is achieved.
Fig. 4 is a flow chart illustrating a method of knowledge-graph based question answering according to an exemplary embodiment, and as shown in fig. 4, the method of knowledge-graph based question answering includes the following steps.
In step S41, a user question is obtained, and data is retrieved, the data including a target entity in the user question and also including a target neighbor node type of the target entity.
In step S42, index data of a plurality of SPO triples including the target entity is retrieved from an index table of a pre-constructed knowledge graph, and the plurality of SPO triples are obtained from the knowledge graph according to the index data.
In step S43, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
The index table of the knowledge graph further comprises a node type index table, wherein the node type index table comprises: for each node in the knowledge-graph, a sub-tree with the node as a root node includes the node ID, a starting child node type ID of a starting child node, an ending node type ID with the node as an end, and a node type ID range from the starting child node type ID to the ending node type ID.
In one embodiment, the node type index table may be constructed, for example, as follows:
the nodes are first ID-processed, and the ID-processing of the nodes is a binary combination of the nodes (nodes)i,counti) The number of the newly added entity type. A node contains an entity and an attribute value, however the attribute value is without an entity type. Here, attribute values are given a type number xi (in the entity type index table T)tIn (1), the number of entity types is marked as xi, 1, 2, xi-1 represents different entity types, and the number xi is not used). If the type of the attribute value is additionally considered, the entity type index table TtCan be expanded to Tt={s0,s1,...sξ-1ξ }. An entity can belong to multiple types simultaneouslyThe first step requires merging entity types. For example, "zhang san" has "characters" and "singer" tags, and can be merged into "singer" because entities of type "singer" are searched when entities of type "characters" are retrieved. In the division of entity types, a singer is a node under a subtree with a character as a root, and the root node of the character can be removed at the moment. However, an entity may have multiple mutually exclusive and non-combinable type tags, such as "zhang san" being an "actor" (type number 0) and a "singer" (type number 1). In this case, the entity may be split into different ternary combinations (nodes) according to different entity typesi,counti,typei1),...,(nodei,counti,typeim) Such as (Zhang three, 100, 0), (Zhang three, 100, 1). The principle of node ID is to sort twice according to ascending order taking the node type as a main sorting key and descending order taking the total count as a secondary sorting key. From the natural number 0, the ID of the integer corresponding to the node (entity, attribute value) is the sequence number of the node. Such as for a nodei,counti,typei) The tuple list { (zhang three, 661,0), (zhang three, 661, 1), (wang five, 300,1), (lie four, 362,1), (work a, 13,3), (work b, 5, 4), (work c, 10,5), (china, 213,6), (zhao six, 65,2) }, which is secondarily sorted is { (zhang three, 661,0), (zhang three, 661, 1), (lie four, 362,1), (wang five, 300,1), (zhao six, 65,2), (work a, 13,3), (work b, 5, 4), (work c, 10,5), (china, 213,6) }. The entity mapping relationship after the entity type number corresponds to is Zhang three-0, Zhang three-1, Li four-2, Wang five-3, Zhao six-4, work A-5, work B-6, work C-7 and Chinese-8. The one entity, Zhang three, corresponds to two IDs because it has two mutually exclusive entity types. The mapping table of nodes to IDs is denoted as TnMapping tables T different from relationships and IDsrThe mapping of nodes and IDs is not one-to-one, but the node + entity type is in one-to-one correspondence with IDs.
To type of knittingIn the case of the number, the type "character" is taken as an example, and the range of the type number is 0,1, 2 (including the types "character", "singer" and "actor"), that is, in the interval [0, 2 ]]In (1). The entity ID is firstly arranged according to the entity type in an ascending order, and the process ensures that the entity ID of the same type is numbered. For example, the above persona entities: zhangsan-0, Zhangsan-1, Li four-2, Wangwu-3, Zhao six-4, the number is between 0-4; the above work entity: the number of the works A-5, B-6 and C-7 is between 5 and 7. Let ei be the largest ID value among the entities corresponding to the entity type numbered i. The ID value can be recorded into the entity type index table, and the entity type index table T is updatedtIs Tt={(s0,e0),(s1,e1),...(sξ-1,eξ-1),(ξ,eξ)}. Indexing the table T by a new entity typetIt can be determined that the ID of the largest node corresponding to the entity type with the entity type number i is ei. The entity number range corresponding to the entity type with the entity type number i is [ s ]i,i]. So si-1 is the last entity type, the maximum ID of the node of which is
According to the principle of ID property, the ID of the end node of the previous entity type is also the start ID of the corresponding node of the next entity type, so that the ID number of the node corresponding to the entity type number i is
Taking { (zhang three, 661,0), (zhang three, 661, 1), (lii four, 362,1), (wang five, 300,1), (zhao six, 65,2), (works a, 13,3), (works b, 5, 4), (works c, 10,5), (china, 213,6) } as an example, the corresponding original entity type index table is Tt0,1, 0, 3,4, 3, 0. The entity type numbered 5 is "work", and the numerical value corresponding to the entity type numbered 5 on the index table is 3, which means that the three entity types are 3,4 and 5"movies", "music" and "works" are all of the "works" type. The updated entity type index table T is based on the above-mentioned principlet{ (0,0), (1, 3), (0, 4), (3, 5), (4, 6), (3, 7), (0, 8) }, the entity type index table is expressed in an alternative manner:
entity type: actor (T)tPosition 0) starting entity type number: s00, maximum entity ID: e.g. of the type0=0
Entity type: singer (T)t1 position) starting entity type number: s11, maximum entity ID: e.g. of the type1=3
Entity type: character (T)t2 position) starting entity type number: s20, maximum entity ID: e.g. of the type2=4
Entity type: movie (T)t3 position) starting entity type number: s33, maximum entity ID: e.g. of the type3=5
Entity type: music (T)t4 position of) starting entity type number: s44, maximum entity ID: e.g. of the type4=6
Entity type: work (T)t5 position) starting entity type number: s53, maximum entity ID: e.g. of the type5=7
Entity type: default (T)t6 position) starting entity type number: s60, maximum entity ID: e.g. of the type6=8
Take the entity type "work" as example i-5, si=3,ei7, its corresponding entity type range is 3-5(3,4,5), i.e. [ s ]i,i]. Thus, entities within entity types 3-5 are all entities of the type "works". The previous entity type number is 2 (i.e., s)i-1) it corresponds to a maximum entity ID of 4, i.e.The maximum entity ID corresponding to the entity type number 5 is 7 (e)i6). Thus entity type "work" (entity type number)Is 5) the corresponding entity ID range is (4,7), i.e.Work A-5, work B-6, and work C-7. Index table T by entity type number i and entity typetAnd the range of the corresponding entity ID can be obtained.
Thus, when the relationship between the first target entity and the second target entity in the search data is included in the search connection target entities, the index data of the plurality of SPO triples including the target entities can be retrieved from the index table of the pre-constructed knowledge graph, for example, as follows:
obtaining the target node ID of the target entity from the mapping table from the node to the node identification ID as i, and obtaining the node type ID range from the node type index table as iAccording to the target node ID i, acquiring an index position interval (d) corresponding to the target node ID i from the node index tablei-1,di). According to the index position interval (d)i-1,di) Can be selected from the data storage table TsnIn (1), obtain the adjacent node list N of the node ii. Then in NiIn, the search ID is located atNodes of the interval range. For example, two binary queries can be used to determine that the adjacent node satisfying the condition is in NiInterval range of (1), this step takes 2log2(di-di-1). Assume that the number of contiguous entities of this type is HTypeQuerying adjacent nodes in the corresponding type interval, wherein the iteration number is HTypeComplexity of O (H)Type)。
In an exemplary embodiment of the present disclosure, since the table T is indexed from the nodeiIn the index position interval (d) of the target nodei-1,di) Then due to TsnThe table is obtained by sorting each data pair in a sorting mode that adjacent node IDs are gradually increased according to the index position interval corresponding to each node ID, and then corresponding the data pairs and the index positions one by one according to the sequence of the index positions. Thus, in the index position interval (d) of the acquisition target nodei-1,di) Then, can be from TsnAnd determining an index position interval meeting the node type ID range in the table, rapidly obtaining adjacent nodes meeting the target entity and the target adjacent node type in the knowledge graph, rapidly constructing and obtaining the SPO triple of the target entity, and rapidly responding to the user problem in real time.
Fig. 5 is a flow chart illustrating a method of knowledge-graph based question answering according to an exemplary embodiment, and as shown in fig. 5, the method of knowledge-graph based question answering includes the following steps.
In step S51, a user question is obtained, and data is retrieved, the retrieved data including two or more target entities in the user question, and further including retrieving a relationship between a first target entity and a second target entity in a connection target entity.
In step S52, index data of a plurality of SPO triples including the target entity is retrieved from an index table of a pre-constructed knowledge graph, and the plurality of SPO triples are obtained from the knowledge graph according to the index data.
In step S53, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
In one embodiment, the node index table T is searched with the first target node ID of the first target entity as i and the second target node ID of the second target entity as jiObtaining a first index position interval (d) corresponding to the first target node IDi-1,di) And a second index position section (d) corresponding to the second target node IDj-1,dj). Taking a section with fewer adjacent nodes (assuming d)i-di-1<dj-dj-1) In the data storage table TsnIn (1), obtain the adjacent node list N of the node ii. In NiThe middle dichotomy searches the relation between the node j and the corresponding node, and the time complexity is O (log (d)i-di-1))。
In step S43, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
In exemplary embodiments of the present disclosure, a first index position interval (d) corresponding to a first target node IDi-1,di) And a second index position section (d) corresponding to the second target node IDj-1,dj) The number of positions included is compared, and the index position interval with a smaller number of positions is used as the index position interval, which can be stored in the first data storage table TsnAnd rapidly searching the adjacent node ID corresponding to each index position and the relation ID of the target node and the adjacent node, rapidly obtaining the SPO triple comprising the first target entity and the second target entity in the knowledge graph, and rapidly responding to the user problem in real time.
Fig. 6 is a flow chart illustrating a method of knowledge-graph based question answering according to an exemplary embodiment, and as shown in fig. 6, the method of knowledge-graph based question answering includes the following steps.
In step S61, a user question is obtained, and data is retrieved, the retrieved data comprising two or more target entities in the user question, and further comprising retrieving a common adjacency node between a third target entity and a fourth target entity in the connection target entities.
In the present disclosure, a common adjacent node indicates whether there is a node, and two given entities are associated by two different SPO triples, for example, query "having actors who have played a work a and a work b together", and the type of the found entity is designated as "actor", and two target entities of the entities "work a" and "work b" can be associated at the same time. For example, the resulting SPO triplet "< work a > - <? p1> - < answer >, < work B > - <? p2> - < answer > ", the physical neighbor node that can be found has < kudzu >. Thus, the entity node < kudzuvine > is a common adjacent node of the third target entity and the fourth target entity, namely, the work A and the work B.
In step S62, index data of a plurality of SPO triples including the target entity is retrieved from an index table of a pre-constructed knowledge graph, and the plurality of SPO triples are obtained from the knowledge graph according to the index data.
In step S63, the matching degree of each SPO triplet with the user question is determined, and the SPO triplet with the highest matching degree with the user question is output as the target question-answer data.
Thus, when the common adjacent node between the third target entity and the fourth target entity in the connection target entity is included in the retrieval data, the index data of the plurality of SPO triples including the target entity can be retrieved from the index table of the pre-constructed knowledge graph, for example, as follows:
for example, the node index table T is searched with the third target node ID of the third target entity as i and the fourth target node ID of the fourth target entity as jiObtaining a third index position interval (d) corresponding to the third target node IDi-di-1) And a fourth index position section (d) corresponding to the fourth target node IDj-dj-1). A common neighbor node of the two nodes is retrieved. For node i and node j, the conversion can be performed in the data storage table TsnWherein the problem of common subsequences is solved for two ordered arrays. The maximum number of iterations is (d)i-di-1)+(dj- dj-1)。
In practical applications, in a question and answer scenario, a query with constraints is common, such as "the role played by plum in a movie work D is", "the constellation is a twin star of women", and "singers who live in Chicago", etc. The possible sub-picture paths involved respectively are "< movie work d > - <? p1> - < answer >, < answer > - <? p2> - < li + a, "< answer > - <? p1> - < star >, < answer > - <? p2> - < Gemini >, < answer > - <? p3> - < woman > "," < answer > - <? p1> - < Chicago >, < answer > - <? p2> - < singer > ". It is a common requirement to retrieve common nodes of two or more entities to construct a candidate subgraph. In the constraint relationship, there are often two entities with very different volumes, such as those of chinese in "movie work one", because there are a large number of characters of chinese nationality, and the number of nodes (possibly up to millions) in the association of the entity "china" may be much larger than the number of nodes (possibly only tens) in "movie work one".
Suppose that two nodes that need to compute a common node are i and j, respectively, and the number of adjacent nodes of node i is much larger than the number of adjacent nodes of j. It is abbreviated here that the number of adjacent nodes is μ ═ di-di-1And v ═ dj-dj-1Where v is much smaller than μ, v < μ. Normally, the maximum time it takes to iteratively find the common node is μ + v. The optimization scheme is to partition the adjacent node list of i by adopting a binary tree construction mode and a layered retrieval mode.
The mode of splitting the list each time is to extract the intermediate node, and the split list is a smaller list. The whole list can be split into a full binary tree, the leaf nodes of the tree are sublists, and the non-leaf nodes are intermediate nodes generated by splitting each time. Suppose that the depth of the constructed binary tree is k + 1. Each leaf node represents a list of lengthOr isIn fact, for an ordered list, such a list does not need to be constructed, and only all the non-leaf nodes of the binary tree need to be extracted, and their positions and IDs need to be recorded into one ordered list. The mode of solving the common node of the node i and the node j each time is to select the adjacent node sigma of the node j and search twice. First searching for an ordered list representing a binary treeTo the sub-list where the neighboring node may be located. Searching whether the node sigma exists in the sublist for the second time, wherein the maximum searching times of the sublist isSince each of the neighboring nodes σ of node j is also ordered, the maximum number of searches for all v neighboring nodes to retrieve the ordered list representing the binary tree may be v +2k-1, wherein 2k-1 is the number of nodes in the ordered list of the binary tree, i.e. the number of non-leaf nodes of the binary tree. The maximum retrieval times for searching the common node of the two nodes is v mu/2k+v+2k-1, wherein v +2k-1 is the number of first step searches of all neighbors σ of node j, v μ/2kIs the second step search times for all the neighboring nodes σ of node j. The depth of the full binary tree can be freely set, if the depth k +1 of the tree is reasonably set, 2 is enabledkWhere λ v < μ, the number of searches can be expressed as μ/λ + (λ +1) v-1. The choice of appropriate values for λ and k can be reduced considerably compared to μ + v.
The optimization procedure described above, in essence, is a list N of neighbor nodes for node iiA hierarchical search is made. By default, a full binary tree with a depth of 0 is constructed, and the list is only one. Or, maximizing the depth of the binary tree, at this timeSuch that there is only one node or two nodes per sub-list. At this time, it is searched for whether or not the adjacent node σ of the node j exists in the adjacent node list N each timeiIn the method, a binary search is equivalent to one binary search, which is equivalent to searching whether an adjacent node sigma of a node j is a common node of nodes i and j by adopting a binary method every time, and the search frequency is vlog2Mu.m. The idea of partition retrieval is unified from the fact that an iterative traversal mode and a binary retrieval mode are unified.
In an exemplary embodiment of the present disclosure, connecting the third target entity and the third target entity of the target entities for retrieving the data comprisesCommon adjacent nodes among the four target entities can aim at a first index position interval (d) corresponding to a third target node IDi-1,di) And a second index position section (d) corresponding to the fourth target node IDj-1,dj) In the data storage table TsnThe method and the device have the advantages that the problem of searching the common adjacent node ID with the same adjacent node ID as the third target node and the fourth target node is solved, the index data of the SPO triple comprising the third target node ID and the common adjacent node ID and the index data of the SPO triple comprising the fourth target node ID and the common adjacent node ID in the knowledge graph are quickly obtained, and the purpose of quickly responding to the user problem in real time is achieved. Through the above exemplary embodiments, the present disclosure optimizes the sub-map retrieval portion, and compared with the graph database-based retrieval, the present disclosure has a great improvement in retrieval speed. In our investigations, the graph database is used, in part, to take hundreds of milliseconds in a single hop through a single entity, and several hundreds of milliseconds or more in a single hop through two or more entities. With the present disclosure, the search time of the N entity M-hop pattern (N1, 2, 3M 1, 2) is shortened to several milliseconds to several tens of milliseconds in common cases. Normally, all subgraphs containing N entities and M-hop patterns are mined (N is 1, 2, 3M is 1, 2), and the overall subgraph module takes less than 100 milliseconds.
FIG. 7 is a block diagram of a knowledge-graph based question answering apparatus 700, according to an exemplary embodiment. Referring to fig. 7, the apparatus includes an acquisition module 701, a retrieval module 702, and a determination module 703.
The obtaining module 701 is configured to obtain a user question and retrieve data, where the retrieve data includes a target entity in the user question;
a retrieving module 702, configured to retrieve, according to the retrieval data, index data of a plurality of SPO triples including the target entity from an index table of a pre-constructed knowledge graph, and obtain, according to the index data, the plurality of SPO triples from the knowledge graph, where, for each SPO triplet, the S-tuple and the O-tuple are nodes in the knowledge graph, the S-tuple and the O-tuple are adjacent nodes, and the P-tuple is an edge in the knowledge graph, which represents a relationship between the S-tuple and the O-tuple;
a determining module 703, configured to determine a matching degree between each SPO triplet and the user question, and output the SPO triplet with the highest matching degree with the user question as target question-answer data;
wherein the index table of the knowledge-graph comprises: a mapping table of node to node identification IDs, a mapping table of relationship to relationship IDs, a node index table, and a data storage table, wherein the node index table includes: aiming at each node in the knowledge graph, the node ID of the node and an index position interval corresponding to the node ID are included; the data storage table includes: and aiming at the index position interval corresponding to each node ID, a data pair corresponding to each index position of the node is included, and the data pair comprises the adjacent node ID of the node and the relation ID of the node and the adjacent node.
Optionally, the retrieval data further includes one or a combination of several of the following data:
retrieving a target relationship in the user question;
searching a target adjacent node with a preset connection direction with the target entity;
retrieving a target neighbor node type for the target entity;
retrieving a relationship between a first target entity and a second target entity of the connection target entities;
a common adjacency node between a third target entity and a fourth target entity of the connection target entities is retrieved.
Optionally, the retrieving module 702 retrieves index data including a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
searching an adjacent node ID corresponding to each index position and a relation ID of the target node and the adjacent node from the data storage table according to the index position interval corresponding to the target node ID;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, the data storage table comprises a first data storage table and a second data storage table;
the first data storage table is obtained by sequencing each data pair in a sequencing mode that adjacent node IDs are arranged from small to large according to an index position interval corresponding to each node ID and then corresponding the data pairs to index positions one by one according to the sequence of the index positions;
and the second data storage table is obtained by sequencing each data pair in a sequencing mode that the relation IDs are from small to large aiming at the index position interval corresponding to each node ID, and corresponding the data pairs and the index positions one by one according to the sequence of the index positions.
Optionally, if the retrieval data further includes a target relationship in the user question;
optionally, the retrieving module 702 retrieves index data including a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge graph in the following manner:
acquiring a target relation ID of the target relation from the mapping table of the relation-to-relation ID;
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a first target index position interval where the target relation ID is located from the second data storage table according to the index position interval corresponding to the target node ID;
aiming at each first target index position, retrieving an adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, if the retrieval data further includes retrieving a target adjacent node having a preset connection direction with the target entity, where the preset connection direction includes a forward connection direction and a reverse connection direction, the forward connection direction represents that the target entity is an S-tuple, and the reverse connection direction represents that the target entity is an O-tuple;
the second data storage table includes a first connection symbol indicating that the relationship ID is a forward connection direction and a second connection symbol indicating that the relationship ID is a reverse connection direction, which correspond to the relationship ID;
the retrieving module 702 retrieves index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge-graph in the following manner:
acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
if the preset connection direction is the forward connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the searching sequence of the relation IDs from large to small; alternatively, the first and second electrodes may be,
if the preset connection direction is the reverse connection direction, searching the adjacent node ID corresponding to each index position, the target node and the relation ID of the adjacent node from the second data storage table according to the index position interval corresponding to the target node ID and the searching sequence of the relation IDs from small to large;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, the index table of the knowledge-graph further includes a node type index table, and the node type index table includes: for each node in the knowledge graph, taking the node as a subtree of a root node, wherein the subtree comprises the node ID, a starting child node type ID of a starting child node, an ending node type ID taking the node as an ending and a node type ID range from the starting child node type ID to the ending node type ID;
if the retrieved data further includes a target adjacent node type of the target entity, the retrieving module 702 retrieves, from an index table of a pre-constructed knowledge graph, index data including a plurality of SPO triples of the target entity in the following manner: acquiring a target node ID of the target entity from the mapping table from the node to the node identification ID;
acquiring the node type ID range from the node type index table;
acquiring an index position interval corresponding to the target node ID from the node index table according to the target node ID;
determining a second target index position interval meeting the node type ID range from the first data storage table according to the index position interval corresponding to the target node ID;
for each second target index position in the second target index position interval, retrieving a target adjacent node ID corresponding to the index position and the target relation ID representing the target node and the adjacent node relation from the data storage table;
and obtaining the index data of the SPO triple of the target entity according to the adjacent node ID of the target node and the relation ID of the target node and the adjacent node.
Optionally, if the search data includes a relationship between a first target entity and a second target entity in the search connection target entities;
the retrieving module 702 retrieves index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge-graph in the following manner:
acquiring a first target node ID of the first target entity from the mapping table from the node to the node identification ID, and acquiring a second target node ID of the second target entity;
acquiring a first index position interval corresponding to the first target node ID from the node index table according to the first target node ID, and acquiring a second index position interval corresponding to the second target node ID from the node index table according to the second target node ID;
acquiring a first position number included in the first index position interval and a second position number included in the second index position interval, and if the first position number is smaller than the second position number, searching a first target adjacent node ID corresponding to the first index position from the first data storage table for each index position in the first index position interval;
determining a node ID including the second target node ID in the first target adjacent node ID according to each first target adjacent node ID to obtain a relationship ID including the second target node ID, the first target node ID and the second target node ID;
and obtaining index data of the SPO triple including the second target node ID and the first target node ID according to the second target node ID, the first target node ID and the relationship ID of the second target node ID.
Optionally, if the search data includes a common adjacent node between a third target entity and a fourth target entity in the search connection target entities;
the retrieving module 702 retrieves index data comprising a plurality of SPO triples of the target entity from an index table of a pre-constructed knowledge-graph in the following manner:
acquiring a third target node ID of the third target entity from the mapping table from the node to the node identification ID, and acquiring a fourth target node ID of the fourth target entity;
acquiring a third index position interval corresponding to the third target node ID from the node index table according to the third target node ID, and acquiring a fourth index position interval corresponding to the fourth target node ID from the node index table according to a fourth target node ID;
retrieving, for each third index position in the third index position interval and for each fourth index position in the fourth index position interval, a common neighbor node ID from the first data storage table that has a same neighbor node ID as the third target node and the fourth target node;
determining a relationship ID of the third target node ID and the common adjacent node ID according to the common adjacent node ID, and determining a relationship ID of the fourth target node ID and the common adjacent node ID;
obtaining index data of the SPO triple including the third target node ID and the common adjacent node ID according to the common adjacent node ID, the third target node ID and the relationship ID of the common adjacent node ID, and obtaining index data of the SPO triple including the fourth target node ID and the common adjacent node ID according to the relationship ID of the common adjacent node ID, the fourth target node ID and the common adjacent node ID.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The present disclosure also provides a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, implement the steps of the knowledge-graph based question-answering method provided by the present disclosure.
FIG. 8 is a block diagram illustrating an apparatus 800 for knowledge-graph based question answering in accordance with an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 8, the apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power component 806 provides power to the various components of device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned knowledge-graph based question-answering method when executed by the programmable apparatus.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.