Feed generation method, information recommendation method, device and equipment
1. A feed generation method, comprising:
according to the text description information and the image description information of the objects in the set, a relation graph formed by nodes and connecting lines among the nodes is constructed, wherein the nodes are used for representing texts taken from the text description information or images taken from the image description information, and the connecting lines are used for representing incidence relations among the nodes;
processing the first vector quantitative representation of the nodes and the second vector quantitative representation of the incidence relation between the nodes by using a feed prediction model component to obtain a feed prediction result;
determining a feed of the set based on the feed prediction result.
2. The feed generation method according to claim 1, wherein,
the feed prediction model component comprises an encoding component whose input is the first and second vector quantized representations and a decoding component whose input is an output of the encoding component.
3. The feed generation method according to claim 2, wherein,
the encoding component includes a graph convolution neural network structure for extracting features from the relational graph.
4. The summary generation method according to claim 3, wherein the graph convolutional neural network structure includes a predetermined number of layers of feature extraction units, and an input of a subsequent feature extraction unit in two adjacent layers of feature extraction units depends on an output of a previous feature extraction unit, a matrix composed of feature vectors of at least some nodes in the relationship graph, and a matrix of association relationships between the nodes.
5. The feed generation method according to claim 4, wherein,
and residual errors are connected between two adjacent layers of feature extraction units.
6. The feed generation method according to claim 2, wherein,
the decoding component includes an attention structure for determining weight assignments for different portions of the output of the encoding component at decoding time and a recurrent neural network structure for generating a decoding result based on the weight assignments.
7. The digest generation method according to claim 2, wherein the step of constructing a relationship graph composed of nodes and links between the nodes includes:
according to the text description information of the objects in the set, constructing a first relational graph formed by text nodes and connecting lines among the text nodes, wherein the text nodes are used for representing texts taken from the text description information;
and constructing a second relational graph consisting of image nodes and connecting lines among the image nodes according to the image description information of the objects in the set, wherein the image nodes are used for representing the images taken from the image description information.
8. The digest generation method of claim 7, wherein the step of constructing a first relationship graph composed of text nodes and links between the text nodes includes:
extracting one or more texts from the text description information as text nodes;
two text nodes appearing in the text description information of the same object are connected by a connecting line, and/or two text nodes with the relevance larger than a first threshold value are connected by a connecting line.
9. The feed generation method of claim 8, further comprising:
determining the weight of the connecting line, wherein the weight is positively correlated with the times of the texts represented by the two text nodes appearing in the text description information of the same object, and/or the weight is positively correlated with the correlation.
10. The digest generation method of claim 7, wherein the step of constructing a second relationship diagram composed of the image nodes and the connecting lines between the image nodes includes:
extracting one or more images from the image description information as image nodes;
and connecting two image nodes with the same text in the text description information by using a connecting line, and/or connecting two image nodes with the similarity larger than a second threshold value by using a connecting line.
11. The feed generation method of claim 10, further comprising:
determining the weight of the connecting line, wherein the weight is positively correlated with the number of the same texts, and/or the weight is positively correlated with the similarity.
12. The feed generation method according to claim 7, wherein,
the encoding component includes a first graph convolutional neural network structure for extracting features from the first relational graph and a second graph convolutional neural network structure for extracting features from the second relational graph.
13. The feed generation method according to claim 12, wherein,
the encoding component further comprises an information fusion component for integrating the output of the first graph convolutional neural network structure and the output of the second graph convolutional neural network structure,
the output of the information fusion component serves as the input of the decoding component.
14. The digest generation method of claim 1, wherein the step of constructing a relationship graph composed of nodes and connecting lines between the nodes comprises:
according to the text description information and the image description information of the objects in the set, a relation graph formed by connecting lines between nodes is constructed, the nodes in the relation graph are divided into text nodes and image nodes, the text nodes are used for representing texts taken from the text description information, and the image nodes are used for representing images taken from the image description information.
15. The feed generation method of claim 14, wherein the step of constructing a relationship graph composed of nodes and connecting lines between the nodes comprises:
extracting one or more texts from the text description information as text nodes;
extracting one or more images from the image description information as image nodes;
and respectively connecting image nodes corresponding to two objects with the same text in the text description information with text nodes corresponding to the same text by using connecting lines.
16. The feed generation method of claim 1, further comprising:
obtaining a first vector quantization representation of the node; and/or
A second quantized representation of the association is obtained.
17. The feed generation method of claim 16, wherein obtaining the first vector quantized representation of the node comprises:
taking a word vector of the text as a first vector quantization representation of the node in case the node is used for characterizing the text taken from the text description information; and/or
In the case where the node is used to characterize an image taken from the image description information, a feature extraction component is used to extract features of the image as a first vectorized representation of the node.
18. A feed generation method as claimed in claim 17, wherein the feature extraction components comprise a residual network and a multi-layer perceptron, the output of the residual network being an input to the multi-layer perceptron.
19. The feed generation method according to any one of claims 1 to 18,
the object is a commodity, the text description information is a commodity summary, and the image description information is a commodity picture.
20. An information recommendation method, comprising:
determining a set of a plurality of objects having the same theme recommended to a user;
determining a synopsis of the collection using the topic generation method of any one of claims 1 to 19;
outputting the feed; and
and responding to the operation of the user on the summary, and outputting the objects in the set.
21. An information generating method, comprising:
according to first type description information and second type description information of objects in a set, a relation graph formed by nodes and connecting lines among the nodes is constructed, wherein the nodes are used for representing first type information taken from the first type description information or second type information taken from the second type description information, and the connecting lines are used for representing incidence relations among the nodes;
processing the first vector quantization representation of the nodes and the second vector quantization representation of the incidence relation between the nodes by using an information prediction model component to obtain an information prediction result;
determining information for the set based on the information prediction result.
22. A feed generation apparatus comprising:
the building module is used for building a relation graph consisting of nodes and connecting lines among the nodes according to text description information and/or image description information of objects in the set, wherein the nodes are texts taken from the text description information or images taken from the image description information, and the connecting lines are used for representing the association relation among the nodes;
a feed prediction module for processing the first vector quantization representation of the nodes and the second vector quantization representation of the association between the nodes using a feed prediction model component to obtain a feed prediction result;
a determination module to determine a feed of the set based on the feed prediction result.
23. An information recommendation apparatus comprising:
the information recommendation module is used for determining a set of a plurality of objects with the same theme recommended to a user;
a title generation module for determining a feed of the collection using the feed generation method of any of claims 1 to 19; and
a first output module for outputting the synopsis; and
and the second output module is used for responding to the operation of the user on the summary and outputting the objects in the set.
24. An information generating apparatus comprising:
the building module is used for building a relational graph consisting of nodes and connecting lines between the nodes according to first type description information and second type description information of objects in the set, wherein the nodes are information taken from the first type description information or information taken from the second type description information, and the connecting lines are used for representing the incidence relation between the nodes;
the information prediction module is used for processing the first vector quantization representation of the nodes and the second vector quantization representation of the incidence relation between the nodes by using an information prediction model component to obtain an information prediction result;
a determination module to determine information of the set based on the information prediction result.
25. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1 to 21.
26. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-21.
Background
For a set comprising a plurality of objects, how to show the summary of the set to a user before the user browses the objects in the set enables the user to quickly know the objects in the set through the summary is a problem which needs to be solved urgently in the field of information recommendation.
Taking topic recommendation in the field of commodity information recommendation as an example, topic recommendation is an information recommendation mode which is popular at present. In the theme recommendation scene, the system can recommend a plurality of pieces of information with a common theme to the user at one time. Taking the commodity recommendation as an example, assuming that a woman is interested in a skirt, the system recommends a subject about the skirt, and various skirts are selected in the subject so that the user can compare and purchase commodities more specifically. In addition, commodities which need to be matched with each other can be combined into a theme, for example, commodities included in a theme related to "storage" can be vases, storage racks, photo walls, multi-treasure grids and the like which have the same function.
Topic recommendation can save user time and improve the probability of information being accepted by the user, and is therefore a very important function. For example, the theme recommendation for the commodities can save the time for the user to search related commodities and match the commodities, and meanwhile, merchants can sell more commodities at one time.
Topic recommendation requires a feed that lets users quickly have a sense of the objects within the topic. That is, the user needs to quickly know what is probably inside the topic by the feed, and needs no further clicks.
Good feeds should be consistent with the information within the topic. For example, if the recommended subject is "skirt," the generated feed cannot be a coat. In the conventional summary generation method, only a product title is used as an input source, and a lot of noise words exist in the product title, so that the accuracy of the generated summary is influenced.
Therefore, there is still a need for an information generation scheme that can promote consistency of the generated feed with the information within the topic.
Disclosure of Invention
One technical problem to be solved by the present disclosure is to provide an information generation scheme capable of improving consistency between generated information and object information in a set, for example, to provide a summary generation scheme capable of improving consistency between a generated summary and information in a topic.
According to a first aspect of the present disclosure, there is provided a feed generation method including: according to the text description information and the image description information of the objects in the set, a relation graph formed by nodes and connecting lines among the nodes is constructed, the nodes are used for representing texts taken from the text description information or images taken from the image description information, and the connecting lines are used for representing the association relation among the nodes; processing the first vector quantitative representation of the nodes and the second vector quantitative representation of the incidence relation between the nodes by using a feed prediction model component to obtain a feed prediction result; a feed of the set is determined based on the feed prediction result.
According to a second aspect of the present disclosure, there is provided an information recommendation method including: determining a set of a plurality of objects having the same theme recommended to a user; determining a feed of the collection using the feed generation method of the first aspect; outputting the summary; and responding to the operation of the user for the summary, and outputting the objects in the set.
According to a third aspect of the present disclosure, there is provided an information generating method including: according to the first type description information and the second type description information of the objects in the set, a relation graph formed by nodes and connecting lines among the nodes is constructed, the nodes are used for representing the first type information taken from the first type description information or the second type information taken from the second type description information, and the connecting lines are used for representing the incidence relation among the nodes; processing the first vector quantization representation of the nodes and the second vector quantization representation of the incidence relation between the nodes by using an information prediction model component to obtain an information prediction result; the aggregated information is determined based on the information prediction results.
According to a fourth aspect of the present disclosure, there is provided a digest generation apparatus including: the building module is used for building a relation graph consisting of nodes and connecting lines among the nodes according to the text description information and/or the image description information of the objects in the set, the nodes are texts or images taken from the text description information, and the connecting lines are used for representing the incidence relation among the nodes; a summary prediction module for processing the first vector quantized representation of the nodes and the second vector quantized representation of the association between the nodes using a title prediction model component to obtain a summary prediction result; a determination module to determine a feed of the set based on the feed prediction result.
According to a fifth aspect of the present disclosure, there is provided an information recommendation apparatus including: the information recommendation module is used for determining a set of a plurality of objects with the same theme recommended to a user; a feed generation module for determining a feed of the collection using the feed generation method of the first aspect; the first output module is used for outputting the summary; and the second output module is used for responding to the operation of the user on the summary and outputting the objects in the set.
According to a sixth aspect of the present disclosure, there is provided an information generating apparatus comprising: the building module is used for building a relational graph consisting of nodes and connecting lines between the nodes according to first type description information and/or second type description information of objects in the set, the nodes are the first type information taken from the first type description information or the second type information taken from the second type description information, and the connecting lines are used for representing the incidence relation between the nodes; the information prediction module is used for processing the first vector quantization representation of the nodes and the second vector quantization representation of the incidence relation between the nodes by using the information prediction model component to obtain an information prediction result; a determination module to determine aggregated information based on the information prediction result.
According to a seventh aspect of the present disclosure, there is provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of the first to third aspects as described above.
According to an eighth aspect of the present disclosure, there is provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the method of any of the first to third aspects described above.
Therefore, by taking the generated summary as an example, the information source can be enriched by introducing the image information, and the title with more abundant information can be generated according to the additional information source; and based on a relational graph constructed from the text description information and the image description information of the object at the same time, a title with high correlation and attractiveness can be generated.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
Fig. 1 shows a schematic diagram of a text diagram construction flow.
Fig. 2 shows a schematic construction flow diagram of an image map.
Fig. 3 shows a schematic construction flow diagram of the heteromorphic graph.
FIG. 4 shows a schematic flow diagram for generating a feed based on the Dual-Graph2seq model.
FIG. 5 shows a schematic flow diagram for generating a synopsis based on the Hetero-Graph2seq model.
Fig. 6 shows a schematic flow diagram of an information generation method according to an embodiment of the present disclosure.
Fig. 7 shows a block diagram of the structure of a digest generation apparatus according to an embodiment of the present disclosure.
Fig. 8 is a block diagram illustrating a structure of an information recommendation apparatus according to an embodiment of the present disclosure.
Fig. 9 shows a block diagram of the structure of an information generating apparatus according to an embodiment of the present disclosure.
FIG. 10 shows a schematic structural diagram of a computing device, according to one embodiment of the present disclosure.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The following takes a topic recommendation scenario as an example, and schematically illustrates a generation process of a summary of a collection. The summary is used to characterize the commonality of the objects in the collection, and can be regarded as a summary of the collection. The presentation form of the feed may include, but is not limited to, text, pictures, video, and other information formats.
Construction of first, relation diagram
Generally, text description information of an object has more noise words, which is not favorable for generating a highly consistent title, and image description information of the object and the object are highly related. Thus, the present disclosure introduces image description information of an object, the image description information and text description information of the object together as an information source to determine titles of a plurality of objects belonging to the same subject. The image description information may refer to still picture information of the object or video information of the object. Alternatively, the image description information may also refer to description information after an image is recognized based on an OCR (Optical Character Recognition) technique.
A plurality of objects belonging to the same topic refers to a collection of objects having a common topic suitable for being recommended to a user at one time. A synopsis of a plurality of objects belonging to the same topic, i.e. a synopsis of a collection, is determined. The theme may be a theme for characterizing the category of the object, or may be a theme for characterizing functions of the object. The object is used for representing information recommended to the user, and the information can be, but is not limited to, various forms of information such as commodities, reading materials (such as articles, information and the like).
In order to find out the commonalities of the objects in the set, a relationship graph for representing the association relationship between the objects in the set can be constructed according to the text description information and the image description information of the objects in the set. The text description information refers to description information of a text form of the object, and the image description information refers to description information of an image form of the object. Taking the object as an example of a commodity, the text description information may refer to a commodity title, and the image description information may refer to, but is not limited to, a commodity picture and/or a commodity video.
First, the process of constructing the relational diagram will be schematically described.
A relational graph consisting of nodes and connecting lines between the nodes can be constructed according to the text description information and the image description information of the objects in the set, the nodes are used for representing texts taken from the text description information or images taken from the image description information, and the connecting lines are used for representing the incidence relation between the nodes.
As an example, three types of relationship diagrams can be constructed as follows.
1. Text graph
A relationship graph (for convenience of distinction, may be referred to as a first relationship graph) composed of text nodes and connecting lines between the text nodes may be constructed according to the text description information of the objects in the set.
The nodes in the first relationship graph are all text type nodes (i.e. text nodes), so the first relationship graph may also be referred to as a text graph. The text nodes are used for representing the text taken from the text description information.
Fig. 1 shows a schematic diagram of a text diagram construction flow.
As shown in fig. 1, one or more texts may be first extracted from the text description information of the objects in the set as text nodes. The text with the frequency greater than the preset numerical value can be extracted as the text node according to the frequency of the text appearing in the text description information, and the text with the importance degree greater than the preset numerical value or with the importance degree ranked at the top can be extracted as the text node according to the importance degree of the text obtained based on the preset mode.
Taking an object as a commodity and text description information as a commodity title as an example, a large number of irrelevant noise words exist in the commodity title, and in order to avoid influence caused by title generation of the noise words, keywords can be extracted from the commodity title as text nodes, for example, words with a word frequency greater than 1 in all the commodity titles in a set can be used as the keywords.
After the text nodes are obtained, the incidence relation among the text nodes can be mined, and the text nodes with the incidence relation are connected by connecting lines. Whether the incidence relation exists among the text nodes can be judged according to various modes. For example, two text nodes appearing in the text description information of the same object may be connected by a connecting line, that is, if the texts represented by the two text nodes appear in the text description information of one object, the two text nodes may be connected by an edge. For another example, two text nodes with a correlation greater than the first threshold may also be connected by a connecting line, and the present disclosure is not repeated with respect to the calculation manner of the correlation between the text nodes, for example, the correlation between the text nodes may be determined by calculating the text similarity between texts represented by the text nodes.
In this way, a text graph consisting of text nodes and connecting lines between the text nodes can be constructed.
Optionally, the weight of the connecting line can also be determined, and the weight is used for representing the intimacy degree of the incidence relation between the text nodes at the two ends of the connecting line. Taking the example of determining the connection line between the text nodes according to whether the texts represented by the text nodes appear in the text description information of the same object, the weight may be positively correlated with the number of times that the texts represented by the two text nodes appear in the text description information of the same object, that is, if the co-occurrence frequency of the two texts is higher, the relationship between the two texts is more intimate. Taking the example of determining the connection line between the text nodes according to the correlation between the text nodes, the weight may be positively correlated with the magnitude of the correlation.
2. Image map
A relationship graph (for convenience of distinction, may be referred to as a second relationship graph) composed of image nodes and connecting lines between the image nodes may be constructed according to the image description information of the objects in the set.
The nodes in the second relational graph are all image nodes, so the second relational graph can also be called an image graph. Wherein the image nodes are used for representing the images taken from the image description information.
Fig. 2 shows a schematic construction flow diagram of an image map.
As shown in FIG. 2, one or more images may first be extracted from the image description information of the objects in the collection as image nodes. The image description information of the object may include one or more images of the object, and the images arranged in the front may be extracted as image nodes according to the arrangement order of the images in the image description information of the object; images with importance degrees higher than a predetermined value or ranked higher in importance degree may be extracted as image nodes according to the importance degrees of the images obtained based on a predetermined manner. The image may be a still picture or a dynamic video.
After the image nodes are obtained, the incidence relation among the image nodes can be mined, and the image nodes with the incidence relation are connected by connecting lines. Whether the incidence relation exists among the image nodes can be judged according to various modes. For example, two image nodes having the same text in the text description information may be connected by a connecting line, that is, if the text description information of the object corresponding to the two image nodes has the same text, the two image nodes may be connected by an edge. For another example, two image nodes with similarity greater than the second threshold may also be connected by a connecting line, and the present disclosure is not repeated with respect to the way of calculating the similarity between the image nodes, for example, the similarity between the image nodes may be determined by calculating the image similarity between the images represented by the image nodes.
In this way, an image graph consisting of image nodes and connecting lines between the image nodes can be constructed.
Optionally, the weight of the connecting line can also be determined, and the weight is used for representing the intimacy degree of the incidence relation of the image nodes at the two ends of the connecting line. Taking the example of determining the connection line between the image nodes according to whether the text description information of the object corresponding to the image node has the same text or not, the weight may be positively correlated with the number of the same text, that is, if the number of the same text in the text description information of the object corresponding to the two image nodes is more, the relationship between the two image nodes is more intimate, and the weight is larger. Taking the example of determining the connection line between the image nodes according to the similarity between the image nodes, the weight may be positively correlated with the magnitude of the similarity.
3. Special-shaped picture
And constructing a relation graph consisting of nodes and connecting lines among the nodes according to the text description information and the image description information of the objects in the set. Unlike text graphs and image graphs, the nodes in the relationship graph constructed here include two types of nodes, namely text nodes and image nodes. Accordingly, the relationship graph constructed herein may be referred to as an anomaly graph. Wherein the text nodes are used for representing texts taken from the text description information. The image nodes are used to characterize images taken from the image description information.
Fig. 3 shows a schematic construction flow diagram of the heteromorphic graph.
As shown in fig. 3, first, one or more texts may be extracted from the text description information of the objects in the set as text nodes; and extracting one or more images from the image description information of the objects in the set as image nodes. The order of generating the text nodes and the image nodes is not limited in the disclosure. For the generation manner of the text node and the image node, the above related description may be referred to, and details are not repeated here.
After the text nodes and the image nodes are obtained, the association relationship between the nodes can be mined, and the nodes with the association relationship are connected by using connecting lines.
As an example, the image nodes corresponding to two objects having the same text in the text description information may be connected with the text nodes corresponding to the same text by using connecting lines. Thus, in a heterogeneous graph, text nodes act as bridges to explicitly build relationships between image nodes.
Taking the object as a commodity, the text description information as a commodity title, and the image description information as a commodity picture as an example, if the title of the commodity a and the title of the commodity B share the related keyword W, the picture node I _ a and the keyword W of the commodity a and the picture nodes I _ B and W of the commodity B are connected. In the abnormal graph, the pictures of the commodities and the keywords of the commodities are nodes in the graph at the same time, and the text information (the keywords) is used as a bridge to explicitly model the relationship between the pictures of the commodities.
Vectorization representation of available information in relational graph
The available information in the relationship graph includes nodes in the graph and association relationships between the nodes. A vectorized representation of the available information, i.e. of the characteristics of the available information, is used.
Vectorized representations of nodes in the relationship graph (which may be referred to as first vectorized representations for ease of distinction) may be obtained, and vectorized representations of associations between nodes in the relationship graph (which may be referred to as second vectorized representations for ease of distinction) may be obtained.
The first vectorized representation may be used to characterize the features of the extracted nodes, i.e. the first vectorized representation may be considered as a vectorized representation of the features of the nodes. The second quantized representation may be used to characterize the features of the extracted associations, i.e. the second quantized representation may be considered as a vectorized representation of the features of the associations between the nodes. The second quantitative representation may be used to represent whether the nodes have an association relationship and a degree of closeness of the association relationship, and an obtaining process of the second quantitative representation is not a focus point of the present disclosure and is not described herein again. The following is merely an exemplary illustration of the acquisition process of the vectorized representation of the text nodes and image nodes.
1. Text node
In case the nodes are used to characterize a text taken from the text description information (i.e. the nodes are text nodes), the word vectors of the text may be taken as the first vector representation of the nodes. In addition, other ways of adding other textual information of the object to the vectorized representation of the node may also be used to obtain better performance. That is, the first vector quantization represents a word vector that may include other text information (e.g., context information) in addition to the word vector that includes the text characterized by the text node.
2. Image node
In the case where a node is used to characterize an image taken from the image description information (i.e., the node is an image node), a feature extraction component may be used to extract features of the image as a first vector quantized representation of the node.
As an example, the feature extraction component may include a residual network (ResNet) and a multi-layer perceptron (MLP), the output of the residual network being the input of the multi-layer perceptron. For example, the residual network may employ ResNet50, i.e., may use ResNet50 as a feature extractor to extract features of an image (e.g., a picture of a commodity) characterized by image nodes. Specifically, a ResNet50 network pre-trained on ImageNet (Picture Collection) may be used, and the parameters of the ResNet50 network may be re-adjusted (e.g., fine-tuned) according to the category of the commodity. Then the 2048 dimensional features after the last layer of pooling (pooling layer) were extracted for all pictures. The extracted features may then be fed into an MLP (multi-layer perceptron), which, in subsequent training, may be trained only,
MLP=W2*ReLU(W1*Xi)
w in the above formula1W2Is a trainable parameter matrix, XiIs a feature extracted using ResNet50, ReLU is an activation function.
Third, summary generation mode
The first vector quantized representation of the nodes and the second vector quantized representation of the association between the nodes may be processed using a pre-trained feed prediction model component to obtain a feed prediction result. A feed of the set may be determined based on the feed prediction results. The summary prediction result may be a prediction result with the highest probability output by the summary prediction model component, and the prediction result output by the title prediction model component may be directly used as the summary of the set, or the prediction result output by the title prediction model component may be adjusted, and the adjusted result may be used as the summary of the set. The summary is used to characterize the commonality of the objects in the collection, and can be regarded as a summary of the collection. The presentation form of the feed may include, but is not limited to, text, pictures, video, and other information formats. For example, the generated feed may be regarded as a title of the collection, and the presentation form of the title may include, but is not limited to, various information formats such as text, pictures, videos, and the like.
The feed prediction model component can employ, but is not limited to, a sequence-to-sequence model architecture. Thus, the feed prediction model component may comprise an encoding component and a decoding component, the input of the encoding component being the first vector quantized representation and the second vector quantized representation, the output of the encoding component being the input of the decoding component.
The decoding component may employ an Attention mechanism (Attention) based recurrent neural network structure (RNN), i.e. the decoding component may comprise an Attention (Attention) structure for determining weight assignments for different parts of the output of the encoding component at decoding and a recurrent neural network structure (RNN) for generating decoding results based on the weight assignments. The decoding principle with respect to the decoding component is not an emphasis of the present disclosure and thus is not described again.
In order to accurately extract the characteristics of the nodes and the incidence relations between the nodes in the relational Graph, the coding component may adopt a Graph convolutional neural Network (GCN) structure, and extract the characteristics from the relational Graph by using the GCN.
The graph convolutional neural network structure includes a predetermined number of layers of feature extraction units. The input of the subsequent feature extraction unit in the two adjacent layers of feature extraction units depends on the output of the previous feature extraction unit, a matrix formed by feature vectors of at least part of nodes in the relational graph and a matrix of incidence relation among the nodes.
As an example, the graph convolution neural network structure includes an l + 1-level feature extraction unit, where l is an integer greater than or equal to 0, and the l + 1-level feature extraction unit may extract features in the following manner:
wherein the content of the first and second substances,a is a matrix for representing the incidence relation between nodes, I is an identity matrix, HlFeatures representing the extraction of the l-th layer, input layer H0=X,X is a matrix formed by the characteristic vectors of at least part of the nodes in the relational graph,is composed ofWl +1 is the parameter of the l +1 th layer, and σ is the activation function.
To alleviate the over-smoothing problem in GCNs, residual concatenation may be added between layers. Namely, the adjacent two layers of feature extraction units are connected by residual errors. The present disclosure is not repeated with respect to the specific form of residual concatenation for inter-layer joining.
As an example, in case of adding residual concatenation, the output of the encoding component can be expressed as:
gout=tanh(Wogl+1);
gl+1=Hl+1+gl,
wherein, gl+1For a representation obtained via the l +1 layer coding layer, glFor a representation obtained via the l layer coding layer, gl +1Then a full connection layer is fed and the activation function tanh is used to obtain the representation g for the decoding endout。
As described above, the constructed relationship diagram can be divided into a homogeneous diagram (text diagram, image diagram) and an heterogeneous diagram. The heterogeneous graph simultaneously has image nodes and text nodes, and for the heterogeneous graph, cross-modal information of the image nodes and the text nodes in the heterogeneous graph can be directly extracted by using GCN.
In order to be able to use information from images and text in the case of a structured relational graph being a isomorphic graph consisting of a text graph and an image graph, the encoding component may comprise an information fusion component in addition to the first and second graph convolution neural network structures. Wherein the first graph convolution neural network structure is used for extracting features from the first relational graph (namely a text graph), and the second graph convolution neural network structure is used for extracting features from the second relational graph (namely an image graph). The information fusion component is used for integrating the output of the first graph convolution neural network structure and the output of the second graph convolution neural network structure. The output of the information fusion component serves as the input of the decoding component.
In other words, the present disclosure proposes a feed predictive model component of two different model structures for isomorphic graphs (text graphs and image graphs) and heterogeneous graphs. For ease of distinction, these two abstract predictive model components may be referred to as the Dual-Graph2seq model and the Hetero-Graph2seq model, respectively.
1. Dual-Graph2seq model
The Dual-Graph2seq model includes an encoding component and a decoding component. The coding component comprises a first graph convolution neural network structure, a second graph convolution neural network structure and an information fusion component. For the structure of the convolutional neural network and the decoding component, the above description may be referred to, and details are not repeated here.
FIG. 4 shows a schematic flow diagram for generating a feed based on the Dual-Graph2seq model.
As shown in fig. 4, the text graph and the image graph may be encoded by using graph encoding technology, the encoding result may be subjected to information fusion, and the fused encoding information may be input to the decoding component as an input of the decoding portion.
The present disclosure may use a gating (Adaptive Gate) mechanism to control the fusion of the encoded information of the text map and the image map; it is also possible to directly connect the encoding result of the text image and the encoding result of the image and then use an Attention (Attention) mechanism to obtain a context vector for decoding; the fusion of the coding information of the text image and the image can also be realized by using a Multi-head attention mechanism (Multi-head attention) after the image coding; the encoding result of the text image and the encoding result of the image may also be processed using a posing method to obtain a context vector for decoding.
2. Hetero-Graph2seq model
The Hetero-Graph2seq model includes an encoding component and a decoding component. Wherein the encoding component comprises a graph convolutional neural network structure. For the structure of the convolutional neural network and the decoding component, the above description may be referred to, and details are not repeated here.
FIG. 5 shows a schematic flow diagram for generating a synopsis based on the Hetero-Graph2seq model.
As shown in fig. 5, the cross-modal information in the images and texts in the heterogeneous graph can be directly encoded by graph encoding technology (such as GCN), and the encoding result can be used as the input of the decoding component. Wherein the decoding component can decode based on the attention mechanism.
In summary, if text information is used alone to easily fit to a noisy word, a title inconsistent with the subject is generated, while image information is relatively clean and highly correlated with the subject, adding image information may act to some extent to resist overfitting. Therefore, by introducing image information, information sources can be enriched, making it possible to generate a feed with more information enriched according to the additional information sources.
And in order to find the commonalities of the objects in the set, the present disclosure constructs a relational graph based on both the textual description information and the image description information of the objects, and extracts features using graph coding techniques, so as to generate a highly relevant and attractive synopsis.
By way of example, while generating the synopsis, a degree of contribution of the objects in the collection to the generation of the synopsis may also be output, e.g., a proportion of the generated synopsis from one or more objects in the collection may be output. Therefore, the user can further know the closeness degree of the objects and the summaries in the set by showing the proportional relation between the output summaries and the objects in the set.
The present disclosure also provides an information recommendation method, including: determining a set of a plurality of objects having the same theme recommended to a user; determining a title of the collection using the feed generation method mentioned above; outputting the summary; and responding to the operation of the user for the summary, and outputting the objects in the set.
The present disclosure may also be implemented as an information generating method that may determine information of a set according to first type description information and second type description information of objects in the set. The determined information may be used to characterize a characteristic common to the objects in the collection. That is, the determined information may be considered an information representation of a characteristic common to the objects in the collection. For example, the information determined herein may refer to information that reflects the synopsis of the objects in the collection, i.e., the synopsis mentioned above. The user can thus quickly learn about the objects in the collection through the determined information.
As an example, a relationship graph composed of nodes and connecting lines between the nodes may be constructed according to first type description information and second type description information of objects in the set, where the nodes are used to represent first type information taken from the first type description information or second type information taken from the second type description information, and the connecting lines are used to represent association relationships between the nodes; processing the first vector quantization representation of the nodes and the second vector quantization representation of the incidence relation between the nodes by using an information prediction model component to obtain an information prediction result; the aggregated information is determined based on the information prediction results.
The first type description information and the second type description information refer to information having different formats. For example, the first type of description information may refer to description information in a text form (i.e., the text description information mentioned above), and the second type of description information may refer to description information in an image form (i.e., the image description information mentioned above). The description information in the form of an image may be further subdivided into picture description information and video description information. Thus, the first type of description information and the second type of description information can be any two of a plurality of types of description information such as texts, pictures, videos and the like.
The information format of the first type information is the same as the information format of the first type description information, and the first type description information is taken as text description information as an example, and the first type information is taken as a text. Correspondingly, the information format of the second type information is the same as that of the second type description information, taking the second type description information as the image description information as an example, and the first type information is an image.
For a specific implementation process of the information generation method, reference may be made to the above related description, and an implementation flow of the information generation method is schematically described below by taking the first type of description information as text description information, the second type of description information as image description information, and the generated information as a text.
FIG. 6 shows a schematic flow diagram of a text generation method according to one embodiment of the present disclosure.
As shown in fig. 6, a relationship diagram composed of nodes and connecting lines between the nodes may be first constructed according to the text description information and the image description information of the objects in the set. The nodes in the relational graph are used for representing texts taken from the text description information or images taken from the image description information, and the connecting lines in the relational graph are used for representing the incidence relation among the nodes. The construction process of the relationship diagram can be referred to the above related description, and is not repeated here.
After the completion relationship graph is constructed, the first vector quantized representation of the nodes and the second vector quantized representation of the association between the nodes may be processed using a text prediction model component to obtain a text prediction result. Regarding the structure and prediction principle of the text prediction model component, reference may be made to the above-mentioned title prediction model component, that is, the title prediction model component may be regarded as a special text prediction model component.
Finally, the text of the set may be determined based on the text prediction results. After the text of the collection is determined, the text may be presented to the user to facilitate the user's quick understanding of the objects in the collection through the text.
Fig. 7 shows a block diagram of the structure of a digest generation apparatus according to an embodiment of the present disclosure. Wherein the functional blocks of the feed generation apparatus can be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present disclosure. It will be appreciated by those skilled in the art that the functional blocks described in fig. 7 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein.
In the following, brief descriptions are provided for functional modules that the generating apparatus may have and operations that each functional module may perform, and for details related thereto, refer to the related descriptions in conjunction with fig. 1 to fig. 5, which are not described herein again.
Referring to fig. 7, the digest generation apparatus 700 includes a construction module 710, a digest prediction module 720, and a determination module 730.
The construction module 710 is configured to construct a relationship graph composed of nodes and connecting lines between the nodes according to the text description information and/or the image description information of the objects in the set, where the nodes are texts from the text description information or images from the image description information, and the connecting lines are used to represent association relationships between the nodes.
The construction module 710 may construct a first relationship graph composed of text nodes and connecting lines between the text nodes according to the text description information of the objects in the set, where the text nodes are used to represent texts taken from the text description information; and the construction module 710 may construct a second relationship graph composed of image nodes and connecting lines between the image nodes according to the image description information of the objects in the set, the image nodes being used to characterize the image taken from the image description information.
In constructing the first relationship graph, the construction module 710 may extract one or more texts from the text description information as text nodes; two text nodes appearing in the text description information of the same object are connected by a connecting line, and/or two text nodes with the relevance larger than a first threshold value are connected by a connecting line.
In constructing the second relationship graph, the construction module 710 may extract one or more images from the image description information as image nodes; and connecting two image nodes with the same text in the text description information by using a connecting line, and/or connecting two image nodes with the similarity larger than a second threshold value by using a connecting line.
The feed generation apparatus 700 may also include a weight determination module. The weight determination module may be configured to determine a continuous weight in the first relationship graph and/or the second relationship graph. For the first relational graph, the weight is positively correlated with the times of the texts represented by the two text nodes appearing in the text description information of the same object, and/or the weight is positively correlated with the correlation. For the second relational graph, the weight is positively correlated with the number of the same texts, and/or the weight is positively correlated with the similarity.
The building module 710 may further build a relationship graph (i.e., the above-mentioned heteromorphic graph) composed of nodes and connecting lines between the nodes according to the text description information and the image description information of the objects in the set, where the nodes in the relationship graph are divided into text nodes and image nodes, the text nodes are used for representing texts taken from the text description information, and the image nodes are used for representing images taken from the image description information.
When constructing an abnormal graph, the construction module 710 may extract one or more texts from the text description information as text nodes; extracting one or more images from the image description information as image nodes; and respectively connecting image nodes corresponding to two objects with the same text in the text description information with text nodes corresponding to the same text by using connecting lines.
The feed prediction module 720 is configured to process the first vector quantized representation of the nodes and the second vector quantized representation of the association between the nodes using a feed prediction model component to obtain a feed prediction result. For the structure of the proposed predictive model component, see the above description, and will not be described herein.
The determination module 730 is configured to determine a feed of the set based on the feed prediction result. For example, the digest prediction module 720 may output a text with the highest prediction probability, and the determination module 630 may directly use the text output by the digest prediction module 720 as the digest of the set, or may adjust the text output by the digest prediction module 720, and use the adjusted text as the digest of the set.
The feed generation apparatus 700 may further include a first acquisition module and/or a second acquisition module. The first acquisition module is used for acquiring a first vector quantization representation of the nodes, and the second acquisition module is used for acquiring a second vector quantization representation of the incidence relation among the nodes.
In the case where the nodes are used to characterize text taken from the text description information, the first obtaining module may take word vectors of the text as the first vector quantized representation of the nodes.
In the case where the node is used to characterize an image taken from the image description information, the first obtaining module may extract features of the image as a first vectorized representation of the node using the feature extraction component. For the structure of the feature extraction component, reference may be made to the above-mentioned related description, and details are not repeated here.
Fig. 8 is a block diagram illustrating a structure of an information recommendation apparatus according to an embodiment of the present disclosure. Wherein the functional blocks of the information recommendation device can be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present disclosure. It will be appreciated by those skilled in the art that the functional blocks described in fig. 8 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein.
In the following, functional modules that the information recommendation apparatus can have and operations that each functional module can perform are briefly described, and details related thereto may be referred to the above description, and are not repeated here.
Referring to fig. 8, the information recommendation apparatus 800 includes an information recommendation module 810, a digest generation module 820, a first output module 830, and a second output module 840.
The information recommendation module 810 is used to determine a set of multiple objects having the same topic for recommendation to a user. An object in a collection refers to a collection of objects having a common theme suitable for being recommended to a user at one time. The theme may be a theme for characterizing the category of the object, or may be a theme for characterizing functions of the object. The object is used for representing information recommended to the user, and the information can be, but is not limited to, various forms of information such as commodities, reading materials (such as articles, information and the like).
The feed generation module 820 is used to determine a feed of the collection using the feed generation methods mentioned above. The digest generation module 820 corresponds to the digest generation apparatus mentioned above, and the structure of the digest generation module 820 and the process of generating the digest can be referred to the above description in conjunction with fig. 7.
The first output module 830 is used for outputting the synopsis. The user may perform a click operation on the feed to access objects in the collection to which the feed corresponds. The second output module 840 may output the objects in the set in response to a user operation on the feed.
Fig. 9 shows a block diagram of the structure of an information generating apparatus according to an embodiment of the present disclosure. Wherein the functional blocks of the information generating apparatus may be implemented by hardware, software, or a combination of hardware and software implementing the principles of the present disclosure. It will be appreciated by those skilled in the art that the functional blocks described in fig. 9 may be combined or divided into sub-blocks to implement the principles of the invention described above. Thus, the description herein may support any possible combination, or division, or further definition of the functional modules described herein.
In the following, functional modules that the information generating apparatus may have and operations that each functional module may perform are briefly described, and for details related thereto, reference may be made to the related description above in conjunction with fig. 1 to 6, which is not described herein again.
Referring to fig. 9, the information generating apparatus 900 includes a constructing module 910, an information predicting module 920, and a determining module 930.
The constructing module 910 is configured to construct a relationship graph composed of nodes and connection lines between the nodes according to the first type description information and the second type description information of the objects in the set, where a node is the first type information extracted from the first type description information or the second type information extracted from the second type description information, and a connection line is used to represent an association relationship between nodes.
The constructing module 910 may construct a first relationship graph composed of first type nodes (e.g., text nodes) and connecting lines between the first type nodes according to the first type description information of the objects in the set, where the text nodes are used to represent texts taken from the text description information; and the construction module 910 may construct a second relationship graph composed of second-type nodes (e.g., image nodes) and connection lines between the second-type nodes according to the second-type description information of the objects in the set, where the second-type nodes are used to represent the second-type information extracted from the second-type description information.
In constructing the first relationship graph, the construction module 910 may extract one or more pieces of first-type information from the first-type description information as first-type nodes; and connecting two first-type nodes appearing in the first-type description information of the same object by using a connecting line, and/or connecting two first-type nodes with the correlation larger than a first threshold value by using a connecting line.
In constructing the second relationship graph, the constructing module 910 may extract one or more second type information from the second type description information as a second type node; and connecting two second type nodes with the same first type information in the first type description information by using a connecting line, and/or connecting two second type nodes with the similarity larger than a second threshold value by using a connecting line.
The information generating apparatus 900 may further include a weight determining module. The weight determination module may be configured to determine a continuous weight in the first relationship graph and/or the second relationship graph. For the first relational graph, the weight is positively correlated with the number of times the first-type information represented by the two first-type nodes appears in the first-type description information of the same object, and/or the weight is positively correlated with the correlation. For the second relationship graph, the weight is positively correlated with the number of the same first type information, and/or the weight is positively correlated with the similarity.
The building module 910 may further build a relationship graph (e.g., the above-mentioned heteromorphic graph) composed of nodes and connection lines between the nodes according to the first type description information and the second type description information of the objects in the set, where the nodes in the relationship graph are divided into first type nodes and second type nodes (e.g., the above-mentioned text nodes and image nodes), the first type nodes are used for representing the first type information extracted from the first type description information, and the second type nodes are used for representing the second type information extracted from the second type description information.
In constructing the heteromorphic graph, the construction module 910 may extract one or more pieces of first-type information from the first-type description information as first-type nodes; extracting one or more second type information from the second type description information as second type nodes; and respectively connecting second type nodes corresponding to two objects with the same first type information in the first type description information with first type nodes corresponding to the same first type information by using a connecting line.
The information prediction module 920 is configured to process the first vector quantization representation of the nodes and the second vector quantization representation of the association between the nodes using the information prediction model component to obtain an information prediction result. The structure of the information prediction model component can be referred to the above related description, and is not described herein again.
The determining module 930 is configured to determine the aggregated information based on the information prediction result. Taking the information prediction result as a text as an example, the information prediction module 920 may output a text with the largest prediction probability, and the determination module 930 may directly use the text output by the information prediction module 920 as a summary of the set, or may adjust the text output by the information prediction module 920, and use the adjusted text as a summary of the set.
The information generating apparatus 900 may further include a first obtaining module and/or a second obtaining module. The first acquisition module is used for acquiring a first vector quantization representation of the nodes, and the second acquisition module is used for acquiring a second vector quantization representation of the incidence relation among the nodes.
Taking the first type of description information as the text description information as an example, in the case that the node is used for representing the text taken from the text description information, the first obtaining module may use a word vector of the text as the first vector quantization representation of the node.
Taking the second type of description information as the image description information as an example, in the case that the node is used for characterizing the image taken from the image description information, the first obtaining module may extract the feature of the image as the first vector quantization representation of the node by using the feature extraction component. For the structure of the feature extraction component, reference may be made to the above-mentioned related description, and details are not repeated here.
FIG. 10 shows a schematic structural diagram of a computing device that can be used to implement the above-described method according to an embodiment of the present disclosure.
Referring to fig. 10, the computing device 1000 includes a memory 1010 and a processor 1020.
The processor 1020 may be a multi-core processor or may include multiple processors. In some embodiments, processor 1020 may include a general-purpose host processor and one or more special purpose coprocessors such as a Graphics Processor (GPU), Digital Signal Processor (DSP), or the like. In some embodiments, processor 1020 may be implemented using custom circuits, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory 1010 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions that are needed by the processor 1020 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 1010 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, among others. In some embodiments, memory 810 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 1010 has stored thereon executable code, which when processed by the processor 1020, may cause the processor 1020 to perform the above-mentioned feed generation method or information recommendation method or information generation method.
The digest generation method, the information recommendation method, the information generation method, the digest generation apparatus, the information recommendation apparatus, the information generation apparatus, and the computing device according to the present disclosure have been described in detail above with reference to the drawings.
Furthermore, the method according to the present disclosure may also be implemented as a computer program or computer program product comprising computer program code instructions for performing the above-mentioned steps defined in the above-mentioned method of the present disclosure.
Alternatively, the present disclosure may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the various steps of the above-described method according to the present disclosure.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:生成式文本摘要系统和方法