Image category identification method and device, electronic equipment and storage equipment
1. A picture category identification method is characterized by comprising the following steps:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
2. The method of claim 1, wherein obtaining vector feature information of a graph structure according to the graph structure comprises:
obtaining text block characteristic information of the graph structure according to the graph structure;
and converting the text block characteristic information of the graph structure into vector characteristic information.
3. The method of claim 1, wherein generating a graph structure from the text blocks comprises:
converting the text content in the text block into vector characteristic information of a node as the characteristic information of the node of the graph structure corresponding to the text block;
using the relative position information and the relative width and height information between the two text blocks as the characteristic information of the edge between the nodes corresponding to the two text blocks;
and generating a graph structure according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
4. The method according to claim 3, wherein the using the relative position information and the relative width and height information between the two text blocks as the feature information of the edge between the nodes corresponding to the two text blocks comprises:
obtaining two rectangles corresponding to the text block according to the coordinate information of the text block corresponding to the two nodes;
normalizing the width and the height of the two rectangles to obtain two normalized rectangles;
and taking the relative position information between the corresponding vertexes of the two rectangles and the width information and the height information of the rectangles after normalization processing as the characteristic information of the edge between the nodes corresponding to the two text blocks.
5. The method according to claim 3 or 4, wherein the obtaining text block feature information of a graph structure according to the graph structure comprises:
and merging the characteristic information of each node in the graph structure and the characteristic information of all edges contained in the graph structure to obtain the text block characteristic information of the graph structure.
6. The method of claim 3, wherein generating a graph structure according to the feature information of the nodes and the feature information of the edges between the nodes comprises:
and generating a full connected graph according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
7. The method of claim 1, wherein the inputting the vector feature information into a vector classification model to obtain the category of the target picture comprises:
inputting the vector characteristic information into a vector classification model, and outputting a probability value of the vector characteristic information belonging to each category by the vector classification model;
obtaining a category corresponding to the vector characteristic information according to the probability value of the vector characteristic information belonging to each category;
and taking the category corresponding to the vector feature information as the category of the target picture.
8. The method of claim 7, wherein obtaining the category corresponding to the vector feature information according to the probability value of the vector feature information belonging to each category comprises:
selecting the maximum probability value from the probability values of the vector feature information belonging to each category;
judging whether the maximum probability value is greater than or equal to a preset probability threshold value or not;
and if so, taking the picture category corresponding to the maximum probability value as the category corresponding to the vector feature information.
9. The method of claim 1, wherein the target picture is a document picture containing text information.
10. An apparatus for recognizing picture category, comprising:
a target picture obtaining unit for obtaining a target picture;
the text block obtaining unit is used for obtaining a text block containing coordinate information and character content according to the target picture;
the graph structure generating unit is used for generating a graph structure according to the text blocks;
the vector characteristic information obtaining unit is used for obtaining vector characteristic information of the graph structure according to the graph structure;
the category obtaining unit of the target picture is used for inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
11. An electronic device, comprising:
a processor; and
a memory for storing a program of a picture category identification method, the device being powered on and executing the program of the picture category identification method by the processor, the following steps being performed:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
12. A storage device, characterized in that,
a program storing a method for identifying a picture type, the program being executed by a processor and performing the steps of:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
13. An image searching method, comprising:
obtaining a picture to be searched;
generating a graph structure of the picture to be searched according to the picture to be searched;
obtaining vector feature information of the graph structure;
obtaining the category of the picture to be searched by using a vector classification model;
and outputting the information data of the picture to be searched according to the category of the picture to be searched.
14. A picture identity card identification method is characterized by comprising the following steps:
obtaining a picture identity card to be identified;
generating a graph structure of the picture identity card according to the picture identity card;
obtaining the category of the picture identity card by using a vector classification model;
outputting confirmation information for confirming whether the category of the picture identity card is correct or not to a user;
obtaining confirmation information input by a user;
and adjusting parameters of the vector classification model according to the confirmation information input by the user.
15. A processing method of picture bills is characterized by comprising the following steps:
obtaining a picture bill to be identified;
generating a graph structure of the picture bill according to the picture bill;
obtaining the category of the picture bill by using a vector classification model;
and if the type of the picture bill is the invoice, counting the sum of the picture bill to obtain the total sum of the invoice.
16. A picture category identification method is characterized by comprising the following steps:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure of the target picture according to the target picture;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain a candidate category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information;
judging whether the candidate category of the target picture is matched with the text block; and if so, determining the candidate type of the target picture as the type of the target picture.
17. A method for constructing a knowledge graph, comprising:
in the training process of a vector classification model for picture classification, acquiring training characteristics of the vector classification model;
obtaining the relation between the entity information of the picture and each entity according to the training characteristics;
and constructing a picture knowledge graph according to the entity information and the relation between the entities.
Background
Currently, optical character recognition technology (e.g., optical OCR) provides an optical character recognition and information structuring service for a specific document picture, but this technology requires to know in advance to which document the document picture belongs. Under a real scene, many different types of document data can be mixed together, if manual marking of the document types needs extra cost and error rate exists, users hope to have a scheme for automatic data classification.
In the prior art, when classifying document pictures, a CNN (convolutional neural network) is used to extract pixel characteristics of the document pictures, and then the pixel characteristics of the document pictures are input into a classifier to determine picture types of the document pictures. The disadvantage of this approach is that the pixel level features are very disturbed by such things as shadows, blur, hue, shooting angle, etc., and therefore often require a large amount of data to train.
Disclosure of Invention
The application provides a picture category identification method and device, electronic equipment and storage equipment, and aims to solve the problem that in the prior art, a large amount of data training is needed due to the fact that a large amount of interference is generated when document pictures are classified according to pixel characteristics.
The application provides a picture category identification method, which comprises the following steps:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
Optionally, the obtaining of the vector feature information of the graph structure according to the graph structure includes:
obtaining text block characteristic information of the graph structure according to the graph structure;
converting the text block characteristic information of the graph structure into vector characteristic information;
optionally, the generating a graph structure according to the text block includes:
converting the text content in the text block into vector characteristic information of a node as the characteristic information of the node of the graph structure corresponding to the text block;
using the relative position information and the relative width and height information between the two text blocks as the characteristic information of the edge between the nodes corresponding to the two text blocks;
and generating a graph structure according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
Optionally, the using the relative position information and the relative width and height information between the two text blocks as feature information of an edge between nodes corresponding to the two text blocks includes:
obtaining two rectangles corresponding to the text block according to the coordinate information of the text block corresponding to the two nodes;
normalizing the width and the height of the two rectangles to obtain two normalized rectangles;
and taking the relative position information between the corresponding vertexes of the two rectangles and the width information and the height information of the rectangles after normalization processing as the characteristic information of the edge between the nodes corresponding to the two text blocks.
Optionally, the obtaining text block feature information of the graph structure according to the graph structure includes:
and merging the characteristic information of each node in the graph structure and the characteristic information of all edges contained in the graph structure to obtain the text block characteristic information of the graph structure.
Optionally, the generating a graph structure according to the feature information of the nodes and the feature information of the edges between the nodes includes:
and generating a full connected graph according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
Optionally, the inputting the vector feature information into a vector classification model to obtain the category of the target picture includes:
inputting the vector characteristic information into a vector classification model, and outputting a probability value of the vector characteristic information belonging to each category by the vector classification model;
obtaining a category corresponding to the vector characteristic information according to the probability value of the vector characteristic information belonging to each category;
and taking the category corresponding to the vector feature information as the category of the target picture.
Optionally, the obtaining the category corresponding to the vector feature information according to the probability value that the vector feature information belongs to each category includes:
selecting the maximum probability value from the probability values of the vector feature information belonging to each category;
judging whether the maximum probability value is greater than or equal to a preset probability threshold value or not;
and if so, taking the picture category corresponding to the maximum probability value as the category corresponding to the vector feature information.
Optionally, the target picture is a document picture containing text information.
The present application further provides an apparatus for recognizing a picture category, including:
a target picture obtaining unit for obtaining a target picture;
the text block obtaining unit is used for obtaining a text block containing coordinate information and character content according to the target picture;
the graph structure generating unit is used for generating a graph structure according to the text blocks;
the vector characteristic information obtaining unit is used for obtaining vector characteristic information of the graph structure according to the graph structure;
the category obtaining unit of the target picture is used for inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
The present application further provides an electronic device, comprising:
a processor; and
a memory for storing a program of a picture category identification method, the device being powered on and executing the program of the picture category identification method by the processor, the following steps being performed:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
The present application also provides a storage device storing a program of a method for identifying a picture category, the program being executed by a processor and performing the steps of:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
The application provides an image searching method, which comprises the following steps:
obtaining a picture to be searched;
generating a graph structure of the picture to be searched according to the picture to be searched;
obtaining vector feature information of the graph structure;
obtaining the category of the picture to be searched by using a vector classification model;
and outputting the information data of the picture to be searched according to the category of the picture to be searched.
The application provides a method for identifying a picture identity card, which comprises the following steps:
obtaining a picture identity card to be identified;
generating a graph structure of the picture identity card according to the picture identity card;
obtaining the category of the picture identity card by using a vector classification model;
outputting confirmation information for confirming whether the category of the picture identity card is correct or not to a user;
obtaining confirmation information input by a user;
and adjusting parameters of the vector classification model according to the confirmation information input by the user.
The application provides a processing method of a picture bill, which comprises the following steps:
obtaining a picture bill to be identified;
generating a graph structure of the picture bill according to the picture bill;
obtaining the category of the picture bill by using a vector classification model;
and if the type of the picture bill is the invoice, counting the sum of the picture bill to obtain the total sum of the invoice.
The application provides a picture category identification method, which comprises the following steps:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure of the target picture according to the target picture;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain a candidate category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information;
judging whether the candidate category of the target picture is matched with the text block; and if so, determining the candidate type of the target picture as the type of the target picture.
The application provides a method for constructing a knowledge graph, which comprises the following steps:
in the training process of a vector classification model for picture classification, acquiring training characteristics of the vector classification model;
obtaining the relation between the entity information of the picture and each entity according to the training characteristics;
and constructing a picture knowledge graph according to the entity information and the relation between the entities.
Compared with the prior art, the method has the following advantages:
the application provides a picture category identification method, which comprises the following steps: obtaining a target picture; obtaining a text block containing coordinate information and character content according to the target picture; generating a graph structure according to the text blocks; obtaining vector characteristic information of the graph structure according to the graph structure; inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information. According to the method for identifying the picture category, firstly, a target picture is converted into a text block; then generating a graph structure according to the text blocks; and then obtaining the vector characteristic information of the graph structure according to the graph structure, classifying the document pictures according to the vector characteristic information, and because the vector characteristic information is obtained according to the text block, the vector characteristic information is not interfered by light and shadow, blur, tone, shooting angle and the like, and does not need a large amount of data to train, thereby solving the problem that the prior art needs a large amount of data to train due to more interference when classifying the document pictures according to the pixel characteristics.
Drawings
Fig. 1A is an application scenario diagram of a method for identifying a picture category according to a first embodiment of the present application.
Fig. 1 is a flowchart of a method for identifying a picture category according to a first embodiment of the present application.
Fig. 2 is a schematic diagram of obtaining two rectangles corresponding to a text block according to coordinate information of the text block corresponding to two nodes according to the first embodiment of the present application.
Fig. 3 is a schematic diagram of an all-connected graph including 3 nodes.
Fig. 4 is a schematic diagram of an apparatus for identifying a picture category according to a second embodiment of the present application.
Fig. 5 is a schematic diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather construed as limited to the embodiments set forth herein.
In order to show the present application more clearly, an application scenario of the method for identifying a picture category provided in the first embodiment of the present application is briefly introduced first.
The method for identifying the picture category provided by the first embodiment of the application can be applied to a scene in which a client interacts with a server, for example, as shown in fig. 1A, when the category of a target picture needs to be identified, a connection is usually established between the client and the server, the client sends the target picture to the server after the connection, and after the server receives the target picture, the server obtains a text block containing coordinate information and text content according to the target picture; then generating a graph structure according to the text blocks; then obtaining vector characteristic information of the graph structure according to the graph structure; and inputting the vector characteristic information into a vector classification model to obtain the category of the target picture, providing the category of the target picture to a client, and receiving the category of the target picture by the client.
A first embodiment of the present application provides a method for identifying a picture category, which is described below with reference to fig. 1.
As shown in fig. 1, in step S101, a target picture is obtained.
The target picture is a picture of a category to be obtained. The target picture can be a document picture containing text information. Such as identification card pictures, wedding card pictures. The target picture may be a picture obtained by scanning an original picture (e.g., an original of an identification card) or may refer to a picture obtained by photographing the original picture.
The image category identification method can be operated at a server, the target image can be a target image obtained from a client, when a user needs to obtain the category of the target image, the target image can be uploaded to the server through the client, and the server obtains the target image sent by the client. The picture category identification method can also be operated on the client side.
As shown in fig. 1, in step S102, a text block containing coordinate information and text content is obtained according to the target picture.
In specific implementation, a general OCR (optical character recognition) technology may be adopted to obtain a text block containing coordinate information and text content according to a target picture, where the OCR technology refers to a technology of locating text coordinate information from the target picture and recognizing text content.
The text block refers to a rectangle containing text content.
And the coordinate information of the text block refers to the coordinate information of four vertexes which enclose a rectangle in the target picture.
For example, the text content of one text block is "Shanghai city XXX", and the coordinates of the four vertices of the rectangle containing the text content are: (86,162), (337,162), (337,182), (86,182).
A plurality of text blocks containing coordinate information and text content can be obtained according to a target picture. For example, 21 text blocks, which are respectively represented as text block 0, text block 1, and up to text block 20, can be obtained from one identification card picture, and the information of each text block includes coordinate information and the text content of the text block.
As shown in fig. 1, in step S103, a graph structure is generated from the text blocks.
The generating a graph structure according to the text block includes:
converting the text content in the text block into vector characteristic information of a node as the characteristic information of the node of the graph structure corresponding to the text block;
using the relative position information and the relative width and height information between the two text blocks as the characteristic information of the edge between the nodes corresponding to the two text blocks;
and generating a graph structure according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
In specific implementation, when the text content in the text block is converted into the vector feature information of the node, the text content in the text block can be converted into the vector feature information of the node by adopting the content embedding, wherein the content embedding is a sentence vector technology and is a method for representing a segment of text information as the vector feature information.
The relative position information between the two text blocks refers to the position information between the text blocks expressed by relative coordinates. Because the positions of various types of certificates in the photographed picture in the target picture are not determined, absolute coordinates such as x1, x2, y1 and y2 are abandoned, and relative coordinates such as x2-x1 and y2-y1 are adopted to represent the relative positions of two text blocks.
The relative width and height information between the two text blocks refers to width and height information obtained by normalizing the width or height of one text block by taking the width or height information of the other text block as a denominator. Since it is considered that the size of a document (e.g., an identification card) in a target picture cannot be determined even when taking a picture, it makes no sense to use absolute values such as w1, h1, w2, and h2, where the value of the width and height of two text blocks is normalized by using the information of the width or height of one text block as a denominator and the relative proportion is used as feature information.
The using the relative position information and the relative width and height information between the two text blocks as the feature information of the edge between the nodes corresponding to the two text blocks includes:
obtaining two rectangles corresponding to the text block according to the coordinate information of the text block corresponding to the two nodes;
normalizing the width and the height of the two rectangles to obtain two normalized rectangles;
and taking the relative position information between the corresponding vertexes of the two rectangles and the width information and the height information of the rectangles after normalization processing as the characteristic information of the edge between the nodes corresponding to the two text blocks.
As shown in fig. 2, according to the coordinate information of the text block corresponding to the two nodes, two rectangles corresponding to the text block are obtained: rectangle 1 and rectangle 2, rectangle 1 has a width w1 and a height h1, and rectangle 2 has a width w2 and a height h 2. Then, taking the height h1 of the rectangle 1 as a denominator, and carrying out normalization processing on the width and the height of the two rectangles to obtain two rectangles after normalization processing; and taking the relative distance between the corresponding vertexes of the two rectangles and the width information and the height information of the normalized rectangles as the characteristic information of the edge between the nodes corresponding to the two text blocks. The feature information of the edge may be a 5-dimensional vector (which may be omitted since h1 is necessarily normalized to 1), and the 5-dimensional vector includes: the normalized width information w1/h1 of the rectangle 1, the width information w2/h1 of the rectangle 2, the height information h2/h1 of the rectangle 2, and the x component (i.e., (x2-x1)/h1) and the y component (i.e., (y2-y1)/h1) of the relative position between the upper left vertices of the rectangle 1 and the rectangle 2.
The obtaining of the text block feature information of the graph structure according to the graph structure includes:
and merging the characteristic information of each node in the graph structure and the characteristic information of all edges contained in the graph structure to obtain the text block characteristic information of the graph structure.
In the first embodiment of the present application, generating a graph structure according to the feature information of the nodes and the feature information of the edges between the nodes may include: and generating a full connected graph according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
For example, fig. 3 is a schematic diagram of an all-connected graph including 3 nodes.
In particular, in addition to generating the all-connected graph according to the feature information of the nodes and the feature information of the edges between the nodes, a graph with another structure may be generated according to the feature information of the nodes and the feature information of the edges between the nodes.
As shown in fig. 1, in step S104, vector feature information of a graph structure is obtained according to the graph structure.
The obtaining of the vector feature information of the graph structure according to the graph structure includes:
obtaining text block characteristic information of the graph structure according to the graph structure;
and converting the text block characteristic information of the graph structure into vector characteristic information.
In specific implementation, Graph Embedding can be adopted to convert the text block feature information of the Graph structure into vector feature information. Wherein, Graph Embedding: graph embedding technology is a method for representing graph structure data as vector features.
In a scheme for identifying categories of document pictures in the prior art, a method for identifying categories is adopted, in which keywords of a document are calibrated, OCR results of the document pictures are identified, and a matching rate of the keywords is calculated. The disadvantage of this solution is that when the keyword distinction is not sufficient, for example, 80% of the keywords in the text contents in two document pictures overlap, it is difficult to distinguish the document pictures. Compared with the method for identifying the document picture category based on the keywords in the prior art, the first embodiment of the application introduces the relative position and the relative width and height in the composition process according to the text block data, so that the knowledge in the aspects of the keywords and the knowledge in the layout of the document can be learned in the process of obtaining the vector feature information of the graph structure according to the graph structure, the data with the same keywords but different layouts can be distinguished, and the distinguishing degree is stronger.
As shown in fig. 1, in step S105, inputting the vector feature information into a vector classification model to obtain a category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
The vector classification model refers to a target vector classification model obtained by training an original vector classification model by using vector characteristic information corresponding to certain classes of pictures and class labels of the vector characteristic information as sample sets. If the training samples comprise wedding card pictures, identity card pictures and passport pictures, the vector classification model supports the types of the wedding cards, the types of the identity cards and the types of the passports. If the image categories supported by the vector classification model need to be added, the vector characteristic information corresponding to the images of the categories and the category labels of the vector characteristic information need to be added, and the image categories and the category labels are merged with the original sample set to train a new vector classification model.
The inputting the vector feature information into a vector classification model to obtain the category of the target picture includes:
inputting the vector characteristic information into a vector classification model, and outputting a probability value of the vector characteristic information belonging to each category by the vector classification model;
obtaining a category corresponding to the vector characteristic information according to the probability value of the vector characteristic information belonging to each category;
and taking the category corresponding to the vector feature information as the category of the target picture.
The obtaining of the category corresponding to the vector feature information according to the probability value of the vector feature information belonging to each category includes:
selecting the maximum probability value from the probability values of the vector feature information belonging to each category;
judging whether the maximum probability value is greater than or equal to a preset probability threshold value or not;
and if so, taking the picture category corresponding to the maximum probability value as the category corresponding to the vector feature information.
In specific implementation, a Sigmoid classifier can be used as a vector classification model to classify the vector feature information. The Sigmoid function is often used as an activation function for neural networks, mapping variables between 0 and 1.
For example, if the vector feature information is input into the vector classification model, the probability value that the vector classification model outputs the vector feature information belonging to the identity card category is 70%, the probability value that the output vector feature information belongs to the wedding card category is 10%, and the probability value that the output vector feature information belongs to the passport category is 10%, the maximum probability value 70% is selected from the three probability values, and then whether the probability value 70% is greater than the preset probability threshold value (if the preset probability threshold value is 50%) is judged, and after the judgment that the probability value 70% is greater than the preset probability threshold value, the picture category identity card corresponding to the maximum probability value 70% is taken as the category corresponding to the vector feature information, that is, the category of the target picture is taken as the identity card category; if the vector characteristic information is input into the vector classification model, the probability value that the vector characteristic information output by the vector classification model belongs to the identity card category is 10%, the probability value that the output vector characteristic information belongs to the wedding card category is 15%, the probability value that the output vector characteristic information belongs to the passport category is 15%, the maximum probability value of 15% is selected from the three probability values, and the target picture belongs to the category which is not supported by the vector classification model after the probability value of 15% is judged to be smaller than the preset probability threshold value of 50%.
The method for identifying the picture category provided by the first embodiment of the present application includes first converting a target picture into a text block; then generating a graph structure according to the text blocks; and then obtaining the vector characteristic information of the graph structure according to the graph structure, classifying the document pictures according to the vector characteristic information, and because the vector characteristic information is obtained according to the text block, the vector characteristic information is not interfered by light and shadow, blur, tone, shooting angle and the like, and does not need a large amount of data to train, thereby solving the problem that the prior art needs a large amount of data to train due to more interference when classifying the document pictures according to the pixel characteristics. In addition, compared with a method for identifying document picture categories based on keywords in the prior art, in the first embodiment of the present application, because the relative position and the relative width are introduced in the composition process according to the text block data, not only knowledge in terms of keywords but also knowledge of document layouts can be learned in the process of obtaining vector feature information of a graph structure according to the graph structure, data with the same keywords but different layouts can be distinguished, and the distinguishing degree is stronger.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, a second embodiment of the present application also provides an apparatus for identifying the picture category.
As shown in fig. 4, the apparatus for identifying picture categories includes:
a target picture obtaining unit 401, configured to obtain a target picture;
a text block obtaining unit 402, configured to obtain a text block containing coordinate information and text content according to the target picture;
a graph structure generating unit 403, configured to generate a graph structure according to the text block;
a vector feature information obtaining unit 404, configured to obtain vector feature information of a graph structure according to the graph structure;
a category obtaining unit 405 of the target picture, configured to input the vector feature information into a vector classification model, so as to obtain a category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
Optionally, the vector feature information obtaining unit is specifically configured to:
obtaining text block characteristic information of the graph structure according to the graph structure;
converting the text block characteristic information of the graph structure into vector characteristic information;
optionally, the graph structure generating unit is specifically configured to:
converting the text content in the text block into vector characteristic information of a node as the characteristic information of the node of the graph structure corresponding to the text block;
using the relative position information and the relative width and height information between the two text blocks as the characteristic information of the edge between the nodes corresponding to the two text blocks;
and generating a graph structure according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
Optionally, the graph structure generating unit is specifically configured to:
obtaining two rectangles corresponding to the text block according to the coordinate information of the text block corresponding to the two nodes;
normalizing the width and the height of the two rectangles to obtain two normalized rectangles;
and taking the relative position information between the corresponding vertexes of the two rectangles and the width information and the height information of the rectangles after normalization processing as the characteristic information of the edge between the nodes corresponding to the two text blocks.
Optionally, the vector feature information obtaining unit is specifically configured to:
and merging the characteristic information of each node in the graph structure and the characteristic information of all edges contained in the graph structure to obtain the text block characteristic information of the graph structure.
Optionally, the graph structure generating unit is specifically configured to:
and generating a full connected graph according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
Optionally, the category obtaining unit of the target picture is specifically configured to:
inputting the vector characteristic information into a vector classification model, and outputting a probability value of the vector characteristic information belonging to each category by the vector classification model;
obtaining a category corresponding to the vector characteristic information according to the probability value of the vector characteristic information belonging to each category;
and taking the category corresponding to the vector feature information as the category of the target picture.
Optionally, the category obtaining unit of the target picture is specifically configured to:
selecting the maximum probability value from the probability values of the vector feature information belonging to each category;
judging whether the maximum probability value is greater than or equal to a preset probability threshold value or not;
and if so, taking the picture category corresponding to the maximum probability value as the category corresponding to the vector feature information.
Optionally, the target picture is a document picture containing text information.
It should be noted that, for the detailed description of the apparatus provided in the second embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not described here again.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, a third embodiment of the present application also provides an electronic device.
As shown in fig. 5, the electronic device includes:
a processor 501; and
a memory 502 for storing a program of a picture category identification method, wherein after the device is powered on and the program of the picture category identification method is executed by the processor, the following steps are executed:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
Optionally, the obtaining of the vector feature information of the graph structure according to the graph structure includes:
obtaining text block characteristic information of the graph structure according to the graph structure;
converting the text block characteristic information of the graph structure into vector characteristic information;
optionally, the generating a graph structure according to the text block includes:
converting the text content in the text block into vector characteristic information of a node as the characteristic information of the node of the graph structure corresponding to the text block;
using the relative position information and the relative width and height information between the two text blocks as the characteristic information of the edge between the nodes corresponding to the two text blocks;
and generating a graph structure according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
Optionally, the using the relative position information and the relative width and height information between the two text blocks as feature information of an edge between nodes corresponding to the two text blocks includes:
obtaining two rectangles corresponding to the text block according to the coordinate information of the text block corresponding to the two nodes;
normalizing the width and the height of the two rectangles to obtain two normalized rectangles;
and taking the relative distance between the corresponding vertexes of the two rectangles and the width information and the height information of the normalized rectangles as the characteristic information of the edge between the nodes corresponding to the two text blocks.
Optionally, the obtaining text block feature information of the graph structure according to the graph structure includes:
and merging the characteristic information of each node in the graph structure and the characteristic information of all edges contained in the graph structure to obtain the text block characteristic information of the graph structure.
Optionally, the generating a graph structure according to the feature information of the nodes and the feature information of the edges between the nodes includes:
and generating a full connected graph according to the characteristic information of the nodes and the characteristic information of the edges between the nodes.
Optionally, the inputting the vector feature information into a vector classification model to obtain the category of the target picture includes:
inputting the vector characteristic information into a vector classification model, and outputting a probability value of the vector characteristic information belonging to each category by the vector classification model;
obtaining a category corresponding to the vector characteristic information according to the probability value of the vector characteristic information belonging to each category;
and taking the category corresponding to the vector feature information as the category of the target picture.
Optionally, the obtaining the category corresponding to the vector feature information according to the probability value that the vector feature information belongs to each category includes:
selecting the maximum probability value from the probability values of the vector feature information belonging to each category;
judging whether the maximum probability value is greater than or equal to a preset probability threshold value or not;
and if so, taking the picture category corresponding to the maximum probability value as the category corresponding to the vector feature information.
Optionally, the target picture is a document picture containing text information.
It should be noted that, for the detailed description of the electronic device provided in the third embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not repeated here.
In accordance with the method for identifying a picture category provided in the first embodiment of the present application, a fourth embodiment of the present application further provides a storage device storing a program of the method for identifying a picture category, where the program is executed by a processor to perform the following steps:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure according to the text blocks;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain the category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information.
It should be noted that, for the detailed description of the storage device provided in the fourth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not described here again.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, a fifth embodiment of the present application provides a picture searching method, including:
obtaining a picture to be searched;
generating a graph structure of the picture to be searched according to the picture to be searched;
obtaining vector feature information of the graph structure;
obtaining the category of the picture to be searched by using a vector classification model;
and outputting the information data of the picture to be searched according to the category of the picture to be searched.
It should be noted that, for the picture searching method provided in the fifth embodiment of the present application, reference may be made to the relevant description of the first embodiment of the present application, and only a brief description is made here. After the category of the picture to be searched is obtained, the information data of the picture to be searched can be obtained according to the identification information of the picture to be searched.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, a sixth embodiment of the present application provides a method for identifying a picture identity card, including:
obtaining a picture identity card to be identified;
generating a graph structure of the picture identity card according to the picture identity card;
obtaining the category of the picture identity card by using a vector classification model;
outputting confirmation information for confirming whether the category of the picture identity card is correct or not to a user;
obtaining confirmation information input by a user;
and adjusting parameters of the vector classification model according to the confirmation information input by the user.
It should be noted that, for the picture searching method provided in the sixth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not repeated here.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, a seventh embodiment of the present application provides a method for processing a picture ticket, including:
obtaining a picture bill to be identified;
generating a graph structure of the picture bill according to the picture bill;
obtaining the category of the picture bill by using a vector classification model;
and if the type of the picture bill is the invoice, counting the sum of the picture bill to obtain the total sum of the invoice.
It should be noted that, for the picture searching method provided in the seventh embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not repeated here.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, an eighth embodiment of the present application provides a method for identifying a picture category, including:
obtaining a target picture;
obtaining a text block containing coordinate information and character content according to the target picture;
generating a graph structure of the target picture according to the target picture;
obtaining vector characteristic information of the graph structure according to the graph structure;
inputting the vector characteristic information into a vector classification model to obtain a candidate category of the target picture; the vector classification model is used for obtaining the category of the picture corresponding to the vector characteristic information according to the input vector characteristic information;
judging whether the candidate category of the target picture is matched with the text block; and if so, determining the candidate type of the target picture as the type of the target picture.
It should be noted that, for the picture searching method provided in the eighth embodiment of the present application, reference may be made to the related description of the first embodiment of the present application, and details are not repeated here.
Corresponding to the method for identifying the picture category provided in the first embodiment of the present application, a ninth embodiment of the present application provides a method for constructing a knowledge graph, including:
in the training process of a vector classification model for picture classification, acquiring training characteristics of the vector classification model;
obtaining the relation between the entity information of the picture and each entity according to the training characteristics;
and constructing a picture knowledge graph according to the entity information and the relation between the entities.
In a training process of a vector classification model for picture classification, training features of the vector classification model may be obtained. For example, the picture may be an identification card, an invoice, vehicle information, and the like. According to the training characteristics, entity information and the relation among the entities can be obtained. And finally, constructing a picture knowledge graph according to the entity information and the relation between the entities. For example, if a picture includes several attributes such as a uniform social credit code, a vehicle brand, a vehicle usage time, a vehicle usage status, a vehicle license plate, etc., it can reflect that the picture corresponds to an entity called "legal person" and the entity "vehicle".
The entity of 'legal person' and 'vehicle' can be obtained by the expert who constructs the knowledge map according to the summary of the knowledge of the expert through the picture content, or can be obtained by the derivation of a machine according to an initial knowledge base; in addition, it is also known that "legal person" has an attribute of "unified social credit code", and "vehicle" has attributes of "brand", "time of use", "license plate", "use status", and the like.
Then, by means of a picture including several attributes of the unified social credit code, the license plate of the vehicle, the time of use of the vehicle, etc., it can be known that the relationship of "possession" between "legal person" and "vehicle" is present.
Obtaining two entities of 'legal person' and 'vehicle' through the attributes of uniform social credit codes, vehicle license plates, vehicle service time and the like included in the pictures; the 'legal person' has the attribute of 'unified social credit code', and the attribute can be used as the primary key information of the 'legal person' entity as the attribute has one-to-one relationship with the legal person entity; similarly, the "vehicle" has the attributes of license plate, usage time, usage status, etc., and the use of the license plate of the vehicle can correspond to a specific vehicle one-to-one, and thus can be used as the main key information of the "vehicle" entity.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.
In a typical configuration, a computing device includes one or more processors (CPUs), memory mapped input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:数据处理方法、模型训练方法、装置及设备