Feature quantization model training, feature quantization and data query methods and systems
1. A feature quantization model training method comprises the following steps:
acquiring a plurality of source data fields;
acquiring characteristic information and marking information of each source data domain;
and training the characteristic quantization model according to the characteristic information and the labeling information of all the source data fields to obtain a public characteristic quantization model, wherein in the training process, public characteristic information and field-specific characteristic information are separated from the characteristic information of the source data fields, and the public characteristic information is the common characteristic information of the source data fields.
2. The method for training the feature quantization model according to claim 1, wherein the training the feature quantization model according to the feature information and the label information of all the source data fields to obtain the common feature quantization model comprises:
and training the characteristic quantization model according to the characteristic information and the labeling information of all the source data fields to obtain a public characteristic quantization model and a domain-specific characteristic quantization model of each source data field.
3. The feature quantization model training method of claim 2, wherein the feature quantization model is trained using a deep neural network algorithm.
4. The feature quantization model training method of claim 1 or 2, wherein the training of the feature quantization model comprises:
adjusting the feature quantization model such that Ex (L (F)) is applied to all the source data fields0(X, Y)) to a minimum value;
wherein X is the characteristic information representing all the source data fields, Y is the label information of all the source data fields, F0Representing a common feature quantization model, F0(X) representing the characteristic information X passing through F0The characteristic quantization code obtained after processing, L (F)0(X), Y) represents a loss function between the characteristic quantization code and the label information Y, Ex (L (F)0(X, Y)) represents the mathematical expectation of the L function for the characteristic information X.
5. The feature quantization model training method of claim 4, wherein the training of the feature quantization model further comprises:
adjusting the feature quantization model such that for any of the source data fields k, Ex (L (F)0(x),Fk(x) Y)) and, for any of the source data fields k, Ex (L (φ (F) F)0(x),Fk(x)),y))<Ex(L(φ(F0(x),Fp(x) Y)), where p is not equal to k;
wherein x represents the characteristic information of the source data field k, y is the label information of the source data field k, and F0Representing a common feature quantization model, F0(x) Representing characteristic information x passing through F0Characteristic quantization code obtained after processing, FkRepresenting the sourceDomain-specific feature quantification model of data domain k, Fk(x) Representing characteristic information x passing through FkThe characteristic quantization code obtained after processing, Fp, represents a domain-specific characteristic quantization model of the source data domain p, Fp(x) Representing characteristic information x passing through FpThe characteristic quantization code obtained after processing, phi (F)0(x),Fk(x) Represents a pair F0(x) And Fk(x) Performing a fusion treatment of phi (F)0(x),Fp(x) Represents a pair F0(x) And Fp(x) Performing a fusion treatment, L (phi (F)0(x),Fk(x) Y) and L (phi (F)0(x),Fp(x) Y) represents a loss function between the feature quantization code after the fusion process and the label information y, Ex () represents a mathematical expectation function, K is 1,2, …, K, p is 1,2, …, K being the number of the source data fields.
6. The feature quantization model training method of claim 5, wherein the fusion process is performed by an additive or linear concatenation method.
7. A method of feature quantization, comprising:
performing feature quantization on the target data set by using a common feature quantization model to obtain a feature quantization code of the target data set, wherein the common feature quantization model is obtained by training by using the feature quantization model training method according to any one of claims 1 to 6.
8. A data query method is applied to a server, and comprises the following steps:
receiving a target characteristic quantization code of target query data sent by a client;
comparing the target characteristic quantization code with a characteristic quantization code of a target data set to obtain a query result matched with the target characteristic quantization code, wherein the characteristic quantization code of the target data set is obtained by the characteristic quantization method of claim 7;
and returning the query result to the client.
9. The data query method of claim 8, wherein the characteristic quantization code of the target data set is obtained and stored by previously performing characteristic quantization on the target data set by using a common characteristic quantization model.
10. A data query method is applied to a client side, and comprises the following steps:
acquiring input target query data;
performing characteristic quantization calculation on the target query data according to a public characteristic quantization model to obtain a target characteristic quantization code of the target query data, wherein the public characteristic quantization model is obtained by training according to the characteristic quantization model training method of any one of claims 1 to 6;
sending the target characteristic quantization code to a server;
and receiving a query result returned by the server aiming at the target characteristic quantization code.
11. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the feature quantification model training method as claimed in any one of claims 1 to 6.
12. An electronic device, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the feature quantization method according to claim 7.
13. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the data query method as claimed in claim 8 or 9.
14. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, performing the steps of the data query method of claim 10.
15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for training a feature quantification model according to any one of claims 1 to 6; or, the computer program, when being executed by a processor, implementing the steps of the feature quantization method as claimed in claim 7; or, the computer program, when being executed by a processor, implementing the steps of the data query method as claimed in claim 8 or 9; alternatively, the computer program realizes the steps of the data query method as claimed in claim 10 when executed by a processor.
Background
Feature quantization (feature quantization) is an important technology in artificial intelligence-related fields such as computer vision, data mining, and the like. The goal of feature quantization is to output a reduced feature code (feature quantization code) containing condensed original information (features of original image, video, text, etc. data), while maintaining the expression capability of the original features to the maximum extent. The significance of feature quantization is that for large-scale data sets (such as massive image data in an image search system), a specific task (such as image search and the like) can be completed with smaller storage and computation complexity by using the quantized reduced feature codes. For example, in the field of image search, the image feature dimension of the mainstream is usually in the tens of thousands, and a representative visual feature such as a local aggregation descriptor (VLAD), a FisherVector or a feature vector of a depth network after global average pooling. When image searching and other operations are performed, high-dimensional features require extremely high storage cost and computation complexity. The feature quantization can greatly reduce the requirement on storage space and the computational complexity of the running time under the condition of basically not losing precision. Particularly, for a million-level image data set, after a characteristic quantization operation, the characteristics of the entire data set are usually only a few Gigabytes (GB), and can be easily read into the memory of a single server, thereby avoiding the time-consuming input/output (I/O) cost between multi-machine communication and memory-external memory in cloud services.
Conventional feature quantization algorithms include K-means clustering and the like. These algorithms are usually unsupervised and the distance or similarity calculation between features is often based on a standard euclidean distance or cosine similarity. In recent years, a feature quantization algorithm based on labeling information gradually draws more attention, and shows stronger performance in practical application. Common forms of annotation information include semantic tags (e.g., one or more tags are given to semantic categories of images), similarity tags (e.g., a value specifying whether two images are similar or even a particular similarity value), and so forth. However, when using a feature quantization algorithm for a particular target data domain, a common problem is the lack of annotation information. On one hand, the acquisition of the labeling information often requires manual labeling, which is expensive; on the other hand, annotation information for certain vertical domain applications is sparse in nature, such as the fine-grained recognition problem (fine-grained recognition). Thereby making it difficult to guarantee the performance of the feature quantization algorithm.
Disclosure of Invention
The embodiment of the invention provides a method and a system for training a characteristic quantization model, quantizing characteristics and inquiring data, which are used for solving the problem that the performance of a characteristic quantization algorithm is difficult to guarantee when the labeling information of a target data domain is insufficient.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a method for training a feature quantization model, including:
acquiring a plurality of source data fields;
acquiring characteristic information and marking information of each source data domain;
and training the characteristic quantization model according to the characteristic information and the labeling information of all the source data fields to obtain a public characteristic quantization model, wherein in the training process, public characteristic information and field-specific characteristic information are separated from the characteristic information of the source data fields, and the public characteristic information is the common characteristic information of the source data fields.
Optionally, the training the characteristic quantization model according to the characteristic information and the label information of all the source data fields to obtain a common characteristic quantization model includes:
and training the characteristic quantization model according to the characteristic information and the labeling information of all the source data fields to obtain a public characteristic quantization model and a domain-specific characteristic quantization model of each source data field.
Optionally, a deep neural network algorithm is used to train the public characteristic quantization model and the domain-specific characteristic quantization model.
Optionally, the training the feature quantization model includes:
adjusting the feature quantization model such that Ex (L (F)) is applied to all the source data fields0(X, Y)) to a minimum value;
wherein X is the characteristic information representing all the source data fields, Y is the label information of all the source data fields, F0Representing a common feature quantization model, F0(X) representing the characteristic information X passing through F0The characteristic quantization code obtained after processing, L (F)0(X), Y) represents a loss function between the characteristic quantization code and the label information Y, Ex (L (F)0(X, Y)) represents the mathematical expectation of the L function for the characteristic information X.
Optionally, the training the feature quantization model further includes:
adjusting the feature quantization model such that for any of the source data fields k, Ex (L (F)0(x),Fk(x) Y)) and, for any of the source data fields k, Ex (L (φ (F) F)0(x),Fk(x)),y))<Ex(L(φ(F0(x),Fp(x) Y)), where p is not equal to k;
wherein x represents the characteristic information of the source data field k, y is the label information of the source data field k, and F0Representing a common feature quantization model, F0(x) Representing characteristic information x passing through F0Characteristic quantization code obtained after processing, FkA domain-specific feature quantization model representing the source data domain k, Fk(x) Representing characteristic information x passing through FkThe characteristic quantization code obtained after processing, Fp, represents a domain-specific characteristic quantization model of the source data domain p, Fp(x) Representing characteristic information x passing through FpThe characteristic quantization code obtained after processing, phi (F)0(x),Fp(x) Represents a pair F0(x) And Fp(x) Performing a fusion treatment of phi (F)0(x),Fk(x) Represents a pair F0(x) And Fk(x) Performing a fusion treatment, L (phi (F)0(x),Fk(x) Y) and L (phi (F)0(x),Fp(x))Y) represents a loss function between the feature quantization code after the fusion process and the label information y, Ex () represents a mathematical expectation function, K is 1,2, …, K, p is 1,2, …, K being the number of the source data fields.
Optionally, the fusion process is performed by using an additive or linear splicing method.
In a second aspect, an embodiment of the present invention provides a feature quantization method, including:
and performing characteristic quantization on the target data set by adopting a common characteristic quantization model to obtain a characteristic quantization code of the target data set, wherein the common characteristic quantization model is obtained by adopting the signaling method of the characteristic model of the first aspect.
In a third aspect, an embodiment of the present invention provides a data query method, which is applied to a server, and the method includes:
receiving a target characteristic quantization code of target query data sent by a client;
comparing the target characteristic quantization code with a characteristic quantization code of a target data set to obtain a query result matched with the target characteristic quantization code, wherein the characteristic quantization code of the target data set is obtained by adopting the characteristic quantization method of the second aspect;
and returning the query result to the client.
Optionally, the feature quantization code of the target data set is obtained by performing feature quantization on the target data set in advance by using a common feature quantization model and storing the feature quantization code.
In a fourth aspect, an embodiment of the present invention provides a data query method, which is applied to a client, and the method includes:
acquiring input target query data;
performing characteristic quantization calculation on the target query data according to a public characteristic quantization model to obtain a target characteristic quantization code of the target query data, wherein the public characteristic quantization model is obtained by training by adopting the characteristic quantization model training method of the first aspect;
sending the target characteristic quantization code to a server;
and receiving a query result returned by the server aiming at the target characteristic quantization code.
In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the computer program implements the steps of the feature quantization model training method of the first aspect.
In a sixth aspect, an embodiment of the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when the computer program is executed by the processor, the steps of the feature quantization method in the second aspect are implemented.
In a seventh aspect, an embodiment of the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the computer program implements the steps of the data query method in the third aspect.
In an eighth aspect, an embodiment of the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the computer program implements the steps of the data query method in the fourth aspect.
In a ninth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the feature quantization model training method of the first aspect; or, the computer program when executed by a processor implements the steps of the feature quantization method of the second aspect described above; or, the computer program, when executed by a processor, implements the steps of the data query method of the third aspect; alternatively, the computer program, when executed by a processor, implements the steps of the data query method of the fourth aspect.
In the embodiment of the invention, the public characteristic quantization model is obtained by using rich marking information of a plurality of source data domains for training, and the public characteristic quantization model can be used for characteristic quantization of the target data domain with insufficient marking information, so that the characteristic quantization performance of the characteristic quantization model in the data domain with insufficient marking information is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a diagram illustrating a feature quantization method in the related art;
FIG. 2 is a schematic flow chart of a method for training a feature quantification model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a feature quantization model training method according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a method for quantifying features according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of a data query method applied to a server according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a data query method applied to a client according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a training system of a feature quantization model according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of a feature quantization system according to an embodiment of the present invention;
FIG. 9 is a block diagram of a data query system according to an embodiment of the present invention;
FIG. 10 is a schematic structural diagram of a data query system according to another embodiment of the present invention;
fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the invention;
fig. 12 is a schematic structural diagram of an electronic device according to another embodiment of the invention;
fig. 13 is a schematic structural diagram of an electronic device according to yet another embodiment of the invention;
fig. 14 is a schematic structural diagram of an electronic device according to yet another embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a feature quantization method in the related art, and as can be seen from fig. 1, in the related art, feature information (i.e., feature extraction) of a data set (or called a data field) needs to be extracted first, then key parameters of a feature quantization model are adjusted and optimized based on labeling information of the data set, and finally the extracted feature information is subjected to feature quantization by using the obtained feature quantization model.
To solve the above problem, referring to fig. 2, an embodiment of the present invention provides a method for training a feature quantization model, including:
step 21: acquiring a plurality of source data fields;
in the embodiment of the present invention, a data field may also be referred to as a data set, and one data field includes a plurality of data. For example, the data field is an image database including a plurality of images.
The multiple source data fields have a degree of relatedness, e.g., there are multiple identical semantic category labels.
Step 22: acquiring characteristic information and marking information of each source data domain;
the characteristic information may be set as desired, for example, in the image data set, and the characteristic information may include a descriptor of image visual information, etc.
Step 23: and training the characteristic quantization model according to the characteristic information and the labeling information of all the source data fields to obtain a public characteristic quantization model, wherein in the training process, public characteristic information and field-specific characteristic information are separated from the characteristic information of the source data fields, and the public characteristic information is the common characteristic information of the source data fields.
The public characteristic information is cross-domain invariant public information and contains knowledge of a plurality of data domains. For example, different cameras have different postures, and the postures of the photographed human faces or human bodies are correspondingly different, but there are some common features in the images, for example, the topological structure of the human faces, that is, the topological structure of the human faces, is common feature information.
In the embodiment of the invention, the public characteristic quantization model is obtained by using rich marking information of a plurality of source data domains for training, and the public characteristic quantization model can be used for characteristic quantization of the target data domain with insufficient marking information, so that the characteristic quantization performance of the characteristic quantization model in the data domain with insufficient marking information is improved.
Taking the image feature quantization oriented to the semantic retrieval task as an example, when a specific feature quantization model is applied to a certain target data domain, the existing method is to optimize key parameters of the specific feature quantization model based on semantic annotation information of the target data domain. When semantic annotation information is deficient, the existing method cannot guarantee the characteristic quantization performance of a specific characteristic quantization model in a target data domain. In the embodiment of the invention, a common characteristic quantization model is obtained by borrowing a plurality of existing related source data fields with abundant labeling information and training by multiplexing the labeling information of the related source data fields, and the common characteristic quantization model is adopted to carry out characteristic quantization on a target data field, so that the aim of improving the characteristic quantization performance of the characteristic quantization model on a target data set is fulfilled.
Of course, it should be noted that, in the embodiment of the present invention, the data field is not limited to the image data set, and the data in the data field includes, but is not limited to, data forms of images, videos, audios, and the like.
In this embodiment of the present invention, optionally, the training the characteristic quantization model according to the characteristic information and the label information of all the source data fields to obtain the common characteristic quantization model includes:
and training the characteristic quantization model according to the characteristic information and the labeling information of all the source data fields to obtain a public characteristic quantization model and a domain-specific characteristic quantization model of each source data field.
The domain-specific feature information is specific feature information for a certain data domain.
Referring to fig. 3, fig. 3 is a schematic diagram of a feature quantization model training method in an embodiment of the present invention, as can be seen from fig. 3, a data set (also referred to as a data field) for training a feature quantization model includes K, when the feature quantization model is trained, feature information of the data set needs to be obtained for each data set, then, the feature quantization model is trained according to label information and feature information of all data sets, and in the training process, the feature information of the data set can be decomposed into common feature information and domain-specific feature information, so as to finally obtain K +1 models, one common feature quantization model, and K domain-specific feature quantization models.
Suppose that K source data fields (datasets) are given, denoted as<Xk,Yk>Where K is 1,2, …, K. Wherein Xk,YkRespectively representing characteristic information and label information (typically in the form of a matrix) of the data set. For the convenience of discussion, the symbols x and y are used hereinafter to represent characteristic information and label information of a data set, respectively. In the embodiment of the invention, F is generated in a machine learning mode0,F1,…,FKThere are a total of K +1 models. Wherein, F0Shared by all K data fields, FkProprietary to the kth data field. Let Fk(x) And (4) representing the characteristic quantized code obtained by Fk processing of the characteristic information x. Phi (F)i(x),Fj(x) Represents a pair Fi(x) And Fj(x) Fusion is performed (e.g., simple addition or linear splicing can be performed, etc.). L (F)k(x) Y) represents a loss function between the feature quantization code obtained by subjecting the feature information x to Fk processing after the k-th model processing and the label information y (e.g., L may beTo classify a 0-1 loss function), we want to obtain a smaller loss function value. Ex (L (F)k(x) Y)) represents the mathematical expectation of the L function for x.
To obtain the above models, all source data fields need to be searched<Xk,Yk>And (3) carrying out a model learning process, wherein specific optimization targets in the learning process comprise:
1) for all K ═ 1,2, …, K, Ex (L (F)0(x) Y)) should take a minimum value. Thus, the public characteristic quantization model is ensured to obtain excellent characteristic quantization performance;
2) for any K ═ 1,2, …, K, Ex (L (Φ (F))0(x),Fk(x) Y)) should take a minimum value. Therefore, the complementarity of the domain-specific characteristic quantization model and the public characteristic quantization model is ensured;
3) for any K ═ 1,2, …, K, Ex (L (Φ (F))0(x),Fk(x)),y))<Ex(L(φ(F0(x),Fp(x) Y)), where p is not equal to k. This ensures the optimality of the domain-specific feature quantization model for a particular data domain.
Namely, the training of the feature quantization model includes:
adjusting the feature quantization model such that Ex (L (F)) is applied to all the source data fields0(X, Y)) to a minimum value;
wherein X is the characteristic information representing all the source data fields, Y is the label information of all the source data fields, F0Representing a common feature quantization model, F0(X) representing the characteristic information X passing through F0The characteristic quantization code obtained after processing, L (F)0(X), Y) represents a group passing through F0Loss function between feature quantization code obtained from processed feature information X and label information Y, Ex (L (F)0(X, Y)) represents the mathematical expectation of the L function for the characteristic information X.
Further, the training the feature quantization model further includes:
adjusting the feature quantization model such that for any of the source data fields k, Ex (L (F)0(x),Fk(x) Y)) take the minimum value; and, for anyThe source data field k, Ex (L (phi (F))0(x),Fk(x)),y))<Ex(L(φ(F0(x),Fp(x) Y)), where p is not equal to k;
wherein x represents the characteristic information of the source data field k, y is the label information of the source data field k, and F0Representing a common feature quantization model, F0(x) Representing characteristic information x passing through F0Characteristic quantization code obtained after processing, FkA domain-specific feature quantization model representing the source data domain k, Fk(x) Representing characteristic information x passing through FkThe characteristic quantization code obtained after processing, Fp, represents a domain-specific characteristic quantization model of the source data domain p, Fp(x) Representing characteristic information x passing through FpThe characteristic quantization code obtained after processing, phi (F)0(x),Fk(x) Represents a pair F0(x) And Fk(x) Performing a fusion treatment of phi (F)0(x),Fp(x) Represents a pair F0(x) And Fp(x) Performing a fusion treatment, L (phi (F)0(x),Fk(x) Y) and L (phi (F)0(x),Fp(x) Y) represents a loss function between the feature quantization code after the fusion process and the label information y, Ex () represents a mathematical expectation function, K is 1,2, …, K, p is 1,2, …, K being the number of the source data fields.
In the embodiment of the present invention, optionally, the fusion processing is performed by using an addition or linear splicing method.
In the embodiment of the invention, for different source data domains, after the results of the domain-specific characteristic quantization model and the public characteristic quantization model are fused, the characteristic quantization performance in the data domain can be ensured to be improved compared with the case of only using the public characteristic quantization model.
In the embodiment of the invention, for different data domains, domain-specific characteristic quantization models can be used interchangeably and fused with a common characteristic quantization model, and the actual effect is approximately equal to the introduction of random noise or the occurrence of a severe overfitting phenomenon.
In the embodiment of the invention, optionally, a deep neural network algorithm is adopted to train the characteristic quantization model. For example, the feature quantization model may be trained based on multiple layers of convolution, pooling, or non-linear activation network layers.
In the embodiment of the present invention, the feature information of each source data field may be extracted in various ways, for example, a deep neural network algorithm may be used to extract the feature information of each source data field.
In the embodiment of the present invention, optionally, the public characteristic quantization model and the domain-specific characteristic quantization model use a locality sensitive hash algorithm or a K-means algorithm. Further optionally, if the data set is an image data set, the public characteristic quantization model and the domain-specific characteristic quantization model use a locality sensitive hash algorithm.
In the embodiment of the present invention, optionally, if the data set is an image data set, for an image retrieval task, the following manner may be adopted: 1) image feature extraction is based on a pre-trained neural network (such as ResNet 50); 2) a shallow convolution network is adopted for the public characteristic quantization model and the domain-specific characteristic quantization model; 3) and the public characteristic quantization model and the domain-specific characteristic quantization model are fused in a linear splicing mode.
In the embodiment of the present invention, the characteristic quantization model training method may be executed by a server.
Referring to fig. 4, an embodiment of the present invention further provides a feature quantization method, including:
step 41: and performing characteristic quantization on the target data set by adopting a common characteristic quantization model to obtain a characteristic quantization code of the target data set, wherein the common characteristic quantization model is obtained by adopting the characteristic quantization model training method.
In the embodiment of the invention, the public characteristic quantization model is obtained by using rich marking information of a plurality of source data domains for training, and the public characteristic quantization model can be used for characteristic quantization of the target data domain with insufficient marking information, so that the characteristic quantization performance of the characteristic quantization model in the data domain with insufficient marking information is improved.
Referring to fig. 5, an embodiment of the present invention further provides a data query method, where the data query method is applied to a server side, and the data query method includes:
step 51: receiving a target characteristic quantization code of target query data sent by a client;
step 52: comparing the target characteristic quantization code with a characteristic quantization code of a target data set to obtain a query result matched with the target characteristic quantization code, wherein the characteristic quantization code of the target data set is obtained by adopting the characteristic quantization method;
step 53: and returning the query result to the client.
Optionally, the feature quantization code of the target data set is obtained by performing feature quantization on the target data set in advance by using a common feature quantization model and storing the feature quantization code.
Referring to fig. 6, an embodiment of the present invention further provides a data query method, where the data query method is applied to a client, and the data query method includes:
step 61: acquiring input target query data;
step 62: and performing characteristic quantization calculation on the target query data according to a public characteristic quantization model to obtain a target characteristic quantization code of the target query data, wherein the public characteristic quantization model is obtained by training by adopting the characteristic quantization model training method.
Referring to fig. 7, an embodiment of the present invention further provides a training system 70 for a feature quantization model, including:
a first obtaining module 71, configured to obtain a plurality of source data fields;
a second obtaining module 72, configured to obtain feature information and label information of each source data field;
and the training module 73 is configured to train the feature quantization model according to the feature information and the label information of all the source data fields to obtain a common feature quantization model, wherein in the training process, common feature information and domain-specific feature information are resolved from the feature information of the plurality of source data fields, and the common feature information is feature information common to the plurality of source data fields.
Optionally, the training module 73 is configured to train the feature quantization model according to the feature information and the label information of all the source data fields, so as to obtain a common feature quantization model and a domain-specific feature quantization model of each source data field.
Optionally, the training module 73 is configured to train the feature quantization model by using a deep neural network algorithm.
Optionally, the training module 73 is configured to adjust the feature quantization model such that Ex (L (F) is greater than x for all the source data fields0(X, Y)) to a minimum value;
wherein X is the characteristic information representing all the source data fields, Y is the label information of all the source data fields, F0Representing a common feature quantization model, F0(X) representing the characteristic information X passing through F0The characteristic quantization code obtained after processing, L (F)0(X), Y) represents a loss function between the characteristic quantization code and the label information Y, Ex (L (F)0(X, Y)) represents the mathematical expectation of the L function for the characteristic information X.
Optionally, the training module 73 is configured to adjust the feature quantization model such that for any of the source data fields k, Ex (L (Φ (F)) is0(x),Fk(x) Y)) and, for any of the source data fields k, Ex (L (φ (F) F)0(x),Fk(x)),y))<Ex(L(φ(F0(x),Fp(x) Y)), where p is not equal to k;
wherein x represents the characteristic information of the source data field k, y is the label information of the source data field k, and F0Representing a common feature quantization model, F0(x) Representing characteristic information x passing through F0Characteristic quantization code obtained after processing, FkA domain-specific feature quantization model representing the source data domain k, Fk(x) Representing characteristic information x passing through FkThe characteristic quantization code obtained after processing, Fp, represents a domain-specific characteristic quantization model of the source data domain p, Fp(x) Representing characteristic information x passing through FpThe characteristic quantization code obtained after processing, phi (F)0(x),Fk(x) Represents a pair F0(x) And Fk(x) Performing a fusion treatment of phi (F)0(x),Fp(x) Represents a pair F0(x) And Fp(x) Performing a fusion process,L(φ(F0(x),Fk(x) Y) and L (phi (F)0(x),Fp(x) Y) represents a loss function between the feature quantization code after the fusion process and the label information y, Ex () represents a mathematical expectation function, K is 1,2, …, K, p is 1,2, …, K being the number of the source data fields.
Optionally, the training module 73 is configured to perform the fusion processing by using an additive or linear splicing method.
Referring to fig. 8, an embodiment of the present invention further provides a feature quantization system 80, including:
and the characteristic quantization module 81 is configured to perform characteristic quantization on the target data set by using a common characteristic quantization model to obtain a characteristic quantization code of the target data set, where the common characteristic quantization model is obtained by training using the characteristic quantization model training method.
The feature quantification system 80 may be a server.
Referring to fig. 9, an embodiment of the present invention further provides a data query system 90, including:
a receiving module 91, configured to receive a target feature quantization code of target query data sent by a client;
the query module 92 is configured to compare the target feature quantization code with a feature quantization code of a target data set to obtain a query result matched with the target feature quantization code, where the feature quantization code of the target data set is obtained by using the above feature quantization method;
and the sending module 93 is configured to return the query result to the client.
The data query system 90 may be a server.
Optionally, the feature quantization code of the target data set is obtained by performing feature quantization on the target data set in advance by using a common feature quantization model and storing the feature quantization code.
Referring to fig. 10, an embodiment of the present invention further provides a data query system 100, including:
an obtaining module 101, configured to obtain input target query data;
a calculating module 102, configured to perform feature quantization calculation on the target query data according to a common feature quantization model to obtain a target feature quantization code of the target query data, where the common feature quantization model is obtained by training using the feature quantization model training method;
a sending module 103, configured to send the target feature quantization code to a server;
a receiving module 104, configured to receive a query result returned by the server for the target feature quantization code.
The data query system 100 may be a client.
Referring to fig. 11, an embodiment of the present invention further provides an electronic device 110, which includes a processor 111, a memory 112, and a computer program stored in the memory 112 and capable of running on the processor 111, where the computer program, when executed by the processor 111, implements the processes of the above-mentioned embodiment of the feature quantization model training method, and can achieve the same technical effects, and therefore, the details are not repeated here to avoid repetition.
Optionally, the electronic device 110 is a server.
Referring to fig. 12, an electronic device 120 according to an embodiment of the present invention includes a processor 121, a memory 122, and a computer program stored in the memory 122 and capable of running on the processor 121, where the computer program is executed by the processor 121 to implement the processes of the above-mentioned feature quantization method embodiment, and can achieve the same technical effects, and therefore, the details are not repeated here to avoid repetition.
Optionally, the electronic device 120 is a server.
Referring to fig. 13, an embodiment of the present invention further provides an electronic device 130, which includes a processor 131, a memory 132, and a computer program stored in the memory 132 and capable of running on the processor 131, where the computer program, when executed by the processor 131, implements the processes of the data query method embodiment applied to the server, and can achieve the same technical effects, and therefore, the details are not repeated here to avoid repetition.
Optionally, the electronic device 130 is a server.
Referring to fig. 14, an embodiment of the present invention further provides an electronic device 140, which includes a processor 141, a memory 142, and a computer program stored in the memory 142 and capable of running on the processor 141, where the computer program, when executed by the processor 141, implements the processes of the data query method embodiment applied to the client, and can achieve the same technical effects, and therefore, the descriptions thereof are omitted here to avoid repetition.
Optionally, the electronic device 140 is a client.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to implement each process of the above-mentioned embodiment of the feature quantization model training method, or the computer program is executed by the processor to implement each process of the above-mentioned embodiment of the feature quantization method, or the computer program is executed by the processor to implement each process of the above-mentioned embodiment of the data query method applied to the server side, or the computer program is executed by the processor to implement each process of the above-mentioned embodiment of the data query method applied to the client side, and the same technical effect can be achieved, and is not described herein again to avoid repetition. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.