Greedy gradient integration method and system for inhibiting language deviation
1. A greedy gradient integration method for suppressing linguistic deviations, comprising the steps of:
modeling the visual question-answering task into an additive model, wherein the additive model is divided into a basic model and a deviation model which are added in a generalized way;
optimizing each part of the additive model one by one, firstly optimizing the deviation model, and solving the minimum loss of the deviation model by using a binary cross entropy loss function: m (m is belonged to N)*) Substituting the deviation model function into a binary cross entropy loss function, and solving a negative gradient of the binary cross entropy loss function to obtain the optimization direction of the (m + 1) th deviation model function;
and after all deviation models are optimized, the basic model is optimized by taking the negative gradients of all deviation models as supervision.
2. The greedy gradient integration method of claim 1, wherein the additive model is represented by a function:
wherein f (X; theta) represents the basic model, hi(Bi;φi) Representing the deviation model, X representing the input variable, theta representing a parameter of the base model, phiiParameters representing the ith deviation model, BiDenotes the characteristic of the deviation of the separation, M (M. di. epsilon. N)*And M e M) represents the total number of deviation models for supervising the model optimization direction.
3. The greedy gradient integration method of claim 2, wherein the m bias model functions are substituted into the binary cross-entropy loss function to obtain:
and solving a negative gradient of the substituted binary cross entropy loss function to obtain:
where σ () represents a Sigmoid function, ym,iLabel (0 or 1) representing the ith answer candidate of the mth model.
4. The greedy gradient integration method of claim 3, wherein the basic model is optimized by using the negative gradients of all deviation models as supervision:
as shown in the above-mentioned formula,represents the Sigmoid function labeled with all bias model negative gradients, σ (f (X; θ)) represents the base model function,and taking the loss minimum value by taking all the negative gradients of the deviation model functions as labels for the basic model function.
5. The greedy gradient integration method of claim 1, wherein the bias model comprises a long tail distribution bias model and a question answer bias model.
6. A greedy gradient integration system for suppressing linguistic biases, comprising:
the method comprises the following steps that 1, a visual question-answering task is modeled into an additive model, and the additive model is divided into a basic model and a deviation model which are added in a generalized way;
the module 2 optimizes each part of the additive model one by one, optimizes the deviation model, and utilizes a binary cross entropy loss function to solve the minimum loss of the deviation model: m (m is belonged to N)*) Substituting the deviation model function into a binary cross entropy loss function, and solving a negative gradient of the binary cross entropy loss function to obtain the optimization direction of the (m + 1) th deviation model function;
and 3, optimizing all deviation models, and then optimizing the basic model by taking the negative gradients of all deviation models as supervision.
7. The greedy gradient integration system of claim 6, wherein the additive model is represented by a function:
wherein f (X; theta) represents the basic model, hi(Bi;φi) Representing the deviation model, X representing the input variable, theta representing a parameter of the base model, phiiParameters representing the ith deviation model, BiDenotes the characteristic of the deviation of the separation, M (M. di. epsilon. N)*And M e M) represents the total number of deviation models for supervising the model optimization direction.
8. The greedy gradient integration system of claim 7, wherein the m bias model functions are substituted into the binary cross-entropy loss function to obtain:
and solving a negative gradient of the substituted binary cross entropy loss function to obtain:
where σ () represents a Sigmoid function, ym,iLabel (0 or 1) representing the ith answer candidate of the mth model.
9. The greedy gradient integration system of claim 8, wherein the basic model is optimized using negative gradients of all deviation models as supervision by:
as shown in the above-mentioned formula,represents the Sigmoid function labeled with all bias model negative gradients, σ (f (X; θ)) represents the base model function,and taking the loss minimum value by taking all the negative gradients of the deviation model functions as labels for the basic model function.
10. The greedy gradient integration system of claim 7, wherein the bias model comprises a long tail distribution bias model and a question answer bias model.
Background
The Visual Question answering technology (VQA) aims at carrying out Question answering of natural language according to given pictures, is an important research direction in the multi-modal field, and has wide research and application values in the aspects of improving man-machine interaction, assisting in acquiring Visual information by visually-handicapped people and high-level AI intelligence.
Due to inevitable imbalance in data collection and strong correlation generated by homomorphic characteristics of question-answers, the deep learning-based visual question-answer model generally tends to capture mapping relations between questions and answers and ignore information of pictures for question answering, so that robustness and generalization capability of the model are seriously affected. This deviation is referred to as "Language deviation" (Language Bias) in the visual question-answering system. For a credible generalized visual question-answering system, how to balance the influence of language and visual characteristics on final decision making is to fully utilize image information to predict answers in the decision making process and inhibit language deviation, which is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a greedy gradient integration algorithm aiming at the problem of language deviation in a visual question-answering system.
The invention provides a greedy gradient integration method for inhibiting language deviation, which comprises the following steps: modeling the visual question-answering task into an additive model, wherein the additive model is divided into a basic model and a deviation model which are added in a generalized way; optimizing each part of the additive model one by one, firstly optimizing a deviation model, and solving the minimum loss of the deviation model by using a binary cross entropy loss function: m (m is belonged to N)*) Substituting the deviation model function into a binary cross entropy loss function, and solving a negative gradient of the binary cross entropy loss function to obtain the optimization direction of the (m + 1) th deviation model function; and after all deviation models are optimized, the basic model is optimized by taking the negative gradients of all deviation models as supervision.
The greedy gradient integration method described above, wherein the additive model is represented by a function:
wherein f (X; theta) represents the basic model, hi(Bi;φi) Representing the deviation model, X representing the input variable, theta representing a parameter of the base model, BiDenotes the characteristic of the deviation of the separation, M (M. di. epsilon. N)*And M is equal to M) represents the total number of deviation models and is used for monitoring the optimization direction of the model, phiiRepresenting the parameters of the ith bias model.
In the greedy gradient integration method, the m deviation model functions are substituted into the binary cross entropy loss function to obtain:
and solving a negative gradient of the substituted binary cross entropy loss function to obtain:
where σ () represents a Sigmoid function, ym,iLabel (0 or 1) representing the ith answer candidate of the mth model.
In the greedy gradient integration method, the negative gradients of all deviation models are used as supervision, and then the basic model is optimized to be specifically represented as:
as shown in the above-mentioned formula,represents the Sigmoid function labeled with all bias model negative gradients, σ (f (X; θ)) represents the base model function,and taking the loss minimum value by taking all the negative gradients of the deviation model functions as labels for the basic model function.
The greedy gradient integration method comprises the above deviation model, wherein the deviation model comprises a long-tail distribution deviation model and a question answer deviation model.
The invention also provides a greedy gradient integration system for suppressing language bias, which comprises: the method comprises the following steps that 1, a visual question-answering task is modeled into an additive model, and the additive model is divided into a basic model and a deviation model which are added in a generalized way; the module 2 optimizes each part of the additive model one by one, firstly optimizes the deviation model, and utilizes a binary cross entropy loss function to solve the minimum loss of the deviation model: m (m is belonged to N)*) Substituting the deviation model function into a binary cross entropy loss function, and solving a negative gradient of the binary cross entropy loss function to obtain the optimization direction of the (m + 1) th deviation model function; and 3, optimizing all deviation models, and then optimizing the basic model by taking the negative gradients of all deviation models as supervision.
The greedy gradient integration system described above, wherein the additive model is represented by a function:
wherein f (X; theta) represents the basic model, hi(Bi;φi) Representing the deviation model, X representing the input variable, theta representing a parameter of the base model, phiiParameters representing the ith deviation model, BiDenotes the characteristic of the deviation of the separation, M (M. di. epsilon. N)*And M e M) represents the total number of deviation models for supervising the model optimization direction.
The greedy gradient integration system described above, wherein the m bias model functions are substituted into the binary cross entropy loss function to obtain:
and solving a negative gradient of the substituted binary cross entropy loss function to obtain:
where σ () represents a Sigmoid function, ym,iLabel (0 or 1) representing the ith answer candidate of the mth model.
The greedy gradient integration system takes the negative gradients of all deviation models as supervision, and then optimizes the basic model to be embodied as follows:
as shown in the above-mentioned formula,represents the Sigmoid function labeled with all bias model negative gradients, σ (f (X; θ)) represents the base model function,and taking the loss minimum value by taking all the negative gradients of the deviation model functions as labels for the basic model function.
The greedy gradient integration system described above, wherein the bias model comprises a long tail distribution bias model and a question answer bias model.
The invention is described in detail below with reference to the drawings and specific examples, but the invention is not limited thereto.
Drawings
FIG. 1 is a flow chart of a greedy gradient integration method according to an embodiment of the invention.
FIG. 2 is a flow diagram of a greedy gradient integration method according to an embodiment of the invention.
Detailed Description
The invention will be described in detail with reference to the following drawings, which are provided for illustration purposes and the like:
the present specification discloses one or more embodiments that incorporate the features of this invention. The disclosed embodiments are merely illustrative. The scope of the invention is not limited to the disclosed embodiments. The invention is defined by the appended claims.
References in the specification to "one embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not intended to refer to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Where certain terms are used in the specification and following claims to refer to particular components or features, those skilled in the art will understand that various terms or numbers may be used by a skilled user or manufacturer to refer to the same component or feature. This specification and the claims that follow do not intend to distinguish between components or features that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. In addition, the term "connected" is intended to encompass any direct or indirect electrical connection. Indirect electrical connection means include connection by other means.
It should be noted that in the description of the present invention, the terms "lateral", "longitudinal", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
FIG. 1 is a flow chart of a greedy gradient integration method according to an embodiment of the invention. The greedy gradient integration method comprises the following steps: modeling the visual question-answering task into an additive model, wherein the additive model is divided into a basic model and a deviation model which are added in a generalized way, and the additive model can be represented by a function:
wherein f (X; theta) represents the basic model, hi(Bi;φi) Representing the deviation model, X representing the input variable, theta representing a parameter of the base model, phiiParameters representing the ith deviation model, BiDenotes the characteristic of the deviation of the separation, M (M. di. epsilon. N)*And M e M) represents the total number of deviation models for supervising the model optimization direction.
And after the modeling is finished, optimizing each part of the additive model one by one. In the invention, in order to use the negative gradients of all deviation models as labels, the deviation models need to be optimized first, and the minimum loss of the deviation models is solved by using a binary cross entropy loss function:
the binary cross entropy loss function is:
pi=σ(zi)
wherein, P represents a C-dimensional vector, and Y represents a C-dimensional vector; p is a radical ofiRepresenting the i-th dimension, y, of the vector PiRepresenting the ith dimension, z, of the vector YiIs the output of the model for the ith answer candidate.
M (m is belonged to N)*) Substituting the deviation model function into a binary cross entropy loss function to obtain:
and solving a negative gradient of the substituted binary cross entropy loss function to obtain:
where σ () represents a Sigmoid function, ym,iLabel (0 or 1) representing the ith answer candidate of the mth model.
Solving a negative gradient of the binary cross entropy loss function to obtain the optimization direction of the (m + 1) th deviation model function;
since there are many deviation models, it is necessary to determine whether all the deviation models are optimized every time one deviation model is optimized. After optimizing all deviation models, the basic model is optimized by taking the negative gradients of all deviation models as supervision, and the following formula is adopted:
as shown in the above-mentioned formula,represents the Sigmoid function labeled with all bias model negative gradients, σ (f (X; θ)) represents the base model function,and taking the loss minimum value by taking all the negative gradients of the deviation model functions as labels for the basic model function.
FIG. 2 is a flow chart of a greedy gradient integration method according to an embodiment of the invention. In the visual question-answering question, there are two main types of deviation features that bring about language deviation: long tail distribution of answers, and strong association between question semantics and answers. Therefore, in the present embodiment, the bias model is embodied as a long-tailed distribution bias model and a question answer bias model, which are mainly existing bias model types, but the present invention is not limited thereto, and the bias feature types may be added or reduced as needed.
Specifically, the process flow is generally as shown in the embodiment of fig. 1, and still models the visual question-answering task as an additive model that is divided into a base model and a bias model that are generalized additive. In this embodiment, the deviation models can be expressed as:
this formula represents the long tail distribution deviation model, where tiRepresentative problem qiType of question, question type reference VQA (image)Problem(s)And answer labels) The 65 types of data sets are determined by the first few words of the question.
Also expressed as:
this formula represents a model of deviation of answers to questions, wherein cqIs a fully connected classifier to obtainThe prediction confidence representing the answer is a real number vector in the C dimension.
First, statistics are predictedAs a first deviation model prediction, a second deviation model is optimized based on the gradient of the first prediction
WhereinIs the label answer. Adding the deviation models twice, continuously optimizing the basic model, and continuously using the gradient integration optimization basic model
The model was optimized using Batch Gradient descent (Batch SGD), optimizing L once within each Batch1And L2And in the test, only the basic model obtained according to the greedy gradient integration is used for carrying out answer prediction. The greedy gradient integration provided by the invention is to carry out constraint on a supervision level, is irrelevant to the selection of the basic model, and has generalization capability on various basic models.
The invention designs a greedy strategy gradient integration algorithm, which uses the gradient with deviation characteristic prediction loss as a pseudo label of main model prediction for supervision and trains to obtain a more stable visual question-answer model. Aiming at two main deviations generally faced by the task, firstly, the label long tail distribution of a training sample is used as a first deviation model, a gradient integration method is used for balancing the sample, and the tail label sample is focused; and (3) modeling by using a question-answer branch as a second deviation, and encouraging the model to intensively learn questions which are difficult to answer only by using a language model by using a gradient integration method again, so that the model is forced to carry out answer reasoning by referring to more visual information. And obtaining a visual question-answering model which is more robust to language priors through two times of learning. In the technical effect, the influence brought by various deviation characteristics can be removed, the language prior is more robust, and the generalization capability of the prediction model is improved. Performing independent modeling on the label distribution characteristics and the problem language characteristics, and performing one-by-one inhibition through a greedy gradient integration strategy; in technical effect, the visual question-answering system can be forced to pay attention to the visual part information, and retrospective visual evidence is provided for the predicted answers.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种基于自然语言实时场景生成的聊天机器人