Causal inference method and system for cascade medical observation data
1. A method of causal inference of cascade medical observation data, comprising:
acquiring cascade medical observation data, and extracting a first variable and a second variable from the cascade medical observation data;
establishing an improved cascade nonlinear additive noise model by taking the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure and the result in the causal relationship as parameters;
constructing an edge log-likelihood function aiming at the improved cascade nonlinear additive noise model;
carrying out anti-deformation decomposition on the edge log likelihood function, and optimizing by using an approximate posterior distribution method to obtain a variation lower bound corresponding to the edge log likelihood function;
taking the first variable as a cause in the causal relationship, taking the second variable as a result in the causal relationship, and solving the maximized lower bound of the variation by using a preset antagonistic training model to obtain a first lower bound value of the variation; taking the second variable as a cause in the causal relationship, taking the first variable as an effect in the causal relationship, and solving the maximized lower bound of the variation by using the antagonistic training model to obtain a second lower bound value of the variation;
and comparing the first variable lower bound value with the second variable lower bound value to obtain a comparison result, and determining the causal direction of the cascade medical observation data according to the comparison result.
2. A causal inference method of cascade medical observation data according to claim 1, wherein said modified cascade nonlinear additive noise model is expressed as:
Z1=f1(X;θ)+ε1
ZT=fT(Zpa(T);θ)+εT
Y=fT+1(Zpa(y);θ)+εy
wherein T represents the depth of the cascade structure, X represents the cause of the causal relationship, and ZTRepresenting the intermediate variable for each depth in the cascade structure, Y representing the result in the causal relationship, f ═ f1,f2......,fTRepresents a set of non-linear functions, theta represents a parameter in a causal relationship, epsilonTRepresenting additive noise, Z, corresponding to each depth in a cascade structurepa(T)Represents Z in a cascade structureTThe intermediate variable corresponding to the previous depth of (Z)pa(y)Representing the intermediate variable, epsilon, corresponding to the last depth in the cascade structureyRepresents from Zpa(y)Additive noise to Y.
3. A method of causal inference of cascade medical observation data according to claim 2, wherein said edge log likelihood function is expressed as:
in the formula, pθ() Representing a likelihood function, xiRepresents the ith data point, y, in XiRepresents the ith data point in Y and z represents an intermediate variable, wherein i is 1,2,3 … m, and m represents the number of data points.
4. The method of claim 3, wherein the performing a pair-wise decomposition of the edge log-likelihood function and optimizing by an approximate posterior distribution method to obtain a lower bound of variation corresponding to the edge log-likelihood function comprises:
decomposing the edge log-likelihood function by using a Markov condition to obtain an expression of the decomposed edge log-likelihood function:
for p in the above expression respectivelyθ(yi|zpa(y)) And pθ(zt|zpa(t)) Decomposing and dividing the function fT+1(Zpa(y)) is rewritten as f (x, epsilon) to obtain the expression of the edge log-likelihood function after rewriting:
in the formula (I), the compound is shown in the specification,additive noise representing the resulting variable; ε represents the additive noise of the intermediate variable;
introduction of parametersUsing simple distributionTo approximate a posterior distribution pθ(ε|xi,yi) Further decomposing the edge log-likelihood function to obtain an expression of the edge log-likelihood function after further decomposition:
defining the first term in the above expression as the lower bound of variation, then whenWhen the KL divergence in the expression is 0, the edge log likelihood function is equal to the variation lower bound corresponding to the edge log likelihood function, and the variation lower bound corresponding to the edge log likelihood function is decomposed to obtain a decomposed variation lower boundThe expression of the lower bound of the variation corresponding to the edge log likelihood function is as follows:
the last item in the above expressionIs rewritten asAnd constructing a discriminating network model T (X, Y; epsilon), and implicitly convertingExpressing as the optimal value of a discrimination network model T (X, Y; epsilon), bypassing KL divergence by using a countermeasure strategy of the discrimination network, and further obtaining an expression of a variation lower bound corresponding to the edge log-likelihood function:
in the formula (I), the compound is shown in the specification,representing the optimal value of the discriminative network model T (X, Y; epsilon).
5. The method of claim 4, wherein the antagonistic training model employs a variational auto-encoder with a discriminant network comprising an encoder module, a decoder module, and a discriminator module.
6. The method of causal inference of cascade medical observation data of claim 5, wherein said encoder module is to simply distributeThe coding network adopts three full-connection layers with ReLU nonlinear functions and an output layer without nonlinear processing as a network structure, and causes in causal relationship and preset random variables are mapped into additive noise through the coding network.
7. The method of causal inference of cascade medical observation data of claim 6, wherein said decoder module assigns a posterior distribution pθ(y′i|xiEpsilon), the decoding network adopts the same network structure as the encoding network, the cause in the causal relationship and additive noise output by the encoding network are reconstructed into a result through the decoding network, the reconstruction error between the reconstructed result and the result in the causal relationship is calculated, and the expectation of the lower bound of the variation is estimated by adopting a Monte Carlo method.
8. The method of causal inference of cascade medical findings according to claim 7, wherein the discriminator module distributes the simple distributionsAnd posterior distribution pθ(y′i|xiEpsilon) is expressed as a discrimination network which uses two fully-connected networks and one output layer without nonlinear function processing as the network structure, and discriminates additive noise from simple distribution by the discrimination networkOr posterior distribution pθ(y′i|xiEpsilon) and make simple distributionPosterior distribution of pθ(y′i|xiε) are close.
9. A method of causal inference of cascade medical findings according to claim 8, wherein the objective function of the discriminant network is:
wherein σ (t) ═ 1+ e-t)-tRepresents the Sigmoid function, T*(X, Y; ε) represents the optimal value of the objective function.
10. A causal inference system for cascading medical observation data, comprising:
the data acquisition module is used for acquiring cascade medical observation data and extracting a first variable and a second variable from the cascade medical observation data;
the model establishing module is used for establishing an improved cascade nonlinear additive noise model by taking the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure and the result in the causal relationship as parameters;
a function constructing module, configured to construct an edge log-likelihood function for the improved cascaded nonlinear additive noise model;
the function decomposition module is used for carrying out anti-variation decomposition on the edge log likelihood function and optimizing by using an approximate posterior distribution method to obtain a variation lower bound corresponding to the edge log likelihood function;
the parameter solving module is used for solving the maximized lower variation bound by using a preset antagonistic training model by taking the first variable as a cause in the causal relationship and the second variable as a result in the causal relationship to obtain a first lower variation bound value; taking the second variable as a cause in the causal relationship, taking the first variable as an effect in the causal relationship, and solving the maximized lower bound of the variation by using the antagonistic training model to obtain a second lower bound value of the variation;
and the direction determining module is used for comparing the first variation lower bound value with the second variation lower bound value to obtain a comparison result, and determining the causal direction of the cascade medical observation data according to the comparison result.
Background
With the advent of the big data age, a great deal of data is generated in each field, and it is important to study the causal relationship among the data. Causal inference is widely applied to the aspect of biomedicine, and biologists study the causal connection between certain diseases and genes through observed disease gene data; starting from comprehensive information in medicine and biology of the medicine, the molecular factors causing adverse drug reactions are deduced; causal molecular interactions are found using genetic data. In addition, causal inference has wide application in other fields, such as economic model prediction by using a causal network; and the performance of the TCP network protocol and the like are researched through a causal graph model.
At present, causal inference gradually becomes a hotspot in various research fields, and a great deal of results have been obtained. However, the existing causal inference method does not consider that in real data, causal variables and effect variables may not have direct causal influence, causal effects may have intermediate variables, and initial causes and final effects have indirect nonlinear causal influence, so the existing causal inference method is not satisfactory on data with a cascade structure. In addition, while causal inference has achieved a great deal of medical success, there is currently no way to study this indirect, cascade-structured medical data, starting from observed data.
In view of this, it is a technical problem to be solved by those skilled in the art to provide a cause and effect inference method for inferring a cause and effect direction of indirect medical observation data having a cascade structure from observation data, thereby improving accuracy of identifying the cause and effect direction, and solving the problem that the cause and effect inference method does not consider cascade medical observation data having a cascade structure in the existing methods.
Disclosure of Invention
In order to solve the technical problems, the invention provides a causal inference method and a causal inference system for cascade medical observation data, which can well identify the causal direction of the medical observation data with a cascade structure and obviously improve the accuracy of causal direction identification.
The invention provides a causal inference method of cascade medical observation data, which comprises the following steps:
acquiring cascade medical observation data, and extracting a first variable and a second variable from the cascade medical observation data;
establishing an improved cascade nonlinear additive noise model by taking the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure and the result in the causal relationship as parameters;
constructing an edge log-likelihood function aiming at the improved cascade nonlinear additive noise model;
carrying out anti-deformation decomposition on the edge log likelihood function, and optimizing by using an approximate posterior distribution method to obtain a variation lower bound corresponding to the edge log likelihood function;
taking the first variable as a cause in the causal relationship, taking the second variable as a result in the causal relationship, and solving the maximized lower bound of the variation by using a preset antagonistic training model to obtain a first lower bound value of the variation; taking the second variable as a cause in the causal relationship, taking the first variable as an effect in the causal relationship, and solving the maximized lower bound of the variation by using the antagonistic training model to obtain a second lower bound value of the variation;
and comparing the first variable lower bound value with the second variable lower bound value to obtain a comparison result, and determining the causal direction of the cascade medical observation data according to the comparison result.
Preferably, the expression of the improved cascaded nonlinear additive noise model is as follows:
Z1=f1(X;θ)+ε1
ZT=fT(Zpa(T);θ)+εT
Y=fT+1(Zpa(y);θ)+εy
wherein T represents the depth of the cascade structure, X represents the cause of the causal relationship, and ZTRepresenting the intermediate variable for each depth in the cascade structure, Y representing the result in the causal relationship, f ═ f1,f2,.....,fTRepresents a set of non-linear functions, theta represents a parameter in a causal relationship, epsilonTRepresenting additive noise, Z, corresponding to each depth in a cascade structurepa(T)Represents Z in a cascade structureTThe intermediate variable corresponding to the previous depth of (Z)pa(y)Representing the intermediate variable, epsilon, corresponding to the last depth in the cascade structureyRepresents from Zpa(y)Additive noise to Y.
Preferably, the expression of the edge log likelihood function is:
in the formula, pθ() Representing a likelihood function, xiRepresents the ith data point, y, in XiRepresents the ith data point in Y and z represents an intermediate variable, where i-1, 2, 3.. m, and m represents the number of data points.
Preferably, the performing inverse transformation resistant decomposition on the edge log-likelihood function and optimizing by using an approximate posterior distribution method to obtain a lower variation bound corresponding to the edge log-likelihood function includes:
decomposing the edge log-likelihood function by using a Markov condition to obtain an expression of the decomposed edge log-likelihood function:
for p in the above expression respectivelyθ(yi|zpa(y)) And pθ(zt|zpa(t)) Decomposing and dividing the function fT+1(Zpa(y)) is rewritten as f (x,epsilon), obtaining an expression of the edge log-likelihood function after rewriting:
in the formula (I), the compound is shown in the specification,additive noise representing the resulting variable; ε represents the additive noise of the intermediate variable;
introduction of parametersUsing simple distributionTo approximate the posterior distributionFurther decomposing the edge log-likelihood function to obtain an expression of the further decomposed edge log-likelihood function:
defining the first term in the above expression as the lower bound of variation, then whenWhen the KL divergence in the above expression is 0, the edge log likelihood function is equal to the variation lower bound corresponding to the edge log likelihood function, and the variation lower bound corresponding to the edge log likelihood function is decomposed to obtain the decomposed expression of the variation lower bound corresponding to the edge log likelihood function:
the last item in the above expressionIs rewritten asAnd constructing a discriminating network model T (X, Y; epsilon), and implicitly converting(ε|xi,yi) -logp θ (ε) expressed as the discriminant network model T (X, Y; epsilon), utilizing a countermeasure strategy of a discrimination network to bypass KL divergence, and further obtaining an expression of a variation lower bound corresponding to the edge log-likelihood function:
in the formula, T*(X, Y; ε) represents the optimal value of the discriminatory network model T (X, Y; ε).
Preferably, the confrontation training model adopts a variational automatic encoder with a discrimination network, and comprises an encoder module, a decoder module and a discriminator module.
Preferably, the encoder modules will simply be distributedThe coding network adopts three full-connection layers with ReLU nonlinear functions and an output layer without nonlinear processing as a network structure, and causes in causal relationship and preset random variables are mapped into additive noise through the coding network.
Preferably, the decoder module distributes the posterior pθ(y′i|xiEpsilon) is expressed as a decoding network, the decoding network adopts the same network structure as the encoding network, the cause in the causal relationship and additive noise output by the encoding network are reconstructed into a result through the decoding network, and the reconstructed result and the causal relationship are calculatedThe expectation of the lower bound of the variation is estimated using the monte carlo method.
Preferably, the discriminator module will simply distributeAnd posterior distribution p0(y′i|xiEpsilon) is expressed as a discrimination network which uses two fully-connected networks and one output layer without nonlinear function processing as the network structure, and discriminates additive noise from simple distribution by the discrimination networkOr posterior distribution pθ(y′i|xiEpsilon) and make simple distributionPosterior distribution of pθ(y′i|xiε) are close.
Preferably, the objective function of the discriminant network is:
wherein σ (t) ═ 1+ e-t)-tRepresents the Sigmoid function, T*(X, Y; ε) represents the optimal value of the objective function.
In another aspect, the invention provides a causal inference system for cascading medical observation data, comprising:
the data acquisition module is used for acquiring cascade medical observation data and extracting a first variable and a second variable from the cascade medical observation data;
the model establishing module is used for establishing an improved cascade nonlinear additive noise model by taking the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure and the result in the causal relationship as parameters;
a function constructing module, configured to construct an edge log-likelihood function for the improved cascaded nonlinear additive noise model;
the function decomposition module is used for carrying out anti-variation decomposition on the edge log likelihood function and optimizing by using an approximate posterior distribution method to obtain a variation lower bound corresponding to the edge log likelihood function;
the parameter solving module is used for solving the maximized lower variation bound by using a preset antagonistic training model by taking the first variable as a cause in the causal relationship and the second variable as a result in the causal relationship to obtain a first lower variation bound value; taking the second variable as a cause in the causal relationship, taking the first variable as an effect in the causal relationship, and solving the maximized lower bound of the variation by using the antagonistic training model to obtain a second lower bound value of the variation;
and the direction determining module is used for comparing the first variation lower bound value with the second variation lower bound value to obtain a comparison result, and determining the causal direction of the cascade medical observation data according to the comparison result.
The invention has at least the following beneficial effects:
the method takes the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure and the result in the causal relationship as parameters, establishes the improved cascade nonlinear additive noise model, can better match the medical observation data with the cascade structure, improves the accuracy of identifying the causal direction of the cascade medical data, solves the variation lower bound corresponding to the maximized edge log likelihood function through the preset antagonistic training model, bypasses the KL divergence by using an antagonistic strategy rather than an approximate formula, can allow the additive noise to have wider distribution, thereby improving the deduction capability of the model.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart illustrating a method for causal inference of cascade medical observation data according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a cascade structure of an improved cascade nonlinear additive noise model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a confrontation training model according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a causal inference system for cascading medical observation data according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a causal inference method and a causal inference system for cascade medical observation data, which can well identify the causal direction of the medical observation data with a cascade structure and obviously improve the accuracy of causal direction identification.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In one aspect, the present invention provides a method for causal inference of cascade medical observation data, referring to fig. 1, where the method includes:
and S110, acquiring cascade medical observation data, and extracting a first variable and a second variable from the cascade medical observation data.
In the embodiment of the invention, the acquired cascade medical observation data comprises observed data serving as an initial reason and data serving as a final result, besides, an unobserved intermediate variable and additive noise exist between the initial reason and the final result, and only the data serving as the initial reason and the final result are observed and extracted as a first variable and a second variable. The first variable and the second variable have a causal relationship, and the correct causal direction is uncertain, which may be the first variable-the second variable, that is, the first variable is used as a cause, and the second variable is used as an effect; it is also possible to have a second variable-the first variable, i.e. the second variable as the cause and the first variable as the result.
And S120, establishing an improved cascade nonlinear additive noise model by taking the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure and the result in the causal relationship as parameters.
It should be noted that an ANM (english term: Additive Noise Model) Model is a common causal discovery algorithm between binary variables in a nonlinear condition, and the Model can be expressed as y ═ f (x) + epsilon, epsilon ×. CANM (Cascade Additive Noise Model, Chinese full name: cascaded nonlinear Additive Noise Model) is a Model proposed for researching the causal relationship between indirect and nonlinear variables, and an ANM (adaptive Noise Model) is mainly applied to data containing intermediate variables.
In the embodiment of the invention, the improved cascade nonlinear additive noise model can be regarded as a combination of a plurality of ANM models, each direct causal influence follows the ANM model, and unobserved intermediate variables and potential noises exist between causes and results in the causal relationship, so that the model can be better matched with medical observation data with a cascade structure.
And S130, aiming at the improved cascade nonlinear additive noise model, constructing an edge log-likelihood function.
And S140, carrying out anti-deformation decomposition on the edge log-likelihood function, and optimizing by using an approximate posterior distribution method to obtain a variation lower bound corresponding to the edge log-likelihood function.
In the embodiment of the invention, the edge log likelihood function is subjected to inverse transformation resistant decomposition, the simple distribution is utilized to obtain the true posterior distribution which is difficult to solve by myopia, the variation lower bound corresponding to the edge log likelihood function is obtained by optimization through a countermeasure method, and the maximized edge log likelihood is converted into the maximized variation lower bound.
S150, taking the first variable as a cause in the causal relationship, taking the second variable as a result in the causal relationship, and solving a maximized lower bound of variation by using a preset antagonistic training model to obtain a first lower bound of variation; and taking the second variable as a cause in the causal relationship, taking the first variable as a result in the causal relationship, and solving the maximized lower bound of the variation by using the antagonistic training model to obtain a second lower bound of the variation.
In the embodiment of the invention, firstly, if the correct causal direction in the cascade medical observation data is a first variable-a second variable, the first variable can be used as a cause in the causal relationship, the second variable is used as a result, and a variation lower bound corresponding to the maximized edge log likelihood function is solved through a preset antagonistic training model to obtain a first variation lower bound value; then, assuming that the correct causal direction in the cascade medical observation data is the second variable, namely the first variable, the second variable is used as the cause in the causal relationship, and the first variable is used as the result, and the second variable lower bound value can be obtained by the same method. When the variation lower bound corresponding to the maximized edge log likelihood function is solved through a preset countermeasure training model, the countermeasure strategy is used for bypassing the KL divergence instead of an approximate formula, additive noise can be allowed to be distributed more widely, and therefore the deduction capability of the model is improved.
And S160, comparing the first variable lower bound value with the second variable lower bound value to obtain a comparison result, and determining the causal direction of the cascade medical observation data according to the comparison result.
In the embodiment of the invention, after a first variable component lower bound value and a second variable component lower bound value are obtained by solving, the two values are compared, and if the first variable component lower bound value is greater than the second variable component lower bound value, the correct causal direction in the cascade medical observation data is determined to be a first variable-a second variable; otherwise, the correct causal direction in the cascade medical observation is determined to be the second variable, the first variable.
As can be seen from the above, the causal inference method for cascade medical observation data provided in the embodiment of the present invention establishes an improved cascade nonlinear additive noise model by using the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure, and the result in the causal relationship as parameters, can better match medical observation data with a cascade structure, improve the accuracy of identifying the causal direction of the cascade medical data, meanwhile, the variation lower bound corresponding to the maximized edge log likelihood function is solved through a preset countermeasure training model, the KL divergence is bypassed by utilizing a countermeasure strategy rather than an approximate formula, the additive noise can be allowed to be distributed more widely, therefore, the inference capability of the model is improved, compared with the prior art, the causal direction of the medical observation data with the cascade structure can be well identified, and the accuracy of causal direction identification is remarkably improved.
Referring to fig. 2, as a preferred embodiment of the present invention, the expression of the improved cascaded nonlinear additive noise model is:
Z1=f1(X;θ)+ε1
ZT=fT(Zpa(r);θ)+εr
Y=fT+1(Zpa(y);θ)+εy
wherein T represents the depth of the cascade structure, X represents the cause of the causal relationship, and ZTRepresenting the intermediate variable for each depth in the cascade structure, Y representing the result in the causal relationship, f ═ f1,f2,.....,fTRepresents a set of non-linear functions, theta represents a parameter in a causal relationship, epsilonTRepresenting additive noise, Z, corresponding to each depth in a cascade structurepa(T)Represents Z in a cascade structureTThe intermediate variable corresponding to the previous depth of (Z)pa(y)Representing the intermediate variable, epsilon, corresponding to the last depth in the cascade structureyRepresents from Zpa(y)Additive noise to Y.
In the present example, it is assumed that there is no mixing in the causal mechanismAnd the data generation process follows the assumption of nonlinear additive noise, so that the cause X in the causal relationship and the additive noise epsilon corresponding to each depth in the cascade structureTAnd from Zpa(y)Additive noise epsilon to YyAre independent of each other.
Further, in the above embodiment, the expression of the edge log-likelihood function is:
in the formula, pθ() Representing a likelihood function, xiRepresents the ith data point, y, in XiRepresents the ith data point in Y and z represents an intermediate variable, where i-1, 2, 3.. m, and m represents the number of data points.
In the embodiment of the invention, the data is assumed to beReason X includes m data points XiReason Y includes m data points YiThen an expression of the edge log-likelihood function of the data D can be obtained.
Further, in the above embodiment, the step S140 includes:
decomposing the edge log-likelihood function by using a Markov condition to obtain an expression of the decomposed edge log-likelihood function:
for p in the above expression respectivelyθ(yi|zpa(y)) And pθ(zt|zpa(t)) Decomposing and dividing the function fT+1(Zpa(y)) is rewritten to f (x, epsilon) to obtain the expression of the edge log-likelihood function after rewriting:
in the formula (I), the compound is shown in the specification,additive noise representing the resulting variable; ε represents the additive noise of the intermediate variables.
In the embodiment of the invention, the edge log likelihood function is decomposed by using Markov conditions, and then p is decomposed by independence between reasons and additive noiseθ(yi|zpa(y)) And pθ(zt|zpa(t)) At the same time, due to the last unobserved intermediate variable Zpa(y)Contains all additive noise epsilonTAnd all the effects of the cause X on the result Y, the function f can be appliedT+1(Zpa(y)) is rewritten to f (x, epsilon), and then an expression of the edge log-likelihood function after rewriting can be obtained.
Introduction of parametersUsing simple distributionTo approximate a posterior distribution pθ(ε|xi,yi) Further decomposing the edge log-likelihood function to obtain an expression of the edge log-likelihood function after further decomposition:
defining the first term in the above expression as the lower bound of variation, then whenWhen the KL divergence in the expression is 0, the edge log-likelihood function is equal to the variation lower bound corresponding to the edge log-likelihood function, and the variation lower bound corresponding to the edge log-likelihood function is decomposed to obtain a decomposed edge log-likelihood function pairExpression of the lower bound of the variation of the strain:
in the embodiment of the invention, a parameter is utilizedSimple form distributionTo approximate a true posterior distribution p that is difficult to solve with respect to the parameter thetaθ(ε|xi,yi) By means of antagonismAnd pθ(ε|xi,yI) Jointly optimizing the lower bound of variation (ELBO) corresponding to the edge log-likelihood of data D, and the edge log-likelihood is for each data point (x)i,yi) Is determined by the sum of the edge log-likelihoods of (c). Due to the desire of usCan better approximate pθ(ε|xi,yi) Therefore, it is required toMinimum, but the sum of the edge log-likelihoods for data D is fixed, so we need to maximizeThis term is called the lower bound of variation, and whenAnd in time, the edge log likelihood function is equal to the lower bound of the variation corresponding to the edge log likelihood function, so that the maximum edge log likelihood is equal to the lower bound of the variation corresponding to the maximum edge log likelihood.
The last item in the above expressionIs rewritten asConstructing a discriminating network model T (X, Y; epsilon), implicitExpressing as the optimal value of a discrimination network model T (X, Y; epsilon), bypassing KL divergence by using a countermeasure strategy of the discrimination network, and further obtaining an expression of a variation lower bound corresponding to the edge log-likelihood function:
in the formula, T*(X, Y; ε) represents the optimal value of the discriminatory network model T (X, Y; ε).
In the embodiment of the present invention, since the integral term of the KL divergence calculated does not have an analytical solution in a closed form except for a minority distribution, we will refer to the last term in the above expressionIs rewritten asAnd by implicit willExpressed as the optimal value of a discriminant network model T (X, Y; epsilon) we constructed, KL divergence is bypassed using a discriminant network's countermeasure strategy that can allow a wider distribution as a priori of the underlying noise and make the mapping of data D to additive noise more flexible. Therefore, we can further obtain the expression of the lower bound of the variation corresponding to the edge log-likelihood function.
Referring to fig. 3, as a preferred embodiment of the present invention, the confrontation training model employs a variational automatic encoder with a discriminant network, which includes an encoder module, a decoder module and a discriminator module.
In the embodiment of the invention, the confrontation training model consists of an encoder module, a decoder module and a discriminator module, the variation lower bound corresponding to the edge log-likelihood function is optimized to be converged by using the alternative processing of the encoder module, the decoder module and the discriminator module, and the first variation lower bound value and the second variation lower bound value are obtained by solving so as to determine the causal direction between causal data with intermediate variables.
Further, in the above embodiments, the encoder modules will simply be distributedThe coding network adopts three full-connection layers with ReLU nonlinear functions and an output layer without nonlinear processing as a network structure, and causes in causal relationship and preset random variables are mapped into additive noise through the coding network.
In the embodiment of the invention, the encoder modules are simply distributedExpressed as a coded network, namely an encoder, since the coded network encoder is medical observation cascade dataModel of the mapping to additive noise epsilon, so we use three fully connected layers with the ReLU nonlinear function and one output layer without nonlinear processing as the network structure of the coding network encoder. Cascading medical observations through a coded network encoderCoding the data into additive noise epsilon together with a random variable u (u-N (0, l)) without using a reparameterization, so that the mapping of the medical observation cascade data to the additive noise can be more flexible, and the mode can be changedType learns more complex probability distributions.
Further, in the above embodiment, the decoder module distributes p a posterioriθ(y′i|xiEpsilon), the decoding network adopts the same network structure as the encoding network, the cause in the causal relationship and additive noise output by the encoding network are reconstructed into a result through the decoding network, the reconstruction error between the reconstructed result and the result in the causal relationship is calculated, and the expectation of the lower bound of the variation is estimated by adopting a Monte Carlo method.
In the embodiment of the invention, the decoder module distributes the posterior pθ(y′i|xiEpsilon) is expressed as a decoding network, i.e. decoder, since the decoding network decoder corresponds to the causal sample xiAnd additive noise N to reconstructed result y'iSo we adopt the same network structure as the coding network. In a decoding network decoder, additive noise N and causal samples x are utilizediFor result sample yiReconstructing to obtain a reconstructed result variable y'i. Then by calculating the effect variable y in the cause and effect relationshipiResult variable reconstructed with decoding network decoderTo estimate the reconstruction error epsiloniFurther, the expectation of variation lower boundThe estimation can be performed using the monte carlo method.
Furthermore, in the above embodiment, the discriminator module is simply distributedAnd posterior distribution pθ(y′i|xiEpsilon) is expressed as a discrimination network which uses two fully-connected networks and one output layer without nonlinear function processing as the network structure, and discriminates additive noise from simple distribution by the discrimination networkOr posterior distribution pθ(y′i|xiEpsilon) and make simple distributionPosterior distribution of pθ(y′i|xiε) are close.
In the embodiment of the invention, in the discriminator module, a relation is definedAnd pθ(ε) discriminator T (x)i,yi(ii) a Epsilon), the network structure of the discriminator is composed of two layers of fully connected networks and one layer of output layer without nonlinear function processing. The additive noise epsilon is distinguished as much as possible by the discrimination network from the current inference modelOr from pθ(epsilon) independently while forcingTo distribution pθAnd (epsilon) approaching.
Further, in the above embodiment, the objective function of the discriminant network is:
wherein σ (t) ═ 1+ e-t)-tRepresents the Sigmoid function, T*(X, Y; ε) represents the optimal value of the objective function.
In the embodiment of the invention, the variation lower bound corresponding to the edge log likelihood function is optimized to be converged by using the alternative processing of the encoder module, the decoder module and the discriminator module, and the causal direction between the causal data with the intermediate variable is determined by the variation lower bound.
In another aspect, embodiments of the present invention provide a causal inference system for cascading medical findings, which is described below and with reference to the above-described methods.
Referring to fig. 4, the system includes:
the data acquisition module 410 is used for acquiring cascade medical observation data and extracting a first variable and a second variable from the cascade medical observation data;
a model establishing module 420, configured to establish an improved cascade nonlinear additive noise model with a cause in the causal relationship, an intermediate variable corresponding to each depth in the cascade structure, and a result in the causal relationship as parameters;
a function constructing module 430, configured to construct an edge log-likelihood function for the improved cascaded nonlinear additive noise model;
the function decomposition module 440 is configured to perform inverse transformation resistant decomposition on the edge log likelihood function, and optimize by using an approximate posterior distribution method to obtain a variation lower bound corresponding to the edge log likelihood function;
the parameter solving module 450 is configured to use the first variable as a cause in the causal relationship, use the second variable as a result in the causal relationship, and solve the maximized lower bound of the variation by using a preset antagonistic training model to obtain a first lower bound of the variation; taking the second variable as a cause in the causal relationship, taking the first variable as a result in the causal relationship, and solving a maximized lower bound of variation by using the antagonistic training model to obtain a second lower bound of variation;
the direction determining module 460 is configured to compare the first variation lower bound value with the second variation lower bound value to obtain a comparison result, and determine a causal direction of the cascade medical observation data according to the comparison result.
As can be seen from the above, the causal inference system for cascade medical observation data provided in the embodiment of the present invention establishes an improved cascade nonlinear additive noise model by using the cause in the causal relationship, the intermediate variable corresponding to each depth in the cascade structure, and the result in the causal relationship as parameters, can better match medical observation data with a cascade structure, improve the accuracy of identifying the causal direction of the cascade medical data, meanwhile, the variation lower bound corresponding to the maximized edge log likelihood function is solved through a preset countermeasure training model, the KL divergence is bypassed by utilizing a countermeasure strategy rather than an approximate formula, the additive noise can be allowed to be distributed more widely, therefore, the inference capability of the model is improved, compared with the prior art, the causal direction of the medical observation data with the cascade structure can be well identified, and the accuracy of causal direction identification is remarkably improved.
The following describes practical application results of the method and system for causal inference of cascade medical observation data disclosed by the embodiments of the present invention through specific implementation cases.
Taking the causal direction of "insulin content-food consumption-body weight" as an example, in the correct causal relationship, insulin content is the initial cause, body weight is the final result, and food consumption is an intermediate variable between the initial cause and the final result.
First, insulin content data and body weight data are extracted from medical observation data.
Secondly, the insulin content is taken as a reason X, the weight is taken as a result Y, and a random variable u are input into an encoder module encoder, and are processed through three full-connection layers with a ReLU nonlinear function and an output layer network without nonlinear processing.
Inputting the output of an encoder module encoder and user-defined noise epsilon into a discriminator, and enabling a discriminator target function through two layers of fully-connected networks and one layer of output layer network without nonlinear function processing:an optimum value is obtained.
The output of the encoder module encoder and the insulin content X are input into the decoder module decoder together, passing through three full-link layers with the ReLU nonlinear function and an output layer network without nonlinear processing. Calculating the reconstruction errorBy encoder module, discriminationAnd the device module and the decoder module alternately process the components to make the ELBO converge and calculate the forward lower bound value of the components.
And thirdly, changing the input, taking the weight as a reason X and taking the insulin content as a result Y, and calculating the inverse variation lower limit value by the same method.
Finally, the forward variation lower bound value is compared with the reverse variation lower bound value, and the forward variation lower bound value is larger than the reverse variation lower bound value, so that the correct causal direction is deduced: "insulin content-food consumption-body weight".
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
- 上一篇:石墨接头机器人自动装卡簧、装栓机
- 下一篇:一种医疗实验数据智能收集系统及方法