Tumor microenvironment and tumor gene mutation detection system, method and equipment
1. A tumor microenvironment and tumor gene mutation detection system is characterized in that the system comprises an image scanning device and an upper computer, the upper computer comprises a data processing module, a tumor microenvironment detection module and a tumor gene mutation detection module, wherein,
the image scanning device is used for shooting a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor sample comprises a stained sample and an unstained sample;
the data processing module is used for preprocessing the panoramic scanning image to obtain a first training atlas;
the tumor microenvironment detection module is used for inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain a biomarker distribution prediction model, and obtaining a biomarker distribution prediction atlas according to the biomarker distribution prediction model;
the tumor gene mutation detection module is used for determining a second training atlas according to the biomarker distribution prediction atlas; inputting the second training atlas into a pre-constructed gene mutation detection training model for iterative training to obtain a gene mutation detection model, and performing gene mutation detection by using the model.
2. The system of claim 1, wherein the biomarker distribution prediction training model comprises a generator, a discriminator, and an optimizer, the first training atlas comprises a stained real training atlas and an unstained training atlas to be tested,
the generator is used for segmenting the training atlas to be tested to obtain a biomarker distribution prediction atlas; calculating a prediction loss value of the generator according to the biomarker distribution atlas, the real training atlas and a prediction loss function;
the discriminator is used for discriminating a biomarker prediction distribution map set and a real training map set, and calculating a discrimination loss value of the discriminator according to the biomarker distribution map set, the real training map set and a discrimination loss function;
and the optimizer is used for respectively adjusting the parameters of the generator and the discriminator according to the prediction loss value, the discrimination loss value and the back propagation algorithm until the iterative training is finished to obtain the biomarker distribution prediction model.
3. The system according to claim 2, wherein the biomarker distribution prediction training model further comprises a feature extraction module, and the feature extraction module is configured to perform one or more of rotation, cropping, or flipping on the images in the real training image set and the training image set to be tested to obtain a multi-scale image.
4. A method for detecting tumor microenvironment and tumor gene mutation, comprising:
acquiring a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor sample comprises a stained sample and an unstained sample;
preprocessing the panoramic scanning image to obtain a first training atlas;
inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain a biomarker distribution prediction model;
obtaining a biomarker distribution prediction atlas according to the biomarker distribution prediction model;
determining a second training atlas according to the biomarker distribution prediction atlas;
inputting the second training atlas into a pre-constructed gene mutation detection training model for iterative training to obtain a gene mutation detection model, and performing gene mutation detection by using the model.
5. The method according to claim 4, wherein the first training atlas comprises a dyed real training atlas and an undyed training atlas to be tested, the biomarker distribution prediction training model comprises a generator and a discriminator, and the inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain the biomarker distribution prediction model comprises:
inputting the training atlas to be tested into the generator to obtain a biomarker distribution prediction atlas;
calculating a predictive loss value of the generator according to the biomarker distribution atlas, a real training atlas and a predictive loss function;
calculating a discriminant loss value of the discriminator according to the biomarker distribution atlas, the real training atlas and a discriminant loss function;
and respectively adjusting the parameters of the generator and the discriminator according to the prediction loss value, the discrimination loss value and a back propagation algorithm until the iterative training is finished to obtain a biomarker distribution prediction model.
6. The method of claim 5, further comprising:
inputting the biomarker distribution prediction atlas and the real training atlas into the discriminator to discriminate to obtain a confidence map; the confidence map is used for indicating the confidence of the biomarker distribution prediction atlas;
screening the region image set with the confidence coefficient larger than a preset confidence coefficient threshold value in the biomarker distribution prediction image set according to the confidence map;
and determining the region image set obtained by current training as the input of the next training.
7. The method of claim 4, wherein determining a second training atlas from the biomarker distribution prediction atlas comprises:
determining a probability distribution value of a target marker in the biomarker distribution prediction map set;
comparing the probability distribution value of the target marker with a preset probability distribution threshold value;
and determining the image of which the probability distribution value of the target marker is greater than the preset probability distribution threshold value as the image of a second training atlas.
8. The method of claim 4, wherein preprocessing the panoramic scan image to obtain a first training atlas comprises:
removing noise points from the panoramic scanning image to obtain a noise point removing image set;
artifact removal is carried out on the combination of the denoising point image blocks to obtain a denoising image set;
carrying out image blocking on the artifact-removed image set to obtain m blocked image sets with the size of n multiplied by n, wherein m is more than or equal to 1, and n is more than or equal to 1;
and carrying out color gamut standardization processing on the block image set to obtain a first training atlas.
9. The method of claim 8, wherein the gamut normalizing the set of block images comprises:
selecting an image with a standardized color gamut as a template image;
decomposing pictures in the set of block images into a non-negative and sparse staining intensity matrix;
fusing the color deviation of the dyeing intensity matrix and the template image to obtain a block image with a standardized color gamut;
converting the color distribution of each image in the block image set after the color gamut is standardized in the RGB space into the spectral density by using the following formula:
OD=-log10(I);
converting the spectral density value into a set standard value by using a color deconvolution transformation formula, wherein the formula is as follows:
OD=VS→S=V-1OD;
wherein, OD represents the spectral density, I represents the color vector of RGB space, V is the dyeing vector matrix, and S is the saturation matrix of each dyeing, i.e. the set standard value.
10. An apparatus, comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the tumor microenvironment and tumor gene mutation detection method of any one of claims 4-9.
Background
With the annual rise in tumor incidence and mortality, it has become one of the major threats to human health. With the development of medical technology, a new cancer treatment modality, tumor immunotherapy, has been developed in great quantities in basic and clinical research. According to related literatures, the response efficiency of tumor immunotherapy is related to various factors such as tumor microenvironment and tumor gene mutation conditions. Rapidly, efficiently and accurately analyzing and knowing the tumor microenvironment; the method for judging the tumor gene mutation type has important significance for guiding tumor immunotherapy.
At present, wet experiments are the main means for analyzing tumor microenvironment and detecting tumor gene mutation. The research on the tumor microenvironment generally comprises the steps of detecting an in vitro tumor sample and qualitatively and quantitatively analyzing cell biomarkers at a microscopic level. Muihc (multi-spectral immunofluorescent staining) technology is commonly used to simultaneously perform microscopic fluorescence imaging detection of multiple cell-expressed proteins of the tumor microenvironment in tumor samples. The mutation condition of the tumor gene is detected and analyzed by using a gene sequencing technology.
However, the mhic method and the gene sequencing method have the limitations of long experimental period, high cost and the like. And color cross is easy to occur among multiple mIHC staining reagents, so that the accuracy of characterization is influenced. Moreover, the mIHC result usually needs a pathologist to perform manual interpretation, depends on the expertise of an interpreter to a great extent, and has certain subjectivity.
Therefore, the method for realizing tumor microenvironment and tumor gene mutation detection by deep learning is provided, and assists a pathologist to efficiently and rapidly observe and evaluate expression distribution of cell biomarkers in the tumor microenvironment and analyze tumor gene mutation conditions so as to better guide tumor immunotherapy.
Disclosure of Invention
In order to solve the problems in the prior art, namely the problems of long experimental period, high cost, low detection efficiency and large error of a wet experimental method, the invention provides a tumor microenvironment and tumor gene mutation detection system, method and equipment.
In a first aspect of the present invention, a tumor microenvironment and tumor gene mutation detection system is provided, the system includes an image scanning device and an upper computer, the upper computer includes a data processing module, a tumor microenvironment detection module and a tumor gene mutation detection module,
the image scanning device is used for shooting a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor sample comprises a stained sample and an unstained sample;
the data processing module is used for preprocessing the panoramic scanning image to obtain a first training atlas;
the tumor microenvironment detection module is used for inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain a biomarker distribution prediction model, and obtaining a biomarker distribution prediction atlas according to the biomarker distribution prediction model;
the tumor gene mutation detection module is used for determining a second training atlas according to the biomarker distribution prediction atlas; inputting the second training atlas into a pre-constructed gene mutation detection training model for iterative training to obtain a gene mutation detection model, and performing gene mutation detection by using the model.
Optionally, the biomarker distribution prediction training model comprises a generator, a discriminator and an optimizer, the first training atlas comprises a stained real training atlas and an unstained training atlas to be tested,
the generator is used for segmenting the training atlas to be tested to obtain a biomarker distribution prediction atlas; calculating a prediction loss value of the generator according to the biomarker distribution atlas, the real training atlas and a prediction loss function;
the discriminator is used for discriminating a biomarker prediction distribution map set and a real training map set, and calculating a discrimination loss value of the discriminator according to the biomarker distribution map set, the real training map set and a discrimination loss function;
and the optimizer is used for respectively adjusting the parameters of the generator and the discriminator according to the prediction loss value, the discrimination loss value and the back propagation algorithm until the iterative training is finished to obtain the biomarker distribution prediction model.
Optionally, the biomarker distribution prediction training model further includes a feature extraction module, and the feature extraction module is configured to perform one or more operations of rotation, clipping, and turning on images in the real training image set and the training image set to be tested to obtain a multi-scale image.
In a second aspect, the present application provides a tumor microenvironment and a method for detecting a tumor gene mutation, the method comprising:
acquiring a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor sample comprises a stained sample and an unstained sample;
preprocessing the panoramic scanning image to obtain a first training atlas;
inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain a biomarker distribution prediction model;
obtaining a biomarker distribution prediction atlas according to the biomarker distribution prediction model;
determining a second training atlas according to the biomarker distribution prediction atlas;
inputting the second training atlas into a pre-constructed gene mutation detection training model for iterative training to obtain a gene mutation detection model, and performing gene mutation detection by using the model.
Optionally, the first training atlas includes a dyed real training atlas and an undyed training atlas to be tested, the biomarker distribution prediction training model includes a generator and a discriminator, and the inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain the biomarker distribution prediction model includes:
inputting the training atlas to be tested into the generator to obtain a biomarker distribution prediction atlas;
calculating a predictive loss value of the generator according to the biomarker distribution atlas, a real training atlas and a predictive loss function;
calculating a discriminant loss value of the discriminator according to the biomarker distribution atlas, the real training atlas and a discriminant loss function;
and respectively adjusting the parameters of the generator and the discriminator according to the prediction loss value, the discrimination loss value and a back propagation algorithm until the iterative training is finished to obtain a biomarker distribution prediction model.
Optionally, the method further comprises:
inputting the biomarker distribution prediction atlas and the real training atlas into the discriminator to discriminate to obtain a confidence map; the confidence map is used for indicating the confidence of the biomarker distribution prediction atlas;
screening the region image set with the confidence coefficient larger than a preset confidence coefficient threshold value in the biomarker distribution prediction image set according to the confidence map;
and determining the region image set obtained by current training as the input of the next training.
Optionally, the determining a second training atlas from the biomarker distribution prediction atlas comprises:
determining a probability distribution value of a target marker in the biomarker distribution prediction map set;
comparing the probability distribution value of the target marker with a preset probability distribution threshold value;
and determining the image of which the probability distribution value of the target marker is greater than the preset probability distribution threshold value as the image of a second training atlas.
Optionally, the preprocessing the panoramic scanned image to obtain a first training atlas includes:
removing noise points from the panoramic scanning image to obtain a noise point removing image set;
artifact removal is carried out on the combination of the denoising point image blocks to obtain a denoising image set;
carrying out image blocking on the artifact-removed image set to obtain m blocked image sets with the size of n multiplied by n, wherein m is more than or equal to 1, and n is more than or equal to 1;
and carrying out color gamut standardization processing on the block image set to obtain a first training atlas.
Optionally, the performing color gamut normalization processing on the set of block images includes:
selecting an image with a standardized color gamut as a template image;
decomposing pictures in the set of block images into a non-negative and sparse staining intensity matrix;
fusing the color deviation of the dyeing intensity matrix and the template image to obtain a block image with a standardized color gamut;
converting the color distribution of each image in the block image set after the color gamut is standardized in the RGB space into the spectral density by using the following formula:
OD=-log10(I);
converting the spectral density value into a set standard value by using a color deconvolution transformation formula, wherein the formula is as follows:
OD=VS→S=V-1OD;
wherein, OD represents the spectral density, I represents the color vector of RGB space, V is the dyeing vector matrix, and S is the saturation matrix of each dyeing, i.e. the set standard value. Acquiring a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor sample comprises a stained sample and an unstained sample;
in a third aspect of the invention, there is provided an apparatus comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for implementing the tumor microenvironment and tumor gene mutation detection method of any one of the first aspect.
In a fourth aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for execution by the computer to implement the tumor microenvironment and tumor gene mutation detection method according to any one of the first aspect.
The invention has the beneficial effects that: according to the method, a panoramic scanning image of an in-vitro tumor sample is obtained, and the expression and distribution information of biomarkers such as PanCK, DAPI, CD3, CD20 and the like of tumor cells, cell nuclei and tumor infiltrating lymphocytes in a tumor microenvironment is directly predicted based on the panoramic scanning image, a biomarker distribution prediction model and a gene mutation detection model; meanwhile, the mutation of the tumor genes such as APC, TP53 and KRAS which are frequently generated in colon cancer patients is detected. Compared with the traditional wet experiment method, the method has the advantages that the experiment cost is reduced, the experiment period is shortened, and the subjectivity of the interpretation result is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a tumor microenvironment and a tumor gene mutation detection system according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a biomarker distribution prediction training model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a tumor microenvironment and a method for detecting tumor gene mutation according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a comparison between a color gamut normalization process and a color gamut normalization process according to an embodiment of the present application;
FIG. 5 is a schematic diagram showing the comparison of the prediction result of the biomarker distribution prediction model and the real image according to the embodiment of the present application;
FIG. 6 is a schematic diagram illustrating comparison between predicted results and actual results based on a mutation detection model according to an embodiment of the present application.
FIG. 7 is a graph of the operational characteristics of a recipient of the effect of a gene mutation test;
FIG. 8 is a block diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
The invention provides a tumor microenvironment and tumor gene mutation detection system, as shown in fig. 1, the system comprises an image scanning device 100 and an upper computer 200, the upper computer comprises a data processing module 201, a tumor microenvironment detection module 202 and a tumor gene mutation detection module 203, wherein,
the image scanning device 100 is used for shooting a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor samples include stained and unstained samples.
In the embodiment of the present application, the image scanning apparatus 100 may perform image scanning using a 40 × magnification panoramic scanner.
The data processing module 201 is configured to perform preprocessing on the panoramic scanned image to obtain a first training atlas.
The tumor microenvironment detection module 202 is configured to input the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain a biomarker distribution prediction model, and obtain a biomarker distribution prediction atlas according to the biomarker distribution prediction model.
The tumor gene mutation detection module 203 is configured to determine a second training atlas according to the biomarker distribution prediction atlas; inputting the second training atlas into a pre-constructed gene mutation detection training model for iterative training to obtain a gene mutation detection model, and performing gene mutation detection by using the model.
Optionally, as shown in fig. 2, the biomarker distribution prediction training model comprises a generator, a discriminator and an optimizer, the first training atlas comprises a dyed real training atlas and an undyed training atlas to be tested,
the generator is used for segmenting the training atlas to be tested to obtain a biomarker distribution prediction atlas; calculating a prediction loss value of the generator according to the biomarker distribution atlas, the real training atlas and a prediction loss function;
the discriminator is used for discriminating a biomarker prediction distribution map set and a real training map set, and calculating a discrimination loss value of the discriminator according to the biomarker distribution map set, the real training map set and a discrimination loss function;
and the optimizer is used for respectively adjusting the parameters of the generator and the discriminator according to the prediction loss value, the discrimination loss value and the back propagation algorithm until the iterative training is finished to obtain the biomarker distribution prediction model.
Optionally, the biomarker distribution prediction training model further includes a feature extraction module, and the feature extraction module is configured to perform one or more operations of rotation, clipping, and turning on images in the real training image set and the training image set to be tested to obtain a multi-scale image.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the method embodiment, and will not be described herein again.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Based on the same inventive concept, the invention provides a tumor microenvironment and a tumor gene mutation detection method, and in order to more clearly explain the tumor microenvironment and the tumor gene mutation detection method provided by the invention, details of each step in the embodiment of the invention are expanded below with reference to fig. 3.
Step S301: acquiring a panoramic scanning image of an in-vitro tumor sample; the ex vivo tumor samples include stained and unstained samples.
In this step, the ex-vivo tumor sample is a sample obtained by full-sectioning, and the ex-vivo tumor sample is scanned by a scanning device with a magnification of 40 × to obtain a panoramic scanning image. The magnification can be set according to specific requirements, and is not intended to limit the scope of the present invention.
In one example, the stained sample may be obtained by staining an ex vivo tumor sample by H & E (Hematoxylin-eosin staining) technique. The true distribution of each biomarker in ex vivo tumor samples was represented by staining.
Step S302: and preprocessing the panoramic scanning image to obtain a first training atlas.
Optionally, the preprocessing the panoramic scanned image to obtain a first training atlas includes:
and removing noise from the panoramic scanning image to obtain a noise-removed image set.
In this step, a median filter of size N × N may be used to remove noise from the panoramic scan image, in one example, the selected region N is 5, resulting in a noise-removed image set.
And combining the denoising point image blocks to remove artifacts to obtain a denoising image set.
In one example, the specific operations are: calculating the mean value and the minimum value of all image blocks at the position of each pixel point on the basis of the denoising point image set; obtaining an average image and a minimum projection image according to the minimum value and the average value; subtracting the sensor offset from the average image and generating a binary mask using a preserving-edge smoothing method and a thresholding operation; applying a binary mask to the averaged image to obtain an artifact region and a non-artifact region. And replacing the pixel value corresponding to the artifact area by using the average value of the pixels of the non-artifact area, and then applying Gaussian blur to generate a background average estimation value of the image. And subtracting the background average estimation value from the average image to obtain a final artifact estimation value, and subtracting the artifact estimation value from each denoising point image block to remove the artifact and realize artifact removal.
And carrying out image blocking on the artifact-removed image set to obtain m blocked image sets with the size of n multiplied by n, wherein m is more than or equal to 1, and n is more than or equal to 1.
The artifact-removed image set may be subjected to non-overlapping cropping by using an n × n sliding window, so as to obtain m block image sets with the size of n × n, where m is greater than or equal to 1 and n is 512 in one example.
And carrying out color gamut standardization processing on the block image set to obtain a first training atlas.
Optionally, the performing color gamut normalization processing on the set of block images includes:
selecting an image with a standardized color gamut as a template image;
decomposing pictures in the block image set into a non-negative and sparse staining intensity matrix;
and fusing the color deviation of the dyeing intensity matrix and the template image to obtain a block image with a standardized color gamut.
In order to further unify and adjust the contrast of the color distribution of the block images after the color gamut standardization so as to better adapt to the model training, the invention adopts a color gamut standardization operation based on a spectrum, and the operation is as follows:
converting the color distribution of each image in the color gamut standardized block image set in the RGB space into spectral density by using the following formula:
OD=-log10(I) (1);
converting the spectral density value into a set standard value by using a color deconvolution transformation formula, wherein the formula is as follows:
OD=VS→S=V-1OD (2);
wherein, OD represents the spectral density, I represents the color vector of RGB space, V is the dyeing vector matrix, and S is the saturation matrix of each dyeing, i.e. the set standard value.
Referring to fig. 4, a schematic comparison of before and after gamut normalization is shown.
Step S303: inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain the biomarker distribution prediction model.
In this step, the first training atlas includes a stained real training atlas and an unstained training atlas to be tested. The stained real training set presents the real expression profile of the multiple biomarkers of the tumor sample. And detecting and analyzing the tumor microenvironment through the expression distribution of the biomarkers.
In one example, the biomarkers are PanCK for characterizing tumor cells and CD3, CD20, etc., for characterizing immune lymphocytes.
In the embodiment of the application, the biomarker distribution prediction training model is a neural network training model of semi-supervised learning constructed based on an antagonistic learning strategy. As shown in fig. 2, a generator and a discriminator are included, and particularly, the generator is constructed in an encoder-decoder structure and the discriminator is constructed using a deep convolutional neural network.
In addition, the biomarker distribution prediction training model further comprises a feature extraction module, and the multi-scale features of the first training image set are extracted through the feature extraction module, so that one or more operations of rotating, cutting or turning can be specifically carried out on the images in the real training image set and the training image set to be tested, and the images with the multi-scale features can be obtained. By extracting the multi-scale features, the generalization performance of the model can be improved.
Optionally, inputting the first training atlas into a pre-constructed biomarker distribution prediction training model for iterative training to obtain a biomarker distribution prediction model includes the following steps:
and inputting the training atlas to be tested into the generator to obtain a biomarker distribution prediction atlas.
And calculating the prediction loss value of the generator according to the biomarker distribution atlas, the real training atlas and the prediction loss function.
The embodiment of the application is a biomarker distribution prediction training model constructed based on an antagonistic learning strategy and a semi-supervised learning mode, so that the prediction loss function consists of a segmentation item loss function, an antagonistic loss item function and a semi-supervised loss function.
In one example, the predictive loss function is formulated as follows:
Lseg=Lce+λadvLadv+λsemiLsemi(3);
wherein L issegFor predicting the loss function, λadvAnd λsemiWeights representing the antagonistic loss term function and the semi-supervised loss function, respectively, LadvAnd LsemiRespectively representing the antagonistic loss term function and the semi-supervised loss function, LceRepresenting the segmentation term loss function.
Specifically, the partition term loss function is calculated as follows:
the opposition loss term function is calculated as follows:
the classifier is obfuscated by maximizing the loss value of the countering loss term function so that the image of the biomarker prediction distribution atlas generated by the generator approximates a real image. When calculating the function of the resisting loss term, the function of the resisting loss term can be properly reduced so as to avoid that the function of the resisting loss term excessively modifies the prediction result to cause gradient disappearance.
The semi-supervised loss function is calculated as follows:
wherein, in the formula (4) -formula (6), XnThe first training image set has image size H × W × 3, YnIs a self-learning label of the tumor sample.
S (-) is defined as the generator, D (-) is defined as the discriminator, and the size of the biomarker prediction profile generated by the generator is H × W × C, where H, W represent the height and width coordinates of the image, respectively, and C is the number of classes of the biomarker.
In equation (6) above, I (-) represents an indication function for indicating which regions in the image of the biomarker prediction distribution atlas have confidence levels above a set threshold, TsemiRepresenting a threshold for controlling the confidence selection. In one example, the set threshold may be 0.5.
It should be noted that, in the training phase, the invention will self-learn the label YnAnd the indicator function I (-) is regarded as a constant term, and thus, the semi-supervised loss function of equation (6) can be regarded as a spatial cross-entropy loss function after screening through a mask.
And calculating the discrimination loss value of the discriminator according to the biomarker distribution atlas, the real training atlas and a discrimination loss function.
In the embodiment of the present application, the discriminant loss function LDThe specific formula is as follows:
wherein,XnIs a first training atlas, YnIs a self-learning label of a tumor sample, ynA true label representing the tumor sample, y when the input to the discriminator is the set of biomarker prediction distribution maps from the generator outputn0; when the input to the discriminator is from a real training atlas, yn=1。
S (-) is defined as a generator, D (-) is defined as a discriminator, and the confidence map generated by the discriminator has the size H multiplied by W multiplied by 1, wherein H and W respectively represent the height and width coordinates of the image.
And respectively adjusting the parameters of the generator and the discriminator according to the prediction loss value, the discrimination loss value and a back propagation algorithm until the iterative training is finished to obtain a biomarker distribution prediction model.
The embodiment of the application adopts an iterative training mode, and the output result of each training is used as the input of the next training. The specific process is as follows:
inputting the biomarker distribution prediction atlas and the real training atlas into the discriminator to discriminate to obtain a confidence map; the confidence map is used to indicate the confidence of the biomarker distribution prediction atlas. The confidence map reveals which regions in the prediction are close to the distribution of the true biomarkers.
And screening the region image set with the confidence coefficient larger than a preset confidence coefficient threshold value in the biomarker distribution prediction image set according to the confidence map.
And determining the region image set obtained by current training as the input of the next training.
In addition, in the iterative training process, an Adam optimizer can be adopted for iterative optimization training. The final layer of output of the biomarker distribution prediction training model is processed by a softmax function, the output of a plurality of neurons is mapped into a (0, 1) interval, and finally the probability distribution of each biomarker category is obtained, wherein the softmax function formula is shown as (8):
wherein, yiTo predict the probability that an object belongs to the i-th class, i and j represent the class of the biomarker, respectively, C is the number of classes, ziAnd zjRespectively representing the output values of the ith and jth classes in the network.
As shown in fig. 5, a comparative schematic of the biomarker distribution prediction model prediction results is shown.
In a specific embodiment of the present application, an iterative training process of a generator of a biomarker distribution prediction training model is given:
the first step is as follows: setting the initial training times as 1, setting the threshold value of the total training times as 1000 and the threshold value of the confidence coefficient as 0.5, and recording the current training times of the model.
The second step is that: extracting the un-dyed training atlas to be tested and the corresponding real label thereof through a feature extraction module to obtain a multi-scale feature image.
The third step: and segmenting the multi-scale characteristic image through a generator to output a biomarker distribution prediction image.
And fourthly, calculating the prediction loss values of the biomarker distribution prediction image and the dyed real image.
And fifthly, adjusting the parameters of the generator through a back propagation algorithm according to the predicted loss value.
Sixthly, inputting the biomarker distribution prediction image and the real image output by the generator into a discriminator to obtain a confidence map;
seventhly, screening a region image with the confidence coefficient larger than the preset confidence coefficient from the biomarker distribution prediction image as an input image for next training according to the confidence map;
eighthly, adding 1 to the training times every time the training is finished, and judging whether the current training times are greater than the set threshold value 1000; and under the condition that the training times do not reach the set threshold value, the second step to the seventh step are repeatedly executed until the training times are reached.
In the embodiment of the application, the generator gradually reduces the value of the prediction loss through multiple segmentation prediction and parameter adjustment until the value of the prediction loss is stable or smaller than a preset threshold value, which indicates that the generator has sufficient precision, and can accurately identify the distribution of different cell biomarkers from the tumor sample without labeling and staining.
Therefore, the in vitro tumor sample can be accurately predicted by using the trained biomarker distribution prediction model.
Step S204: and obtaining a biomarker distribution prediction atlas according to the biomarker distribution prediction model.
Step S205: determining a second training atlas from the biomarker distribution prediction atlas.
Optionally, the determining a second training atlas from the biomarker distribution prediction atlas comprises:
determining a probability distribution value of a target marker in the biomarker distribution prediction map set;
comparing the probability distribution value of the target marker with a preset probability distribution threshold value;
and determining the image of which the probability distribution value of the target marker is greater than the preset probability distribution threshold value as the image of a second training atlas.
In one example, for detection of tumor cells, for example, image patches with PanCK distribution greater than 50% can be screened as a second training image, a second set of training images, by screening the tumor cell marker PanCK.
Step S206: inputting the second training atlas into a pre-constructed gene mutation detection training model for iterative training to obtain a gene mutation detection model, and performing gene mutation detection by using the model.
The gene mutation detection training model is optimized by minimizing the two-class cross entropy loss function in this step. The two-class cross entropy loss function is as follows:
loss(xs,ys)=-[yslogxs+(1-ys)log(1-xs)] (9);
wherein x issRepresenting a second training set, ysRepresenting a second training diagramImage tags of the sets.
The gene mutation detection training model is also trained in an iterative manner, and the training process refers to the training process of the generator discrimination module, which is not described herein again. In contrast, the model outputs classes of gene mutations.
As shown in FIG. 6, a comparison diagram of the prediction results of the training model for detecting gene mutation is shown.
As shown in FIG. 7, the predicted AUC values of APC, TP53 and KRAS were 0.76, 0.79 and 0.77, respectively, for the recipient operational characteristics of the tumor gene mutation detection effect.
When a new sample is detected, firstly, the H & E image of a sample slice is processed by using a cell biomarker prediction module, and an image block with PanCK distribution of more than 50% is screened as a tumor area image block to obtain a tumor area image block set. And detecting the image block set of the tumor area by using the trained tumor gene mutation detection model to obtain a tumor gene mutation detection result.
In this example, the recipient operational characteristics of the gene mutation detection effect are shown in FIG. 7, and the predicted AUC (Area Under the Curve roc) values of the genes APC, TP53, KRAS are 0.76, 0.79 and 0.77, respectively.
It should be noted that, the tumor microenvironment and tumor gene mutation detection system provided in the above embodiments is only exemplified by the division of the above functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the modules or steps in the embodiments of the present invention are further decomposed or combined, for example, the modules in the above embodiments may be combined into one module, or further separated into a plurality of sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An apparatus of a third embodiment of the invention comprises:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement a tumor microenvironment and a tumor genetic mutation detection method.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement a tumor microenvironment and a tumor gene mutation detection method.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Referring now to FIG. 8, therein is shown a block diagram of a computer system of a server that may be used to implement embodiments of the method, system, and apparatus of the present application. The server shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 8, the computer system includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.