Screening method of plasma molecular marker for diagnosing cervical cancer based on metabonomics
1. A screening method of plasma molecular markers for diagnosing cervical cancer based on metabonomics is characterized by comprising the following steps:
s1, selecting a sample
Selecting an fasting blood sample of a cervical cancer patient, and selecting a fasting blood sample of a healthy volunteer for comparison;
s2, carrying out plasma metabonomics sequencing
Taking a blood plasma sample at the temperature of minus 80 ℃, and slowly dissolving at the temperature of 4 ℃; mixing 100 original ion samples with 400 original ion precooled methanol acetonitrile solution in a volume ratio of 1: 1; vortexing for 60s, standing at-20 deg.C for 1h, centrifuging at 4 deg.C for 20min, freeze-drying the supernatant, and performing liquid-phase color cloth analysis; randomly mixing a Quality Control (QC) sample into a sample to be checked, separating by using an Ultra High Performance Liquid Chromatograph (UHPLC), and performing mass spectrum analysis of an electrospray ionization positive mode and a electrospray ionization negative mode by using a mass spectrometer;
processing liquid chromatography-mass spectrometry (LC-MS) raw data using XCMS to calibrate peaks, correct retention time and extract peak area; identification of metabolite structure by mass matching (25ppm) and database-based secondary spectral matching; annotating peaks, then normalizing by pareto extension using SIMCA, univariate statistical analysis, multivariate statistical analysis, enrichment analysis, joint path analysis using the MetaboAnalyst tool; selecting differentially expressed metabolites based on the criteria of a p value of less than 0.05 and an absolute value change of greater than 1.2; spearman correlation analysis was calculated using R packs Hmisc and corrplot;
s3 transcriptomics analysis
S3-1, analyzing the differential gene expression of the transcription data of healthy volunteers and cervical cancer patients in the GEO database by using GEO 2R;
s3-2, adopting a DESeq2 package of R language to analyze the differential gene expression of the transcription data of healthy volunteers and cervical cancer patients in TCGA;
s3-3, utilizing a cluster analyzer (clusterProfiler) to perform enrichment analysis of a Gene Ontology (GO) and a Kyoto Encyclopedia of Genes and Genomes (KEGG);
s4, regression analysis and AUC calculation
Establishing a 10-time cross-validation Lasso regression model, distinguishing healthy volunteers from cervical cancer patients, establishing a three-time cross-validation Lasso model, and distinguishing patients with IIA 1 or less and patients with IIA 2 or more; matching the best fit values to one metabolite respectively, and then fitting to a general model (GLM); r is subjected to regression analysis by using glmnet, an ROC curve is established, and curve AUC calculation is carried out on the model by using pROC.
2. The screening method for diagnosis of cervical plasma molecular markers based on metabolomics of claim 1, wherein the screening standard of the expressed genes in the step S3-1 is p <0.05, and the absolute value of the fold difference is more than 3/2.
3. The metabonomic-based screening method for plasma molecular markers for diagnosing cervical cancer according to claim 1, wherein the screening standard of the expressed genes in the step S3-2 is p <0.05, and the absolute value of the fold difference is more than 6/5.
4. The metabonomic-based screening method for plasma molecular markers for diagnosing cervical cancer according to claim 1, wherein the inclusion criteria of the patient sample in the step S1 are as follows: a, biopsy of a patient diagnosed with cervical cancer; b, patients with definite pathological results and detailed stages clinically; c, patients who did not receive any chemoradiotherapy and other treatments;
the exclusion criteria for the patient sample in step S1 are: d, patients with basic clinical information insufficiency; e, patients under 18 years of age or over 85 years of age; f, patients without pathological confirmation results; g, patients with metabolic disease.
Background
Cervical Cancer (CC) is a significant global public health problem. It is one of the most common malignancies that endanger women's health, with the incidence second among women and the number of deaths caused by it accounting for approximately 7.5% of the total number of cancer deaths in women. The latest National Cancer statistical data published by the National Cancer Center of China (NCCC) show that the incidence of cervical Cancer accounts for 6.25% of the first ten female malignant tumors, and is at the 6 th position; higher in rural areas at 6.34%; the death was about 3.96% of all tumors in women, and was at the 8 th position.
The screening and treatment of the cervical cancer have certain defects, such as lower specificity of HPV detection and lower sensitivity of liquid-based thin-layer cell detection, so that a new diagnosis and treatment target is searched for a cervical cancer patient, a targeted treatment medicament with an exact curative effect is explored, and the significance for early diagnosis and improvement of the prognosis of the cervical cancer is great.
Over the last decade, advances in the field of metabolomics have demonstrated that cancer is a metabolic disease and have led to the re-discovery of metabolomics as a new target for cancer detection and therapy. Early studies showed that cervical cancer patients contained a unique collection of molecules in serum, tumor tissue and stool, ranging from amino acids to nucleic acids. It was previously reported that by integrating metabolomics and transcriptomics data, 5 candidate metabolites for diagnosing cervical cancer were screened. Khan et al selected 7 metabolites as biomarkers for early detection of cervical epithelial neoplasia (CIN) and CC patients. However, there is currently no study on CC dynamics monitoring.
Disclosure of Invention
The invention aims to obtain molecular markers of cervical cancer plasma for diagnosis and typing by screening with a machine learning method based on metabonomics, and the molecular markers are beneficial to disease diagnosis and treatment of doctors. In order to achieve the purpose, the invention adopts the following technical scheme:
a screening method for diagnosing cervical plasma molecular markers based on metabonomics comprises the following steps:
s1, selecting a sample
Selecting an fasting blood sample of a cervical cancer patient, and selecting a fasting blood sample of a healthy volunteer for comparison;
s2, carrying out plasma metabonomics sequencing
Taking a blood plasma sample at the temperature of minus 80 ℃, and slowly dissolving at the temperature of 4 ℃; mixing 100 original ion samples with 400 original ion precooled methanol acetonitrile solution in a volume ratio of 1: 1; vortexing for 60s, standing at-20 deg.C for 1h, centrifuging at 4 deg.C for 20min, freeze-drying the supernatant, and performing liquid-phase color cloth analysis; randomly mixing a Quality Control (QC) sample into a sample to be checked, separating by using an Ultra High Performance Liquid Chromatograph (UHPLC), and performing mass spectrum analysis of an electrospray ionization positive mode and a electrospray ionization negative mode by using a mass spectrometer;
processing liquid chromatography-mass spectrometry (LC-MS) raw data using XCMS to calibrate peaks, correct retention time and extract peak area; identifying metabolite structure and annotated peaks by mass matching (25ppm) and database-based secondary spectral matching, followed by normalization by pareto extension using SIMCA, univariate statistical analysis, multivariate statistical analysis, enrichment analysis, joint path analysis using the MetaboAnalyst tool; selecting differentially expressed metabolites based on the p value less than 0.05 and the absolute value of fold difference greater than 1.2; spearman correlation analysis was calculated using R packs Hmisc and corrplot;
s3 transcriptomics analysis
S3-1, analyzing the differential gene expression of the transcription data of healthy volunteers and cervical cancer patients in a GEO database by using GEO 2R;
s3-2, analyzing the differential gene expression of the transcription data of healthy volunteers and cervical cancer patients in the TCGA database by adopting a DESeq2 packet in R;
s3-3, utilizing a cluster analyzer (clusterProfiler) to perform enrichment analysis of a Gene Ontology (GO) and a Kyoto Encyclopedia of Genes and Genomes (KEGG);
s4, regression analysis and AUC calculation
Establishing a 10-time cross-validation Lasso regression model, distinguishing healthy volunteers from cervical cancer patients, establishing a three-time cross-validation Lasso model, and distinguishing patients with IIA 1 or less and patients with IIA 2 or more; matching the best fit values to one metabolite respectively, and then fitting to a general model (GLM); r is subjected to regression analysis by using glmnet, an ROC curve is established, and curve AUC calculation is carried out on the model by using pROC.
The screening criteria for improving the expressed genes in the S3-1 step was p <0.05, and the absolute value of fold difference was greater than 3/2.
As an improvement, the screening standard of the expressed genes in the S3-2 step is p <0.05, and the absolute value of the difference multiple is more than 6/5.
As a refinement, the inclusion criteria for the patient sample in step S1 are: a, biopsy of a patient diagnosed with cervical cancer; b, patients with definite pathological results and detailed stages clinically; c, patients who did not receive any chemoradiotherapy and other treatments;
the exclusion criteria for the patient sample in step S1 are: d, patients with basic clinical information insufficiency; e, patients under 18 years of age or over 85 years of age; f, no pathological confirmation result; g, patients with metabolic disease.
The invention has the advantages that:
the invention screens out metabolic molecular markers capable of distinguishing cervical cancer from healthy people and molecular markers capable of distinguishing early stage and middle and late stage patients on the basis of metabonomics. Compared with the existing cervical cancer diagnosis method, the method has the following advantages:
1. the molecular marker screened by the invention is easy to sample, and can be used for collecting a sample under a non-invasive condition. Can monitor and diagnose cervical carcinoma simply and non-invasively.
2. The invention discloses a novel molecular typing metabolic marker capable of distinguishing early and middle and late cervical cancer patients.
3. The tumor marker screened by the invention has higher performance and can be applied to cervical cancer typing.
Drawings
FIG. 1 is a metabolic expression profile of CC patients and healthy volunteers; pca (a) and OPLS-da (b) metabolomics data scoring plots, the ellipses showing 95% confidence intervals; (C) mapping heat maps of differentially expressed metabolites using scale peak intensities; (D) enrichment analysis of SMPDB (small molecule pathway database) differential metabolites (E) correlation plots of differentially expressed metabolites;
FIG. 2 is a screening of metabolites by LASSO regression analysis; (A) ln (mm) on the x-axis and binomial deviation on the y-axis; (B) LASSO coefficient curves for 51 metabolites versus ln (λ); (C-E) ROC curves for training cohorts, test cohorts and validation cohorts with area under the curve of 0.795 (95% CI:0.98-1), 1 (95% CI:1-1), AUC of 1 (95% CI:1-1), (F-J) peak intensity plots for 5 metabolic markers including cyclohexylamine, L-carnitine, Val-Thr, sinapiside, 5.6.7.8 tetrahydrate-2 naphthol; healthy volunteers and CC patients evaluated the significance of the difference between the two groups by student t test;
FIG. 3 is an MS/MS spectrum of cyclohexylamine (AF), L-carnitine, Val-Thr, sinapioside, 5.6.7.8-tetrahydro-2-naphthoic acid (top) matched with standard compounds (bottom);
FIG. 4 is a metabolic expression profile of group IIA 1 or less and IIA 2 or more for CC patients; (a-B) GO and KEGG enrichment analysis with DEGs; (C) differential expression metabolite KEGG enrichment analysis; (D) KEGG enrichment analysis of differentially expressed metabolites and genes; (E) network analysis of DEGs and differentially expressed metabolites; (F) correlation of differentially expressed metabolites with clinical indices;
FIG. 5 shows the expression abundance of (A) TAMO. (B) ROC curve of TAMO in training set. (C) TAMO is on the ROC curve in the validation set. (D) Survival analysis curve for patients with high and low TMAO expression. (E) TMAO expression risk analysis.
Detailed Description
The present invention will be described in detail and specifically with reference to the following examples so as to facilitate the understanding of the present invention, but the following examples do not limit the scope of the present invention.
Example 1
The embodiment discloses a screening method of cervical plasma molecular markers for diagnosis based on metabonomics, which comprises the following steps:
s1, selecting a sample
Selecting an fasting blood sample of a cervical cancer patient, and selecting a fasting blood sample of a healthy volunteer for comparison. Inclusion criteria for patient samples were: a biopsy of a patient diagnosed with rectal cancer; b, patients with definite pathological results and detailed stages clinically; c, patients who did not receive any chemoradiotherapy and other treatments; exclusion criteria for patient samples were: d, patients with basic clinical information insufficiency; e, patients under 18 years of age or over 85 years of age; f, patients without pathological confirmation results; g, patients with metabolic disease.
S2, carrying out plasma metabonomics sequencing
Taking a blood plasma sample at the temperature of minus 80 ℃, and slowly dissolving at the temperature of 4 ℃; mixing 100 original ion samples with 400 original ion precooled methanol acetonitrile solution in a volume ratio of 1: 1; vortexing for 60s, standing at-20 deg.C for 1h, centrifuging at 4 deg.C for 20min, freeze-drying the supernatant, and performing liquid-phase color cloth analysis; quality Control (QC) samples are randomly mixed into a sample to be checked, and after separation by using an Ultra High Performance Liquid Chromatograph (UHPLC), mass spectrometry in positive and negative modes of electrospray ionization is carried out by using a triple TOF5600 mass spectrometer (ABCIEX).
Processing liquid chromatography-mass spectrometry (LC-MS) raw data using XCMS to calibrate peaks, correct retention time and extract peak area; identification of metabolite structure by mass matching (25ppm) and database-based secondary spectral matching; annotating peaks, then normalizing by pareto extension using SIMCA, univariate statistical analysis, multivariate statistical analysis, enrichment analysis, joint path analysis using the MetaboAnalyst tool; selecting differentially expressed metabolites based on the p value less than 0.05 and the absolute value of fold difference greater than 1.2; spearman correlation analysis of R was calculated using Hmisc and corrplot.
S3 transcriptomics analysis
S3-1, analyzing the differential gene expression of the transcription data of healthy volunteers and cervical cancer patients in a GEO database by using GEO 2R; the screening standard of the expressed gene is p <0.05, and the absolute value of the difference multiple is greater than 3/2.
S3-2, adopting DESeq2 in R to analyze the differential gene expression of the transcription data of healthy volunteers and cervical cancer patients in TCGA; the screening standard of the expressed gene is p <0.05, and the absolute value of the difference multiple is greater than 6/5.
S3-3. enrichment of the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was analyzed using a Cluster Analyzer (clusterProfiler).
S4, regression analysis and AUC calculation
Establishing a 10-time cross-validation Lasso regression model, distinguishing healthy volunteers from cervical cancer patients, establishing a three-time cross-validation Lasso model, and distinguishing patients with IIA 1 or less and patients with IIA 2 or more; matching the best fit values to one metabolite respectively, and then fitting to a general model (GLM); r is subjected to regression analysis by using glmnet, an ROC curve is established, and curve AUC calculation is carried out on the model by using pROC.
87 CC patients and 34 healthy volunteers were recruited for the S1 sample collection step of example 1, and plasma samples were subjected to the S2 step for plasma metabonomics sequencing. The clinical details of the two groups are shown in table 1:
TABLE 1 basic clinical characteristics of the patients enrolled
Principal Component Analysis (PCA) was performed based on metabolomics data from healthy volunteers and CC patients, with component 1(PCI) accounting for 70.3% and component 2(PC2) accounting for 12.4% (fig. 1A), indicating a clear separation between the two groups. OPLS-DA scores for the predicted component (x-axis) and the orthogonal component (y-axis) were 4.8% and 13.5%, respectively (fig. 1B), indicating a significant difference in metabolism between CC patients and healthy volunteers. Differential metabolomics found 51 metabolites (fig. 1C), with differentially expressed metabolites enriched in carnitine metabolism, lipid metabolism, and amino acid metabolism (fig. 1D). The various amino acids and their derivatives appear to coordinate with each other (FIG. 1E).
The transcriptome from the GEO dataset was integrated, including specimen tissues of 24 normal and 28 CC patients. The differential metabolites and the metabolite sources shown by the gene networks may be caused by abnormal tumor metabolism.
All samples were randomly divided into training and validation sets at a 7:3 ratio, then modeled using a 10-fold cross-validation absolute contraction and selection operator (Lasso), with the lowest standard error λ (dashed, λ ═ 0.049) for the optimal λ values (dashed, λ ═ 0.136),1(SE) (fig. 2). The five most contributing metabolites (cyclohexylamine, L-carnitine, Val-Thr, sinapiside, 5.6.7.8. tetrahydro, 2. naphthol, acid) in the final selection model were combined to form the prediction model (FIG. 2B). The structures of the five metabolites were confirmed by the standard compounds (fig. 3). Five metabolites performed equally well in the training cohort, test set cohort and another independent validation cohort (including 45 CC patients and 7 persons) (fig. 2C-2E). The peak metabolite intensities varied significantly between the two groups (FIGS. 2F-2J). Low plasma concentrations of l-carnitine increase fatigue and failure in cancer patients. However, the natural product cinniulin, mainly from cruciferous vegetables, is elevated in the plasma of CC patients. Sineglin has anticancer therapeutic activity. We speculate that this is due to the dynamics of the gut microbiota affecting metabolism. The relationship of the other three metabolites to cancer remains to be further investigated.
According to the FIGO staging criteria, IIA 1 and IIA 2 demarcate the maximum size based on tumor size. The survival outcomes of IIA 1 and IIA 2 remain controversial. However, surgery is more appropriate for the ≦ IIA 1 patient stage due to the smaller tumor. The differential molecular analysis of patients with IIA 1 and mild phase versus patients with IIA 2 and intense phase may facilitate physician-defined treatment strategies against additional chemotherapy and radiation therapy. The CC patients are divided into two groups of IIA 1 and IIA 2, and the comprehensive analysis is carried out by combining metabonomics and TCGA database transcriptome. GO enrichment of Differentially Expressed Genes (DEGs) showed pathways associated with tumor progression (fig. 4A). Whether using DEGs or differentially expressed metabolites, the enriched pathway revealed metabolic abnormalities, particularly amino acid and lipid metabolism (fig. 4B-4D). DEGs are networked with differentially expressed metabolites around an ornithine and a palmitic acid core group that promote tumor cell proliferation and migration (fig. 4E). We performed correlation analysis on differentially expressed metabolites of clinical indices, and found that tumor volume is correlated with several metabolites, such as TMAO, tumor biomarker CA125, and the like (fig. 4F). This indicates that there is a significant metabolic difference between IIA 1 and IIA 2.
Through further modeling and screening by using LASSO, metabolites of two groups, namely IIA 1 and IIA 2 can be distinguished remarkably, and 87 patients are divided into a training set and a testing set randomly according to the ratio of 7: 3. Finally, trimethylamine oxide (TMAO) is screened out, and the expression abundance of the TMAO has obvious difference between two groups of IIA 1 and IIA 2 (figure 5A). The model constructed by using TMAO shows excellent performance when distinguishing two groups of IIA 1 and IIA 2, the AUC in the training set reaches 0.869, the AUC in the testing set reaches 0.738 (figures 5B-5C), and the cut-off values of the TMAO of the model are used for grouping, so that the result shows that the survival of the two groups of TMAO high-low expression is different, and the prognosis outcome of the TMAO group with low abundance is better (figure 5D). Risk also demonstrates that TMAO is a risk factor for progression of cervical cancer disease (fig. 5E).
The embodiments of the present invention have been described in detail, but they are merely exemplary, and the present invention is not equivalent to the above-described embodiments. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, it is intended that all equivalent alterations and modifications be included within the scope of the invention, without departing from the spirit and scope of the invention.