Method for tracing producing areas of soybean and soybean oil by characterizing triglyceride based on LC-Q-TOF-MS
1. A method for tracing the origin of soybean and soybean oil based on LC-Q-TOF-MS characterization of triglyceride is characterized by comprising the following steps:
s1, preparing a standard sample: respectively squeezing soybean samples in different producing areas of a definite region to obtain soybean oil samples, and gradually diluting the soybean oil samples by a diluting solvent by 100-200 times to obtain standard samples, wherein the standard samples comprise at least two soybean samples in different producing areas;
s2, acquiring mass spectrum data of the standard sample: respectively acquiring mass spectrum data information of the standard samples in different regions by using a liquid chromatogram-quadrupole time-of-flight mass spectrometer to obtain IDA-MS high-resolution mass spectrum data of triglyceride compounds of the standard samples;
s3, determination of a targeting compound: matching the triglyceride compound mass spectrum data in the IDA-MS high-resolution mass spectrum data obtained in the step S2 with standard triglyceride compound data in a lipid compound database to determine the triglyceride compound targeting compound of the standard sample;
s4, establishing a soybean and soybean oil producing area traceability identification model: analyzing the IDA-MS high-resolution mass spectrum data through analysis software to obtain triglyceride compound marker observed peak data of soybean oil in the standard sample, and processing the triglyceride compound marker observed peak data in the standard sample in different areas in one or more ways of a principal component analysis method, a partial least square method-discriminant analysis method and an orthogonal partial least square method regression analysis method to obtain characteristic distribution rules of soybean oil in different producing areas in the standard sample, so as to construct a traceability identification model of soybean and soybean oil producing areas based on lipidomics;
s5, result prediction: squeezing a soybean sample to be detected to obtain a soybean oil sample, diluting the soybean oil sample by a diluent by 100-200 times step by step to obtain the sample to be detected, collecting IDA-MS high-resolution mass spectrum data of the sample to be detected by a liquid chromatogram-quadrupole flight time mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model, and predicting the production area traceability result.
2. The method for tracing origins of soybean and soybean oil according to claim 1, wherein the diluting solvent is a methanol-ethyl acetate mixed solution, and the ratio of methanol to ethyl acetate in the diluting solvent is 1: 1, gradually diluting the soybean oil sample by 200 times by using a methanol-ethyl acetate mixed solution.
3. The method for tracing origins of soybean and soybean oil according to claim 1, wherein the step of collecting mass spectrum data of the standard sample in step S2 is as follows:
placing the diluted standard sample into an injector of the liquid chromatogram-quadrupole time-of-flight mass spectrometer, performing separation analysis on the standard sample through a liquid chromatograph in the liquid chromatogram-quadrupole time-of-flight mass spectrometer, then performing mass spectrum data acquisition on the standard sample through a mass spectrometer in the liquid chromatogram-quadrupole time-of-flight mass spectrometer, respectively obtaining primary mass spectrum information and secondary mass spectrum information of the standard sample through primary TOF-MS scanning and secondary IDA-MS scanning of the mass spectrometer, wherein the secondary mass spectrum information is IDA-MS high-resolution mass spectrum data, determining a triglyceride compound target compound of the standard sample through the IDA-MS high-resolution mass spectrum data, and performing directional quantitative processing analysis on the IDA-MS high-resolution mass spectrum data through analysis software, so as to construct a traceability identification model of soybean and soybean oil producing areas.
4. The method for tracing origins of soybean and soybean oil according to claim 3, wherein the liquid chromatography conditions of the liquid chromatograph in the liquid chromatography-quadrupole time-of-flight mass spectrometer are as follows: the flow rate is 0.5. mu.L/min, the column temperature is 40 ℃, Xbridge BEH C18 chromatographic column gradient elution is carried out, and the sample volume is 2. mu.L; the A phase in the mobile phase is isopropanol, and the B phase in the mobile phase is acetonitrile, wherein the content of the B phase in the mobile phase in different time periods is as follows: 0min, 70% B; 0-5min, 70-65% B; 5-8min, 65% B; 10-10.5min, 65-70% B; 10.5-15min, 70% B;
the mass spectrum condition of the quadrupole flight time of the mass spectrometer in the liquid chromatogram-quadrupole flight time mass spectrometer is as follows: the mass spectrometer adopts a positive ion mode to collect data, and the ion source is as follows: ESI and APCI complex sources; the positive ion scanning mode is as follows: APCI source connection automatic correction system, first grade TOF-MS scans accurate mass range: 100-2000 Da, data acquisition time of 100ms, DP of 100V and CE of 10V, wherein DP is declustering voltage, and CE is collision energy; secondary IDA-MS scan accurate mass range: 50-2000 Da, DP:100V, CE:35 +/-15V; the mass spectrometer adopts a high-sensitivity mode, the data acquisition time is 50ms, the signal threshold is 100cps, data are acquired for 6 times in each circulation, and dynamic background subtraction is adopted.
5. The method for tracing origins of soybean and soybean oil according to claim 1, wherein said step S4 further comprises a blind sample verification step comprising: selecting a plurality of soybean verification samples in the same region, collecting IDA-MS high-resolution mass spectrum data of the soybean verification samples with definite regions through a liquid chromatogram-quadrupole time-of-flight mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model in the step S4, and verifying the production area traceability accuracy of the soybean verification samples.
6. The method for tracing the origins of soybean and soybean oil in accordance with claim 3, wherein the method for determining the target compound of step S3 comprises: according to the molecular weight range of the target object in the IDA-MS high-resolution mass spectrum data, dividing the IDA-MS high-resolution mass spectrum data in the soybean oil sample into 3 regions: a first region with the molecular weight of 800-1000, a second region with the molecular weight of 550-800 and a third region with the molecular weight of less than 550, wherein the first region and the second region are soybean oil and fat metabolism characteristic regions, animal and plant triglyceride compounds with the molecular weight range of 700-950 in a lipid compound database and soybean oil triglyceride compounds in the first region and the second region are matched and screened, and triglyceride compounds in 114 soybean oils with the molecular weight range of 766-920Da are determined as targeting compounds.
7. The method of claim 1, wherein in step S4, the triglyceride compound marker observed peak data in the standard sample is processed by the orthometric partial least squares regression analysis method to construct the lipidomics-based OPLS-DA soybean and soybean oil provenance identification model.
8. The method of claim 1, wherein in step S4, the partial least squares discriminant analysis is performed on the triglyceride compound marker observed peak data in the standard sample to construct a PLS-DA soybean and soybean oil production area traceability system model based on lipidomics.
9. The method for tracing origins of soybean and soybean oil according to claim 6, wherein said step S4 further comprises an optimization step of a model for tracing and identifying the origins of soybean and soybean oil, said optimization step comprising: and determining partial triglyceride compounds with large contribution degree in the target compounds through VIP values of the soybean and soybean oil origin tracing identification model, and deleting abnormal value samples exceeding 99% confidence intervals and deleting all soybean oil samples in the areas with small number in the identification model according to hotelling's and DModx indexes so as to optimize the soybean and soybean oil origin tracing identification model.
10. The method for tracing origins of soybean and soybean oil according to claim 1, wherein in the step S1, during the preparation of the standard samples, the soybean samples are from n different regions, the soybean samples are divided into n x (n-1)/2 groups, each group of soybean samples consists of two soybean samples from different regions, and each group of soybean samples is subjected to the steps S2, S3 and S4 to establish a two-country soybean and soybean oil origin tracing identification model for identifying the corresponding countries or regions of the soybean and the soybean oil in the two-country soybean and soybean oil origin tracing identification model.
Background
The field of food quality safety detection relates to two major quality safety problems, namely the risk problem of toxic and harmful substances and the authenticity problem of products. Regarding the detection problem of harmful substances in food, a large number of standards and detection methods reported in literatures at home and abroad detect toxic and harmful substances in food, and regarding the authenticity problem of the quality of food, the detection method has attracted attention and attention of consumers at home and abroad in recent ten years, and gradually becomes a hot spot and a difficult problem in the field of food quality detection. Currently, the technology for detecting the authenticity of food mainly comprises fingerprint technology such as ultraviolet spectrum and infrared spectrum, atomic spectrum technology such as atomic absorption, emission and fluorescence, isotope mass spectrum technology, high-resolution mass spectrum technology, nuclear magnetic resonance technology, raman spectrum technology, and omics technology which is started in the nineteenth 20 th century, and the food omics in the omics technology comprise omics technology based on omics and epigenomics, transcriptomics, proteomics, metabonomics and lipidomics, and the like, and proteomics, metabonomics and lipidomics are commonly used in the field of food inspection, so that the problems of true and false functional components of food, food nutrient content and food origin tracing can be judged through the omics technology. In the field of origin tracing authenticity identification of food, isotope mass spectrometry technology and omics technology are two relatively reliable identification technologies, but the methods and patents for origin tracing of soybeans by using omics technology are few.
The patent application with the publication number of CN104360004A discloses a method for identifying the authenticity of cubilose by using LC-Q-TOF combined with statistical analysis. Adding a formic acid solution into a sample to be detected, then boiling in a water bath, cooling, and filtering by a filter membrane to obtain a treated sample to be detected; collecting mass spectrum information of the treated sample to be detected by using a liquid chromatogram-quadrupole time-of-flight mass spectrometer, and extracting a characteristic compound; and (4) calling the obtained characteristic compound information of the sample to be detected into a cubilose authenticity identification model for prediction. And judging to be a genuine bird's nest when the accuracy is 80% or more, or else, judging to be a counterfeit bird's nest. The invention also establishes a cubilose authenticity identification model, and only one authenticity identification model is needed during identification, so that the detection is simpler and easier to operate. The method is used for identifying the cubilose product, but because the cubilose and the soybean have different components, the method cannot extract the target compound of the soybean and further completes the tracking of the soybean production area.
Disclosure of Invention
The invention aims to provide a method for tracing the producing area of soybean and soybean oil based on LC-Q-TOF-MS characterization triglyceride, which aims to solve the technical problems, identifies the producing area source of the soybean and soybean oil samples according to a lipidomics technical method, acquires triglyceride mass spectrum data of the soybean standard samples with definite areas and different producing areas through a liquid chromatogram-quadrupole time-of-flight mass spectrometer (LC-Q-TOF-MS), obtains MarkerView Peaks data of the triglyceride of the soybean standard samples through MaserView software analysis, finally processes the triglyceride compound MarkerView Peaks data in the soybean oil by a multivariate statistical analysis method such as main component analysis, partial least square method-discriminant analysis and orthogonal partial least square method regression analysis, obtains characteristic distribution rules of the soybean oil in different producing areas, and establishing multi-country and two-country soybean and soybean oil origin tracing identification models, and then tracing the origin of the soybean sample to be detected by using the soybean and soybean oil origin tracing identification models, so that the accuracy of tracing the origin of the soybean and the soybean oil is improved.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a method for tracing the origin of soybean and soybean oil based on LC-Q-TOF-MS characterization of triglyceride comprises the following steps:
s1, preparing a standard sample: respectively squeezing soybean samples in different producing areas of a definite region to obtain soybean oil samples, and gradually diluting the soybean oil samples by a diluting solvent by 100-200 times to obtain standard samples, wherein the standard samples comprise at least two soybean samples in different producing areas;
s2, acquiring mass spectrum data of the standard sample: respectively acquiring mass spectrum data information of the standard samples in different regions by using a liquid chromatogram-quadrupole time-of-flight mass spectrometer to obtain IDA-MS high-resolution mass spectrum data of triglyceride compounds of the standard samples;
s3, determination of a targeting compound: matching the triglyceride compound mass spectrum data in the IDA-MS high-resolution mass spectrum data obtained in the step S2 with standard triglyceride compound data in a lipid compound database to determine the triglyceride compound targeting compound of the standard sample;
s4, establishing a soybean and soybean oil producing area traceability identification model: analyzing the IDA-MS high-resolution mass spectrum data through analysis software to obtain triglyceride compound marker observed peak data of soybean oil in the standard sample, and processing the triglyceride compound marker observed peak data in the standard sample in different areas in one or more ways of a principal component analysis method, a partial least square method-discriminant analysis method and an orthogonal partial least square method regression analysis method to obtain characteristic distribution rules of soybean oil in different producing areas in the standard sample, so as to construct a traceability identification model of soybean and soybean oil producing areas based on lipidomics; the marker observation peak data is MarkerView peaks data, the MarkerView peaks data of triglyceride of a standard sample is obtained through analysis of MasterView analysis software, and the soybean origin tracing identification model judges and predicts the production areas of soybeans and soybean oil in different areas through multivariate statistical analysis;
s5, result prediction: squeezing a soybean sample to be detected to obtain a soybean oil sample, diluting the soybean oil sample by a diluent by 100-200 times step by step to obtain the sample to be detected, collecting IDA-MS high-resolution mass spectrum data of the sample to be detected by a liquid chromatogram-quadrupole flight time mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model, and predicting the production area traceability result.
Further preferably, in step S3, the lipid compound database is the lipid compound database LIPID MAPS Lipidomics Gateway published in the united states, the standard triglyceride compound data is 6899 triglyceride compounds of the tiranylglycerols in the lipid compound database LIPID MAPS Lipidomics Gateway, and 116 triglyceride compounds common to soybean oil samples are determined as the triglyceride compound targeting compounds of the standard sample by the PeakView software qualitative analysis.
Preferably, the dilution solvent is a methanol-ethyl acetate mixed solution, and the ratio of methanol to ethyl acetate in the dilution solvent is 1: 1, gradually diluting the soybean oil sample by 200 times by using a methanol-ethyl acetate mixed solution.
Preferably, the step of acquiring mass spectrum data of the standard sample in step S2 is as follows:
placing the diluted standard sample into an injector of the liquid chromatogram-quadrupole time-of-flight mass spectrometer, performing separation analysis on the standard sample through a liquid chromatograph in the liquid chromatogram-quadrupole time-of-flight mass spectrometer, then performing mass spectrum data acquisition on the standard sample through a mass spectrometer in the liquid chromatogram-quadrupole time-of-flight mass spectrometer, respectively obtaining primary mass spectrum information and secondary mass spectrum information of the standard sample through primary TOF-MS scanning and secondary IDA-MS scanning of the mass spectrometer, wherein the secondary mass spectrum information is IDA-MS high-resolution mass spectrum data, determining a triglyceride compound target compound of the standard sample through the IDA-MS high-resolution mass spectrum data, and performing directional quantitative processing analysis on the IDA-MS high-resolution mass spectrum data through analysis software, so as to construct a traceability identification model of soybean and soybean oil producing areas.
Preferably, the liquid chromatography conditions of the liquid chromatograph in the liquid chromatography-quadrupole time-of-flight mass spectrometer are as follows: the flow rate is 0.5. mu.L/min, the column temperature is 40 ℃, Xbridge BEH C18 chromatographic column gradient elution is carried out, and the sample volume is 2. mu.L; the A phase in the mobile phase is isopropanol, and the B phase in the mobile phase is acetonitrile, wherein the content of the B phase in the mobile phase in different time periods is as follows: 0min, 70% B; 0-5min, 70-65% B; 5-8min, 65% B; 10-10.5min, 65-70% B; 10.5-15min, 70% B.
The mass spectrum condition of the quadrupole flight time of the mass spectrometer in the liquid chromatogram-quadrupole flight time mass spectrometer is as follows: the mass spectrometer adopts a positive ion mode to collect data, and the ion source is as follows: ESI and APCI complex sources; the positive ion scanning mode is as follows: APCI source connection automatic correction system, first grade TOF-MS scans accurate mass range: 100-2000 Da, data acquisition time of 100ms, DP of 100V and CE of 10V, wherein DP is declustering voltage, and CE is collision energy; secondary IDA-MS scan accurate mass range: 50-2000 Da, DP:100V, CE:35 +/-15V; the mass spectrometer adopts a high-sensitivity mode, the data acquisition time is 50ms, the signal threshold is 100cps, data are acquired for 6 times in each circulation, and dynamic background subtraction is adopted.
Preferably, the liquid chromatography-quadrupole time-of-flight mass spectrometer is an shimadzu LC20AD liquid chromatograph, the mass spectrometer is a Triple TOF5600+ mass spectrometer, the automatic correction system is a CDS system, and the conditions of the quadrupole time-of-flight mass spectrometer further include: the calibration is carried out for 1 time per 10 samples, the flow rate of APCI positive ion calibration solution is 0.3mL/min, and the pressure of an air curtain is as follows: 40psi, ion source atomization gas pressure: at 50psi, ion source assist gas pressure: 50psi, ion source temperature: the method comprises the following steps of collecting all IDA-MS high-resolution mass spectrum data collected by a mass spectrometer at 500 ℃, collecting the IDA-MS high-resolution mass spectrum data by analysis TF 1.6 software of ABSciex company, qualitatively and quantitatively processing and analyzing the IDA-MS high-resolution mass spectrum data by PeakView and MasterView software, introducing the IDA-MS high-resolution mass spectrum data into SIMCA 14.0 software (Umetrics company of Switzerland), and performing principal component analysis, partial least square method discriminant analysis and orthogonal partial least square method discriminant analysis to obtain the distribution rules of triglyceride and metabolite of soybeans in different producing areas, thereby constructing a soybean and soybean oil producing area traceability identification model based on lipidomics.
Preferably, the step S4 further includes a blind sample verification step, where the blind sample verification step includes: selecting a plurality of soybean verification samples in the same region, collecting IDA-MS high-resolution mass spectrum data of the soybean verification samples with definite regions through a liquid chromatogram-quadrupole time-of-flight mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model in the step S4, and verifying the production area traceability accuracy of the soybean verification samples.
Preferably, the method for determining the target compound of step S3 includes: according to the molecular weight range of the target object in the IDA-MS high-resolution mass spectrum data, dividing the IDA-MS high-resolution mass spectrum data in the soybean oil sample into 3 regions: a first region with the molecular weight of 800-1000, a second region with the molecular weight of 550-800 and a third region with the molecular weight of less than 550, wherein the first region and the second region are soybean oil and fat metabolism characteristic regions, animal and plant triglyceride compounds with the molecular weight range of 700-950 in a lipid compound database and soybean oil triglyceride compounds in the first region and the second region are matched and screened, and triglyceride compounds in 114 soybean oils with the molecular weight range of 766-920Da are determined as targeting compounds.
Preferably, in step S4, the triglyceride compound marker observed peak data in the standard sample is subjected to an orthogonal partial least squares regression analysis to construct a lipidomics-based OPLS-DA soybean and soybean oil location-based identification model.
Preferably, in step S4, partial least squares discriminant analysis is performed on the triglyceride compound marker observed peak data in the standard sample, so as to construct a PLS-DA soybean and soybean oil production area traceability identification model based on lipidomics.
Preferably, the step S4 further includes an optimization step of the traceability certification model of soybean and soybean oil producing area, the optimization step includes: and determining partial triglyceride compounds with large contribution degree in the target compounds through VIP values of the soybean and soybean oil origin tracing identification model, and deleting abnormal value samples exceeding 99% confidence intervals and deleting all soybean oil samples in the areas with small number in the identification model according to hotelling's and DModx indexes so as to optimize the soybean and soybean oil origin tracing identification model.
Preferably, in the preparation of the standard sample in step S1, the soybean sample originates from at least three different countries or regions, so that the step S4 establishes a multi-country soybean and soybean oil production area traceability identification model.
More preferably, in the standard sample preparation process in step S1, the soybean sample is obtained from five countries including baci, russia, usa, canada, and argentina.
Preferably, in the process of preparing the standard sample in step S1, the soybean samples are from n different regions, the soybean samples are divided into n × (n-1)/2 groups, each group of soybean samples consists of two soybean samples from different producing areas, and each group of soybean samples is subjected to establishing a two-country soybean and soybean oil producing area traceability identification model according to steps S2, S3 and S4, so as to identify the countries or regions of the corresponding soybeans and soybean oil in the two-country soybean and soybean oil producing area traceability identification model.
Has the advantages that:
according to the method for tracing the origins of the soybeans and the soybean oil based on the LC-Q-TOF-MS representation of the triglyceride, IDA-MS high-resolution mass spectrum data of the soybeans are acquired by adopting a liquid chromatogram-quadrupole flight time mass spectrometer, MarkerView Peaks data of triglyceride compounds are obtained through analysis software, the MarkerView Peaks data of the triglyceride compounds in a soybean oil sample are subjected to multivariate statistical analysis methods such as principal component analysis, partial least square method-discriminant analysis and orthogonal partial least square method regression analysis, the characteristic distribution laws of the soybeans in different producing areas are obtained, a multivariate statistical analysis discriminant prediction model is constructed, the accuracy of tracing the origins of the soybeans and the soybean oil is further improved by combining the tracing identification models of the producing areas of the soybeans and the soybean oil in multiple countries and two countries, and the tracing identification models of the producing areas of the soybeans and the soybean oil in two countries can be independently established to improve the accuracy of tracing the identifying the producing areas of the soybeans and the soybean oil in the two countries And (4) tracing the origin of the soybean.
Drawings
FIG. 1 shows the IDA-MS high resolution mass spectrum of the soybean oil sample measured by the present invention;
FIG. 2 shows a multi-national OPLS-DA soybean and soybean oil origin tracing and identification model before optimization according to the present invention;
FIG. 3 shows a PLS-DA soybean and soybean oil source tracing identification model of the present invention;
FIG. 4 is a graph showing the VIP value distribution of triglyceride compounds in the tracing identification model of soybean and soybean oil production area for OPLS-DA of the present invention;
FIG. 5 shows the optimized multi-national OPLS-DA soybean and soybean oil production area traceability identification model of the present invention;
FIG. 6 shows a Brazil-American two kingdoms OPLS-DA soybean and soybean oil origin tracing and identification model of the present invention;
FIG. 7 is a graph showing the Russian-United states multinational OPLS-DA soybean and soybean oil location-tracking identification model of the present invention;
FIG. 8 is a graph showing a Canada-United states multinational OPLS-DA soybean and soybean oil source tracing identification model of the present invention;
FIG. 9 is a graph showing an Argentina-American multinational OPLS-DA soybean and soybean oil provenance identification model of the present invention;
FIG. 10 shows a PCA identification model and an OPLS-DA identification model of triglyceride compounds of a soybean oil sample constructed based on three solvents;
FIG. 11 shows a model for location-based identification of lipidomics PCA-class of soybean oil triglycerides by three solvent dilution methods.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
The technical solution of the present invention is described in detail with specific examples below.
The experimental device comprises: selecting a Triple TOF5600+ high resolution mass spectrometer of ABsciex corporation of America; HPLC 20AD high performance liquid chromatograph of Shimadzu corporation, Japan; an Xbridge BEH C18 column (100 mm. times.2.1 mm,3 μm) from Waters, USA; VORTEX 4 VORTEX mixer from IKA, germany; and an oil press;
selecting a reagent: methanol, acetonitrile, acetone, ethyl acetate (chromatographically pure, semer feishel, usa);
sources of soybean and soybean oil samples: the soybean standard sample is obtained from 807 soybean samples collected by the related directly-affiliated customs and foreign purchase of the whole country, wherein the quantity of the soybean samples of each production place is as follows: wherein 152 of the us samples, 423 of the brazil samples, 96 of the canada samples, 68 of the argentina samples, 14 of the yerba mate samples, 25 of the russian samples, and 29 of the chinese samples; soybean oil samples 334: of these, 36 us samples, 226 brazil samples, 7 ukraine samples, 46 argentina samples, 1 mexico sample, and 16 russian samples.
The method for tracing the producing areas of soybean and soybean oil by characterizing triglyceride based on LC-Q-TOF-MS comprises the following steps:
s1, preparing a standard sample: respectively squeezing soybean samples in different producing areas of a definite region to obtain soybean oil samples, and gradually diluting the soybean oil samples by a diluting solvent by 100-200 times to obtain standard samples, wherein the standard samples comprise at least two soybean samples in different producing areas;
s2, acquiring mass spectrum data of the standard sample: respectively acquiring mass spectrum data information of the standard samples in different regions by using a liquid chromatogram-quadrupole time-of-flight mass spectrometer to obtain IDA-MS high-resolution mass spectrum data of triglyceride compounds of the standard samples;
s3, determination of a targeting compound: matching the triglyceride compound mass spectrum data in the IDA-MS high-resolution mass spectrum data obtained in the step S2 with standard triglyceride compound data in a lipid compound database to determine the triglyceride compound targeting compound of the standard sample;
s4, establishing a soybean and soybean oil producing area traceability identification model: analyzing the IDA-MS high-resolution mass spectrum data through analysis software to obtain triglyceride compound marker observed peak data of soybean oil in the standard sample, and processing the triglyceride compound marker observed peak data in the standard sample in different areas in one or more ways of a principal component analysis method, a partial least square method-discriminant analysis method and an orthogonal partial least square method regression analysis method to obtain characteristic distribution rules of soybean oil in different producing areas in the standard sample, so as to construct a traceability identification model of soybean and soybean oil producing areas based on lipidomics; the marker observation peak data is MarkerView peaks data, the MarkerView peaks data of triglyceride of a standard sample is obtained through analysis of MasterView analysis software, and the soybean origin tracing identification model judges and predicts the production areas of soybeans and soybean oil in different areas through multivariate statistical analysis;
s5, result prediction: squeezing a soybean sample to be detected to obtain a soybean oil sample, diluting the soybean oil sample by a diluent by 100-200 times step by step to obtain the sample to be detected, collecting IDA-MS high-resolution mass spectrum data of the sample to be detected by a liquid chromatogram-quadrupole flight time mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model, and predicting the production area traceability result.
The step S4 further includes a blind sample verification step, which includes: selecting a plurality of soybean verification samples in the same region, collecting IDA-MS high-resolution mass spectrum data of the soybean verification samples with definite regions through a liquid chromatogram-quadrupole time-of-flight mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model in the step S4, and verifying the production area traceability accuracy of the soybean verification samples. According to the blind sample verification result, a multi-country soybean and soybean oil production place traceability identification model, a two-country soybean and soybean oil production place traceability identification model or a traceability identification model identified by the two types of soybean and soybean oil production place traceability identification models are selected and constructed aiming at soybean samples of different production places, and the two identification models are constructed simultaneously, so that the two model identification results can be integrated, and the identification accuracy is ensured.
Multi-country OPLS-DA soybean and soybean oil producing area tracing identification model example 1
Physically squeezing soybean samples in different producing areas with definite regions by an oil press to obtain soybean oil samples, selecting a methanol-ethyl acetate mixed solution as a diluting solvent, wherein the ratio of methanol to ethyl acetate in the diluting solvent is 1: 1, gradually diluting the soybean oil sample by 200 times by using a methanol-ethyl acetate mixed solution.
Placing the diluted standard sample into a sample injector of the liquid chromatogram-quadrupole time-of-flight mass spectrometer, performing separation analysis on the standard sample through a liquid chromatograph in the liquid chromatogram-quadrupole time-of-flight mass spectrometer, then performing mass spectrum data acquisition on the standard sample through a mass spectrometer in the liquid chromatogram-quadrupole time-of-flight mass spectrometer, respectively obtaining primary mass spectrum information and secondary mass spectrum information of the standard sample through primary TOF-MS scanning and secondary IDA-MS scanning of the mass spectrometer, wherein the secondary mass spectrum information is IDA-MS high-resolution mass spectrum data, an IDA-MS high-resolution mass spectrum of a soybean oil sample is shown in figure 1, and determining the triglyceride compound target compound of the standard sample through the IDA-MS high-resolution mass spectrum data, the method of identifying a targeting compound comprises: according to the molecular weight range of the target object in the IDA-MS high-resolution mass spectrum data, dividing the IDA-MS high-resolution mass spectrum data in the soybean oil sample into 3 regions: the method comprises a first region with the molecular weight of 800-1000, a second region with the molecular weight of 550-800 and a third region with the molecular weight of less than 550, wherein the first region and the second region are soybean oil lipid metabolism characteristic regions, animal and plant derived triglyceride compounds with the molecular weight range of 700-950 in a lipid compound database and soybean oil triglyceride compounds in the first region and the second region are matched and screened, and 114 soybean oil triglyceride compounds with the molecular weight range of 766-920Da are determined to be used as lipidomic analysis target compounds, and the method is applied to identification model research of the origin of the lipidosome origin in the soybean oil.
The lipid compound database is a lipid compound database LIPID MAPS Lipidomics Gateway published in the United states, the standard triglyceride compound data is 6899 triglyceride compounds of Triradylglycerides in the lipid compound database LIPID MAPS Lipidomics Gateway, and 116 common triglyceride compounds in a soybean oil sample are determined to be the triglyceride compound targeting compounds of the standard sample through qualitative analysis of PeakView software.
The liquid chromatogram condition of the liquid chromatograph in the liquid chromatogram-quadrupole time-of-flight mass spectrometer is as follows: the flow rate is 0.5. mu.L/min, the column temperature is 40 ℃, Xbridge BEH C18 chromatographic column gradient elution is carried out, and the sample volume is 2. mu.L; the A phase in the mobile phase is isopropanol, and the B phase in the mobile phase is acetonitrile, wherein the content of the B phase in the mobile phase in different time periods is as follows: 0min, 70% B; 0-5min, 70-65% B; 5-8min, 65% B; 10-10.5min, 65-70% B; 10.5-15min, 70% B.
The mass spectrum condition of the quadrupole flight time of the mass spectrometer in the liquid chromatogram-quadrupole flight time mass spectrometer is as follows: the mass spectrometer adopts a positive ion mode to collect data, and the ion source is as follows: ESI and APCI complex sources; the positive ion scanning mode is as follows: APCI source connection automatic correction system, first grade TOF-MS scans accurate mass range: 100-2000 Da, data acquisition time of 100ms, DP of 100V and CE of 10V, wherein DP is declustering voltage, and CE is collision energy; secondary IDA-MS scan accurate mass range: 50-2000 Da, DP:100V, CE:35 +/-15V; the mass spectrometer adopts a high-sensitivity mode, the data acquisition time is 50ms, the signal threshold is 100cps, data are acquired for 6 times in each circulation, and dynamic background subtraction is adopted.
The liquid chromatogram-quadrupole time-of-flight mass spectrometer is an Shimadzu LC20AD liquid chromatograph, the mass spectrometer is a Triple TOF5600+ mass spectrometer, the automatic correction system is a CDS system, and the conditions of the quadrupole time-of-flight mass spectrometer further include: the calibration is carried out for 1 time per 10 samples, the flow rate of APCI positive ion calibration solution is 0.3mL/min, and the pressure of an air curtain is as follows: 40psi, ion source atomization gas pressure: at 50psi, ion source assist gas pressure: 50psi, ion source temperature: the method comprises the steps of collecting all IDA-MS high-resolution mass spectrum data collected by a mass spectrometer at 500 ℃, collecting the IDA-MS high-resolution mass spectrum data on analysis TF 1.6 software of ABSciex company, qualitatively and quantitatively processing and analyzing the IDA-MS high-resolution mass spectrum data on PeakView and MasterView software, introducing the IDA-MS high-resolution mass spectrum data into SIMCA 14.0 software (Umetrics company, Switzerland), and carrying out orthogonal partial least squares regression analysis on triglyceride compound marker observation peak data in a standard sample to obtain distribution rules of triglyceride and metabolite of soybeans in different producing areas, thereby constructing an OPLS-DA soybean and soybean oil producing area traceability identification model based on lipidomics.
In the embodiment, Brazil soybean samples, American soybean samples, Chinese soybean samples, Argentina soybean samples, Canadian soybean samples, Urcaray soybean samples and Russian soybean samples with specific regions are selected to prepare standard samples, IDA-MS high-resolution mass spectrum data of soybeans in different regions in the standard samples are respectively collected through a liquid chromatogram-quadrupole flight time mass spectrometer, triglyceride compound MarkerView peaks data of the soybean samples in different regions are obtained after qualitative and quantitative processing and analysis are carried out on PeakView and MasterView software, and triglyceride compound marker observation peak data in the standard samples in different regions are processed through an orthonormal bias regression analysis method, so that a least square OPLS-DA soybean and soybean oil producing area traceability identification model is constructed.
As can be seen from fig. 2, in the multi-country OPLS-DA soybean and soybean oil origin tracing and identifying model, soybean samples of brazil, russia, argentina and usa origins can be significantly distinguished, and since the samples of usa, canada and argentina origin are distributed together in a crossing way, which affects the accuracy of the model for prediction and identification, further optimization of the soybean and soybean oil origin tracing and identifying model is required, wherein the optimization steps include: and determining partial triglyceride compounds with large contribution degree in the target compounds through VIP values of the soybean and soybean oil origin tracing identification model, and deleting abnormal value samples exceeding 99% confidence intervals and deleting all soybean oil samples in the areas with small number in the identification model according to hotelling's and DModx indexes so as to optimize the soybean and soybean oil origin tracing identification model. As shown in FIG. 4, the VIP values of the identification models are used for determining which variables have large contribution degrees, and finally triglyceride compounds with molecular weights of 873.6967, 875.7123, 851.7123, 877.7280, 853.7280, 865.7280, 913.7280, 829.7280, 915.7436, 767.6184, 835.6810, 879.7436, 917.7593, 859.7749, 855.7436, 895.779, 825.6967, 921.7906, 887.8062 and 869.7593 are determined to have large contribution degrees on the traceability identification models of the production places of the multi-country OPLS-DA soybean and soybean oil, and the traceability identification models of the production places of the OPLS-DA soybean and the soybean oil are optimized by researching according to hotelling's and DModx indexes and deleting a small number of Chinese and Uray samples. As shown in fig. 5, the optimized multi-country OPLS-DA soybean and soybean oil source tracing and identifying model can significantly distinguish samples of different countries, especially samples of brazil, russia, usa and other origins;
after the multi-country OPLS-DA soybean and soybean oil production area traceability identification model is established, blind sample verification needs to be carried out on the model, and the blind sample verification steps comprise: selecting a plurality of soybean verification samples in the same region, collecting IDA-MS high-resolution mass spectrum data of the soybean verification samples with definite regions through a liquid chromatogram-quadrupole time-of-flight mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into a multi-country soybean and soybean oil production area traceability identification model, and verifying the production area traceability accuracy of the soybean verification samples.
After blind sample verification is completed, squeezing a soybean sample to be detected to obtain a soybean oil sample to be detected, diluting the soybean oil sample by a diluent by 100-200 times step by step to obtain a sample to be detected, collecting IDA-MS high-resolution mass spectrum data of the sample to be detected by a liquid chromatogram-quadrupole flight time mass spectrometer, introducing the IDA-MS high-resolution mass spectrum data into the soybean and soybean oil production area traceability identification model, and predicting the production area traceability result.
Multi-national PLS-DA soybean and soybean oil producing area tracing and identifying model example 2
This example describes only the differences from the above examples, and in this example, partial least squares discriminant analysis was performed on the triglyceride compound marker observed peak data in the standard sample to construct a lipidomics-based PLS-DA soybean and soybean oil provenance identification model. As shown in fig. 3, PLS-DA soybean and soybean oil production location-tracing authentication models can significantly distinguish brazilian and non-brazilian soybean samples, especially soybean samples of U.S. origin, distributed over the entire non-brazilian soybean area and clearly distinguished from brazilian soybean areas.
Two-country soybean and soybean oil producing area tracing and identifying model example 1
In this embodiment, in the process of preparing the standard sample in step S1, assuming that the soybean samples originate from n different regions, the soybean samples are divided into n × (n-1)/2 groups, each group of the soybean samples consists of two soybean samples with different origins, and each group of the soybean samples is used to establish a two-country soybean and soybean oil origin traceability identification model according to steps S2, S3 and S4, so as to identify the countries or regions of the corresponding soybeans and soybean oil in the two-country soybean and soybean oil origin traceability identification model.
In the embodiment, 104 Brazilian soybean samples and 106 American soybean samples with clear areas are selected to prepare standard samples, IDA-MS high-resolution mass spectrum data of the soybean oil samples in the two different areas are respectively collected by a liquid chromatography-quadrupole time-of-flight mass spectrometer, and triglyceride compound MarkerView peaks data of the soybean samples in different producing areas in the standard samples are obtained after qualitative and quantitative processing and analysis on PeakView and MasterView software, and triglyceride compound marker observation peak data in the standard samples in the two areas are introduced into SIMCA 14.1 software to be processed by an orthonormal partial least squares regression analysis method, so that a traceability identification model of the producing areas of the Brazilian-American two-country OPLS-DA soybean and the soybean oil is established. As can be seen from fig. 6, in the brazil-us two-country OPLS-DA soybean and soybean oil origin tracing and identifying model, the us and brazil soybean oil samples can be significantly distinguished, in order to further verify the determination accuracy of the two-country pressed soybean oil origin tracing and identifying model, 24 us pressed soybean oil samples and 40 brazil pressed soybean oil samples are selected for model blind sample verification, and the verification result shows that the determination accuracy of the soybean oil sample from the pasteur source is 83.7%, and the determination accuracy of the sample from the us source is 82.9%.
Two-country soybean and soybean oil producing area tracing and identifying model example 2
In the embodiment, a russian soybean sample and a american soybean sample with specific regions are selected to prepare standard samples, the soybean oil samples in the two different regions are respectively subjected to liquid chromatography-quadrupole time-of-flight mass spectrometer to acquire IDA-MS high-resolution mass spectrum data, and triglyceride compound Mark View peaks data of the soybean samples in different regions are acquired after qualitative and quantitative processing and analysis on PeakView and MasterView software, and the triglyceride compound marker observed peak data in the standard samples in the two regions are introduced into SIMCA 14.1 software to be subjected to orthonormal partial least squares regression analysis, so that a russian-American multinational OPLS-DA soybean and soybean oil source tracing identification model is established. As can be seen from fig. 7, in the russian-american two-country OPLS-DA soybean and soybean oil origin tracing and identification model, american and russian soybean oil samples can be significantly distinguished, in order to further verify the determination accuracy of the two-country pressed soybean oil origin tracing and identification model, 24 american pressed soybean oil samples and 5 russian pressed soybean oil samples are selected for model blind sample verification, and the verification results show that the determination accuracy of the brazil-derived soybean oil sample and the russian-derived sample is 100%.
Two-country soybean and soybean oil producing area tracing and identifying model example 3
In the embodiment, a canadian soybean sample and a U.S. soybean sample with definite regions are selected to prepare standard samples, the soybean oil samples of the two different regions are respectively subjected to liquid chromatography-quadrupole time-of-flight mass spectrometer to acquire IDA-MS high-resolution mass spectrum data, and triglyceride compound Mark View peaks data of the soybean samples of different producing areas in the standard samples are acquired after qualitative and quantitative processing analysis on PeakView and MasterView software, and the triglyceride compound marker observed peak data in the standard samples of the two regions are introduced into SIMCA 14.1 software to be subjected to orthonormal partial least squares regression analysis, so that a Canada-United states multi-country OPLS-DA soybean and soybean oil producing area traceability identification model is established. It can be seen from fig. 8 that soybean oil samples in the united states and canada can be significantly distinguished in the canadian-united states OPLS-DA traceability identification model of soybean and soybean oil production places, in order to further verify the determination accuracy of the two-country pressed soybean oil production place traceability identification model, 24 united states pressed soybean oil samples and 15 canadian pressed soybean oil samples are selected for model blind sample verification, and the verification result shows that 17 united states pressed soybean oil samples are accurately determined, and 8 canadian pressed soybean oil samples are accurately determined.
Two-country soybean and soybean oil producing area tracing and identifying model example 4
In the embodiment, specific Argentine soybean samples and American soybean samples are selected to prepare standard samples, IDA-MS high-resolution mass spectrum data of the soybean oil samples in two different areas are respectively collected by a liquid chromatography-quadrupole time-of-flight mass spectrometer, triglyceride compound MarkerView peaks data of the soybean samples in different production places in the standard samples are obtained after qualitative and quantitative processing and analysis on PeakView and MasterView software, and the triglyceride compound marker observed peak data in the standard samples in the two areas are introduced into SIMCA 14.1 software to be processed by an orthonormal partial least squares regression analysis method, so that an Argentine-American multinational OPLS-DA soybean and soybean oil production place traceability identification model is established. As can be seen in fig. 9, the argentina press soybean oil samples were more concentrated in the score plots in the argentina-us two-country OPLS-DA soybean and soybean oil source tracing identification model, while the american press soybean oil samples were more dispersed in the distribution area in the model score plot and some of the american press soybean oil was distributed over the range of the argentina distribution area. In order to further verify the judgment accuracy of the united states and argentina squeezing soybean oil production area traceability identification model, 15 argentina squeezing soybean oil samples and 24 united states squeezing soybean oil samples are selected for model blind sample verification. Research results show that 15 Argentina pressed soybean oil samples are accurately judged to be 8, and 24 American pressed soybean oil samples are accurately judged to be 16.
Comparative example 1
In the comparative example, in order to verify the influence of different dilution solvents on the triglyceride lipidomics origin tracing-based identification model clustering effect of a soybean oil sample, the soybean oil sample is obtained by squeezing the soybean sample with a definite origin respectively, acetone is used as a dilution solvent to dilute the soybean oil sample, then IDA-MS high-resolution mass spectrum data of triglyceride compounds in the soybean oil sample are collected by a liquid chromatogram-quadrupole time-of-flight mass spectrometer LC-Q-TOF-MS, and the triglyceride high-resolution mass spectrum data are processed by two multivariate statistical analysis models of principal component analysis PCA and orthogonal partial least squares regression analysis.
Comparative example 2
In the comparative example, soybean samples with definite production places are selected and respectively squeezed to obtain soybean oil samples, isopropyl triglyceride is used as a diluting solvent to dilute the soybean oil samples, then IDA-MS high-resolution mass spectrum data of triglyceride compounds in the soybean oil samples are collected by a liquid chromatogram-quadrupole time-of-flight mass spectrometer LC-Q-TOF-MS, and the triglyceride high-resolution mass spectrum data are processed by two multivariate statistical analysis models of principal component analysis PCA and orthogonal partial least squares regression analysis OPLS-DA.
Comparative analysis of multi-national OPLS-DA soybean and soybean oil source tracing identification model example 1, comparative example 1 and comparative example 2 the effect of different dilution solvents on the source tracing identification of soybean oil triglycerides was analyzed. As shown in fig. 10, it can be seen from the PCA model and the OPLS-DA model constructed after diluting a soybean oil sample with acetone (acetone), methanol-ethyl acetate mixed solvent (EA-MEOH) and Isopropanol (isoproapanol) solvents, that the sample aggregation effect is synchronized when the soybean oil sample is diluted with acetone, methanol and ethyl acetate solvents, and the aggregation effect of the Isopropanol-diluted soybean oil sample is significantly different from that of the other two solvent dilution methods. Particularly, the analysis result of an OPLS-DA model shows that the aggregation area of the isopropanol diluted soybean oil sample of the same sample is obviously distinguished from the methanol and ethyl acetate mixed solution and the isopropanol diluted soybean oil sample. In order to further examine the influence of different solvents on the lipidomics analysis of triglyceride in soybean oil, a PCA-class model is constructed as shown in FIG. 11, and the problem of the source tracing and identification result of soybean oil samples diluted by three solvents is judged.
As can be seen from the PCA-class origin tracing and identification model in fig. 11, the isopropanol diluted soybean oil origin tracing and identification model can only distinguish the difference between the pressed soybean oil and the finished soybean oil, and cannot significantly identify soybean oil samples from different national sources, the methanol and ethyl acetate mixed solution diluted soybean oil sample can identify the sample sources of the pressed soybean oil and the finished soybean oil without difference, the acetone diluted soybean oil sample can also identify the origin of the soybean oil sample, but the sample aggregation has a high dispersity, and the acetone has a large influence on the liquid chromatographic column, and is not suitable for being used as a diluent of the soybean oil sample, so that the methanol and ethyl acetate mixed solution is selected as the diluent of the soybean oil sample to be significantly better than the other two diluent solvents.
The embodiment of the method for tracing the producing areas of the soybeans and the soybean oil by characterizing the triglyceride based on LC-Q-TOF-MS provided by the invention is explained in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the core concepts of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.