Molecular marker located on soybean chromosome 1 and related to high oil content and application thereof
1. A molecular marker related to high oil content on a soybean chromosome 1, wherein the nucleotide sequence of the molecular marker is SNP1, the sequence of the SNP1 is the nucleotide sequence of 39.67Mb-41.16Mb on the soybean chromosome 1, and the 40386604 nucleotide site of the Gm01 chromosome is T or C.
2. The molecular marker of claim 1, wherein the nucleotide sequence of the upstream primer for amplifying the SNP1 of claim 1 is shown as SEQ ID No.4 or SEQ ID No.5, and the nucleotide sequence of the downstream primer for amplifying the SNP1 of claim 1 is shown as SEQ ID No. 6.
3. The molecular marker of claim 1, wherein the 40780703 nucleotide position of the sequence of the SNP1 is C or G.
4. The molecular marker of claim 3, wherein the nucleotide sequence of the upstream primer for amplifying the SNP1 of claim 3 is shown as SEQ ID NO.7 or SEQ ID NO.8, and the nucleotide sequence of the downstream primer for amplifying the SNP1 of claim 1 is shown as SEQ ID NO. 9.
5. The molecular marker of claim 1, wherein the 41034358 nucleotide position of the sequence of the SNP1 is A or G.
6. The molecular marker of claim 5, wherein the nucleotide sequence of the upstream primer for amplifying the SNP1 of claim 5 is shown as SEQ ID NO.16 or SEQ ID NO.17, and the nucleotide sequence of the upstream primer for amplifying the SNP1 of claim 1 is shown as SEQ ID NO. 18.
7. Use of the molecular marker of any one of claims 1 to 6 for the preparation of a kit for identifying soybean with high oil content, wherein the molecular marker of any one of claims 1 to 6 is amplified using any one of the primer sets (a) to (c):
(a) SEQ ID NO.4 or SEQ ID NO.5 and SEQ ID NO. 6;
(b) SEQ ID NO.7 or SEQ ID NO.8 and SEQ ID NO. 9;
(c) SEQ ID NO.16 or SEQ ID NO.17 and SEQ ID NO. 18.
8. The method for identifying the soybean with high oil content is characterized by comprising the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.4 or SEQ ID NO.5 and SEQ ID NO.6, wherein the soybean to be detected is a soybean with high oil content when the soybean to be detected is CC genotype, and the soybean to be detected is a soybean with low protein content when the soybean is TT genotype.
9. The method for identifying the soybean with high oil content is characterized by comprising the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.7 or SEQ ID NO.8 and SEQ ID NO.9, wherein if the soybean of the variety to be detected is detected to be CC genotype, the soybean of the variety to be detected is soybean with high oil content, and if the soybean of the variety to be detected is detected to be GG genotype, the soybean of the variety to be detected is soybean with low oil content.
10. The method for identifying the soybean with high oil content is characterized by comprising the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.16 or SEQ ID NO.17 and SEQ ID NO.18, and detecting that the soybean of the variety to be detected is AA genotype, wherein the soybean of the variety to be detected is soybean with high oil content, and if the soybean of the variety to be detected is GG genotype, the soybean is soybean with low oil content.
Background
The soybean has rich nutrient components, and the oil content is about 20 percent. People can supplement required nutrients by eating soybeans and can prevent cardiovascular diseases of human bodies, the soybeans are also important oil crops and can be processed into edible oil to meet the dietary requirements of people, and the soybean oil mainly comprises five fatty acids which can prevent heart diseases, cancers and the like. With the increasing improvement of living standard of people, more and more people pay more attention to the edible health and the nutritive value of food, so the demand on soybean is great, but more soybeans in China depend on import from other countries, so that the national urgent need to improve the content of soybean oil and culture high-oil soybean varieties to meet the daily needs of people is provided.
The oil content of soybean kernels is a quality-related character, a relatively complex quantitative character is controlled by a plurality of genes, and is always limited by genetic characteristics and a breeding method, the traditional method is too slow, and with the continuous progress of science and technology, molecular auxiliary selection is provided.
Disclosure of Invention
The invention aims to quickly and accurately screen high-oil-content high-quality soybean varieties, and provides a molecular marker which is positioned on a soybean No.1 chromosome and is related to high oil content, wherein the nucleotide sequence of the molecular marker is SNP1, the sequence of the SNP1 is the nucleotide sequence of 39.67Mb-41.16Mb on the soybean No.1 chromosome, and the 40386604 nucleotide site of the Gm01 chromosome is T or C.
In one embodiment, the nucleotide sequence of the upstream primer for amplifying SNP1 is set forth as SEQ ID NO.4 or as SEQ ID NO.5, and the nucleotide sequence of the downstream primer for amplifying SNP1 is set forth as SEQ ID NO. 6.
In one embodiment, the 40780703 nucleotide position of the sequence of the SNP1 is a C or a G.
In one embodiment, the nucleotide sequence of the upstream primer for amplifying SNP1 is set forth as SEQ ID NO.7 or as SEQ ID NO.8, and the nucleotide sequence of the downstream primer for amplifying SNP1 is set forth as SEQ ID NO. 9.
In one embodiment, the 41034358 nucleotide position of the sequence of the SNP1 is an a or a G.
In one embodiment, the nucleotide sequence of the upstream primer amplifying SNP1 is set forth in SEQ ID No.16 or SEQ ID No.17, and the nucleotide sequence of the upstream primer amplifying SNP1 according to claim 1 is set forth in SEQ ID No. 18.
The invention also provides an application of the molecular marker in preparing a kit for identifying soybean with high oil content, wherein any one of the primer groups (a) to (c) is used for amplifying the SNP1 molecular marker:
(a) SEQ ID NO.4 or SEQ ID NO.5 and SEQ ID NO. 6;
(b) SEQ ID NO.7 or SEQ ID NO.8 and SEQ ID NO. 9;
(c) SEQ ID NO.16 or SEQ ID NO.17 and SEQ ID NO. 18.
The invention also provides a method for identifying the soybean with high oil content, which comprises the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.4 or SEQ ID NO.5 and SEQ ID NO.6, wherein the soybean to be detected is a soybean with high oil content when the soybean to be detected is CC genotype, and the soybean to be detected is a soybean with low protein content when the soybean is TT genotype.
The invention also provides a method for identifying the soybean with high oil content, which comprises the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.7 or SEQ ID NO.8 and SEQ ID NO.9, wherein if the soybean of the variety to be detected is detected to be CC genotype, the soybean of the variety to be detected is soybean with high oil content, and if the soybean of the variety to be detected is detected to be GG genotype, the soybean of the variety to be detected is soybean with low oil content.
The invention also provides a method for identifying the soybean with high oil content, which comprises the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.16 or SEQ ID NO.17 and SEQ ID NO.18, and detecting that the soybean of the variety to be detected is AA genotype, wherein the soybean of the variety to be detected is soybean with high oil content, and if the soybean of the variety to be detected is GG genotype, the soybean is soybean with low oil content.
Has the advantages that: the research utilizes 643 parts of resource groups subjected to genome-wide re-sequencing combined with phenotype data of soybean kernel storage substances repeated for 3 times in 2 years, utilizes a hierarchical evaluation method to screen out SNP sites which are extremely obviously related to oil content, adopts KASP in an SNP molecular marker technology to verify in 162 parts of soybean non-sequencing extreme oil content resource materials, develops molecular markers related to the oil content according to the classification result and the phenotype data thereof, and provides a high-speed and accurate method for screening high-oil content high-quality varieties in advance in production.
Drawings
Fig. 1 is a 2018 and 2019 resource sequencing material oil content and BLUP distribution histogram, wherein a is the 2018 oil content distribution histogram, B is the 2019 oil content distribution histogram, C is the 2 year oil BLUP distribution histogram, the abscissa is the group, and the ordinate is the frequency;
FIG. 2 is the distribution of the number of SNP sites on 20 chromosomes, wherein the abscissa is the chromosome and the ordinate is the number of SNPs;
FIG. 3 is the distribution of the number of SNP sites on 20 chromosomes, in which the abscissa is the chromosome and the ordinate is the number of SNPs, which are related to oil content;
FIG. 4 is a graph showing the difference between the values of phenotypic effects corresponding to an allele of a mutant genome of an SNP site associated with oil and a reference genome, wherein the abscissa is the group and the ordinate is the difference between the phenotypic effects;
FIG. 5 is the mean of the phenotype of the high oil excellent haplotype and the low oil haplotype at the SNP site related to oil, wherein the abscissa is the group and the ordinate is the protein content;
FIG. 6 shows KASP genotyping of SNP markers in 162 soybean extreme oil resource materials.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
MQTL for the content of soybean oil is described in the document Qi et al 2018Meta-analysis and transcriptome profile new genes for soybean segment composition reduced segment.
Example 1.
Experimental population: 643 parts of core germplasm sequencing resources of soybeans in the northeast region are selected as an experimental group, the soybeans are planted in the sunny farm test field of Jilin academy of agricultural sciences and northeast agriculture university in 2018 and 2019 for 3 times of repetition, 1m of the line length is adopted, 20 seeds are sown in each 1 line, the sowing depth is 3-4cm, the field management method is managed by the same field, all characters are inspected after harvesting, and 5 plants with consistent growth vigor are selected for measurement. In the vegetative growth stage, the youngest leaves at the top of the plant are taken for extracting DNA, and 5 plants of each strain are randomly threshed during harvesting for measuring the oil content.
Firstly, measuring and processing the oil content of soybean kernel particles: the oil content of experimental materials and verification materials is measured by a mass method by using a FOSS grain analyzer (Infratec1241), full spectrum scanning is carried out by the FOSS grain analyzer by using a near infrared transmission technology, rich spectrum information can be obtained, and high-precision oil content phenotype data can be obtained by comparing a calibration database. When in measurement, the grains are ensured to be in a safe water content range, each single plant grain is repeatedly measured for 3 times, and the average value of 3 times of measurement data is taken as final oil content phenotypic data. The phenotypic data processing is carried out by using Microsoft Office excel 2013, the average value is taken, and the soybean oil sub-particle oil data repeated 3 times in 2 years is analyzed by using SPSS statistics, wherein the statistical analysis comprises significance test, frequency distribution histogram, mean value calculation and the like. The calculation of The optimal Linear Unbiased prediction value (BLUP) was performed by R software.
As can be seen from Table 1, 1 quality-related trait of the population has no large variation in two years, the coefficient of variation is between 4% and 5%, and the coefficient of variation of the oil component is the minimum and reaches 4.73%. The standard deviation of oil properties is small, and the standard deviation is 0.98 and 0.99 respectively in 2 years. By analyzing the kurtosis and skewness, it was found that the oil content of 1 quality-related trait in this population exhibited a highly skewed distribution. The 2018 and 2019 two-year oil phenotype data are subjected to BLUP analysis, the average values (A and B in figure 1) and the BLUP values of the 2018 and 2019 two-year oil are subjected to normal distribution (C in figure 1) by SPSS software, the two-year data and the BLUP values of the soybean kernel oil which are measured in the graph show continuous distribution, the distribution trend is obvious, and the normal distribution of the soybean kernel oil can be known from the normal curve. Secondly, it can be seen from the graph of 2 years of oil content and its BLUP value that the peak value of BLUP value is lower than that of data in two years, the data distribution is wider and more uniform, and the BLUP value distribution characteristics of the quality character also accord with the quantitative character genetic characteristics, and the method is more suitable for subsequent analysis and research by using the BLUP value.
Table 1 soybean sequencing materials 2018, 2019 descriptive analysis of oil quality traits
The DNA extraction method comprises the following steps: fresh leaves of 3-4g of soybean are taken and put into a 1.5mL centrifuge tube, and then 3 sterilized small steel balls with the diameter of 3mm are added. The centrifuge tube was immersed in liquid nitrogen and the freeze-dried leaf tissue was shaken to powder using a tissue grinder. Adding 650-. Adding equal volume of chloroform, mixing, centrifuging at 12,000rpm for 20min, sucking supernatant, injecting into new centrifuge tube, adding 700 μ L chloroform, mixing, centrifuging at 12,000rpm for 20min, sucking supernatant, and dripping into centrifuge tube filled with pre-cooled isopropanol at-20 deg.C for 20min at constant speed. Centrifuging at 8,000rpm for 10min, pouring the supernatant into a waste liquid tank, and washing the bottom bulk DNA with absolute ethanol and 75% ethanol, respectively. And (5) opening the centrifugal tube, placing the centrifugal tube in an ultra-clean workbench for blow-drying, and adding sterilized water. The quality level of the extracted DNA was measured by a spectrophotometer (NanoDrop), the concentration of the DNA was measured by agarose gel electrophoresis, and the DNA was diluted to a working solution concentration of 20 ng/. mu.L.
And II, hierarchical evaluation of SNP sites of resource sequencing materials: 643 soybean resource groups are selected for re-sequencing, 53,946 SNP sites are obtained by 20 chromosomes, the quality of SNP is controlled MAF to be less than 0.05, the heterozygosity rate is less than 10%, and each chromosome contains 2,697 on average. Wherein the number of SNPs in the chromosome of Chr18 is the largest, and 4,462 SNPs exist; the number of SNPs in chromosome Chr11 was the smallest, 862 SNPs in chromosome, and the number of SNPs in the remaining chromosomes is shown in FIG. 2.
Thirdly, important allele mining: (1) classifying the materials according to phenotype, respectively classifying the materials according to soybean oil with different properties, calculating the average value and standard deviation of all data, respectively taking the data obtained by adding or subtracting the standard deviation from the average value as a critical value, and taking the materials higher than the average value and the standard deviation as high oil content materials.
(2) The sequencing result of the SNP locus to be researched is statistically analyzed according to the allele of a reference genome or the allele corresponding to the mutated genome, and the materials are classified by combining phenotype data, namely oil content according to the standard of adding one-time standard deviation to the average value, taking the standard material with both ends higher and lower than the standard material, and the data obtained by statistics are listed in the following tetrad table as table 2 for chi-square detection:
TABLE 2 Khatag analysis of tetrad table
Note: i is A/C/T/G, a11The number of i alleles in high oil fractions, a21Number of no i alleles in high oil fractions, a12Number of i alleles in Low oil, a22Number of no i alleles in Low oil, C1Is high oil content,C2Is the total number of high oil fractions, R1Total number of i alleles, R2Is the total number of no i alleles and n is the total number of material.
(3) Original hypothesis H0: the size of the oil content is independent of the i allele, HA: there are 2 variables associated. Obtaining χ 2 results by the following formula, agreeing that H0 is true when χ 2 < χ 2 α is obtained; when the obtained χ 2 ≧ χ 2 α, it is not agreed that H0 is established, and HA is established. Therefore, the SNP site to be investigated was determined based on the obtained χ 2 value and the threshold value at α of 0.001.
(4) Repeating the steps (2) and (3), and carrying out independence test on all SNP sites of the oil property to judge the specific influence on the researched property, wherein in the experiment, alpha is 0.001 as a threshold value, when the obtained result corresponds to P < 0.001, the SNP site is judged to be a remarkable site influencing the property, and the subsequent research is carried out, and when P is more than or equal to 0.001, the continuous research on the site is abandoned.
(5) And (4) carrying out next phenotypic effect value verification on the significant sites selected in the step (4), and carrying out phenotypic effect calculation on the significant sites in all materials by the following formula:
note: the Rate of change indicates the effect Value, the A allele indicates the allele corresponding to the reference genome of the SNP site, the Value A indicates the mean Value of the oil content of the sample with A, the B allele indicates the allele corresponding to the mutant genome of the SNP site, and the Value B indicates the mean Value of the oil content of the sample with B.
As a result: taking oil phenotype data, namely the average value of the oil phenotype data, plus one time of standard deviation, as a standard in chi-square analysis, taking oil phenotype and sequencing results at two ends, carrying out hierarchical evaluation on the oil phenotype data, taking the average value plus one time of standard deviation as the standard, still selecting 0.001 with significance alpha as a hierarchical threshold value by using a chi-square analysis method, layering the results obtained by the analysis, considering that the SNP loci are extremely significant relative to the oil when a detected P value is less than 0.001, and obtaining 9,211 sites with extremely significant correlation to the oil because the effect is relatively small when the other P values are more than 0.001, and the selection significance is very small, wherein the number of the loci on the chromosome 9 is 1,448 at most, the number of the loci on other chromosomes is shown in figure 3, further limiting the mutation of the SNP loci in the high-low phenotype respectively, and obtaining the loci with extremely significant loci and the results of the soybean oil MQTL to find the locus coincident intervals, the number of related SNP loci obtained after comparison is 193, and the number of SNP loci related to oil components on chromosome 20 is at most, and reaches 73.
The effect before and after mutation of the key SNP site obtained by chi-square detection and comparison with MQTL is different, some have positive effect on the oil content, and some have negative effect: an allele is capable of increasing oil content if the average oil content of the allele is higher than the average oil content of all resources; conversely, if the mean oil content containing its mutant allele is lower than the mean oil content of all the resources, then the allele has the effect of being able to reduce the oil content. The effect values of the SNP sites on the oil content are different (see FIG. 4), the upper point in the figure shows that the difference between the phenotypic effect value of the allele corresponding to the mutation and the phenotypic effect value of the allele corresponding to the reference genome is positive, namely, the oil content after the mutation is increased, and the lower point in the figure shows that the difference between the phenotypic effect value of the allele corresponding to the mutation and the phenotypic effect value of the allele corresponding to the reference genome is negative, namely, the oil content after the mutation is decreased.
The 28 SNP loci on the No.1 chromosome 39.67-41.16 account for 60% -69.23% of high oil content, and the phenotype effect rate is 1.03% -2.83%; 32 SNP loci on 45.78-46.05 of chromosome 2, which account for 64.66% -72.31% of high oil content, and have the phenotype effect rate of 0.96% -2.21%; the SNP locus of 38904164 on chromosome 3 accounts for 61.54% of high oil content, and the phenotypic effect rates are both 2.17%; 22 SNP loci on chromosome 5 of 3.79-40.12, which account for 60% -73.85% of high oil content, and have a phenotype effect rate of 1.39% -2.28%; 4 SNP loci of 6.67-38.79 on chromosome 6 account for 60% -69.23% of high oil content, and the phenotype effect rate is 1.47% -2.26%; 6 SNP loci on chromosome 11, which account for 60% -63.08% of high oil content and have 1.47% -1.58% of phenotypic effect rate; 15 SNP loci of 20.17 to 20.33 on the No. 13 chromosome account for 72.31 to 78.46 percent of high oil content, and the phenotype effect rate is 1.59 to 2.40 percent; 30524559 and 31571725 SNP loci on chromosome 14, the proportion of which accounts for high oil content is 76.92%, and the phenotypic effect rates are 1.92% and 2.24% respectively; the SNP site 31468397 on chromosome 16, which accounts for 61.54% of high oil content, has a phenotypic effect rate of 2.37%; 9 SNP loci of 50.33-55.08 on chromosome 18, which account for 61.54% -76.92% of high oil content, and have a phenotype effect rate of 1.92% -2.63%; the 73 SNP loci of 13.66-41.76 on chromosome 20 account for 60% -75.38% of high oil content, and the phenotypic effect rate is 1.48% -2.63% (Table 3). The above chromosomal information for soybean is from the website: https:// phenylozome. infoalias is Org _ Gmax.
TABLE 3 SNP sites related to oil content after screening
Fourthly, analyzing haplotype of SNP loci related to oil content: analyzing 193 haplotypes of the variant loci, dividing the adjacent loci into a group for common analysis to obtain 44 groups, wherein each group generates different haplotypes, analyzing to obtain the proportion of the haplotypes in 643 parts of sequencing materials, calculating the phenotype mean value of the haplotypes, analyzing to obtain that the oil content phenotype mean values of the optimal haplotypes and the worst haplotypes of 17 groups of loci have larger difference, and better achieving the separation of oil content is shown in figure 5.
40386604 on chromosome 1 is analyzed to obtain high-oil-content excellent haplotype Hap _1(TGATTCTAGTCGTTC) and low-oil-content haplotype Hap _9(CAGCGACTAGTAGAG), wherein the high-oil-content excellent haplotype accounts for 47.83 percent, the oil content is mainly distributed about 20-23, the low-oil-content haplotype accounts for 8.36 percent, and the oil content is mainly distributed about 20-21, so that obvious difference is achieved; 40780703 on chromosome 1 is analyzed to obtain high-oil-content excellent haplotype Hap _1(GAAGAAAG) and low-oil-content haplotype Hap _9(CGGCCACC), wherein the high-oil-content excellent haplotype accounts for 48.33%, the oil content is mainly distributed about 20-23, the low-oil-content haplotype accounts for 1.34%, and the oil content is mainly distributed about 20.5-21, so that obvious difference is achieved; 40884357 on chromosome 1 is analyzed to obtain high-oil-content excellent haplotype Hap _1(AAC) and low-oil-content haplotype Hap _5(GAT), wherein the high-oil-content excellent haplotype accounts for 48.66%, the oil content is mainly distributed about 20-23, the low-oil-content haplotype accounts for 2.01%, and the oil content is mainly distributed at 19-20, so that obvious difference is achieved; the haplotype with excellent high oil content and the haplotype with low oil content are obtained on the chromosomes 40959307, 41034358, 41037022 and 41156486, the distribution of the oil content is obviously different, and the haplotype with excellent high oil content and the haplotype with low oil content are also generated on the loci on the chromosomes 2, 5, 6, 11 and 13, and the distribution of the oil content is obviously different.
Fifthly, verifying the group: 162 parts of core non-sequencing extreme oil resource material of soybean in northeast region, as shown in Table 4, is used for verification of important allele mining, and the planting, management, sampling and harvesting methods are the same as those of experimental materials.
The marker screening and the method of the SNP locus comprise: the KASP reaction system consists of mixed primers, Master Mix and sample DNA. According to the SNP sites obtained by hierarchical evaluation, base sequences of 50bp respectively at the upstream and downstream of the SNP sites are extracted by local Blast, and KASP primers are designed by using Primer 5.0 software. The primers for each site consisted of 2 specific forward primers (F1/F2) with different alleles and fluorescent tags and 1 common reverse primer (R), wherein the main formulation of each component was 46. mu.L of ddH2O, 12. mu.L each of the forward primer (100. mu. mol. L-1) and the reverse primer (100. mu. mol. L-1), Master Mix was from LGC. Fluorescent label FAM: GAAGGTGACCAAGTTCATGCT (SEQ ID NO.79), fluorescent tag HEX:GAAGGTCGGAGTCAACGGATT(SEQ ID NO.80) primerThe sequence information is shown in Table 5.
Adding components required by KASP reaction into a 384-well plate, adopting a Roche LightCycler480 II real-time fluorescent quantitative PCR instrument, reading a terminal fluorescent signal after the reaction is terminated, and performing PCR amplification program: 95 ℃ for 15 min; at 95 ℃ for 20 s; at 65 ℃ for 25 s; go to step 2, 10cycles, -0.8 ℃ per cycle; 95 ℃ for 10 s; 57 ℃ for 1 min; go to step 4, 35 cycles; 4 ℃ and infinity.
TABLE 4162 parts soybean nonsequencing extreme oil Material variety name and oil content
TABLE 5 primer sequence information
KASP typing verification: the Luo LightCycler480 II obtains a typing result, transposes the typing result to Excel software for analysis, calculates the site coincidence rate, and has the basic idea that:
(1) according to different extreme soybean oil non-sequencing materials, counting the number and distribution of alleles corresponding to the reference genome and the mutant genome of the SNP locus of each primer in the high-oil and low-oil materials, and constructing a four-grid table of the coincidence rate as shown in Table 6:
TABLE 6 FOUR-TABLE OF CONDITION RATES
Note: x and y are genotypes of KASP typing of SNP site design primers, a is the number of x alleles in a typing result of a non-sequencing high-oil material, b is the number of x alleles in a typing result of a non-sequencing low-oil material, c is the number of y alleles in a typing result of a non-sequencing high-oil material, d is the number of y alleles in a typing result of a non-sequencing low-oil material, M is the total number of the non-sequencing high-oil materials, and N is the total number of the non-sequencing low-oil materials.
(2) Primitive hypothesis H0: the size of the content is independent of the x/y allele, HA: there are 2 variables associated. The coincidence rate P is obtained by1、P2When P is obtained1<PαOr P2<PαThen, agree with H0If true; when P is obtained1≥PαAnd P is2≥PαWhen it is not agreeing with H0Is established by HAThis is true. So according to the calculated P1、P2The result of each primer was judged by the threshold value when α is 60%.
(3) Repeating the steps (1) and (2), carrying out independence test on all primer typing results of the oil content trait to verify the influence on the trait, carrying out further phenotypic effect verification on all results obtained in the step (3), and carrying out phenotypic effect calculation on the major effect site in all materials by using the following formula:
in order to verify the discovery of excellent alleles of SNP loci, 21 SNP loci related to oil content are selected, whether specificity exists or not is found according to the design principle of KASP primers, 21 primers are finally designed by adding fluorescent labels, the sequence information of the primers is shown in table 5, 162 parts of extreme oil content resource materials are verified and analyzed by using a KASP platform, and further key SNP loci are obtained.
The final 21 markers associated with oil typing were successful, and figure 6 shows a schematic of the results for 1 successful marker for KASP typing, showing 2 different homozygous alleles (CC, TT) and also heterozygous genotypes (CT). The results and the coincidence rate of the markers related to the oil content were shown by typing with KASP, and the analysis showed that 62 CC genotypes of the Gm01 — 40386604 markers related to the oil content were found in the high oil content material, 51 TT genotypes of the low oil content material were found, and the phenotype effect values were found to be 5.42%, accounting for 76.92% and 55.95% of the high oil content and low oil content materials, respectively; the Gm01_40780703 marker related to the oil content accounts for 78 parts of CC genotypes in a high-oil-content material, 65 parts of GG genotypes in a low-oil-content material, and accounts for 94.87% and 72.62% of the high-oil-content material and the low-oil-content material respectively, and the phenotypic effect value of the marker is 10.17%; the Gm01_41034358 marker related to oil accounts for 52 AA genotypes in the high-oil material, 51 GG genotypes in the low-oil material, which respectively account for 64.10% and 59.52% of the high-oil material and the low-oil material, and the phenotypic effect value is 4.86%;
in conclusion, the nucleotide sequence of the molecular marker is SNP1, the sequence of the SNP1 is the nucleotide sequence at the position of 39.67Mb-41.16Mb on the soybean chromosome 1, and the 40386604 nucleotide site of the Gm01 chromosome is T or C. The 40780703 nucleotide position of the sequence of SNP1 is C or G. The 41034358 nucleotide position of the sequence of SNP1 is A or G.
Example 2.
A kit for screening high-oil soybean comprises: primers (a) to (c):
(a) SEQ ID NO.4 or SEQ ID NO.5 and SEQ ID NO. 6;
(b) SEQ ID NO.7 or SEQ ID NO.8 and SEQ ID NO. 9;
(c) SEQ ID NO.16 or SEQ ID NO.17 and SEQ ID NO. 18.
Secondly, a screening method comprises the following steps:
selecting a sample with unknown soybean oil content, and performing a PCR amplification program by using the kit for screening high-oil soybeans in the step one: 95 ℃ for 15 min; at 95 ℃ for 20 s; at 65 ℃ for 25 s; go to step 2, 10cycles, -0.8 ℃ per cycle; 95 ℃ for 10 s; 57 ℃ for 1 min; go to step 4, 35 cycles; 4 ℃ and infinity. The method comprises the following steps:
the invention also provides a method for identifying the soybean with high oil content, which comprises the following specific steps:
(1) extracting DNA of the soybean to be detected;
(2) and carrying out PCR reaction by using SEQ ID NO.4 or SEQ ID NO.5 and SEQ ID NO.6, wherein the soybean to be detected is a soybean with high oil content when the soybean to be detected is CC genotype, and the soybean to be detected is a soybean with low protein content when the soybean is TT genotype. And carrying out PCR reaction by using SEQ ID NO.7 or SEQ ID NO.8 and SEQ ID NO.9, wherein if the soybean of the variety to be detected is detected to be CC genotype, the soybean of the variety to be detected is soybean with high oil content, and if the soybean of the variety to be detected is detected to be GG genotype, the soybean of the variety to be detected is soybean with low oil content. And carrying out PCR reaction by using SEQ ID NO.16 or SEQ ID NO.17 and SEQ ID NO.18, and detecting that the soybean of the variety to be detected is AA genotype, wherein the soybean of the variety to be detected is soybean with high oil content, and if the soybean of the variety to be detected is GG genotype, the soybean is soybean with low oil content.
As a result: the soybean oil content in a sample with unknown soybean protein content is detected to be more than 20%, the primer mark of the group (a) is marked as CC genotype, the primer mark of the group (b) is marked as CC genotype, the primer mark of the group (c) is marked as AA genotype, and the soybean high oil content is consistent with the genotype detected by the marks. The content of the soybean low oil content is consistent with the genotype detected by the marker.
SEQUENCE LISTING
<110> northeast university of agriculture
<120> molecular marker related to high oil content and located on soybean chromosome 1 and application thereof
<160> 80
<170> PatentIn version 3.5
<210> 1
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 1
gaaggtgacc aagttcatgc taatagtagc cggctctcat 40
<210> 2
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 2
gaaggtcgga gtcaacggat taatagtagc cggctctcac 40
<210> 3
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 3
agccttcaca ccgggggcac 20
<210> 4
<211> 39
<212> DNA
<213> Artificial Synthesis
<400> 4
gaaggtgacc aagttcatgc tttgcttatc acgcttatt 39
<210> 5
<211> 39
<212> DNA
<213> Artificial Synthesis
<400> 5
gaaggtcgga gtcaacggat tttgcttatc acgcttatc 39
<210> 6
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 6
tgcaaaaggt aaaacctagg 20
<210> 7
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 7
gaaggtgacc aagttcatgc taaatttgct ttgttctctg ag 42
<210> 8
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 8
gaaggtcgga gtcaacggat taaatttgct ttgttctctg ac 42
<210> 9
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 9
gccacccaat tggagacttg tagag 25
<210> 10
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 10
gaaggtgacc aagttcatgc taaaaaaaaa gtgtggttca 40
<210> 11
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 11
gaaggtcgga gtcaacggat taaaaaaaaa gtgtggttcg 40
<210> 12
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 12
ggcaccttcg tcgctgttag 20
<210> 13
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 13
gaaggtgacc aagttcatgc tttagttgga attttattaa 40
<210> 14
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 14
gaaggtcgga gtcaacggat tttagttgga attttattag 40
<210> 15
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 15
ataaaaaata atcccatggc cgaaa 25
<210> 16
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 16
gaaggtgacc aagttcatgc tggactgcat gctagacaca ta 42
<210> 17
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 17
gaaggtcgga gtcaacggat tggactgcat gctagacaca tg 42
<210> 18
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 18
tttttcgact tgtgaggcat a 21
<210> 19
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 19
gaaggtgacc aagttcatgc ttgagagaaa tgaaggagaa 40
<210> 20
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 20
gaaggtcgga gtcaacggat ttgagagaaa tgaaggagag 40
<210> 21
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 21
accaaagcat ttctcatgta a 21
<210> 22
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 22
gaaggtgacc aagttcatgc tatggttcat taagtaagag aa 42
<210> 23
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 23
gaaggtcgga gtcaacggat tatggttcat taagtaagag ag 42
<210> 24
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 24
ccactatttg cttcagacgg ggtat 25
<210> 25
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 25
gaaggtgacc aagttcatgc ttccaacgtt tgaattaggg a 41
<210> 26
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 26
gaaggtcgga gtcaacggat ttccaacgtt tgaattaggg t 41
<210> 27
<211> 23
<212> DNA
<213> Artificial Synthesis
<400> 27
caatttctcg gaaaattatg aca 23
<210> 28
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 28
gaaggtgacc aagttcatgc tatctccttc tgatcctcat 40
<210> 29
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 29
gaaggtcgga gtcaacggat tatctccttc tgatcctcac 40
<210> 30
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 30
gagagagaga aaaaaaaaag cagtt 25
<210> 31
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 31
gaaggtgacc aagttcatgc tattctgcac acagccctca 40
<210> 32
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 32
gaaggtcgga gtcaacggat tattctgcac acagccctcg 40
<210> 33
<211> 23
<212> DNA
<213> Artificial Synthesis
<400> 33
gcaactgcag ataaagtgac ttc 23
<210> 34
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 34
gaaggtgacc aagttcatgc tatgactcag gttcttccgt g 41
<210> 35
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 35
gaaggtcgga gtcaacggat tatgactcag gttcttccgt a 41
<210> 36
<211> 27
<212> DNA
<213> Artificial Synthesis
<400> 36
atgaatgaca ctgatgtcta aaagaaa 27
<210> 37
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 37
gaaggtgacc aagttcatgc taaggagtgt aaaggggagg 40
<210> 38
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 38
gaaggtcgga gtcaacggat taaggagtgt aaaggggagc 40
<210> 39
<211> 28
<212> DNA
<213> Artificial Synthesis
<400> 39
ttttaacttt tatcaacctt aagaattt 28
<210> 40
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 40
gaaggtgacc aagttcatgc taacccctct gttattccat 40
<210> 41
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 41
gaaggtcgga gtcaacggat taacccctct gttattccac 40
<210> 42
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 42
gctaaaactt caacatcaag c 21
<210> 43
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 43
gaaggtgacc aagttcatgc tggatggaaa aaagggtggc 40
<210> 44
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 44
gaaggtcgga gtcaacggat tggatggaaa aaagggtggt 40
<210> 45
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 45
tgtggctcca catgatttag g 21
<210> 46
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 46
gaaggtgacc aagttcatgc tgagaacatt atagtgtgca c 41
<210> 47
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 47
gaaggtcgga gtcaacggat tgagaacatt atagtgtgca g 41
<210> 48
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 48
aagcctcttt gatagcctta a 21
<210> 49
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 49
gaaggtgacc aagttcatgc taatgatgga aatgacttgg tt 42
<210> 50
<211> 42
<212> DNA
<213> Artificial Synthesis
<400> 50
gaaggtcgga gtcaacggat taatgatgga aatgacttgg tg 42
<210> 51
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 51
ttagcccaaa aactcttagg 20
<210> 52
<211> 39
<212> DNA
<213> Artificial Synthesis
<400> 52
gaaggtgacc aagttcatgc tttctaatga tggggtagt 39
<210> 53
<211> 39
<212> DNA
<213> Artificial Synthesis
<400> 53
gaaggtcgga gtcaacggat tttctaatga tggggtagc 39
<210> 54
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 54
ttctgcattt gatacagcat cagca 25
<210> 55
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 55
gaaggtgacc aagttcatgc tatggcacac ccgtgtttct c 41
<210> 56
<211> 41
<212> DNA
<213> Artificial Synthesis
<400> 56
gaaggtcgga gtcaacggat tatggcacac ccgtgtttct t 41
<210> 57
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 57
cacaacatca gtggcagtga agaag 25
<210> 58
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 58
gaaggtgacc aagttcatgc tcaatggctt tgtagacccc 40
<210> 59
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 59
gaaggtcgga gtcaacggat tcaatggctt tgtagaccct 40
<210> 60
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 60
gcagaatgct tgccagacac t 21
<210> 61
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 61
gaaggtgacc aagttcatgc taaaacagag gactctgttt 40
<210> 62
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 62
gaaggtcgga gtcaacggat taaaacagag gactctgttc 40
<210> 63
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 63
accaagcaca tcataaaggg aagcc 25
<210> 64
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 64
gaaggtgacc aagttcatgc ttaccctttg caagagctaa 40
<210> 65
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 65
gaaggtcgga gtcaacggat ttaccctttg caagagctac 40
<210> 66
<211> 20
<212> DNA
<213> Artificial Synthesis
<400> 66
tggaagatgt ggatgctgtc 20
<210> 67
<211> 39
<212> DNA
<213> Artificial Synthesis
<400> 67
gaaggtgacc aagttcatgc tttgaatccg aattgcaat 39
<210> 68
<211> 39
<212> DNA
<213> Artificial Synthesis
<400> 68
gaaggtcgga gtcaacggat tttgaatccg aattgcaac 39
<210> 69
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 69
ttctcgttcc atgtcttttg aaacc 25
<210> 70
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 70
gaaggtgacc aagttcatgc tttgaatccg aattgcaata 40
<210> 71
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 71
gaaggtcgga gtcaacggat tttgaatccg aattgcaatt 40
<210> 72
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 72
tttctcgttc catgtctttt gaaac 25
<210> 73
<211> 44
<212> DNA
<213> Artificial Synthesis
<400> 73
gaaggtgacc aagttcatgc tcctagatcc tcatttcaac tcac 44
<210> 74
<211> 44
<212> DNA
<213> Artificial Synthesis
<400> 74
gaaggtcgga gtcaacggat tcctagatcc tcatttcaac tcag 44
<210> 75
<211> 23
<212> DNA
<213> Artificial Synthesis
<400> 75
agcagttgaa gtgcttattc aaa 23
<210> 76
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 76
gaaggtgacc aagttcatgc tgtgaatacc ttggatatgg 40
<210> 77
<211> 40
<212> DNA
<213> Artificial Synthesis
<400> 77
gaaggtcgga gtcaacggat tgtgaatacc ttggatatgt 40
<210> 78
<211> 25
<212> DNA
<213> Artificial Synthesis
<400> 78
ctatgtttgt tcctatctca agtcc 25
<210> 79
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 79
gaaggtgacc aagttcatgc t 21
<210> 80
<211> 21
<212> DNA
<213> Artificial Synthesis
<400> 80
gaaggtcgga gtcaacggat t 21