Method and system for designing primer based on characteristic sequence
1. A method for designing a primer based on a characteristic sequence is characterized by comprising the following steps:
(1) downloading a genome sequence required to be used from the Internet, wherein the target sequence is a positive sequence, other homologous sequences are negative sequences, the positive sequences are combined together to prepare a fasta file of posives.fa, and the negative sequences are combined together to prepare a fasta file of negaves.fa;
(2) dividing the positive sequence into fragments suitable for comparison, and recording the division positions;
(3) using blast to build a library for the negative sequence, then comparing the fragmented positive sequence to the negative sequence to screen, obtaining a specific sequence with lower consistency, and dividing the specific sequence into two stages, wherein the first stage is the specific sequence with the compared length less than 300bp and the compared area consistency less than 70%, and the second stage is the specific sequence with the compared length less than 200bp although the compared area consistency is more than 70%;
(4) primer3 is used for carrying out primer design on the secondary specific sequence obtained in the step (3) to obtain the sequences, positions and lengths of an upstream primer, a downstream primer and a probe;
(5) comparing the primers and the probes obtained in the step (4) with the negative sequence database obtained in the step (3) through blast comparison software, taking primers and probes which are not compared, and then comparing the primers and the probes which are not compared with the nt database of NCBI again to obtain specific primers and probes;
(6) and (5) sorting the results of the step (5), and outputting upstream and downstream primers, probes and sequence lengths.
2. The method for designing a primer based on a characteristic sequence according to claim 1, wherein: the segmentation size of the positive sequence in the step (2) is 300bp-1000 bp.
3. The method for designing a primer based on a characteristic sequence according to claim 2, wherein: the size of the positive sequence partition in step (2) is 500 bp.
4. The method for designing a primer based on a characteristic sequence according to any one of claims 1 to 3, wherein: in the step (4), the lengths of the upstream primer, the downstream primer and the probe are 18-25bp, the GC content is controlled between 40-60%, and the TM interval is set at 57-60 ℃.
5. The method of claim 4, wherein the primer is designed based on the characteristic sequence of the DNA sequence of the target sequence: in the step (4), the lengths of the upstream primer, the downstream primer and the probe are 20bp, and the TM value is 60 ℃.
6. A system for designing primers based on a characteristic sequence, comprising a memory, a processor coupled to the memory, and a computer program stored on the memory and executable on the processor, wherein: the processor, when executing the computer program, performs the method of designing a primer according to any one of claims 1 to 5.
7. The system for designing primers based on characteristic sequences of claim 6, wherein said computer program is written based on python language.
Background
For a long time, the diagnosis of the cause of a patient is usually judged on the basis of clinical characteristics. The selection of therapeutic agents is effective through clinical features. However, for some complex symptoms, the judgment based on the clinical features alone is extremely inaccurate. Chemists in the eighties of the last century have invented PCR, making DNA amplification the basis of biological studies, to be of the right nature today. In order to better judge the cause of the disease, people design qPCR (quantitative PCR), dPCR (digital PCR, etc.) through PCR;
at present, PrimerPremier developed by Premier corporation of canada is the most commonly used and widely used Primer design software, and analysis software developed thereof, Primer-Blast, Primer3 Plus, Premier5, and the like, is used by a large number of letter workers to achieve Primer design. The design process starts with obtaining the target sequence, and then obtains the primer design result according to the provided target sequence and selecting proper primer design parameters.
The existing analysis technology has the following defects:
1. manually acquiring a target sequence;
2. position information errors are easily caused in the process of acquiring a target sequence;
3. the requirement of designing primers in batches cannot be realized;
4. the design result is fragmented, and time and labor are wasted when a plurality of primers are designed;
5. the resulting primer sequences also required manual alignment with the database in NCBI.
Disclosure of Invention
The invention aims to provide a method for designing a primer by a characteristic sequence, which can automatically acquire a target sequence, record the position of the target sequence and the position of the primer and complete the primer design from a genome to the target sequence and the primer fragment in one step.
The technical scheme adopted by the invention is as follows: a method for designing a primer based on a characteristic sequence is characterized by comprising the following steps:
(1) downloading a genome sequence required to be used from the Internet (downloading from an NCBI officer on the Internet or through other ways), wherein the target sequence is a positive sequence, the homologous other sequences are negative sequences, the positive sequences are combined together to prepare a fasta file of posives.fa, and the negative sequences are combined together to prepare a fasta file of negatives.fa;
(2) dividing the positive sequence into fragments suitable for comparison, and recording the division positions;
(3) using blast to build a library for the negative sequence, then comparing the fragmented positive sequence to the negative sequence to screen, obtaining a specific sequence with lower consistency, and dividing the specific sequence into two stages, wherein the first stage is the specific sequence with the compared length less than 300bp and the compared area consistency less than 70%, and the second stage is the specific sequence with the compared length less than 200bp although the compared area consistency is more than 70%;
(4) primer3 is used for carrying out primer design on the secondary specific sequence obtained in the step (3) to obtain the sequences and positions of an upstream primer, a downstream primer and a probe;
(5) comparing the primers and the probes obtained in the step (4) with the negative sequence database obtained in the step (3) through blast comparison software, taking primers and probes which are not compared, and then comparing the primers and the probes which are not compared with the nt database of NCBI again to obtain specific primers and probes;
(6) and (5) sorting the results of the step (5), and outputting upstream and downstream primers, probes and sequence lengths.
Preferably, the size of the positive sequence split in step (2) is 300bp-1000bp, and the split sizes should be consistent in the same primer design process.
Preferably, the positive sequence cleavage size in step (2) is 500 bp.
Preferably, the length of the upstream primer, the downstream primer and the probe in the step (4) is 18-25bp, the GC content is controlled between 40-60%, and the TM interval is set between 57-60 ℃.
Preferably, the length of the upstream primer, the downstream primer and the probe in the step (4) is 20bp, and the TM value is 60 ℃.
The invention also discloses a system for designing primers based on the characteristic sequences, which comprises a memory, a processor connected with the memory, and a computer program stored on the memory and capable of running on the processor, and is characterized in that: the processor executes the method for designing the primer when running the computer program.
Preferably, the computer program is written in the python language.
The invention has the following beneficial effects:
1. the invention provides a function of automatically screening out a specific sequence from homologous microorganisms, and solves the problem that a target sequence is not well obtained in actual operation;
2. the position of the primer fragment in the gene can be automatically recorded, and the function of designing the primer sequence in batches is provided;
3. all steps of primer design are realized at one time, and all primer fragments meeting the specification are obtained;
4. provides a recommended sequence for the user, and reduces the trouble of the user in selecting the final primer.
Drawings
FIG. 1 shows a positive sequence fragment.
FIG. 2 shows the first-stage specificity sequence.
FIG. 3 is a second level specificity sequence.
FIG. 4 shows the sequence of the forward primer.
FIG. 5 shows the sequence of the downstream primer.
FIG. 6 shows a probe sequence.
FIG. 7 shows the primer and probe sequences outputted after sorting.
FIG. 8 is a flow chart of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments, structures, features and effects according to the present invention will be made with reference to the accompanying drawings and preferred embodiments.
Example 1
Taking cryptococcus neoformans as an example, the method for designing the primer based on the characteristic sequence comprises the following steps:
(1) downloading genome sequences of cryptococcus from NCBI officer on the internet or through other ways, integrating positive sequences and negative sequences, preparing cryptococcus neoformans into positives.fa, and preparing cryptococcus neoformans of other cryptococcus neoformans into negatives.fa;
(2) dividing the positive sequence into segments suitable for alignment, and recording the divided positions, as shown in FIG. 1, dividing the positive sequence into gene segments with the length of 500bp by a window moving method (interval of 250 bp);
(3) using blast to build a library of the negative sequences, then comparing the fragmented positive sequences to the negative sequences for screening to obtain specific sequences with lower consistency, and dividing the specific sequences into two stages, wherein the first stage is a specific sequence (shown in figure 2) with consistency lower than 70% and the compared length less than 300bp, and the second stage is a specific sequence (shown in figure 3) with consistency higher than 70% and the compared length less than 200 bp;
(4) primer3 is used for primer design of the specific sequence obtained in the step (3), and the sequences and positions of the upstream primer, the downstream primer and the probe are obtained, and the primer design follows the following principle: 1. the length of the primer and the probe is set to be 18-25bp, the interval of 2 and TM is 57-60 ℃, the content of 3 and GC is 40-60%, 4 and bases are uniformly distributed in the primer as much as possible, 5, the primer and the primer do not form complementary sequences, and 6, a single strand of an amplification product cannot form a secondary structure;
(5) comparing the primers and the probes obtained in the step (4) with the negative sequence database obtained in the step (3) by blast comparison software, taking primers and probes which are not compared, and then comparing the primers and the probes which are not compared with the nt database of NCBI again to obtain specific primers and probes, wherein the specific primers and the probes are shown in figures 4-6;
(6) and (5) performing information arrangement on the result of the step (5), and outputting upstream and downstream primers, probes and sequence lengths, as shown in FIG. 7.
Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.