Fusion protein, base editing tool and application thereof

文档序号:3296 发布日期:2021-09-17 浏览:58次 中文

1. A fusion protein comprising an SsiCas9n polypeptide, wherein the SsiCas9n polypeptide is:

(a) SsiCas9D9A nickase amino acid sequence 2-1122; or

(b) An amino acid sequence shown as SEQ ID NO. 1; or

(c) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.1, and having the functions of the amino acid sequence defined in (a).

2. The fusion protein of claim 1, further comprising a deaminase ancaPOBEC1 polypeptide having an amino acid sequence of:

(d) an amino acid sequence shown as SEQ ID NO. 3; or

(e) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.3, and having a deaminase function possessed by the amino acid or coding DNA sequence defined in (d).

3. The fusion protein of claim 1, further comprising a uracil glycosylase inhibitor having an amino acid sequence:

(f) an amino acid sequence shown as SEQ ID NO. 4; or

(g) An amino acid sequence having more than 90% sequence identity with the amino acid sequence shown in SEQ ID No.4, and having the function of the uracil glycosylase inhibitor possessed by the amino acid or coding DNA sequence defined in (f).

4. The fusion protein of claim 1, wherein the fusion protein further comprises a nuclear localization signal peptide, wherein the nuclear localization signal polypeptide fragment is preferably located at the N-terminus and/or the C-terminus of the fusion protein, and the amino acid sequence of the nuclear localization signal polypeptide fragment is preferably as shown in SEQ ID No. 9.

5. The fusion protein of any one of claims 1 to 4, wherein the fusion protein comprises a nuclear localization signal polypeptide, the deaminase of claim 2, a first linker, the SsiCas9n polypeptide of claim 1, a second linker, the inhibitor of claim 3, and a nuclear localization signal polypeptide, and wherein the amino acid sequence of the fusion protein is preferably:

(h) the amino acid sequence shown as SEQ ID NO. 5; or

(i) An amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.5 and having the function of the amino acid sequence defined in (h), preferably having a cytosine deaminase function, more preferably having a cytosine base editor function, and more preferably being capable of recognizing NHAAAA as PAM.

6. A gene encoding the fusion protein according to any one of claims 1 to 5.

7. A composition comprising a gRNA and the fusion protein of any one of claims 1 to 5,

wherein the gRNA is a chimeric non-naturally occurring guide polynucleotide;

the gRNA/Cas complex is capable of recognizing, binding and nicking or unwinding, cleaving, in whole or in part, the target sequence.

8. A recombinant vector, recombinant bacterium or cell line comprising the gene of claim 6.

9. Use of the fusion protein according to any one of claims 1 to 5 or the gene according to claim 6 or the composition according to claim 7 or the recombinant vector, recombinant bacterium or cell line according to claim 8 for gene editing.

10. A method for gene editing, comprising in vivo or in vitro gene editing using the fusion protein of any one of claims 1 to 5, the gene of claim 6, the composition of claim 7, or the recombinant vector, recombinant bacterium, or cell line of claim 8.

Background

The CRISPR/Cas9 system is a natural defense system used by bacteria to defend phage DNA injection and plasmid transfer, has been widely developed and utilized by humans since being discovered, constructs a DNA editing system and platform relying on guide rna (grna) targeting, and is mainly used for targeted genome editing, transcriptional regulation, epigenetic editing, and the like. The main principle of action of the Cas9 system is the recruitment of Cas9 protein by tracrRNA in the gRNA, withThe gRNA binds to change Cas9 from an unactivated conformation to a DNA-recognition capable conformation. The former 20 bases of crRNA of the classic CRISPR/Cas9 system enable Cas9 to have target sequence specificity, a gRNA and Cas9 protein complex searches a recognition pre-spacer adjacent motif (PAM) of Cas9 protein on a DNA sequence, PAM of the classic SpCas9 is NGG, after PAM sites are successfully recognized, Cas9 enables DNA to be partially melted, gRNA enters and then is complementary with DNA to form an RNA-DNA complementary structure, and finally the gRNA is completely complementary with target DNA to enable an HNH active domain of the Cas9 protein to form a stable and active conformation to shear the target strand DNA. At the same time, a larger conformational change is caused, so that the non-target strand DNA enters the RuvC active domain and is cut by the RuvC active domain[1]. D10 in the RuvC domain and H840 in the HNH domain are critical to the cleavage activity of both domains, respectively, and introduction of either the D10A or H840A mutations results in Cas9 becoming Cas9nickase (Cas9n) with only single-strand cleavage activity, and when both mutations are introduced simultaneously, it becomes dCas9 with only targeted DNA binding activity and no endonuclease activity.

Based on Cas9n and dCas9, a series of genome or epigenome editing tools are developed, and the basic strategy is to connect catalytic enzymes or epigenetic factors with specific functions at the ends of Cas9n and dCas9, and to utilize the targeting activity of Cas9n and dCas9 to transport the specific functional factors to specific genome sites under the guidance of gRNAs so as to realize specific site gene editing, epigenetic editing, transcriptional activation or inhibition and the like. One of the most classical group of site-directed editing tools is a single base editing tool (base editor), i.e., a DNA deaminase is linked to the N-terminus of the Cas9N protein, which is transported from Cas9 to the target DNA sequence under the gRNA sequence, deaminates a specific nucleotide, and makes a single-strand nick on the complementary strand of the deaminated base chain by using the cleavage activity of Cas9N (D10A), and then realizes precise base replacement by a base repair mechanism and DNA replication. The first type of cytosine base editor, cbe (cytosine base editor), was first reported by David Liu laboratories, university of haver by fusing the rat APOBEC1 cytosine deaminase to dCas9 protein to obtain the first cytosine base editor. And to liftHigh editing efficiency, they fuse uracil DNA glycosylase inhibitor protein UGI with Cas9n, and uniform cells convert uracil to cytosine again; in order to make the cells preferentially use the deaminated DNA strand as a DNA repair template, the David Liu laboratory further exchanged dCas9 for Cas9n that cleaves only the complementary single strand of the deaminated strand, thereby greatly increasing the efficiency of CBE editing and enabling efficient base C/G replacement by T/A (C/G-to-T/A)[2]. Thereafter, David Liu laboratory invented adenine base editor ABE (adenine base editor) capable of realizing the substitution of target site base A/T to G/C (A/T-to-G/C)[3]. The base editor obtains TadA capable of deaminating DNA adenine by directed evolution of RNA adenine deaminase TadA, and the ABE7.0 with high-efficiency adenine editing activity is obtained by fusing TadA/TadA dimer and Cas9n protein[4]

Since then, many laboratories began to modify and optimize the base editor, including the combination and optimization of different deaminases and Cas9 proteins, resulting in base editors of different types and different characteristics, so that the editing efficiency and editing range of the base editor were greatly improved. Of these, the most important is the fourth generation base editor ancBE4max invented by David Liu laboratory, which greatly improves the purity and efficiency ratio of the edited product by using ancPOBEC 1 to replace rat APOBEC1, fusing two UGIs, increasing the length of a linker between APOBEC1-Cas9n and Cas9n-UGI, optimizing a Nuclear Localization Signal (NLS), and the like[5]. The PAM recognized by ancBE4max is NGG, the corresponding editing window is 4-8 th of 5' end in the gRNA range, and Cas9n is derived from Streptococcus pyogenes (SpCas 9; 1369 amino acids in total). However, the targeting window and PAM restriction of ancBE4max (PAM which primarily recognizes NGG sequences) greatly limits the range in the genome that can be targeted.

Thus, scientists developed a series of SpCas9 protein mutants obtained by protein engineering and directed evolution combined with deaminase, thereby obtaining a series of base editors with various targeting properties and recognition of PAM. Including xCas9 capable of identifying NGN[6]And SpCas9-NG[7]Cas9 variant SpRY with little PAM restriction[8]. The scientific community also tried to combine Cas9 homologues from different species with deaminases, e.g. Nme2Cas9[9]、SaCas9[10]、St1Cas9[11]、xCas9[12]And the like, thereby obtaining a novel editor with different editing characteristics, different length targeting sequences, different identification windows and the like.

Editing windows of classical editors based on SpCas9 are mainly 4-8 bits, and PAM preference or low partial site targeting efficiency exists in all editors. Moreover, the size of the expression plasmid of the classical base editor far exceeds the packaging range of adenovirus, which is not favorable for clinical research and application. Therefore, the development of novel base editors with different editing windows, different identification PAMs and smaller expression plasmids is the key of the current gene editing application research and clinical application.

Reference documents:

1.Jiang,F.and J.A.Doudna,CRISPR-Cas9 Structures and Mechanisms.Annu Rev Biophys,2017.46:p.505-529.

2.Komor,A.C.,et al.,Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature,2016.533(7603):p.420-424.

3.Gaudelli,N.M.,et al.,Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.Nature,2017.551(7681):p.464-471.

4.Gaudelli,N.M.,et al.,Programmable base editing of AT to GC in genomic DNA without DNA cleavage.Nature,2017.551:p.464-471.

5.Koblan,L.W.,et al.,Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction.Nature Biotechnology,2018.36.

6.Hu,J.H.,et al.,Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.Nature,2018.

7.Engineered CRISPR-Cas9 nuclease with expanded targeting space.Science(New York,N.Y.),2018.361(6408):p.1259.

8.Walton,R.T.,et al.,Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants.Science.368.

9.Edraki,A.,et al.,A Compact,High-Accuracy Cas9 with a Dinucleotide PAM for In Vivo Genome Editing.Mol Cell,2019.73(4):p.714-726.e4.

10.Nishimasu,H.,et al.,Crystal Structure of Staphylococcus aureus Cas9.Cell,2015.162(5):p.1113-26.

11.Zhang,Y.,et al.,Catalytic-state structure and engineering of Streptococcus thermophilus Cas9.Nature Catalysis,2020.3(10):p.813-823.

12.Hu,J.H.,et al.,Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.Nature,2018.556(7699):p.57-63.

disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a novel cytosine base editor for identifying PAM (polyacrylamide) sequence as NHAAAA, and the editing window of the single base editor is changed, so that the targeting range of the single base editor is widened.

The technical scheme adopted by the invention is as follows:

in a first aspect of the invention, a fusion protein is provided, the fusion protein comprising an SsiCas9n polypeptide, the amino acid sequence of the SsiCas9n polypeptide being:

(a) SsiCas9D9A nickase amino acid sequence 2-1122; or

(b) An amino acid sequence shown as SEQ ID NO. 1; or

(c) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.1, and having the functions of the amino acid sequence defined in (a).

In some preferred embodiments of the invention, the amino acid sequence of the SsiCas9N polypeptide is capable of recognizing NHAAAA as PAM, N represents any base.

In some preferred embodiments of the invention, the amino acid sequence of the SsiCas9n polypeptide is capable of causing single-stranded DNA cleavage at the complementary strand of the targeting sequence as a Cas9 nickase.

In some embodiments of the invention, the fusion protein further comprises a deaminase ancepobec 1 polypeptide having an amino acid sequence of:

(d) an amino acid sequence shown as SEQ ID NO. 3; or

(e) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.3, and having the function defined in (d), preferably having a cytosine deaminase function.

In some embodiments of the invention, the fusion protein further comprises a Uracil Glycosylase Inhibitor (UGI) having an amino acid sequence of:

(f) an amino acid sequence shown as SEQ ID NO. 4; or

(g) An amino acid sequence having more than 90% sequence identity with the amino acid sequence shown in SEQ ID No.4, and having the amino acid function defined in (f), preferably having a uracil DNA glycosylase inhibitor function.

In some embodiments of the invention, the fusion protein further comprises a nuclear localization signal peptide, preferably, the nuclear localization signal polypeptide fragment is located at the N-terminal and/or C-terminal of the fusion protein, and the amino acid sequence of the nuclear localization signal polypeptide fragment is shown as SEQ ID No. 9.

In some embodiments of the invention, the fusion protein further comprises a nuclear localization signal polypeptide, a deaminase of the second aspect of the invention, a first linker, an SsiCas9n polypeptide of the first aspect of the invention, a second linker, an inhibitor of the first aspect of the invention, and a nuclear localization signal polypeptide.

In some embodiments of the invention, the fusion protein comprises, in order from N-terminus to C-terminus, a BPNLS, an ancepobec 1 polypeptide fragment, a first linker, a polypeptide fragment consisting of 2-1122 amino acids from N-terminus of SsiCas9D9A nicase, a second linker, a 2-UGI polypeptide, and a BPNLS polypeptide sequence.

In some embodiments of the present invention, the first connector is preferably a 32aa connector, and the first connector is preferably a 10aa connector.

In some preferred embodiments of the invention, the amino acid sequence of the fusion protein is:

(h) the amino acid sequence shown as SEQ ID NO. 5; or

(i) An amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.5 and having the function of the amino acid sequence defined in (h), preferably having a cytosine deaminase function, more preferably having a cytosine base editor function, and more preferably being capable of recognizing NHAAAA as PAM.

The invention also provides a nucleic acid molecule capable of encoding the SsiCas9D9A nickase of the first aspect of the invention, wherein the sequence of the nucleic acid molecule is as follows:

(j) the sequence shown as SEQ ID NO.2 is a DNA coding sequence which is suitable for eukaryotic expression after codon optimization; or

(k) A DNA coding sequence corresponding to an amino acid sequence having more than 90% sequence identity to the amino acid sequence shown in SEQ ID No.1 and having the functions defined in (a) or (j); or

(l) The DNA sequence shown as SEQ ID NO.2 has a DNA sequence with synonymous codons.

In a second aspect of the invention, there is provided a gene encoding a fusion protein according to the first aspect of the invention.

In some embodiments of the invention, the sequence of the gene is:

(m) the sequence shown as SEQ ID NO. 6; or

(n) a DNA coding sequence corresponding to an amino acid sequence having more than 90% sequence identity to the amino acid sequence shown in SEQ ID No.5, and having the function defined in (h) or (m); or

(o) a DNA sequence having synonymous codons in the DNA sequence shown in SEQ ID NO. 6.

In a third aspect of the invention, a composition is provided that includes a gRNA and a fusion protein of the first aspect of the invention,

wherein the gRNA is a chimeric non-naturally occurring guide polynucleotide;

the gRNA/Cas complex is capable of recognizing, binding and nicking or unwinding, cleaving, in whole or in part, the target sequence.

In some preferred embodiments of the invention, the gRNA expression element consists of U6 promoter, gRNA targeting sequence insertion cleavage site, scaffold (Ssi specific), and termination signal, in that order.

In some embodiments of the invention, the scaffold is designed according to the tandem repeat sequence of streptococcus sinensis, which sequence is:

(p) a DNA sequence shown as SEQ ID NO. 8; or

(q) a DNA sequence having a sequence similarity of 80% or more to SEQ ID NO.8 and having the function of the DNA sequence defined in (p).

In some preferred embodiments of the invention, the sequence of the gRNA is:

(r) a DNA sequence shown as SEQ ID NO. 7; or

(s) a DNA sequence having a sequence similarity of 80% or more to SEQ ID NO.7 and having the function of the DNA sequence defined in (r).

In some preferred embodiments of the invention, the gRNA expression vector further includes a coding sequence comprising an EGFP tag, more preferably a gRNA that targets a specific site.

Wherein the eukaryotic codon optimized Cas9 protein homolog SsiCas9 coding sequence; NHAAAA can be identified as PAM sequence, which is different from PAM identification sequence of reported base editor; the length of the designed gRNA is 20 nt; Ssi-ancBE4max can convert base C at 3-12 bit of 5' end of a target sequence into base T, and can target a reported position which cannot be targeted by a cytosine base editor, thereby expanding the targetable range of a single base editor in the whole genome and providing more alternatives for the application of the single base editor.

In a fourth aspect of the invention there is provided a recombinant vector, heavy bacterium or cell line comprising a gene according to the second aspect of the invention.

In some embodiments of the invention, the cell is a eukaryotic cell or a prokaryotic cell.

In some preferred embodiments of the invention, the cell is a mouse cell or a human cell.

In some preferred embodiments of the invention, the cell is a human embryonic kidney cell.

In some more preferred embodiments of the invention, the cell is an HRK293T cell.

In a fifth aspect of the invention, there is provided a fusion protein according to the first aspect of the invention or a gene according to the second aspect of the invention or a composition according to the third aspect of the invention or a recombinant vector, recombinant bacterium or cell line according to the fourth aspect of the invention for use in gene editing.

In a sixth aspect of the present invention, there is provided a method for gene editing, in particular in vivo or in vitro gene editing using the fusion protein of the first aspect of the present invention or the gene of the second aspect of the present invention or the composition of the third aspect of the present invention or the recombinant vector, recombinant bacterium or cell line of the fourth aspect of the present invention.

The invention has the beneficial effects that:

the invention provides a fusion protein (base editor) based on a Streptococcus zhonghua (Streptococcus sinensis) source and a novel base editing tool, and particularly relates to a novel Cytosine Base Editor (CBE) named SsiCas9-ancBE4max, which is obtained by combining SsiCas9 for identifying NHAAAA with BE4 max. The editing tool comprises a scaffold sequence designed according to a tandem repeat sequence of streptococcus sinensis, the length of the designed targeting gRNA is 20nt, the tool can realize the conversion of a specific base (C-to-T), and the targeting range and the application range of base editing are widened. NHAAAA can be identified as PAM, the editing range is cytosine at 3-12 sites of the 5' end of the target sequence, the cytosine can be efficiently converted into thymine (C-to-T), and the target range of base editing is widened.

And the protein size of the base editing tool can be suitable for the packaging requirement of adenovirus, and has good application prospect. The base editing tool provided by the invention can efficiently induce the efficient conversion of C-to-T in 3-12 positions of the 5' end of an editing window, and the identified PAM is NHAAAA, thereby expanding the genome targeting range of base editing and providing tool selectivity of base editing and gene correction. The base editor provided by the invention reduces the size of the expression plasmid of the base editing tool, so that the base editor is more suitable for the packaging range of adenovirus (AAV), and has good gene therapy prospect and industrialization prospect.

Drawings

FIG. 1 is a schematic diagram of the domain of Ssi protein.

FIG. 2 is a schematic diagram of the protein domain of Ssi-ancBE4 max.

FIG. 3 is a schematic map of the plasmid structure of Ssi-ancBE4 max.

FIG. 4 is a schematic map of the plasmid structure of a gRNA of the Ssi-ancBE4max system.

FIG. 5 is a graph Ssi-ancBE4max, which is a graph showing the experimental results of example 3 of the present invention. Wherein A in FIG. 5 is the result of editing at Ssi2, B in FIG. 5 is the result of editing at Ssi6, C in FIG. 5 is the result of editing at Ssi8, and D in FIG. 5 is the result of editing at Ssi 10.

FIG. 6 is a statistical heat map of the editing efficiency of the Ssi-ancBE4max editing system in HEK293T cells. The dashed box is the edit window schematic.

Detailed Description

The concept and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments to fully understand the objects, features and effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention.

Example 1

The Cas9 protein homolog SsiCas9 from Streptococcus zhonghuanensis is aligned with the SpCas9 in amino acid sequence, a functional structure domain of the SsiCas9 is divided, the structure domain is shown in figure 1, a RuvC domain functional site (aspartic acid D9 at position 9) of the SsiCas9 is found and is mutated into alanine (A), and then SsiCas9D9A nickase is obtained, and the amino acid sequence of the SsiCas9D9A nickase is shown in SEQ ID No. 1.

The prokaryotic codon of the streptococcus sinensis SsiCas9D9A is optimized in a eukaryotic way, so that the coding DNA sequence of SsiCas9D9A suitable for eukaryotic cell expression is obtained, and is shown in SEQ ID NO. 2. Optimized SsiCas9D9A commercial company complete gene synthesis. The construction strategy is to replace SpCas 9D 10A of ancBE4max with SsiCas9D9A on the basis of ancBE4max, wherein the ancBE4max is synthesized by a commercial company through the whole gene. Next, we cleaved a portion of ancBE4max from XTEN linker-SpCas 9D 10A-10aa linker-UGI by endonuclease BamHI enzyme, and then supplemented the cleaved portion of ancBE4max from XTEN linker-SsiCas 9D 9A-10aa linker-UGI (with endonuclease BamHI cleavage sites at both ends of the sequence) when SsiCas9D9A was synthesized by commercial companies, as shown in SEQ ID NO. 10.

The plasmid AncBE4max (vector pCMV) was digested with restriction enzyme BamHI (R0136L) in a water bath at 37 ℃ for 2h, and the digestion system (50. mu.l) was: 10 xBuffer: 5 μ l, vector: 5 μ g, BamHI enzyme: 3 μ l, ddH2O: adding to 50 μ l; identifying whether the enzyme digestion is complete through gel electrophoresis; after the completion of the digestion, the linearized vector was purified using clean up kit (AxyPrep PCR clean kit) using 15. mu.l ddH2And (4) eluting with O. Carrying out PCR amplification on the synthesized XTEN linker-SsiCas 9D 9A-10aa linker-UGI, introducing protective bases outside enzyme cutting sites at two ends, and utilizing a PCR primer synthesized by Jinzhi Biotechnology limited, wherein the primer sequence is as follows:

Ssi PCR for:5’-agcggaggatcctctggcagcgagacacca-3’(SEQ ID NO.11);

Ssi PCR rev:5’-cctccggatcctccgctcagcatcttgatctta-3’(SEQ ID NO.12)。

the vector fragment was amplified by PCR reaction and purified using clean up kit (AxyPrep PCR clean kit). The purified PCR product was digested with BamH1, and the digestion system was referred to above.

The purified XTEN linker-SsiCas 9D 9A-10aa linker-UGI was enzymatically ligated with BamH1 linearized vector pCMV _ ancBE4max to obtain the primary ligation product. The ligation system (10. mu.l) was: purification of the linearized vector pCMV _ ancBE4 max: mu.l (50ng), XTEN linker-SsiCas 9D 9A-10aa linker-UGI BamH1 enzymeAn object: 1. mu.l (100ng), T4 DNA Ligase Buffer: 1. mu.l, T4 DNA Ligase: 1 μ l, ddH2O: 6 mu l of the solution; the conditions for enzyme linkage were 16 degrees ligation for 2 h. And (3) after the enzyme-linked product is converted, coating a plate, selecting a monoclonal shake bacteria for sequencing and cloning and identifying, and constructing the protein and DNA sequences of the SsiCas9-ancBE4max as shown in SEQ ID NO.5 and SEQ ID NO.6 respectively. The polypeptide is formed by sequentially fusing a polypeptide fragment consisting of 2-1122 amino acids at the N end of BPNLS, an ancAPEC 1 polypeptide fragment, a 32aa linker, SsiCas9D9A nickase, a 10aa linker, 2 UGI polypeptide and BPNLS polypeptide sequence from the N end to the C end. Wherein the amino acid sequence of the BPNLS nuclear localization signal polypeptide fragment is shown as SEQ ID NO. 9; wherein the amino acid sequence of the ancAPEC 1 polypeptide is shown as SEQ ID NO.3, the amino acid sequence of the UGI polypeptide is shown as SEQ ID NO.4, the amino acid sequence of the SsiCas9-acnBE4max is shown as SEQ ID NO.5, and the DNA coding sequence corresponding to the amino acid sequence of the SsiCas9-acnBE4max is shown as SEQ ID NO. 6.

The schematic diagram of the successfully constructed plasmid domain is shown in FIG. 2, and the map of the plasmid structure is shown in FIG. 3.

The monoclonal with positive identification is subjected to bacterial liquid amplification culture, plasmids (TIANGEN: TIANPure Midi Plasmid Kit) are extracted according to the Kit steps, and the concentration is measured, so that the sufficient dosage is ensured during transfection, and impurities such as salt, protein and the like are not polluted.

Example 2

2.1 vector construction of SsiCas9-ancBE4max System gRNA plasmid

pGL3-U6-sgRNA (Addgene #51133) is used as an expression framework to construct a gRNA expression vector suitable for an SsiCas9 gRNA editing system. According to a tandem repeat sequence from streptococcus zhonghuanensis, a scaffold sequence suitable for an SsiCas9 gRNA action system is designed, scaffold (suitable for SpCas9) of pGL3-U6-sgRNA (Addgene #51133) is replaced by SsiCas9 gRNA scaffold (suitable for SsiCas9), a successfully constructed complete plasmid is shown as SEQ ID NO.7 and named as pGL3-U6-Ssi, and a plasmid structure schematic diagram is shown in FIG. 4. The restriction sites ligated into the targeted gRNA sequence were two BsaI restriction sites, and the plasmid was synthesized from the whole gene of commercial company.

2.2 construction of SsiCas9-ancBE4max System Targeted gRNA plasmid

Grnas were designed and two complementarily paired oligos were synthesized, with the upstream sequence: 5 '-accg-20 nt-3', the downstream sequence is: 5 '-aaac-20 nt-3' (the 20nt downstream alternative sequence is complementary to the upstream 20nt alternative sequence), and the upstream sequence is 20nt-NHAAAA (DNA chain where PAM is located). The synthesized upstream and downstream sequences were annealed by a program (95 ℃ C., 5 min; cooling rate-2 ℃/s from 95 ℃ C. -85 ℃ C.; cooling rate-0.1 ℃/s from 85 ℃ C. -25 ℃ C.; hold at 4 ℃ C.) and ligated to pGL3-U6-Ssi gRNA vectors linearized with BsaI (NEB: R0539L).

The linearized digestion system is shown below: pGL3-U6-Ssi gRNA 2. mu.g; buffer (NEB: R0539L) 6. mu.L; BsaI 2. mu.L; ddH2The amount of O was adjusted to 60. mu.L. The cleavage was carried out overnight at 37 ℃. The linking system is as follows: t4 ligation buffer (NEB: M0202L) 1. mu.L, linearized vector 20ng, annealed oligo fragment (10. mu.M) 5. mu.L, T4 DNA ligase (NEB: M0202L) 0.5. mu.L, ddH2The amount of O was made up to 10. mu.L. Ligation was carried out overnight at 16 ℃. The linked vector is transformed, selected and identified. The positive clones were amplified to extract the plasmid (Axygene: AP-MN-P-250G) and the concentration was determined.

Human endogenous genes EMX1, RUNX1, DNMT1, AARSD1, GMPR2, ABCD3, NFYB and the like are selected, 19 gRNAs are designed in total, and 20 Oligos are synthesized, wherein the sequences are shown in Table 1.

TABLE 1 Oligos sequences

sgSsi-1 for 5’-ACCGtgggcaagagtttctgccac-3’(SEQ ID NO.13)
sgSsi-1 rev 5’-AAACgtggcagaaactcttgccca-3’(SEQ ID NO.14)
sgSsi-2 for 5’-ACCGctgcgttcctagaaccacag-3’(SEQ ID NO.15)
sgSsi-2 rev 5’-AAACctgtggttctaggaacgcag-3’(SEQ ID NO.16)
sgSsi-3 for 5’-ACCGaatgctggctacagatgtcc-3’(SEQ ID NO.17)
sgSsi-3 rev 5’-AAACggacatctgtagccagcatt-3’(SEQ ID NO.18)
sgSsi-4 for 5’-ACCGctcatatgtcacttacctct-3’(SEQ ID NO.19)
sgSsi-4 rev 5’-AAACagaggtaagtgacatatgag-3’(SEQ ID NO.20)
sgSsi-5 for 5’-ACCGgagacaggatctcactgtgt-3’(SEQ ID NO.21)
sgSsi-5 rev 5’-AAACacacagtgagatcctgtctc-3’(SEQ ID NO.22)
sgSsi-6 for 5’-ACCGtgctctaggtggtgttaatg-3’(SEQ ID NO.23)
sgSsi-6 rev 5’-AAACcattaacaccacctagagca-3’(SEQ ID NO.24)
sgSsi-7 for 5’-ACCGcagcaacatgaacaactgaa-3’(SEQ ID NO.25)
sgSsi-7 rev 5’-AAACttcagttgttcatgttgctg-3’(SEQ ID NO.26)
sgSsi-8 for 5’-ACCGaagagccaagtcttactgta-3’(SEQ ID NO.27)
sgSsi-8 rev 5’-AAACtacagtaagacttggctctt-3’(SEQ ID NO.28)
sgSsi-9 for 5’-ACCGctgacaagtactagcttatg-3’(SEQ ID NO.29)
sgSsi-9 rev 5’-AAACcataagctagtacttgtcag-3’(SEQ ID NO.30)
sgSsi-10 for 5’-ACCGttcctcatagcaacatcact-3’(SEQ ID NO.31)
sgSsi-10 rev 5’-AAACagtgatgttgctatgaggaa-3’(SEQ ID NO.32)

Example 3

HEK293T cells were transfected with the base editing system consisting of the SsiCas9-ancBE4max plasmid and pGL3-U6-Ssi gRNA plasmid constructed in the above example as follows:

3.1HEK293T cells (from ATCC) were recovered and cultured in 10cm dishes (Corning,430167) in DMEM (HyClone, SH30243.01) mixed with 10% fetal bovine serum (HyClone, SV 30087). The culture temperature was 37 ℃ and the carbon dioxide concentration was 5%. After multiple passages when the cell density was 90%, the cells were plated into 24-well plates.

3.2 the HEK293T cells are recovered for three generations and then the cell state is observed, the cells with good state are paved into a 24-well plate, after the paved cells are cultured for 18-24h, the cells are transfected when the cell concentration is 80%, and the dosage of each component in the transfection process is as follows: SsiCas9-ancBE4max plasmid 1 μ g, pGL3-U6-Ssi gRNA plasmid: mu.g, EZTrans transfection reagent (Liji organism) 4.5. mu.l.

3.3 the specific transfection procedure (as high efficiency version procedure of EZ Trans transfection reagent for Prunus hainanensis organisms) is:

3.3.1 configuration reagent a: for each well of cells, 1.5. mu.g of plasmid DNA (1. mu.g of SsiCas9-ancBE4max plasmid + 0.5. mu.g of pGL3-U6-Ssi gRNA plasmid) was diluted to 50. mu.l of serum-free double-antibody-free high-glucose DMEM medium (or OPTI-MEM medium) and mixed well.

3.3.2 configuration B reagent: for each well of cells, 4.5. mu.l of EZ Trans transfection reagent (EZ Trans: plasmid DNA ═ 2:1) was diluted to 50. mu.l of serum-free and diabody-free high-glucose DMEM medium (or OPTI-MEMI medium), and gently mixed. This step does not allow the dilution of plasmid and EZ Trans transfection reagents with serum-containing media, because serum contains large amounts of negatively charged proteins that can interfere with the adsorption of nucleic acids by the transfection reagents, thereby affecting transfection efficiency.

3.3.3 standing the reagent A and the reagent B for 5min simultaneously, adding the reagent B into the reagent A as soon as possible, and mixing the reagents lightly. The order of mixing cannot be reversed.

3.3.4 standing at room temperature for 15min to form EZ Trans-DNA complexes. The EZ Trans-DNA transfection complex prepared is dropped into a culture dish containing cells evenly, and the culture dish is shaken or shaken slightly to disperse the EZ Trans-DNA complex evenly.

3.3.5 at 37 ℃ 5% CO2Culturing for 4-6 h in an incubator, removing the culture solution containing the EZ Trans-DNA compound, replacing with a new culture solution, and culturing for 3 days.

3.4 transfected cells were cultured for 3 days, then the cells were digested with trypsin to obtain GFP-positive cells (FITC fluorescence intensity top 15%) and further flow-sorted to obtain GFP-positive cells, and the genomic DNA was extracted from the collected cells by phenol chloroform method.

3.5 PCR primers are designed and synthesized by 100-130bp respectively at the upstream and downstream of the selected endogenous gene targeting site, and diluted to 10. mu.M with water. Each genomic targeting site fragment was PCR amplified using the Novozam high fidelity enzyme kit (Vazyme, p501-d 2). PCR product samples were recovered by using AxyPrep DNA gel recovery kit (Axygen, AP-GX-250G) as tapping gel to remove non-specific bands. The PCR primer sequences are shown in Table 2.

TABLE 2PCR primer sequences

3.6 preliminarily identifying whether the target fragment is successfully amplified through gel electrophoresis, carrying out Sanger sequencing on the successfully amplified target fragment, and analyzing a sequencing result to observe whether a specific base point mutation (C-to-T or G-to-A) exists in a target site.

The sequencing result is shown in FIG. 5, wherein A in FIG. 5 is the result of editing at Ssi2, B in FIG. 5 is the result of editing at Ssi6, C in FIG. 5 is the result of editing at Ssi8, and D in FIG. 5 is the result of editing at Ssi 10; wherein the first column of the left panels of panels A-D in FIG. 5 is a schematic representation of the target DNA sequence; the second column is a PAM sequence; the right of the figure is a statistical chart of the editing result efficiency of the corresponding target sites. The right panel shows the statistical results of the editing efficiency of C-to-T at different positions in the gRNA range. The editing results of 4 editing sites, Ssi2, Ssi6, Ssi8 and Ssi10, are shown in fig. 5, and it can be seen from fig. 5 that the gene editing tool SsiCas9-ancBE4max obtained in this example 1 can cause efficient C-to-T conversion. Furthermore, in HEK293T cells, a total of 10 endogenous human genome sites were tested, and as a result, see fig. 6, it was found that SsiCas9-ancBE4max all resulted in efficient C-to-T conversion, and the editing range was mainly 3-12 bits within the range of gRNA sequences, widening the targeting range of the base editor.

The present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art. Furthermore, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict.

SEQUENCE LISTING

<110> Guangzhou university

<120> fusion protein, base editing tool and application thereof

<130>

<160> 52

<170> PatentIn version 3.5

<210> 1

<211> 1121

<212> PRT

<213> Artificial sequence

<400> 1

Asn Gly Lys Ile Leu Gly Leu Ala Ile Gly Val Ala Ser Val Gly Val

1 5 10 15

Gly Ile Leu Asp Lys Lys Thr Gly Glu Ile Ile His Ala Ser Ser Arg

20 25 30

Ile Phe Pro Ala Ala Thr Ala Asp Ser Asn Val Glu Arg Arg Gly Phe

35 40 45

Arg Gln Gly Arg Arg Leu Gly Arg Arg Lys Lys His Arg Lys Val Arg

50 55 60

Leu Ala Asp Leu Phe Ser Asp Thr Gly Leu Ile Thr Asp Phe Ser Lys

65 70 75 80

Val Ser Ile Asn Leu Asn Pro Tyr Glu Leu Arg Ile Lys Gly Leu Asn

85 90 95

Glu Lys Leu Thr Asn Glu Glu Leu Phe Ile Ala Leu Lys Asn Ile Val

100 105 110

Lys Arg Arg Gly Ile Ser Tyr Leu Asp Asp Ala Asn Glu Asp Gly Glu

115 120 125

Ser Ser Ser Ser Glu Tyr Gly Lys Ala Val Glu Glu Asn Arg Lys Leu

130 135 140

Leu Ala Asp Lys Thr Pro Gly Gln Ile Gln Leu Glu Arg Phe Glu Lys

145 150 155 160

Tyr Gly Gln Val Arg Gly Asp Phe Thr Ile Glu Glu Asn Gly Glu Lys

165 170 175

His Arg Leu Leu Asn Val Phe Ser Thr Ser Ala Tyr Lys Lys Glu Ala

180 185 190

Glu Arg Ile Leu Thr Lys Gln Gln Asp Tyr Asn Gln Asp Ile Thr Asp

195 200 205

Glu Phe Ile Gln Ala Tyr Leu Thr Ile Leu Thr Gly Lys Arg Lys Tyr

210 215 220

Tyr His Gly Pro Gly Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg Phe

225 230 235 240

Arg Thr Asp Gly Thr Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile Gly

245 250 255

Lys Cys Thr Phe Tyr Pro Glu Glu Tyr Arg Ala Ala Lys Ala Ser Tyr

260 265 270

Thr Ala Gln Glu Phe Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr Val

275 280 285

Pro Thr Glu Thr Lys Lys Leu Ser Glu Glu Gln Lys Arg Gln Ile Ile

290 295 300

Glu Tyr Ala Lys Gly Ala Lys Thr Leu Gly Ala Ala Thr Leu Leu Lys

305 310 315 320

Tyr Ile Ala Lys Leu Val Asp Gly Ser Val Glu Asp Ile Lys Gly Tyr

325 330 335

Arg Ile Asp Lys Ser Glu Lys Pro Glu Met His Thr Phe Asp Ile Tyr

340 345 350

Arg Lys Met Gln Thr Leu Glu Thr Val Asp Val Glu Lys Leu Ser Arg

355 360 365

Glu Val Leu Asp Glu Leu Ala His Ile Leu Thr Leu Asn Thr Glu Arg

370 375 380

Glu Gly Ile Glu Glu Ala Ile Lys Val Ser Phe Ile Lys Arg Glu Phe

385 390 395 400

Glu Gln Asp Gln Ile Ala Glu Leu Val Ser Phe Arg Lys Ser Asn Ser

405 410 415

Ser Leu Phe Gly Lys Gly Trp His Asn Phe Ser Ile Lys Leu Met Thr

420 425 430

Glu Leu Ile Pro Glu Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr Ile

435 440 445

Leu Thr Arg Leu Gly Lys Gln Lys Thr Lys Ala Arg Ser Lys Arg Thr

450 455 460

Lys Tyr Ile Asp Glu Lys Glu Leu Thr Asp Glu Ile Tyr Asn Pro Val

465 470 475 480

Val Ala Lys Ser Val Arg Gln Ala Ile Lys Ile Ile Asn Leu Ala Thr

485 490 495

Lys Lys Tyr Gly Val Phe Asp Asn Ile Val Ile Glu Met Ala Arg Glu

500 505 510

Asn Asn Glu Glu Asp Ala Lys Lys Asp Tyr Val Lys Arg Gln Lys Ala

515 520 525

Asn Glu Asp Glu Lys Asn Ala Ala Met Glu Lys Ala Ala His Gln Tyr

530 535 540

Asn Gly Lys Lys Glu Leu Pro Asp Asn Val Phe His Gly His Lys Glu

545 550 555 560

Leu Ala Thr Lys Ile Arg Leu Trp His Gln Gln Gly Glu Lys Cys Leu

565 570 575

Tyr Thr Gly Lys Asn Ile Pro Ile Ser Asp Leu Ile His Asn Gln Tyr

580 585 590

Lys Tyr Glu Ile Asp His Ile Leu Pro Leu Ser Leu Ser Phe Asp Asp

595 600 605

Ser Leu Ala Asn Lys Val Leu Val Leu Ala Thr Ala Asn Gln Glu Lys

610 615 620

Gly Gln Arg Thr Pro Phe Gln Ala Leu Asp Ser Met Asp Asp Ala Trp

625 630 635 640

Ser Tyr Arg Glu Phe Lys Ala Tyr Val Arg Gly Ala Arg Ala Leu Ser

645 650 655

Asn Lys Lys Lys Asp Tyr Leu Leu Asn Glu Glu Asp Ile Asn Lys Ile

660 665 670

Glu Val Lys Gln Lys Phe Ile Glu Arg Asn Leu Val Asp Thr Arg Tyr

675 680 685

Ser Ser Arg Val Val Leu Asn Ala Leu Gln Asp Phe Tyr Lys Leu Asn

690 695 700

Asp Phe Asp Thr Lys Ile Ser Val Val Arg Gly Gln Phe Thr Ser Gln

705 710 715 720

Leu Arg Arg Lys Trp Arg Ile Asp Lys Ser Arg Glu Thr Tyr His His

725 730 735

His Ala Val Asp Ala Leu Ile Ile Ala Ala Ser Ser Gln Leu Arg Leu

740 745 750

Trp Lys Lys Gln Gly Asn Pro Leu Ile Ser Tyr Lys Glu Asn Gln Phe

755 760 765

Val Asp Ser Glu Thr Gly Glu Ile Ile Ser Leu Thr Asp Asp Glu Tyr

770 775 780

Lys Glu Leu Val Phe Arg Ala Pro Tyr Asp His Phe Val Asp Thr Val

785 790 795 800

Ser Ser Lys Lys Phe Glu Asp Arg Ile Leu Phe Ser Tyr Gln Val Asp

805 810 815

Ser Lys Tyr Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ser Thr Arg

820 825 830

Lys Ala Lys Leu Gly Lys Asp Lys Ser Glu Glu Thr Tyr Val Leu Gly

835 840 845

Lys Ile Lys Asp Ile Tyr Thr Gln Thr Gly Tyr Asp Ala Phe Ile Lys

850 855 860

Leu Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr His Lys Asp Pro

865 870 875 880

Ile Thr Phe Glu Lys Val Ile Glu Glu Ile Leu Lys Thr Tyr Pro Asp

885 890 895

Lys Glu Ile Asn Glu Lys Gly Lys Glu Val Ala Cys Asn Pro Phe Glu

900 905 910

Lys Tyr Arg Gln Glu Asn Gly Pro Leu Arg Lys Tyr Ser Lys Lys Gly

915 920 925

Lys Gly Pro Glu Ile Lys Ser Leu Lys Tyr Tyr Asp Asn Lys Leu Gly

930 935 940

Asn His Ile Asp Ile Thr Pro Asp Asn Ser Glu Asn Gln Val Ile Leu

945 950 955 960

Gln Ser Leu Lys Pro Trp Arg Thr Asp Val Tyr Phe Asn His Lys Thr

965 970 975

Lys Ile Tyr Glu Leu Met Gly Leu Lys Tyr Ser Asp Leu Ser Phe Glu

980 985 990

Lys Gly Ser Gly Lys Tyr Arg Ile Ser Leu Asp Lys Tyr Asn Val Ile

995 1000 1005

Lys Lys Lys Glu Gly Val His Lys Glu Ser Glu Phe Lys Phe Thr

1010 1015 1020

Leu Tyr Lys Asn Asp Leu Ile Leu Ile Lys Asp Leu Glu Lys Ser

1025 1030 1035

Glu Gln Gln Leu Phe Arg Tyr Asn Ser Arg Asn Asp Thr Ser Lys

1040 1045 1050

His Tyr Val Glu Leu Lys Pro Tyr Asp Lys Ala Lys Phe Glu Gly

1055 1060 1065

Asn Gln Pro Leu Met Ala Leu Phe Gly Asn Val Ala Lys Gly Gly

1070 1075 1080

Gln Cys Leu Lys Gly Leu Asn Lys Ala Asn Ile Ser Ile Tyr Lys

1085 1090 1095

Val Gln Thr Asp Val Leu Gly Asn Lys Arg Phe Ile Lys Lys Glu

1100 1105 1110

Gly Asp Ala Pro Lys Leu Glu Phe

1115 1120

<210> 2

<211> 3363

<212> DNA

<213> Artificial sequence

<400> 2

aacggcaaga tcctgggact ggccatcgga gttgcatctg ttggagtggg catcctggac 60

aagaagaccg gcgagatcat ccacgccagc agcagaatct tccccgccgc cacagccgat 120

agcaacgtgg aacggagggg cttcagacag ggaagacggc tgggccgtag aaaaaaacac 180

agaaaggtgc ggttggccga tctgttcagc gacaccggcc tgataacaga cttctctaaa 240

gtgtctatca acctgaaccc ctacgagctg cggatcaagg gcctcaatga gaaactgaca 300

aacgaggaac tgttcatcgc cctgaagaac atcgtgaaga gaagaggcat cagctacctg 360

gatgacgcca atgaggacgg cgagagctcc tctagcgagt acggcaaggc tgtggaagaa 420

aaccgaaagt tgctggccga caagactcct ggccagatcc agctggaacg cttcgaaaag 480

tacggacagg tccgaggaga tttcaccatc gaggaaaacg gcgaaaagca tagactgctg 540

aacgtgttca gcaccagcgc ctataagaaa gaagccgagc ggattctgac caagcagcaa 600

gattacaacc aagacatcac cgacgagttc atccaggcct acctgacaat cctgacggga 660

aagagaaagt actaccatgg ccccggcaac gagaagtcta gaaccgacta cggccggttc 720

aggaccgatg gcaccaccct ggacaacatc tttggcatcc tgatcggcaa atgtacattc 780

tacccagagg agtaccgggc ggccaaggcc tcttacaccg cccaggagtt taacctcctg 840

aatgacctga acaatctgac agttccaacc gagacaaaga aactgagcga ggaacagaag 900

cggcaaatca tcgagtacgc caagggagcc aagacacttg gagccgccac cctgctcaag 960

tacatcgcca agctggtgga cggctctgtg gaggatatca agggctatag aattgataaa 1020

agcgagaaac ctgagatgca cacattcgat atctacagaa agatgcagac actggaaacc 1080

gtggatgtgg aaaagctgtc acgcgaggtg ctggatgagc tggcccatat cctgacactg 1140

aataccgaga gagaaggtat cgaggaggcc atcaaggtca gctttatcaa gagagagttc 1200

gaacaggacc agatcgccga gctggtcagc ttccggaagt ccaactctag cctgtttggc 1260

aagggctggc acaacttcag tatcaaactg atgacagaac tgatccccga gctgtatgag 1320

accagcgaag agcagatgac catcctgacc agactgggaa agcaaaagac aaaggctaga 1380

agcaagcgca caaagtacat cgacgagaag gagctgaccg acgagatcta caaccccgtg 1440

gtggccaaga gcgtgagaca ggccattaag atcatcaacc tggccaccaa gaagtacggc 1500

gtgttcgaca acatcgtgat cgagatggcc agagagaaca acgaggagga tgccaagaaa 1560

gattacgtga aaagacaaaa agctaatgag gacgaaaaga acgccgctat ggaaaaggct 1620

gcccaccagt acaacggcaa gaaggagctg cccgataacg tgtttcacgg ccacaaggaa 1680

ctggccacaa agatcagact gtggcaccag cagggcgaga agtgcctgta caccggcaaa 1740

aacatcccta tctctgatct gatccacaac cagtataagt acgagatcga ccacatcctg 1800

cctctgtcac tgagcttcga cgacagcctg gccaataagg tgctggtgct cgctaccgcc 1860

aaccaggaga agggccaaag aacacctttc caggccctcg acagcatgga cgatgcgtgg 1920

tcctatagag aatttaaggc ctacgtgcgg ggcgccagag ccctgagcaa caagaaaaaa 1980

gattacctgc tgaatgaaga ggacatcaac aagatcgaag tgaagcagaa attcatcgag 2040

aggaaccttg tggacactcg gtactcctct agagtggtcc tgaacgccct gcaggacttc 2100

tacaagctga atgatttcga caccaagatc agcgtggtga gaggccagtt caccagccag 2160

ctgagacgga aatggagaat cgacaagagc agagaaacct accaccacca cgccgtggac 2220

gctctgatca ttgccgctag ctcgcagctg agactgtgga agaagcaggg caacccactg 2280

atcagctaca aggaaaacca gttcgtcgac tccgaaaccg gagaaattat cagcctcaca 2340

gatgatgaat acaaggaact ggtgttccgg gctccatacg accacttcgt ggacacagtg 2400

agcagcaaaa agtttgaaga cagaatcctt ttctcctacc aggtggattc caaatacaac 2460

cggaaaatca gcgacgccac catttactct accagaaagg ccaagctggg caaagacaag 2520

agcgaggaaa cctacgtgct gggcaagata aaggacatct acacccagac cggctacgat 2580

gccttcatca agctgtacaa gaaggacaag tccaaatttc tgatgtacca caaggatcct 2640

atcacctttg agaaggtgat cgaggaaatc ctgaagacct accccgacaa ggaaatcaac 2700

gagaagggca aggaagtggc atgcaaccct tttgaaaaat atagacagga gaatggacct 2760

ctgagaaagt attctaagaa aggtaagggc cctgagatca agagcctgaa gtactacgac 2820

aacaaactcg gcaaccacat cgacataacc cctgacaaca gcgaaaatca ggtgatcctc 2880

cagtccctga aaccttggcg gaccgacgtg tacttcaacc acaaaaccaa gatttatgag 2940

ctgatgggcc tgaagtacag cgacctgagc ttcgagaagg gcagcggcaa gtaccggatt 3000

agcctggaca aatataacgt gatcaagaaa aaggagggcg tgcacaagga aagcgagttc 3060

aagttcacac tgtacaagaa cgacctgatc ctaatcaagg atctggaaaa gagcgagcag 3120

cagctgttta gatacaacag ccggaacgat acatccaagc actacgtgga gctgaagcct 3180

tacgacaagg ccaaattcga gggaaatcaa cctctgatgg ccctgttcgg caatgtggcc 3240

aagggaggcc agtgcctgaa gggcctgaac aaagccaaca tcagcatcta caaggtgcag 3300

accgacgtgc tgggcaacaa gcggttcatc aagaaagaag gcgacgctcc taagctggaa 3360

ttt 3363

<210> 3

<211> 228

<212> PRT

<213> Artificial sequence

<400> 3

Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg

1 5 10 15

Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg

20 25 30

Lys Glu Thr Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys

35 40 45

Ile Trp Arg His Ser Ser Lys Asn Thr Thr Lys His Val Glu Val Asn

50 55 60

Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Thr Ser

65 70 75 80

Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser

85 90 95

Lys Ala Ile Thr Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val

100 105 110

Ile Tyr Val Ala Arg Leu Tyr His His Met Asp Gln Gln Asn Arg Gln

115 120 125

Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr

130 135 140

Ala Pro Glu Tyr Asp Tyr Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro

145 150 155 160

Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu

165 170 175

Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu

180 185 190

Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala

195 200 205

Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala

210 215 220

Thr Gly Leu Lys

225

<210> 4

<211> 190

<212> PRT

<213> Artificial sequence

<400> 4

Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val

1 5 10 15

Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile

20 25 30

Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu

35 40 45

Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr

50 55 60

Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile

65 70 75 80

Lys Met Leu Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser Thr Asn Leu

85 90 95

Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu

100 105 110

Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys

115 120 125

Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp

130 135 140

Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp

145 150 155 160

Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu

165 170 175

Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu

180 185 190

<210> 5

<211> 1595

<212> PRT

<213> Artificial sequence

<400> 5

Pro Lys Lys Lys Arg Lys Val Ser Ser Glu Thr Gly Pro Val Ala Val

1 5 10 15

Asp Pro Thr Leu Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val Phe

20 25 30

Phe Asp Pro Arg Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile

35 40 45

Lys Trp Gly Thr Ser His Lys Ile Trp Arg His Ser Ser Lys Asn Thr

50 55 60

Thr Lys His Val Glu Val Asn Phe Ile Glu Lys Phe Thr Ser Glu Arg

65 70 75 80

His Phe Cys Pro Ser Thr Ser Cys Ser Ile Thr Trp Phe Leu Ser Trp

85 90 95

Ser Pro Cys Gly Glu Cys Ser Lys Ala Ile Thr Glu Phe Leu Ser Gln

100 105 110

His Pro Asn Val Thr Leu Val Ile Tyr Val Ala Arg Leu Tyr His His

115 120 125

Met Asp Gln Gln Asn Arg Gln Gly Leu Arg Asp Leu Val Asn Ser Gly

130 135 140

Val Thr Ile Gln Ile Met Thr Ala Pro Glu Tyr Asp Tyr Cys Trp Arg

145 150 155 160

Asn Phe Val Asn Tyr Pro Pro Gly Lys Glu Ala His Trp Pro Arg Tyr

165 170 175

Pro Pro Leu Trp Met Lys Leu Tyr Ala Leu Glu Leu His Ala Gly Ile

180 185 190

Leu Gly Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln

195 200 205

Leu Thr Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu

210 215 220

Pro Pro His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Gly Ser Ser

225 230 235 240

Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr

245 250 255

Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Asn Gly Lys Ile Leu

260 265 270

Gly Leu Ala Ile Gly Val Ala Ser Val Gly Val Gly Ile Leu Asp Lys

275 280 285

Lys Thr Gly Glu Ile Ile His Ala Ser Ser Arg Ile Phe Pro Ala Ala

290 295 300

Thr Ala Asp Ser Asn Val Glu Arg Arg Gly Phe Arg Gln Gly Arg Arg

305 310 315 320

Leu Gly Arg Arg Lys Lys His Arg Lys Val Arg Leu Ala Asp Leu Phe

325 330 335

Ser Asp Thr Gly Leu Ile Thr Asp Phe Ser Lys Val Ser Ile Asn Leu

340 345 350

Asn Pro Tyr Glu Leu Arg Ile Lys Gly Leu Asn Glu Lys Leu Thr Asn

355 360 365

Glu Glu Leu Phe Ile Ala Leu Lys Asn Ile Val Lys Arg Arg Gly Ile

370 375 380

Ser Tyr Leu Asp Asp Ala Asn Glu Asp Gly Glu Ser Ser Ser Ser Glu

385 390 395 400

Tyr Gly Lys Ala Val Glu Glu Asn Arg Lys Leu Leu Ala Asp Lys Thr

405 410 415

Pro Gly Gln Ile Gln Leu Glu Arg Phe Glu Lys Tyr Gly Gln Val Arg

420 425 430

Gly Asp Phe Thr Ile Glu Glu Asn Gly Glu Lys His Arg Leu Leu Asn

435 440 445

Val Phe Ser Thr Ser Ala Tyr Lys Lys Glu Ala Glu Arg Ile Leu Thr

450 455 460

Lys Gln Gln Asp Tyr Asn Gln Asp Ile Thr Asp Glu Phe Ile Gln Ala

465 470 475 480

Tyr Leu Thr Ile Leu Thr Gly Lys Arg Lys Tyr Tyr His Gly Pro Gly

485 490 495

Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg Phe Arg Thr Asp Gly Thr

500 505 510

Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile Gly Lys Cys Thr Phe Tyr

515 520 525

Pro Glu Glu Tyr Arg Ala Ala Lys Ala Ser Tyr Thr Ala Gln Glu Phe

530 535 540

Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr Val Pro Thr Glu Thr Lys

545 550 555 560

Lys Leu Ser Glu Glu Gln Lys Arg Gln Ile Ile Glu Tyr Ala Lys Gly

565 570 575

Ala Lys Thr Leu Gly Ala Ala Thr Leu Leu Lys Tyr Ile Ala Lys Leu

580 585 590

Val Asp Gly Ser Val Glu Asp Ile Lys Gly Tyr Arg Ile Asp Lys Ser

595 600 605

Glu Lys Pro Glu Met His Thr Phe Asp Ile Tyr Arg Lys Met Gln Thr

610 615 620

Leu Glu Thr Val Asp Val Glu Lys Leu Ser Arg Glu Val Leu Asp Glu

625 630 635 640

Leu Ala His Ile Leu Thr Leu Asn Thr Glu Arg Glu Gly Ile Glu Glu

645 650 655

Ala Ile Lys Val Ser Phe Ile Lys Arg Glu Phe Glu Gln Asp Gln Ile

660 665 670

Ala Glu Leu Val Ser Phe Arg Lys Ser Asn Ser Ser Leu Phe Gly Lys

675 680 685

Gly Trp His Asn Phe Ser Ile Lys Leu Met Thr Glu Leu Ile Pro Glu

690 695 700

Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr Ile Leu Thr Arg Leu Gly

705 710 715 720

Lys Gln Lys Thr Lys Ala Arg Ser Lys Arg Thr Lys Tyr Ile Asp Glu

725 730 735

Lys Glu Leu Thr Asp Glu Ile Tyr Asn Pro Val Val Ala Lys Ser Val

740 745 750

Arg Gln Ala Ile Lys Ile Ile Asn Leu Ala Thr Lys Lys Tyr Gly Val

755 760 765

Phe Asp Asn Ile Val Ile Glu Met Ala Arg Glu Asn Asn Glu Glu Asp

770 775 780

Ala Lys Lys Asp Tyr Val Lys Arg Gln Lys Ala Asn Glu Asp Glu Lys

785 790 795 800

Asn Ala Ala Met Glu Lys Ala Ala His Gln Tyr Asn Gly Lys Lys Glu

805 810 815

Leu Pro Asp Asn Val Phe His Gly His Lys Glu Leu Ala Thr Lys Ile

820 825 830

Arg Leu Trp His Gln Gln Gly Glu Lys Cys Leu Tyr Thr Gly Lys Asn

835 840 845

Ile Pro Ile Ser Asp Leu Ile His Asn Gln Tyr Lys Tyr Glu Ile Asp

850 855 860

His Ile Leu Pro Leu Ser Leu Ser Phe Asp Asp Ser Leu Ala Asn Lys

865 870 875 880

Val Leu Val Leu Ala Thr Ala Asn Gln Glu Lys Gly Gln Arg Thr Pro

885 890 895

Phe Gln Ala Leu Asp Ser Met Asp Asp Ala Trp Ser Tyr Arg Glu Phe

900 905 910

Lys Ala Tyr Val Arg Gly Ala Arg Ala Leu Ser Asn Lys Lys Lys Asp

915 920 925

Tyr Leu Leu Asn Glu Glu Asp Ile Asn Lys Ile Glu Val Lys Gln Lys

930 935 940

Phe Ile Glu Arg Asn Leu Val Asp Thr Arg Tyr Ser Ser Arg Val Val

945 950 955 960

Leu Asn Ala Leu Gln Asp Phe Tyr Lys Leu Asn Asp Phe Asp Thr Lys

965 970 975

Ile Ser Val Val Arg Gly Gln Phe Thr Ser Gln Leu Arg Arg Lys Trp

980 985 990

Arg Ile Asp Lys Ser Arg Glu Thr Tyr His His His Ala Val Asp Ala

995 1000 1005

Leu Ile Ile Ala Ala Ser Ser Gln Leu Arg Leu Trp Lys Lys Gln

1010 1015 1020

Gly Asn Pro Leu Ile Ser Tyr Lys Glu Asn Gln Phe Val Asp Ser

1025 1030 1035

Glu Thr Gly Glu Ile Ile Ser Leu Thr Asp Asp Glu Tyr Lys Glu

1040 1045 1050

Leu Val Phe Arg Ala Pro Tyr Asp His Phe Val Asp Thr Val Ser

1055 1060 1065

Ser Lys Lys Phe Glu Asp Arg Ile Leu Phe Ser Tyr Gln Val Asp

1070 1075 1080

Ser Lys Tyr Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ser Thr

1085 1090 1095

Arg Lys Ala Lys Leu Gly Lys Asp Lys Ser Glu Glu Thr Tyr Val

1100 1105 1110

Leu Gly Lys Ile Lys Asp Ile Tyr Thr Gln Thr Gly Tyr Asp Ala

1115 1120 1125

Phe Ile Lys Leu Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr

1130 1135 1140

His Lys Asp Pro Ile Thr Phe Glu Lys Val Ile Glu Glu Ile Leu

1145 1150 1155

Lys Thr Tyr Pro Asp Lys Glu Ile Asn Glu Lys Gly Lys Glu Val

1160 1165 1170

Ala Cys Asn Pro Phe Glu Lys Tyr Arg Gln Glu Asn Gly Pro Leu

1175 1180 1185

Arg Lys Tyr Ser Lys Lys Gly Lys Gly Pro Glu Ile Lys Ser Leu

1190 1195 1200

Lys Tyr Tyr Asp Asn Lys Leu Gly Asn His Ile Asp Ile Thr Pro

1205 1210 1215

Asp Asn Ser Glu Asn Gln Val Ile Leu Gln Ser Leu Lys Pro Trp

1220 1225 1230

Arg Thr Asp Val Tyr Phe Asn His Lys Thr Lys Ile Tyr Glu Leu

1235 1240 1245

Met Gly Leu Lys Tyr Ser Asp Leu Ser Phe Glu Lys Gly Ser Gly

1250 1255 1260

Lys Tyr Arg Ile Ser Leu Asp Lys Tyr Asn Val Ile Lys Lys Lys

1265 1270 1275

Glu Gly Val His Lys Glu Ser Glu Phe Lys Phe Thr Leu Tyr Lys

1280 1285 1290

Asn Asp Leu Ile Leu Ile Lys Asp Leu Glu Lys Ser Glu Gln Gln

1295 1300 1305

Leu Phe Arg Tyr Asn Ser Arg Asn Asp Thr Ser Lys His Tyr Val

1310 1315 1320

Glu Leu Lys Pro Tyr Asp Lys Ala Lys Phe Glu Gly Asn Gln Pro

1325 1330 1335

Leu Met Ala Leu Phe Gly Asn Val Ala Lys Gly Gly Gln Cys Leu

1340 1345 1350

Lys Gly Leu Asn Lys Ala Asn Ile Ser Ile Tyr Lys Val Gln Thr

1355 1360 1365

Asp Val Leu Gly Asn Lys Arg Phe Ile Lys Lys Glu Gly Asp Ala

1370 1375 1380

Pro Lys Leu Glu Phe Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser

1385 1390 1395

Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu

1400 1405 1410

Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu

1415 1420 1425

Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala

1430 1435 1440

Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp

1445 1450 1455

Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn

1460 1465 1470

Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Gly Gly Ser

1475 1480 1485

Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly

1490 1495 1500

Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu

1505 1510 1515

Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val

1520 1525 1530

His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu

1535 1540 1545

Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln

1550 1555 1560

Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser

1565 1570 1575

Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Pro Lys Lys Lys Arg

1580 1585 1590

Lys Val

1595

<210> 6

<211> 4785

<212> DNA

<213> Artificial sequence

<400> 6

ccaaagaaga agcggaaagt cagcagtgaa accggaccag tggcagtgga cccaaccctg 60

aggagacgga ttgagcccca tgaatttgaa gtgttctttg acccaaggga gctgaggaag 120

gagacatgcc tgctgtacga gatcaagtgg ggcacaagcc acaagatctg gcgccacagc 180

tccaagaaca ccacaaagca cgtggaagtg aatttcatcg agaagtttac ctccgagcgg 240

cacttctgcc cctctaccag ctgttccatc acatggtttc tgtcttggag cccttgcggc 300

gagtgttcca aggccatcac cgagttcctg tctcagcacc ctaacgtgac cctggtcatc 360

tacgtggccc ggctgtatca ccacatggac cagcagaaca ggcagggcct gcgcgatctg 420

gtgaattctg gcgtgaccat ccagatcatg acagccccag agtacgacta ttgctggcgg 480

aacttcgtga attatccacc tggcaaggag gcacactggc caagataccc acccctgtgg 540

atgaagctgt atgcactgga gctgcacgca ggaatcctgg gcctgcctcc atgtctgaat 600

atcctgcgga gaaagcagcc ccagctgaca tttttcacca ttgctctgca gtcttgtcac 660

tatcagcggc tgcctcctca tattctgtgg gctacaggcc tgaagtctgg aggatctagc 720

ggaggatcct ctggcagcga gacaccagga acaagcgagt cagcaacacc agagagcagt 780

ggcggcagca gcggcggcag caacggcaag atcctgggac tggccatcgg agttgcatct 840

gttggagtgg gcatcctgga caagaagacc ggcgagatca tccacgccag cagcagaatc 900

ttccccgccg ccacagccga tagcaacgtg gaacggaggg gcttcagaca gggaagacgg 960

ctgggccgta gaaaaaaaca cagaaaggtg cggttggccg atctgttcag cgacaccggc 1020

ctgataacag acttctctaa agtgtctatc aacctgaacc cctacgagct gcggatcaag 1080

ggcctcaatg agaaactgac aaacgaggaa ctgttcatcg ccctgaagaa catcgtgaag 1140

agaagaggca tcagctacct ggatgacgcc aatgaggacg gcgagagctc ctctagcgag 1200

tacggcaagg ctgtggaaga aaaccgaaag ttgctggccg acaagactcc tggccagatc 1260

cagctggaac gcttcgaaaa gtacggacag gtccgaggag atttcaccat cgaggaaaac 1320

ggcgaaaagc atagactgct gaacgtgttc agcaccagcg cctataagaa agaagccgag 1380

cggattctga ccaagcagca agattacaac caagacatca ccgacgagtt catccaggcc 1440

tacctgacaa tcctgacggg aaagagaaag tactaccatg gccccggcaa cgagaagtct 1500

agaaccgact acggccggtt caggaccgat ggcaccaccc tggacaacat ctttggcatc 1560

ctgatcggca aatgtacatt ctacccagag gagtaccggg cggccaaggc ctcttacacc 1620

gcccaggagt ttaacctcct gaatgacctg aacaatctga cagttccaac cgagacaaag 1680

aaactgagcg aggaacagaa gcggcaaatc atcgagtacg ccaagggagc caagacactt 1740

ggagccgcca ccctgctcaa gtacatcgcc aagctggtgg acggctctgt ggaggatatc 1800

aagggctata gaattgataa aagcgagaaa cctgagatgc acacattcga tatctacaga 1860

aagatgcaga cactggaaac cgtggatgtg gaaaagctgt cacgcgaggt gctggatgag 1920

ctggcccata tcctgacact gaataccgag agagaaggta tcgaggaggc catcaaggtc 1980

agctttatca agagagagtt cgaacaggac cagatcgccg agctggtcag cttccggaag 2040

tccaactcta gcctgtttgg caagggctgg cacaacttca gtatcaaact gatgacagaa 2100

ctgatccccg agctgtatga gaccagcgaa gagcagatga ccatcctgac cagactggga 2160

aagcaaaaga caaaggctag aagcaagcgc acaaagtaca tcgacgagaa ggagctgacc 2220

gacgagatct acaaccccgt ggtggccaag agcgtgagac aggccattaa gatcatcaac 2280

ctggccacca agaagtacgg cgtgttcgac aacatcgtga tcgagatggc cagagagaac 2340

aacgaggagg atgccaagaa agattacgtg aaaagacaaa aagctaatga ggacgaaaag 2400

aacgccgcta tggaaaaggc tgcccaccag tacaacggca agaaggagct gcccgataac 2460

gtgtttcacg gccacaagga actggccaca aagatcagac tgtggcacca gcagggcgag 2520

aagtgcctgt acaccggcaa aaacatccct atctctgatc tgatccacaa ccagtataag 2580

tacgagatcg accacatcct gcctctgtca ctgagcttcg acgacagcct ggccaataag 2640

gtgctggtgc tcgctaccgc caaccaggag aagggccaaa gaacaccttt ccaggccctc 2700

gacagcatgg acgatgcgtg gtcctataga gaatttaagg cctacgtgcg gggcgccaga 2760

gccctgagca acaagaaaaa agattacctg ctgaatgaag aggacatcaa caagatcgaa 2820

gtgaagcaga aattcatcga gaggaacctt gtggacactc ggtactcctc tagagtggtc 2880

ctgaacgccc tgcaggactt ctacaagctg aatgatttcg acaccaagat cagcgtggtg 2940

agaggccagt tcaccagcca gctgagacgg aaatggagaa tcgacaagag cagagaaacc 3000

taccaccacc acgccgtgga cgctctgatc attgccgcta gctcgcagct gagactgtgg 3060

aagaagcagg gcaacccact gatcagctac aaggaaaacc agttcgtcga ctccgaaacc 3120

ggagaaatta tcagcctcac agatgatgaa tacaaggaac tggtgttccg ggctccatac 3180

gaccacttcg tggacacagt gagcagcaaa aagtttgaag acagaatcct tttctcctac 3240

caggtggatt ccaaatacaa ccggaaaatc agcgacgcca ccatttactc taccagaaag 3300

gccaagctgg gcaaagacaa gagcgaggaa acctacgtgc tgggcaagat aaaggacatc 3360

tacacccaga ccggctacga tgccttcatc aagctgtaca agaaggacaa gtccaaattt 3420

ctgatgtacc acaaggatcc tatcaccttt gagaaggtga tcgaggaaat cctgaagacc 3480

taccccgaca aggaaatcaa cgagaagggc aaggaagtgg catgcaaccc ttttgaaaaa 3540

tatagacagg agaatggacc tctgagaaag tattctaaga aaggtaaggg ccctgagatc 3600

aagagcctga agtactacga caacaaactc ggcaaccaca tcgacataac ccctgacaac 3660

agcgaaaatc aggtgatcct ccagtccctg aaaccttggc ggaccgacgt gtacttcaac 3720

cacaaaacca agatttatga gctgatgggc ctgaagtaca gcgacctgag cttcgagaag 3780

ggcagcggca agtaccggat tagcctggac aaatataacg tgatcaagaa aaaggagggc 3840

gtgcacaagg aaagcgagtt caagttcaca ctgtacaaga acgacctgat cctaatcaag 3900

gatctggaaa agagcgagca gcagctgttt agatacaaca gccggaacga tacatccaag 3960

cactacgtgg agctgaagcc ttacgacaag gccaaattcg agggaaatca acctctgatg 4020

gccctgttcg gcaatgtggc caagggaggc cagtgcctga agggcctgaa caaagccaac 4080

atcagcatct acaaggtgca gaccgacgtg ctgggcaaca agcggttcat caagaaagaa 4140

ggcgacgctc ctaagctgga atttagcggc gggagcggcg ggagcggggg gagcactaat 4200

ctgagcgaca tcattgagaa ggagactggg aaacagctgg tcattcagga gtccatcctg 4260

atgctgcctg aggaggtgga ggaagtgatc ggcaacaagc cagagtctga catcctggtg 4320

cacaccgcct acgacgagtc cacagatgag aatgtgatgc tgctgacctc tgacgccccc 4380

gagtataagc cttgggccct ggtcatccag gattctaacg gcgagaataa gatcaagatg 4440

ctgagcggag gatccggagg atctggaggc agcaccaacc tgtctgacat catcgagaag 4500

gagacaggca agcagctggt catccaggag agcatcctga tgctgcccga agaagtcgaa 4560

gaagtgatcg gaaacaagcc tgagagcgat atcctggtcc ataccgccta cgacgagagt 4620

accgacgaaa atgtgatgct gctgacatcc gacgccccag agtataagcc ctgggctctg 4680

gtcatccagg attccaacgg agagaacaaa atcaaaatgc tgtctggcgg ctcaaaaaga 4740

accgccgacg gcagcgaatt cgagcccaag aagaagagga aagtc 4785

<210> 7

<211> 4937

<212> DNA

<213> Artificial sequence

<400> 7

gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60

ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120

aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180

atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240

cgaaacaccg tgagaccgag agagggtctc agtttttgta ctctcaagaa attgcagaag 300

ctacaaagat aaggcttcat gccgaaatca acaccctgtc tcttggcggg gtgttttttt 360

ttttaaagaa ttctcgacct cgagacaaat ggcagtattc atccacaatt ttaaaagaaa 420

aggggggatt ggggggtaca gtgcagggga aagaatagta gacataatag caacagacat 480

acaaactaaa gaattacaaa aacaaattac aaaaattcaa aattttcggg tttattacag 540

ggacagcaga gatccacttt ggccgcggct cgagggggtt ggggttgcgc cttttccaag 600

gcagccctgg gtttgcgcag ggacgcggct gctctgggcg tggttccggg aaacgcagcg 660

gcgccgaccc tgggactcgc acattcttca cgtccgttcg cagcgtcacc cggatcttcg 720

ccgctaccct tgtgggcccc ccggcgacgc ttcctgctcc gcccctaagt cgggaaggtt 780

ccttgcggtt cgcggcgtgc cggacgtgac aaacggaagc cgcacgtctc actagtaccc 840

tcgcagacgg acagcgccag ggagcaatgg cagcgcgccg accgcgatgg gctgtggcca 900

atagcggctg ctcagcaggg cgcgccgaga gcagcggccg ggaaggggcg gtgcgggagg 960

cggggtgtgg ggcggtagtg tgggccctgt tcctgcccgc gcggtgttcc gcattctgca 1020

agcctccgga gcgcacgtcg gcagtcggct ccctcgttga ccgaatcacc gacctctctc 1080

cccaggggga tccatggtga gcaagggcga ggagctgttc accggggtgg tgcccatcct 1140

ggtcgagctg gacggcgacg taaacggcca caagttcagc gtgtccggcg agggcgaggg 1200

cgatgccacc tacggcaagc tgaccctgaa gttcatctgc accaccggca agctgcccgt 1260

gccctggccc accctcgtga ccaccctgac ctacggcgtg cagtgcttca gccgctaccc 1320

cgaccacatg aagcagcacg acttcttcaa gtccgccatg cccgaaggct acgtccagga 1380

gcgcaccatc ttcttcaagg acgacggcaa ctacaagacc cgcgccgagg tgaagttcga 1440

gggcgacacc ctggtgaacc gcatcgagct gaagggcatc gacttcaagg aggacggcaa 1500

catcctgggg cacaagctgg agtacaacta caacagccac aacgtctata tcatggccga 1560

caagcagaag aacggcatca aggtgaactt caagatccgc cacaacatcg aggacggcag 1620

cgtgcagctc gccgaccact accagcagaa cacccccatc ggcgacggcc ccgtgctgct 1680

gcccgacaac cactacctga gcacccagtc cgccctgagc aaagacccca acgagaagcg 1740

cgatcacatg gtcctgctgg agttcgtgac cgccgccggg atcactctcg gcatggacga 1800

gctgtacaag taaagcggcc gcgactctag atcataatca gccataccac atttgtagag 1860

gttttacttg ctttaaaaaa cctcccacac ctccccctga acctgaaaca taaaatgaat 1920

gcaattgttg ttgttaactt gtttattgca gcttataatg gttacaaata aagcaatagc 1980

atcacaaatt tcacaaataa agcatttttt tcactgcatt ctagttgtgg tttgtccaaa 2040

ctcatcaatg tatcttagtc gaccgatgcc cttgagagcc ttcaacccag tcagctcctt 2100

ccggtgggcg cggggcatga ctatcgtcgc cgcacttatg actgtcttct ttatcatgca 2160

actcgtagga caggtgccgg cagcgctctt ccgcttcctc gctcactgac tcgctgcgct 2220

cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata cggttatcca 2280

cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga 2340

accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct gacgagcatc 2400

acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa agataccagg 2460

cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat 2520

acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca cgctgtaggt 2580

atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa ccccccgttc 2640

agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg gtaagacacg 2700

acttatcgcc actggcagca gccactggta acaggattag cagagcgagg tatgtaggcg 2760

gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga acagtatttg 2820

gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc tcttgatccg 2880

gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag attacgcgca 2940

gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac gctcagtgga 3000

acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc ttcacctaga 3060

tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag taaacttggt 3120

ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt 3180

catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat 3240

ctggccccag tgctgcaatg ataccgcggg acccacgctc accggctcca gatttatcag 3300

caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct 3360

ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt 3420

tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg 3480

cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca 3540

aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt 3600

tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat 3660

gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac 3720

cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc agaactttaa 3780

aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt 3840

tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt 3900

tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa 3960

gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt 4020

atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaa 4080

taggggttcc gcgcacattt ccccgaaaag tgccacctga cgcgccctgt agcggcgcat 4140

taagcgcggc gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccctag 4200

cgcccgctcc tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc 4260

aagctctaaa tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc 4320

ccaaaaaact tgattagggt gatggttcac gtagtgggcc atcgccctga tagacggttt 4380

ttcgcccttt gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa 4440

caacactcaa ccctatctcg gtctattctt ttgatttata agggattttg ccgatttcgg 4500

cctattggtt aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat 4560

taacgcttac aatttgccat tcgccattca ggctgcgcaa ctgttgggaa gggcgatcgg 4620

tgcgggcctc ttcgctatta cgccagccca agctaccatg ataagtaagt aatattaagg 4680

tacgggaggt acttggagcg gccgcaataa aatatcttta ttttcattac atctgtgtgt 4740

tggttttttg tgtgaatcga tagtactaac atacgctctc catcaaaaca aaacgaaaca 4800

aaacaaacta gcaaaatagg ctgtccccag tgcaagtgca ggtgccagaa catttctcta 4860

tcgataggta ccgattagtg aacggatctc gacggtatcg atcacgagac tagcctcgag 4920

cggccgcccc cttcacc 4937

<210> 8

<211> 86

<212> DNA

<213> Artificial sequence

<400> 8

gtttttgtac tctcaagaaa ttgcagaagc tacaaagata aggcttcatg ccgaaatcaa 60

caccctgtct cttggcgggg tgtttt 86

<210> 9

<211> 7

<212> PRT

<213> Artificial sequence

<400> 9

Pro Lys Lys Lys Arg Lys Val

1 5

<210> 10

<211> 3743

<212> DNA

<213> Artificial sequence

<400> 10

agcggaggat cctctggcag cgagacacca ggaacaagcg agtcagcaac accagagagc 60

agtggcggca gcagcggcgg cagcaacggc aagatcctgg gactggccat cggagttgca 120

tctgttggag tgggcatcct ggacaagaag accggcgaga tcatccacgc cagcagcaga 180

atcttccccg ccgccacagc cgatagcaac gtggaacgga ggggcttcag acagggaaga 240

cggctgggcc gtagaaaaaa acacagaaag gtgcggttgg ccgatctgtt cagcgacacc 300

ggcctgataa cagacttctc taaagtgtct atcaacctga acccctacga gctgcggatc 360

aagggcctca atgagaaact gacaaacgag gaactgttca tcgccctgaa gaacatcgtg 420

aagagaagag gcatcagcta cctggatgac gccaatgagg acggcgagag ctcctctagc 480

gagtacggca aggctgtgga agaaaaccga aagttgctgg ccgacaagac tcctggccag 540

atccagctgg aacgcttcga aaagtacgga caggtccgag gagatttcac catcgaggaa 600

aacggcgaaa agcatagact gctgaacgtg ttcagcacca gcgcctataa gaaagaagcc 660

gagcggattc tgaccaagca gcaagattac aaccaagaca tcaccgacga gttcatccag 720

gcctacctga caatcctgac gggaaagaga aagtactacc atggccccgg caacgagaag 780

tctagaaccg actacggccg gttcaggacc gatggcacca ccctggacaa catctttggc 840

atcctgatcg gcaaatgtac attctaccca gaggagtacc gggcggccaa ggcctcttac 900

accgcccagg agtttaacct cctgaatgac ctgaacaatc tgacagttcc aaccgagaca 960

aagaaactga gcgaggaaca gaagcggcaa atcatcgagt acgccaaggg agccaagaca 1020

cttggagccg ccaccctgct caagtacatc gccaagctgg tggacggctc tgtggaggat 1080

atcaagggct atagaattga taaaagcgag aaacctgaga tgcacacatt cgatatctac 1140

agaaagatgc agacactgga aaccgtggat gtggaaaagc tgtcacgcga ggtgctggat 1200

gagctggccc atatcctgac actgaatacc gagagagaag gtatcgagga ggccatcaag 1260

gtcagcttta tcaagagaga gttcgaacag gaccagatcg ccgagctggt cagcttccgg 1320

aagtccaact ctagcctgtt tggcaagggc tggcacaact tcagtatcaa actgatgaca 1380

gaactgatcc ccgagctgta tgagaccagc gaagagcaga tgaccatcct gaccagactg 1440

ggaaagcaaa agacaaaggc tagaagcaag cgcacaaagt acatcgacga gaaggagctg 1500

accgacgaga tctacaaccc cgtggtggcc aagagcgtga gacaggccat taagatcatc 1560

aacctggcca ccaagaagta cggcgtgttc gacaacatcg tgatcgagat ggccagagag 1620

aacaacgagg aggatgccaa gaaagattac gtgaaaagac aaaaagctaa tgaggacgaa 1680

aagaacgccg ctatggaaaa ggctgcccac cagtacaacg gcaagaagga gctgcccgat 1740

aacgtgtttc acggccacaa ggaactggcc acaaagatca gactgtggca ccagcagggc 1800

gagaagtgcc tgtacaccgg caaaaacatc cctatctctg atctgatcca caaccagtat 1860

aagtacgaga tcgaccacat cctgcctctg tcactgagct tcgacgacag cctggccaat 1920

aaggtgctgg tgctcgctac cgccaaccag gagaagggcc aaagaacacc tttccaggcc 1980

ctcgacagca tggacgatgc gtggtcctat agagaattta aggcctacgt gcggggcgcc 2040

agagccctga gcaacaagaa aaaagattac ctgctgaatg aagaggacat caacaagatc 2100

gaagtgaagc agaaattcat cgagaggaac cttgtggaca ctcggtactc ctctagagtg 2160

gtcctgaacg ccctgcagga cttctacaag ctgaatgatt tcgacaccaa gatcagcgtg 2220

gtgagaggcc agttcaccag ccagctgaga cggaaatgga gaatcgacaa gagcagagaa 2280

acctaccacc accacgccgt ggacgctctg atcattgccg ctagctcgca gctgagactg 2340

tggaagaagc agggcaaccc actgatcagc tacaaggaaa accagttcgt cgactccgaa 2400

accggagaaa ttatcagcct cacagatgat gaatacaagg aactggtgtt ccgggctcca 2460

tacgaccact tcgtggacac agtgagcagc aaaaagtttg aagacagaat ccttttctcc 2520

taccaggtgg attccaaata caaccggaaa atcagcgacg ccaccattta ctctaccaga 2580

aaggccaagc tgggcaaaga caagagcgag gaaacctacg tgctgggcaa gataaaggac 2640

atctacaccc agaccggcta cgatgccttc atcaagctgt acaagaagga caagtccaaa 2700

tttctgatgt accacaagga tcctatcacc tttgagaagg tgatcgagga aatcctgaag 2760

acctaccccg acaaggaaat caacgagaag ggcaaggaag tggcatgcaa cccttttgaa 2820

aaatatagac aggagaatgg acctctgaga aagtattcta agaaaggtaa gggccctgag 2880

atcaagagcc tgaagtacta cgacaacaaa ctcggcaacc acatcgacat aacccctgac 2940

aacagcgaaa atcaggtgat cctccagtcc ctgaaacctt ggcggaccga cgtgtacttc 3000

aaccacaaaa ccaagattta tgagctgatg ggcctgaagt acagcgacct gagcttcgag 3060

aagggcagcg gcaagtaccg gattagcctg gacaaatata acgtgatcaa gaaaaaggag 3120

ggcgtgcaca aggaaagcga gttcaagttc acactgtaca agaacgacct gatcctaatc 3180

aaggatctgg aaaagagcga gcagcagctg tttagataca acagccggaa cgatacatcc 3240

aagcactacg tggagctgaa gccttacgac aaggccaaat tcgagggaaa tcaacctctg 3300

atggccctgt tcggcaatgt ggccaaggga ggccagtgcc tgaagggcct gaacaaagcc 3360

aacatcagca tctacaaggt gcagaccgac gtgctgggca acaagcggtt catcaagaaa 3420

gaaggcgacg ctcctaagct ggaatttagc ggcgggagcg gcgggagcgg ggggagcact 3480

aatctgagcg acatcattga gaaggagact gggaaacagc tggtcattca ggagtccatc 3540

ctgatgctgc ctgaggaggt ggaggaagtg atcggcaaca agccagagtc tgacatcctg 3600

gtgcacaccg cctacgacga gtccacagat gagaatgtga tgctgctgac ctctgacgcc 3660

cccgagtata agccttgggc cctggtcatc caggattcta acggcgagaa taagatcaag 3720

atgctgagcg gaggatccgg agg 3743

<210> 11

<211> 30

<212> DNA

<213> Artificial sequence

<400> 11

agcggaggat cctctggcag cgagacacca 30

<210> 12

<211> 33

<212> DNA

<213> Artificial sequence

<400> 12

cctccggatc ctccgctcag catcttgatc tta 33

<210> 13

<211> 24

<212> DNA

<213> Artificial sequence

<400> 13

accgtgggca agagtttctg ccac 24

<210> 14

<211> 24

<212> DNA

<213> Artificial sequence

<400> 14

aaacgtggca gaaactcttg ccca 24

<210> 15

<211> 24

<212> DNA

<213> Artificial sequence

<400> 15

accgctgcgt tcctagaacc acag 24

<210> 16

<211> 24

<212> DNA

<213> Artificial sequence

<400> 16

aaacctgtgg ttctaggaac gcag 24

<210> 17

<211> 24

<212> DNA

<213> Artificial sequence

<400> 17

accgaatgct ggctacagat gtcc 24

<210> 18

<211> 24

<212> DNA

<213> Artificial sequence

<400> 18

aaacggacat ctgtagccag catt 24

<210> 19

<211> 24

<212> DNA

<213> Artificial sequence

<400> 19

accgctcata tgtcacttac ctct 24

<210> 20

<211> 24

<212> DNA

<213> Artificial sequence

<400> 20

aaacagaggt aagtgacata tgag 24

<210> 21

<211> 24

<212> DNA

<213> Artificial sequence

<400> 21

accggagaca ggatctcact gtgt 24

<210> 22

<211> 24

<212> DNA

<213> Artificial sequence

<400> 22

aaacacacag tgagatcctg tctc 24

<210> 23

<211> 24

<212> DNA

<213> Artificial sequence

<400> 23

accgtgctct aggtggtgtt aatg 24

<210> 24

<211> 24

<212> DNA

<213> Artificial sequence

<400> 24

aaaccattaa caccacctag agca 24

<210> 25

<211> 24

<212> DNA

<213> Artificial sequence

<400> 25

accgcagcaa catgaacaac tgaa 24

<210> 26

<211> 24

<212> DNA

<213> Artificial sequence

<400> 26

aaacttcagt tgttcatgtt gctg 24

<210> 27

<211> 24

<212> DNA

<213> Artificial sequence

<400> 27

accgaagagc caagtcttac tgta 24

<210> 28

<211> 24

<212> DNA

<213> Artificial sequence

<400> 28

aaactacagt aagacttggc tctt 24

<210> 29

<211> 24

<212> DNA

<213> Artificial sequence

<400> 29

accgctgaca agtactagct tatg 24

<210> 30

<211> 24

<212> DNA

<213> Artificial sequence

<400> 30

aaaccataag ctagtacttg tcag 24

<210> 31

<211> 24

<212> DNA

<213> Artificial sequence

<400> 31

accgttcctc atagcaacat cact 24

<210> 32

<211> 24

<212> DNA

<213> Artificial sequence

<400> 32

aaacagtgat gttgctatga ggaa 24

<210> 33

<211> 19

<212> DNA

<213> Artificial sequence

<400> 33

ctgacctggc agataccac 19

<210> 34

<211> 20

<212> DNA

<213> Artificial sequence

<400> 34

ccacaggact taggaacgac 20

<210> 35

<211> 23

<212> DNA

<213> Artificial sequence

<400> 35

cccttgaaaa gtgcagtgtg tcg 23

<210> 36

<211> 23

<212> DNA

<213> Artificial sequence

<400> 36

ggcaattccc tttgaaagac tgc 23

<210> 37

<211> 21

<212> DNA

<213> Artificial sequence

<400> 37

ccgaggtact gttgctgctt c 21

<210> 38

<211> 22

<212> DNA

<213> Artificial sequence

<400> 38

gagatggcaa gcctttgttg cg 22

<210> 39

<211> 22

<212> DNA

<213> Artificial sequence

<400> 39

gatgctcatt ggtagctcgt gc 22

<210> 40

<211> 25

<212> DNA

<213> Artificial sequence

<400> 40

ctatctgtcc atccatgcat ttgcc 25

<210> 41

<211> 20

<212> DNA

<213> Artificial sequence

<400> 41

cctactgcgg atgccttctt 20

<210> 42

<211> 21

<212> DNA

<213> Artificial sequence

<400> 42

ttagcttggt gtggcagcat g 21

<210> 43

<211> 25

<212> DNA

<213> Artificial sequence

<400> 43

caagtcattg tgatgactga ggagc 25

<210> 44

<211> 19

<212> DNA

<213> Artificial sequence

<400> 44

ggccagccta tgatgggcc 19

<210> 45

<211> 25

<212> DNA

<213> Artificial sequence

<400> 45

ggatgctgtg atgactgaga cgtag 25

<210> 46

<211> 28

<212> DNA

<213> Artificial sequence

<400> 46

tggacatttt gagtttgaaa aggctgtg 28

<210> 47

<211> 24

<212> DNA

<213> Artificial sequence

<400> 47

caggcgtgct gtaatacatg aacc 24

<210> 48

<211> 26

<212> DNA

<213> Artificial sequence

<400> 48

gtcaccatag gataggaagt cagcag 26

<210> 49

<211> 18

<212> DNA

<213> Artificial sequence

<400> 49

gtcccactgc accagcag 18

<210> 50

<211> 32

<212> DNA

<213> Artificial sequence

<400> 50

cctattctat ctgagggagg acatgattga ag 32

<210> 51

<211> 26

<212> DNA

<213> Artificial sequence

<400> 51

ctctgcctgg aagaataatg agaacc 26

<210> 52

<211> 23

<212> DNA

<213> Artificial sequence

<400> 52

ccaggatggt gtttgtgaga tgg 23

完整详细技术资料下载
上一篇:石墨接头机器人自动装卡簧、装栓机
下一篇:一种制备人肾脏组织单细胞悬液的消化酶及应用

网友询问留言

已有0条留言

还没有人留言评论。精彩留言会获得点赞!

精彩留言,会给你点赞!