Fusion protein, base editing tool and application thereof
1. A fusion protein comprising an SsiCas9n polypeptide, wherein the SsiCas9n polypeptide is:
(a) SsiCas9D9A nickase amino acid sequence 2-1122; or
(b) An amino acid sequence shown as SEQ ID NO. 1; or
(c) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.1, and having the functions of the amino acid sequence defined in (a).
2. The fusion protein of claim 1, further comprising a deaminase ancaPOBEC1 polypeptide having an amino acid sequence of:
(d) an amino acid sequence shown as SEQ ID NO. 3; or
(e) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.3, and having a deaminase function possessed by the amino acid or coding DNA sequence defined in (d).
3. The fusion protein of claim 1, further comprising a uracil glycosylase inhibitor having an amino acid sequence:
(f) an amino acid sequence shown as SEQ ID NO. 4; or
(g) An amino acid sequence having more than 90% sequence identity with the amino acid sequence shown in SEQ ID No.4, and having the function of the uracil glycosylase inhibitor possessed by the amino acid or coding DNA sequence defined in (f).
4. The fusion protein of claim 1, wherein the fusion protein further comprises a nuclear localization signal peptide, wherein the nuclear localization signal polypeptide fragment is preferably located at the N-terminus and/or the C-terminus of the fusion protein, and the amino acid sequence of the nuclear localization signal polypeptide fragment is preferably as shown in SEQ ID No. 9.
5. The fusion protein of any one of claims 1 to 4, wherein the fusion protein comprises a nuclear localization signal polypeptide, the deaminase of claim 2, a first linker, the SsiCas9n polypeptide of claim 1, a second linker, the inhibitor of claim 3, and a nuclear localization signal polypeptide, and wherein the amino acid sequence of the fusion protein is preferably:
(h) the amino acid sequence shown as SEQ ID NO. 5; or
(i) An amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.5 and having the function of the amino acid sequence defined in (h), preferably having a cytosine deaminase function, more preferably having a cytosine base editor function, and more preferably being capable of recognizing NHAAAA as PAM.
6. A gene encoding the fusion protein according to any one of claims 1 to 5.
7. A composition comprising a gRNA and the fusion protein of any one of claims 1 to 5,
wherein the gRNA is a chimeric non-naturally occurring guide polynucleotide;
the gRNA/Cas complex is capable of recognizing, binding and nicking or unwinding, cleaving, in whole or in part, the target sequence.
8. A recombinant vector, recombinant bacterium or cell line comprising the gene of claim 6.
9. Use of the fusion protein according to any one of claims 1 to 5 or the gene according to claim 6 or the composition according to claim 7 or the recombinant vector, recombinant bacterium or cell line according to claim 8 for gene editing.
10. A method for gene editing, comprising in vivo or in vitro gene editing using the fusion protein of any one of claims 1 to 5, the gene of claim 6, the composition of claim 7, or the recombinant vector, recombinant bacterium, or cell line of claim 8.
Background
The CRISPR/Cas9 system is a natural defense system used by bacteria to defend phage DNA injection and plasmid transfer, has been widely developed and utilized by humans since being discovered, constructs a DNA editing system and platform relying on guide rna (grna) targeting, and is mainly used for targeted genome editing, transcriptional regulation, epigenetic editing, and the like. The main principle of action of the Cas9 system is the recruitment of Cas9 protein by tracrRNA in the gRNA, withThe gRNA binds to change Cas9 from an unactivated conformation to a DNA-recognition capable conformation. The former 20 bases of crRNA of the classic CRISPR/Cas9 system enable Cas9 to have target sequence specificity, a gRNA and Cas9 protein complex searches a recognition pre-spacer adjacent motif (PAM) of Cas9 protein on a DNA sequence, PAM of the classic SpCas9 is NGG, after PAM sites are successfully recognized, Cas9 enables DNA to be partially melted, gRNA enters and then is complementary with DNA to form an RNA-DNA complementary structure, and finally the gRNA is completely complementary with target DNA to enable an HNH active domain of the Cas9 protein to form a stable and active conformation to shear the target strand DNA. At the same time, a larger conformational change is caused, so that the non-target strand DNA enters the RuvC active domain and is cut by the RuvC active domain[1]. D10 in the RuvC domain and H840 in the HNH domain are critical to the cleavage activity of both domains, respectively, and introduction of either the D10A or H840A mutations results in Cas9 becoming Cas9nickase (Cas9n) with only single-strand cleavage activity, and when both mutations are introduced simultaneously, it becomes dCas9 with only targeted DNA binding activity and no endonuclease activity.
Based on Cas9n and dCas9, a series of genome or epigenome editing tools are developed, and the basic strategy is to connect catalytic enzymes or epigenetic factors with specific functions at the ends of Cas9n and dCas9, and to utilize the targeting activity of Cas9n and dCas9 to transport the specific functional factors to specific genome sites under the guidance of gRNAs so as to realize specific site gene editing, epigenetic editing, transcriptional activation or inhibition and the like. One of the most classical group of site-directed editing tools is a single base editing tool (base editor), i.e., a DNA deaminase is linked to the N-terminus of the Cas9N protein, which is transported from Cas9 to the target DNA sequence under the gRNA sequence, deaminates a specific nucleotide, and makes a single-strand nick on the complementary strand of the deaminated base chain by using the cleavage activity of Cas9N (D10A), and then realizes precise base replacement by a base repair mechanism and DNA replication. The first type of cytosine base editor, cbe (cytosine base editor), was first reported by David Liu laboratories, university of haver by fusing the rat APOBEC1 cytosine deaminase to dCas9 protein to obtain the first cytosine base editor. And to liftHigh editing efficiency, they fuse uracil DNA glycosylase inhibitor protein UGI with Cas9n, and uniform cells convert uracil to cytosine again; in order to make the cells preferentially use the deaminated DNA strand as a DNA repair template, the David Liu laboratory further exchanged dCas9 for Cas9n that cleaves only the complementary single strand of the deaminated strand, thereby greatly increasing the efficiency of CBE editing and enabling efficient base C/G replacement by T/A (C/G-to-T/A)[2]. Thereafter, David Liu laboratory invented adenine base editor ABE (adenine base editor) capable of realizing the substitution of target site base A/T to G/C (A/T-to-G/C)[3]. The base editor obtains TadA capable of deaminating DNA adenine by directed evolution of RNA adenine deaminase TadA, and the ABE7.0 with high-efficiency adenine editing activity is obtained by fusing TadA/TadA dimer and Cas9n protein[4]。
Since then, many laboratories began to modify and optimize the base editor, including the combination and optimization of different deaminases and Cas9 proteins, resulting in base editors of different types and different characteristics, so that the editing efficiency and editing range of the base editor were greatly improved. Of these, the most important is the fourth generation base editor ancBE4max invented by David Liu laboratory, which greatly improves the purity and efficiency ratio of the edited product by using ancPOBEC 1 to replace rat APOBEC1, fusing two UGIs, increasing the length of a linker between APOBEC1-Cas9n and Cas9n-UGI, optimizing a Nuclear Localization Signal (NLS), and the like[5]. The PAM recognized by ancBE4max is NGG, the corresponding editing window is 4-8 th of 5' end in the gRNA range, and Cas9n is derived from Streptococcus pyogenes (SpCas 9; 1369 amino acids in total). However, the targeting window and PAM restriction of ancBE4max (PAM which primarily recognizes NGG sequences) greatly limits the range in the genome that can be targeted.
Thus, scientists developed a series of SpCas9 protein mutants obtained by protein engineering and directed evolution combined with deaminase, thereby obtaining a series of base editors with various targeting properties and recognition of PAM. Including xCas9 capable of identifying NGN[6]And SpCas9-NG[7]Cas9 variant SpRY with little PAM restriction[8]. The scientific community also tried to combine Cas9 homologues from different species with deaminases, e.g. Nme2Cas9[9]、SaCas9[10]、St1Cas9[11]、xCas9[12]And the like, thereby obtaining a novel editor with different editing characteristics, different length targeting sequences, different identification windows and the like.
Editing windows of classical editors based on SpCas9 are mainly 4-8 bits, and PAM preference or low partial site targeting efficiency exists in all editors. Moreover, the size of the expression plasmid of the classical base editor far exceeds the packaging range of adenovirus, which is not favorable for clinical research and application. Therefore, the development of novel base editors with different editing windows, different identification PAMs and smaller expression plasmids is the key of the current gene editing application research and clinical application.
Reference documents:
1.Jiang,F.and J.A.Doudna,CRISPR-Cas9 Structures and Mechanisms.Annu Rev Biophys,2017.46:p.505-529.
2.Komor,A.C.,et al.,Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.Nature,2016.533(7603):p.420-424.
3.Gaudelli,N.M.,et al.,Programmable base editing of A·T to G·C in genomic DNA without DNA cleavage.Nature,2017.551(7681):p.464-471.
4.Gaudelli,N.M.,et al.,Programmable base editing of AT to GC in genomic DNA without DNA cleavage.Nature,2017.551:p.464-471.
5.Koblan,L.W.,et al.,Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction.Nature Biotechnology,2018.36.
6.Hu,J.H.,et al.,Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.Nature,2018.
7.Engineered CRISPR-Cas9 nuclease with expanded targeting space.Science(New York,N.Y.),2018.361(6408):p.1259.
8.Walton,R.T.,et al.,Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants.Science.368.
9.Edraki,A.,et al.,A Compact,High-Accuracy Cas9 with a Dinucleotide PAM for In Vivo Genome Editing.Mol Cell,2019.73(4):p.714-726.e4.
10.Nishimasu,H.,et al.,Crystal Structure of Staphylococcus aureus Cas9.Cell,2015.162(5):p.1113-26.
11.Zhang,Y.,et al.,Catalytic-state structure and engineering of Streptococcus thermophilus Cas9.Nature Catalysis,2020.3(10):p.813-823.
12.Hu,J.H.,et al.,Evolved Cas9 variants with broad PAM compatibility and high DNA specificity.Nature,2018.556(7699):p.57-63.
disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a novel cytosine base editor for identifying PAM (polyacrylamide) sequence as NHAAAA, and the editing window of the single base editor is changed, so that the targeting range of the single base editor is widened.
The technical scheme adopted by the invention is as follows:
in a first aspect of the invention, a fusion protein is provided, the fusion protein comprising an SsiCas9n polypeptide, the amino acid sequence of the SsiCas9n polypeptide being:
(a) SsiCas9D9A nickase amino acid sequence 2-1122; or
(b) An amino acid sequence shown as SEQ ID NO. 1; or
(c) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.1, and having the functions of the amino acid sequence defined in (a).
In some preferred embodiments of the invention, the amino acid sequence of the SsiCas9N polypeptide is capable of recognizing NHAAAA as PAM, N represents any base.
In some preferred embodiments of the invention, the amino acid sequence of the SsiCas9n polypeptide is capable of causing single-stranded DNA cleavage at the complementary strand of the targeting sequence as a Cas9 nickase.
In some embodiments of the invention, the fusion protein further comprises a deaminase ancepobec 1 polypeptide having an amino acid sequence of:
(d) an amino acid sequence shown as SEQ ID NO. 3; or
(e) An amino acid sequence having a sequence identity of 90% or more to the amino acid sequence shown in SEQ ID NO.3, and having the function defined in (d), preferably having a cytosine deaminase function.
In some embodiments of the invention, the fusion protein further comprises a Uracil Glycosylase Inhibitor (UGI) having an amino acid sequence of:
(f) an amino acid sequence shown as SEQ ID NO. 4; or
(g) An amino acid sequence having more than 90% sequence identity with the amino acid sequence shown in SEQ ID No.4, and having the amino acid function defined in (f), preferably having a uracil DNA glycosylase inhibitor function.
In some embodiments of the invention, the fusion protein further comprises a nuclear localization signal peptide, preferably, the nuclear localization signal polypeptide fragment is located at the N-terminal and/or C-terminal of the fusion protein, and the amino acid sequence of the nuclear localization signal polypeptide fragment is shown as SEQ ID No. 9.
In some embodiments of the invention, the fusion protein further comprises a nuclear localization signal polypeptide, a deaminase of the second aspect of the invention, a first linker, an SsiCas9n polypeptide of the first aspect of the invention, a second linker, an inhibitor of the first aspect of the invention, and a nuclear localization signal polypeptide.
In some embodiments of the invention, the fusion protein comprises, in order from N-terminus to C-terminus, a BPNLS, an ancepobec 1 polypeptide fragment, a first linker, a polypeptide fragment consisting of 2-1122 amino acids from N-terminus of SsiCas9D9A nicase, a second linker, a 2-UGI polypeptide, and a BPNLS polypeptide sequence.
In some embodiments of the present invention, the first connector is preferably a 32aa connector, and the first connector is preferably a 10aa connector.
In some preferred embodiments of the invention, the amino acid sequence of the fusion protein is:
(h) the amino acid sequence shown as SEQ ID NO. 5; or
(i) An amino acid sequence having a sequence similarity of 80% or more to SEQ ID NO.5 and having the function of the amino acid sequence defined in (h), preferably having a cytosine deaminase function, more preferably having a cytosine base editor function, and more preferably being capable of recognizing NHAAAA as PAM.
The invention also provides a nucleic acid molecule capable of encoding the SsiCas9D9A nickase of the first aspect of the invention, wherein the sequence of the nucleic acid molecule is as follows:
(j) the sequence shown as SEQ ID NO.2 is a DNA coding sequence which is suitable for eukaryotic expression after codon optimization; or
(k) A DNA coding sequence corresponding to an amino acid sequence having more than 90% sequence identity to the amino acid sequence shown in SEQ ID No.1 and having the functions defined in (a) or (j); or
(l) The DNA sequence shown as SEQ ID NO.2 has a DNA sequence with synonymous codons.
In a second aspect of the invention, there is provided a gene encoding a fusion protein according to the first aspect of the invention.
In some embodiments of the invention, the sequence of the gene is:
(m) the sequence shown as SEQ ID NO. 6; or
(n) a DNA coding sequence corresponding to an amino acid sequence having more than 90% sequence identity to the amino acid sequence shown in SEQ ID No.5, and having the function defined in (h) or (m); or
(o) a DNA sequence having synonymous codons in the DNA sequence shown in SEQ ID NO. 6.
In a third aspect of the invention, a composition is provided that includes a gRNA and a fusion protein of the first aspect of the invention,
wherein the gRNA is a chimeric non-naturally occurring guide polynucleotide;
the gRNA/Cas complex is capable of recognizing, binding and nicking or unwinding, cleaving, in whole or in part, the target sequence.
In some preferred embodiments of the invention, the gRNA expression element consists of U6 promoter, gRNA targeting sequence insertion cleavage site, scaffold (Ssi specific), and termination signal, in that order.
In some embodiments of the invention, the scaffold is designed according to the tandem repeat sequence of streptococcus sinensis, which sequence is:
(p) a DNA sequence shown as SEQ ID NO. 8; or
(q) a DNA sequence having a sequence similarity of 80% or more to SEQ ID NO.8 and having the function of the DNA sequence defined in (p).
In some preferred embodiments of the invention, the sequence of the gRNA is:
(r) a DNA sequence shown as SEQ ID NO. 7; or
(s) a DNA sequence having a sequence similarity of 80% or more to SEQ ID NO.7 and having the function of the DNA sequence defined in (r).
In some preferred embodiments of the invention, the gRNA expression vector further includes a coding sequence comprising an EGFP tag, more preferably a gRNA that targets a specific site.
Wherein the eukaryotic codon optimized Cas9 protein homolog SsiCas9 coding sequence; NHAAAA can be identified as PAM sequence, which is different from PAM identification sequence of reported base editor; the length of the designed gRNA is 20 nt; Ssi-ancBE4max can convert base C at 3-12 bit of 5' end of a target sequence into base T, and can target a reported position which cannot be targeted by a cytosine base editor, thereby expanding the targetable range of a single base editor in the whole genome and providing more alternatives for the application of the single base editor.
In a fourth aspect of the invention there is provided a recombinant vector, heavy bacterium or cell line comprising a gene according to the second aspect of the invention.
In some embodiments of the invention, the cell is a eukaryotic cell or a prokaryotic cell.
In some preferred embodiments of the invention, the cell is a mouse cell or a human cell.
In some preferred embodiments of the invention, the cell is a human embryonic kidney cell.
In some more preferred embodiments of the invention, the cell is an HRK293T cell.
In a fifth aspect of the invention, there is provided a fusion protein according to the first aspect of the invention or a gene according to the second aspect of the invention or a composition according to the third aspect of the invention or a recombinant vector, recombinant bacterium or cell line according to the fourth aspect of the invention for use in gene editing.
In a sixth aspect of the present invention, there is provided a method for gene editing, in particular in vivo or in vitro gene editing using the fusion protein of the first aspect of the present invention or the gene of the second aspect of the present invention or the composition of the third aspect of the present invention or the recombinant vector, recombinant bacterium or cell line of the fourth aspect of the present invention.
The invention has the beneficial effects that:
the invention provides a fusion protein (base editor) based on a Streptococcus zhonghua (Streptococcus sinensis) source and a novel base editing tool, and particularly relates to a novel Cytosine Base Editor (CBE) named SsiCas9-ancBE4max, which is obtained by combining SsiCas9 for identifying NHAAAA with BE4 max. The editing tool comprises a scaffold sequence designed according to a tandem repeat sequence of streptococcus sinensis, the length of the designed targeting gRNA is 20nt, the tool can realize the conversion of a specific base (C-to-T), and the targeting range and the application range of base editing are widened. NHAAAA can be identified as PAM, the editing range is cytosine at 3-12 sites of the 5' end of the target sequence, the cytosine can be efficiently converted into thymine (C-to-T), and the target range of base editing is widened.
And the protein size of the base editing tool can be suitable for the packaging requirement of adenovirus, and has good application prospect. The base editing tool provided by the invention can efficiently induce the efficient conversion of C-to-T in 3-12 positions of the 5' end of an editing window, and the identified PAM is NHAAAA, thereby expanding the genome targeting range of base editing and providing tool selectivity of base editing and gene correction. The base editor provided by the invention reduces the size of the expression plasmid of the base editing tool, so that the base editor is more suitable for the packaging range of adenovirus (AAV), and has good gene therapy prospect and industrialization prospect.
Drawings
FIG. 1 is a schematic diagram of the domain of Ssi protein.
FIG. 2 is a schematic diagram of the protein domain of Ssi-ancBE4 max.
FIG. 3 is a schematic map of the plasmid structure of Ssi-ancBE4 max.
FIG. 4 is a schematic map of the plasmid structure of a gRNA of the Ssi-ancBE4max system.
FIG. 5 is a graph Ssi-ancBE4max, which is a graph showing the experimental results of example 3 of the present invention. Wherein A in FIG. 5 is the result of editing at Ssi2, B in FIG. 5 is the result of editing at Ssi6, C in FIG. 5 is the result of editing at Ssi8, and D in FIG. 5 is the result of editing at Ssi 10.
FIG. 6 is a statistical heat map of the editing efficiency of the Ssi-ancBE4max editing system in HEK293T cells. The dashed box is the edit window schematic.
Detailed Description
The concept and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments to fully understand the objects, features and effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention.
Example 1
The Cas9 protein homolog SsiCas9 from Streptococcus zhonghuanensis is aligned with the SpCas9 in amino acid sequence, a functional structure domain of the SsiCas9 is divided, the structure domain is shown in figure 1, a RuvC domain functional site (aspartic acid D9 at position 9) of the SsiCas9 is found and is mutated into alanine (A), and then SsiCas9D9A nickase is obtained, and the amino acid sequence of the SsiCas9D9A nickase is shown in SEQ ID No. 1.
The prokaryotic codon of the streptococcus sinensis SsiCas9D9A is optimized in a eukaryotic way, so that the coding DNA sequence of SsiCas9D9A suitable for eukaryotic cell expression is obtained, and is shown in SEQ ID NO. 2. Optimized SsiCas9D9A commercial company complete gene synthesis. The construction strategy is to replace SpCas 9D 10A of ancBE4max with SsiCas9D9A on the basis of ancBE4max, wherein the ancBE4max is synthesized by a commercial company through the whole gene. Next, we cleaved a portion of ancBE4max from XTEN linker-SpCas 9D 10A-10aa linker-UGI by endonuclease BamHI enzyme, and then supplemented the cleaved portion of ancBE4max from XTEN linker-SsiCas 9D 9A-10aa linker-UGI (with endonuclease BamHI cleavage sites at both ends of the sequence) when SsiCas9D9A was synthesized by commercial companies, as shown in SEQ ID NO. 10.
The plasmid AncBE4max (vector pCMV) was digested with restriction enzyme BamHI (R0136L) in a water bath at 37 ℃ for 2h, and the digestion system (50. mu.l) was: 10 xBuffer: 5 μ l, vector: 5 μ g, BamHI enzyme: 3 μ l, ddH2O: adding to 50 μ l; identifying whether the enzyme digestion is complete through gel electrophoresis; after the completion of the digestion, the linearized vector was purified using clean up kit (AxyPrep PCR clean kit) using 15. mu.l ddH2And (4) eluting with O. Carrying out PCR amplification on the synthesized XTEN linker-SsiCas 9D 9A-10aa linker-UGI, introducing protective bases outside enzyme cutting sites at two ends, and utilizing a PCR primer synthesized by Jinzhi Biotechnology limited, wherein the primer sequence is as follows:
Ssi PCR for:5’-agcggaggatcctctggcagcgagacacca-3’(SEQ ID NO.11);
Ssi PCR rev:5’-cctccggatcctccgctcagcatcttgatctta-3’(SEQ ID NO.12)。
the vector fragment was amplified by PCR reaction and purified using clean up kit (AxyPrep PCR clean kit). The purified PCR product was digested with BamH1, and the digestion system was referred to above.
The purified XTEN linker-SsiCas 9D 9A-10aa linker-UGI was enzymatically ligated with BamH1 linearized vector pCMV _ ancBE4max to obtain the primary ligation product. The ligation system (10. mu.l) was: purification of the linearized vector pCMV _ ancBE4 max: mu.l (50ng), XTEN linker-SsiCas 9D 9A-10aa linker-UGI BamH1 enzymeAn object: 1. mu.l (100ng), T4 DNA Ligase Buffer: 1. mu.l, T4 DNA Ligase: 1 μ l, ddH2O: 6 mu l of the solution; the conditions for enzyme linkage were 16 degrees ligation for 2 h. And (3) after the enzyme-linked product is converted, coating a plate, selecting a monoclonal shake bacteria for sequencing and cloning and identifying, and constructing the protein and DNA sequences of the SsiCas9-ancBE4max as shown in SEQ ID NO.5 and SEQ ID NO.6 respectively. The polypeptide is formed by sequentially fusing a polypeptide fragment consisting of 2-1122 amino acids at the N end of BPNLS, an ancAPEC 1 polypeptide fragment, a 32aa linker, SsiCas9D9A nickase, a 10aa linker, 2 UGI polypeptide and BPNLS polypeptide sequence from the N end to the C end. Wherein the amino acid sequence of the BPNLS nuclear localization signal polypeptide fragment is shown as SEQ ID NO. 9; wherein the amino acid sequence of the ancAPEC 1 polypeptide is shown as SEQ ID NO.3, the amino acid sequence of the UGI polypeptide is shown as SEQ ID NO.4, the amino acid sequence of the SsiCas9-acnBE4max is shown as SEQ ID NO.5, and the DNA coding sequence corresponding to the amino acid sequence of the SsiCas9-acnBE4max is shown as SEQ ID NO. 6.
The schematic diagram of the successfully constructed plasmid domain is shown in FIG. 2, and the map of the plasmid structure is shown in FIG. 3.
The monoclonal with positive identification is subjected to bacterial liquid amplification culture, plasmids (TIANGEN: TIANPure Midi Plasmid Kit) are extracted according to the Kit steps, and the concentration is measured, so that the sufficient dosage is ensured during transfection, and impurities such as salt, protein and the like are not polluted.
Example 2
2.1 vector construction of SsiCas9-ancBE4max System gRNA plasmid
pGL3-U6-sgRNA (Addgene #51133) is used as an expression framework to construct a gRNA expression vector suitable for an SsiCas9 gRNA editing system. According to a tandem repeat sequence from streptococcus zhonghuanensis, a scaffold sequence suitable for an SsiCas9 gRNA action system is designed, scaffold (suitable for SpCas9) of pGL3-U6-sgRNA (Addgene #51133) is replaced by SsiCas9 gRNA scaffold (suitable for SsiCas9), a successfully constructed complete plasmid is shown as SEQ ID NO.7 and named as pGL3-U6-Ssi, and a plasmid structure schematic diagram is shown in FIG. 4. The restriction sites ligated into the targeted gRNA sequence were two BsaI restriction sites, and the plasmid was synthesized from the whole gene of commercial company.
2.2 construction of SsiCas9-ancBE4max System Targeted gRNA plasmid
Grnas were designed and two complementarily paired oligos were synthesized, with the upstream sequence: 5 '-accg-20 nt-3', the downstream sequence is: 5 '-aaac-20 nt-3' (the 20nt downstream alternative sequence is complementary to the upstream 20nt alternative sequence), and the upstream sequence is 20nt-NHAAAA (DNA chain where PAM is located). The synthesized upstream and downstream sequences were annealed by a program (95 ℃ C., 5 min; cooling rate-2 ℃/s from 95 ℃ C. -85 ℃ C.; cooling rate-0.1 ℃/s from 85 ℃ C. -25 ℃ C.; hold at 4 ℃ C.) and ligated to pGL3-U6-Ssi gRNA vectors linearized with BsaI (NEB: R0539L).
The linearized digestion system is shown below: pGL3-U6-Ssi gRNA 2. mu.g; buffer (NEB: R0539L) 6. mu.L; BsaI 2. mu.L; ddH2The amount of O was adjusted to 60. mu.L. The cleavage was carried out overnight at 37 ℃. The linking system is as follows: t4 ligation buffer (NEB: M0202L) 1. mu.L, linearized vector 20ng, annealed oligo fragment (10. mu.M) 5. mu.L, T4 DNA ligase (NEB: M0202L) 0.5. mu.L, ddH2The amount of O was made up to 10. mu.L. Ligation was carried out overnight at 16 ℃. The linked vector is transformed, selected and identified. The positive clones were amplified to extract the plasmid (Axygene: AP-MN-P-250G) and the concentration was determined.
Human endogenous genes EMX1, RUNX1, DNMT1, AARSD1, GMPR2, ABCD3, NFYB and the like are selected, 19 gRNAs are designed in total, and 20 Oligos are synthesized, wherein the sequences are shown in Table 1.
TABLE 1 Oligos sequences
sgSsi-1 for
5’-ACCGtgggcaagagtttctgccac-3’(SEQ ID NO.13)
sgSsi-1 rev
5’-AAACgtggcagaaactcttgccca-3’(SEQ ID NO.14)
sgSsi-2 for
5’-ACCGctgcgttcctagaaccacag-3’(SEQ ID NO.15)
sgSsi-2 rev
5’-AAACctgtggttctaggaacgcag-3’(SEQ ID NO.16)
sgSsi-3 for
5’-ACCGaatgctggctacagatgtcc-3’(SEQ ID NO.17)
sgSsi-3 rev
5’-AAACggacatctgtagccagcatt-3’(SEQ ID NO.18)
sgSsi-4 for
5’-ACCGctcatatgtcacttacctct-3’(SEQ ID NO.19)
sgSsi-4 rev
5’-AAACagaggtaagtgacatatgag-3’(SEQ ID NO.20)
sgSsi-5 for
5’-ACCGgagacaggatctcactgtgt-3’(SEQ ID NO.21)
sgSsi-5 rev
5’-AAACacacagtgagatcctgtctc-3’(SEQ ID NO.22)
sgSsi-6 for
5’-ACCGtgctctaggtggtgttaatg-3’(SEQ ID NO.23)
sgSsi-6 rev
5’-AAACcattaacaccacctagagca-3’(SEQ ID NO.24)
sgSsi-7 for
5’-ACCGcagcaacatgaacaactgaa-3’(SEQ ID NO.25)
sgSsi-7 rev
5’-AAACttcagttgttcatgttgctg-3’(SEQ ID NO.26)
sgSsi-8 for
5’-ACCGaagagccaagtcttactgta-3’(SEQ ID NO.27)
sgSsi-8 rev
5’-AAACtacagtaagacttggctctt-3’(SEQ ID NO.28)
sgSsi-9 for
5’-ACCGctgacaagtactagcttatg-3’(SEQ ID NO.29)
sgSsi-9 rev
5’-AAACcataagctagtacttgtcag-3’(SEQ ID NO.30)
sgSsi-10 for
5’-ACCGttcctcatagcaacatcact-3’(SEQ ID NO.31)
sgSsi-10 rev
5’-AAACagtgatgttgctatgaggaa-3’(SEQ ID NO.32)
Example 3
HEK293T cells were transfected with the base editing system consisting of the SsiCas9-ancBE4max plasmid and pGL3-U6-Ssi gRNA plasmid constructed in the above example as follows:
3.1HEK293T cells (from ATCC) were recovered and cultured in 10cm dishes (Corning,430167) in DMEM (HyClone, SH30243.01) mixed with 10% fetal bovine serum (HyClone, SV 30087). The culture temperature was 37 ℃ and the carbon dioxide concentration was 5%. After multiple passages when the cell density was 90%, the cells were plated into 24-well plates.
3.2 the HEK293T cells are recovered for three generations and then the cell state is observed, the cells with good state are paved into a 24-well plate, after the paved cells are cultured for 18-24h, the cells are transfected when the cell concentration is 80%, and the dosage of each component in the transfection process is as follows: SsiCas9-ancBE4max plasmid 1 μ g, pGL3-U6-Ssi gRNA plasmid: mu.g, EZTrans transfection reagent (Liji organism) 4.5. mu.l.
3.3 the specific transfection procedure (as high efficiency version procedure of EZ Trans transfection reagent for Prunus hainanensis organisms) is:
3.3.1 configuration reagent a: for each well of cells, 1.5. mu.g of plasmid DNA (1. mu.g of SsiCas9-ancBE4max plasmid + 0.5. mu.g of pGL3-U6-Ssi gRNA plasmid) was diluted to 50. mu.l of serum-free double-antibody-free high-glucose DMEM medium (or OPTI-MEM medium) and mixed well.
3.3.2 configuration B reagent: for each well of cells, 4.5. mu.l of EZ Trans transfection reagent (EZ Trans: plasmid DNA ═ 2:1) was diluted to 50. mu.l of serum-free and diabody-free high-glucose DMEM medium (or OPTI-MEMI medium), and gently mixed. This step does not allow the dilution of plasmid and EZ Trans transfection reagents with serum-containing media, because serum contains large amounts of negatively charged proteins that can interfere with the adsorption of nucleic acids by the transfection reagents, thereby affecting transfection efficiency.
3.3.3 standing the reagent A and the reagent B for 5min simultaneously, adding the reagent B into the reagent A as soon as possible, and mixing the reagents lightly. The order of mixing cannot be reversed.
3.3.4 standing at room temperature for 15min to form EZ Trans-DNA complexes. The EZ Trans-DNA transfection complex prepared is dropped into a culture dish containing cells evenly, and the culture dish is shaken or shaken slightly to disperse the EZ Trans-DNA complex evenly.
3.3.5 at 37 ℃ 5% CO2Culturing for 4-6 h in an incubator, removing the culture solution containing the EZ Trans-DNA compound, replacing with a new culture solution, and culturing for 3 days.
3.4 transfected cells were cultured for 3 days, then the cells were digested with trypsin to obtain GFP-positive cells (FITC fluorescence intensity top 15%) and further flow-sorted to obtain GFP-positive cells, and the genomic DNA was extracted from the collected cells by phenol chloroform method.
3.5 PCR primers are designed and synthesized by 100-130bp respectively at the upstream and downstream of the selected endogenous gene targeting site, and diluted to 10. mu.M with water. Each genomic targeting site fragment was PCR amplified using the Novozam high fidelity enzyme kit (Vazyme, p501-d 2). PCR product samples were recovered by using AxyPrep DNA gel recovery kit (Axygen, AP-GX-250G) as tapping gel to remove non-specific bands. The PCR primer sequences are shown in Table 2.
TABLE 2PCR primer sequences
3.6 preliminarily identifying whether the target fragment is successfully amplified through gel electrophoresis, carrying out Sanger sequencing on the successfully amplified target fragment, and analyzing a sequencing result to observe whether a specific base point mutation (C-to-T or G-to-A) exists in a target site.
The sequencing result is shown in FIG. 5, wherein A in FIG. 5 is the result of editing at Ssi2, B in FIG. 5 is the result of editing at Ssi6, C in FIG. 5 is the result of editing at Ssi8, and D in FIG. 5 is the result of editing at Ssi 10; wherein the first column of the left panels of panels A-D in FIG. 5 is a schematic representation of the target DNA sequence; the second column is a PAM sequence; the right of the figure is a statistical chart of the editing result efficiency of the corresponding target sites. The right panel shows the statistical results of the editing efficiency of C-to-T at different positions in the gRNA range. The editing results of 4 editing sites, Ssi2, Ssi6, Ssi8 and Ssi10, are shown in fig. 5, and it can be seen from fig. 5 that the gene editing tool SsiCas9-ancBE4max obtained in this example 1 can cause efficient C-to-T conversion. Furthermore, in HEK293T cells, a total of 10 endogenous human genome sites were tested, and as a result, see fig. 6, it was found that SsiCas9-ancBE4max all resulted in efficient C-to-T conversion, and the editing range was mainly 3-12 bits within the range of gRNA sequences, widening the targeting range of the base editor.
The present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art. Furthermore, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict.
SEQUENCE LISTING
<110> Guangzhou university
<120> fusion protein, base editing tool and application thereof
<130>
<160> 52
<170> PatentIn version 3.5
<210> 1
<211> 1121
<212> PRT
<213> Artificial sequence
<400> 1
Asn Gly Lys Ile Leu Gly Leu Ala Ile Gly Val Ala Ser Val Gly Val
1 5 10 15
Gly Ile Leu Asp Lys Lys Thr Gly Glu Ile Ile His Ala Ser Ser Arg
20 25 30
Ile Phe Pro Ala Ala Thr Ala Asp Ser Asn Val Glu Arg Arg Gly Phe
35 40 45
Arg Gln Gly Arg Arg Leu Gly Arg Arg Lys Lys His Arg Lys Val Arg
50 55 60
Leu Ala Asp Leu Phe Ser Asp Thr Gly Leu Ile Thr Asp Phe Ser Lys
65 70 75 80
Val Ser Ile Asn Leu Asn Pro Tyr Glu Leu Arg Ile Lys Gly Leu Asn
85 90 95
Glu Lys Leu Thr Asn Glu Glu Leu Phe Ile Ala Leu Lys Asn Ile Val
100 105 110
Lys Arg Arg Gly Ile Ser Tyr Leu Asp Asp Ala Asn Glu Asp Gly Glu
115 120 125
Ser Ser Ser Ser Glu Tyr Gly Lys Ala Val Glu Glu Asn Arg Lys Leu
130 135 140
Leu Ala Asp Lys Thr Pro Gly Gln Ile Gln Leu Glu Arg Phe Glu Lys
145 150 155 160
Tyr Gly Gln Val Arg Gly Asp Phe Thr Ile Glu Glu Asn Gly Glu Lys
165 170 175
His Arg Leu Leu Asn Val Phe Ser Thr Ser Ala Tyr Lys Lys Glu Ala
180 185 190
Glu Arg Ile Leu Thr Lys Gln Gln Asp Tyr Asn Gln Asp Ile Thr Asp
195 200 205
Glu Phe Ile Gln Ala Tyr Leu Thr Ile Leu Thr Gly Lys Arg Lys Tyr
210 215 220
Tyr His Gly Pro Gly Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg Phe
225 230 235 240
Arg Thr Asp Gly Thr Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile Gly
245 250 255
Lys Cys Thr Phe Tyr Pro Glu Glu Tyr Arg Ala Ala Lys Ala Ser Tyr
260 265 270
Thr Ala Gln Glu Phe Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr Val
275 280 285
Pro Thr Glu Thr Lys Lys Leu Ser Glu Glu Gln Lys Arg Gln Ile Ile
290 295 300
Glu Tyr Ala Lys Gly Ala Lys Thr Leu Gly Ala Ala Thr Leu Leu Lys
305 310 315 320
Tyr Ile Ala Lys Leu Val Asp Gly Ser Val Glu Asp Ile Lys Gly Tyr
325 330 335
Arg Ile Asp Lys Ser Glu Lys Pro Glu Met His Thr Phe Asp Ile Tyr
340 345 350
Arg Lys Met Gln Thr Leu Glu Thr Val Asp Val Glu Lys Leu Ser Arg
355 360 365
Glu Val Leu Asp Glu Leu Ala His Ile Leu Thr Leu Asn Thr Glu Arg
370 375 380
Glu Gly Ile Glu Glu Ala Ile Lys Val Ser Phe Ile Lys Arg Glu Phe
385 390 395 400
Glu Gln Asp Gln Ile Ala Glu Leu Val Ser Phe Arg Lys Ser Asn Ser
405 410 415
Ser Leu Phe Gly Lys Gly Trp His Asn Phe Ser Ile Lys Leu Met Thr
420 425 430
Glu Leu Ile Pro Glu Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr Ile
435 440 445
Leu Thr Arg Leu Gly Lys Gln Lys Thr Lys Ala Arg Ser Lys Arg Thr
450 455 460
Lys Tyr Ile Asp Glu Lys Glu Leu Thr Asp Glu Ile Tyr Asn Pro Val
465 470 475 480
Val Ala Lys Ser Val Arg Gln Ala Ile Lys Ile Ile Asn Leu Ala Thr
485 490 495
Lys Lys Tyr Gly Val Phe Asp Asn Ile Val Ile Glu Met Ala Arg Glu
500 505 510
Asn Asn Glu Glu Asp Ala Lys Lys Asp Tyr Val Lys Arg Gln Lys Ala
515 520 525
Asn Glu Asp Glu Lys Asn Ala Ala Met Glu Lys Ala Ala His Gln Tyr
530 535 540
Asn Gly Lys Lys Glu Leu Pro Asp Asn Val Phe His Gly His Lys Glu
545 550 555 560
Leu Ala Thr Lys Ile Arg Leu Trp His Gln Gln Gly Glu Lys Cys Leu
565 570 575
Tyr Thr Gly Lys Asn Ile Pro Ile Ser Asp Leu Ile His Asn Gln Tyr
580 585 590
Lys Tyr Glu Ile Asp His Ile Leu Pro Leu Ser Leu Ser Phe Asp Asp
595 600 605
Ser Leu Ala Asn Lys Val Leu Val Leu Ala Thr Ala Asn Gln Glu Lys
610 615 620
Gly Gln Arg Thr Pro Phe Gln Ala Leu Asp Ser Met Asp Asp Ala Trp
625 630 635 640
Ser Tyr Arg Glu Phe Lys Ala Tyr Val Arg Gly Ala Arg Ala Leu Ser
645 650 655
Asn Lys Lys Lys Asp Tyr Leu Leu Asn Glu Glu Asp Ile Asn Lys Ile
660 665 670
Glu Val Lys Gln Lys Phe Ile Glu Arg Asn Leu Val Asp Thr Arg Tyr
675 680 685
Ser Ser Arg Val Val Leu Asn Ala Leu Gln Asp Phe Tyr Lys Leu Asn
690 695 700
Asp Phe Asp Thr Lys Ile Ser Val Val Arg Gly Gln Phe Thr Ser Gln
705 710 715 720
Leu Arg Arg Lys Trp Arg Ile Asp Lys Ser Arg Glu Thr Tyr His His
725 730 735
His Ala Val Asp Ala Leu Ile Ile Ala Ala Ser Ser Gln Leu Arg Leu
740 745 750
Trp Lys Lys Gln Gly Asn Pro Leu Ile Ser Tyr Lys Glu Asn Gln Phe
755 760 765
Val Asp Ser Glu Thr Gly Glu Ile Ile Ser Leu Thr Asp Asp Glu Tyr
770 775 780
Lys Glu Leu Val Phe Arg Ala Pro Tyr Asp His Phe Val Asp Thr Val
785 790 795 800
Ser Ser Lys Lys Phe Glu Asp Arg Ile Leu Phe Ser Tyr Gln Val Asp
805 810 815
Ser Lys Tyr Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ser Thr Arg
820 825 830
Lys Ala Lys Leu Gly Lys Asp Lys Ser Glu Glu Thr Tyr Val Leu Gly
835 840 845
Lys Ile Lys Asp Ile Tyr Thr Gln Thr Gly Tyr Asp Ala Phe Ile Lys
850 855 860
Leu Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr His Lys Asp Pro
865 870 875 880
Ile Thr Phe Glu Lys Val Ile Glu Glu Ile Leu Lys Thr Tyr Pro Asp
885 890 895
Lys Glu Ile Asn Glu Lys Gly Lys Glu Val Ala Cys Asn Pro Phe Glu
900 905 910
Lys Tyr Arg Gln Glu Asn Gly Pro Leu Arg Lys Tyr Ser Lys Lys Gly
915 920 925
Lys Gly Pro Glu Ile Lys Ser Leu Lys Tyr Tyr Asp Asn Lys Leu Gly
930 935 940
Asn His Ile Asp Ile Thr Pro Asp Asn Ser Glu Asn Gln Val Ile Leu
945 950 955 960
Gln Ser Leu Lys Pro Trp Arg Thr Asp Val Tyr Phe Asn His Lys Thr
965 970 975
Lys Ile Tyr Glu Leu Met Gly Leu Lys Tyr Ser Asp Leu Ser Phe Glu
980 985 990
Lys Gly Ser Gly Lys Tyr Arg Ile Ser Leu Asp Lys Tyr Asn Val Ile
995 1000 1005
Lys Lys Lys Glu Gly Val His Lys Glu Ser Glu Phe Lys Phe Thr
1010 1015 1020
Leu Tyr Lys Asn Asp Leu Ile Leu Ile Lys Asp Leu Glu Lys Ser
1025 1030 1035
Glu Gln Gln Leu Phe Arg Tyr Asn Ser Arg Asn Asp Thr Ser Lys
1040 1045 1050
His Tyr Val Glu Leu Lys Pro Tyr Asp Lys Ala Lys Phe Glu Gly
1055 1060 1065
Asn Gln Pro Leu Met Ala Leu Phe Gly Asn Val Ala Lys Gly Gly
1070 1075 1080
Gln Cys Leu Lys Gly Leu Asn Lys Ala Asn Ile Ser Ile Tyr Lys
1085 1090 1095
Val Gln Thr Asp Val Leu Gly Asn Lys Arg Phe Ile Lys Lys Glu
1100 1105 1110
Gly Asp Ala Pro Lys Leu Glu Phe
1115 1120
<210> 2
<211> 3363
<212> DNA
<213> Artificial sequence
<400> 2
aacggcaaga tcctgggact ggccatcgga gttgcatctg ttggagtggg catcctggac 60
aagaagaccg gcgagatcat ccacgccagc agcagaatct tccccgccgc cacagccgat 120
agcaacgtgg aacggagggg cttcagacag ggaagacggc tgggccgtag aaaaaaacac 180
agaaaggtgc ggttggccga tctgttcagc gacaccggcc tgataacaga cttctctaaa 240
gtgtctatca acctgaaccc ctacgagctg cggatcaagg gcctcaatga gaaactgaca 300
aacgaggaac tgttcatcgc cctgaagaac atcgtgaaga gaagaggcat cagctacctg 360
gatgacgcca atgaggacgg cgagagctcc tctagcgagt acggcaaggc tgtggaagaa 420
aaccgaaagt tgctggccga caagactcct ggccagatcc agctggaacg cttcgaaaag 480
tacggacagg tccgaggaga tttcaccatc gaggaaaacg gcgaaaagca tagactgctg 540
aacgtgttca gcaccagcgc ctataagaaa gaagccgagc ggattctgac caagcagcaa 600
gattacaacc aagacatcac cgacgagttc atccaggcct acctgacaat cctgacggga 660
aagagaaagt actaccatgg ccccggcaac gagaagtcta gaaccgacta cggccggttc 720
aggaccgatg gcaccaccct ggacaacatc tttggcatcc tgatcggcaa atgtacattc 780
tacccagagg agtaccgggc ggccaaggcc tcttacaccg cccaggagtt taacctcctg 840
aatgacctga acaatctgac agttccaacc gagacaaaga aactgagcga ggaacagaag 900
cggcaaatca tcgagtacgc caagggagcc aagacacttg gagccgccac cctgctcaag 960
tacatcgcca agctggtgga cggctctgtg gaggatatca agggctatag aattgataaa 1020
agcgagaaac ctgagatgca cacattcgat atctacagaa agatgcagac actggaaacc 1080
gtggatgtgg aaaagctgtc acgcgaggtg ctggatgagc tggcccatat cctgacactg 1140
aataccgaga gagaaggtat cgaggaggcc atcaaggtca gctttatcaa gagagagttc 1200
gaacaggacc agatcgccga gctggtcagc ttccggaagt ccaactctag cctgtttggc 1260
aagggctggc acaacttcag tatcaaactg atgacagaac tgatccccga gctgtatgag 1320
accagcgaag agcagatgac catcctgacc agactgggaa agcaaaagac aaaggctaga 1380
agcaagcgca caaagtacat cgacgagaag gagctgaccg acgagatcta caaccccgtg 1440
gtggccaaga gcgtgagaca ggccattaag atcatcaacc tggccaccaa gaagtacggc 1500
gtgttcgaca acatcgtgat cgagatggcc agagagaaca acgaggagga tgccaagaaa 1560
gattacgtga aaagacaaaa agctaatgag gacgaaaaga acgccgctat ggaaaaggct 1620
gcccaccagt acaacggcaa gaaggagctg cccgataacg tgtttcacgg ccacaaggaa 1680
ctggccacaa agatcagact gtggcaccag cagggcgaga agtgcctgta caccggcaaa 1740
aacatcccta tctctgatct gatccacaac cagtataagt acgagatcga ccacatcctg 1800
cctctgtcac tgagcttcga cgacagcctg gccaataagg tgctggtgct cgctaccgcc 1860
aaccaggaga agggccaaag aacacctttc caggccctcg acagcatgga cgatgcgtgg 1920
tcctatagag aatttaaggc ctacgtgcgg ggcgccagag ccctgagcaa caagaaaaaa 1980
gattacctgc tgaatgaaga ggacatcaac aagatcgaag tgaagcagaa attcatcgag 2040
aggaaccttg tggacactcg gtactcctct agagtggtcc tgaacgccct gcaggacttc 2100
tacaagctga atgatttcga caccaagatc agcgtggtga gaggccagtt caccagccag 2160
ctgagacgga aatggagaat cgacaagagc agagaaacct accaccacca cgccgtggac 2220
gctctgatca ttgccgctag ctcgcagctg agactgtgga agaagcaggg caacccactg 2280
atcagctaca aggaaaacca gttcgtcgac tccgaaaccg gagaaattat cagcctcaca 2340
gatgatgaat acaaggaact ggtgttccgg gctccatacg accacttcgt ggacacagtg 2400
agcagcaaaa agtttgaaga cagaatcctt ttctcctacc aggtggattc caaatacaac 2460
cggaaaatca gcgacgccac catttactct accagaaagg ccaagctggg caaagacaag 2520
agcgaggaaa cctacgtgct gggcaagata aaggacatct acacccagac cggctacgat 2580
gccttcatca agctgtacaa gaaggacaag tccaaatttc tgatgtacca caaggatcct 2640
atcacctttg agaaggtgat cgaggaaatc ctgaagacct accccgacaa ggaaatcaac 2700
gagaagggca aggaagtggc atgcaaccct tttgaaaaat atagacagga gaatggacct 2760
ctgagaaagt attctaagaa aggtaagggc cctgagatca agagcctgaa gtactacgac 2820
aacaaactcg gcaaccacat cgacataacc cctgacaaca gcgaaaatca ggtgatcctc 2880
cagtccctga aaccttggcg gaccgacgtg tacttcaacc acaaaaccaa gatttatgag 2940
ctgatgggcc tgaagtacag cgacctgagc ttcgagaagg gcagcggcaa gtaccggatt 3000
agcctggaca aatataacgt gatcaagaaa aaggagggcg tgcacaagga aagcgagttc 3060
aagttcacac tgtacaagaa cgacctgatc ctaatcaagg atctggaaaa gagcgagcag 3120
cagctgttta gatacaacag ccggaacgat acatccaagc actacgtgga gctgaagcct 3180
tacgacaagg ccaaattcga gggaaatcaa cctctgatgg ccctgttcgg caatgtggcc 3240
aagggaggcc agtgcctgaa gggcctgaac aaagccaaca tcagcatcta caaggtgcag 3300
accgacgtgc tgggcaacaa gcggttcatc aagaaagaag gcgacgctcc taagctggaa 3360
ttt 3363
<210> 3
<211> 228
<212> PRT
<213> Artificial sequence
<400> 3
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Thr Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys
35 40 45
Ile Trp Arg His Ser Ser Lys Asn Thr Thr Lys His Val Glu Val Asn
50 55 60
Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Thr Ser
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser
85 90 95
Lys Ala Ile Thr Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val
100 105 110
Ile Tyr Val Ala Arg Leu Tyr His His Met Asp Gln Gln Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr
130 135 140
Ala Pro Glu Tyr Asp Tyr Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro
145 150 155 160
Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu
165 170 175
Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210> 4
<211> 190
<212> PRT
<213> Artificial sequence
<400> 4
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1 5 10 15
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile
20 25 30
Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu
35 40 45
Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr
50 55 60
Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile
65 70 75 80
Lys Met Leu Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser Thr Asn Leu
85 90 95
Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu
100 105 110
Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys
115 120 125
Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp
130 135 140
Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp
145 150 155 160
Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu
165 170 175
Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu
180 185 190
<210> 5
<211> 1595
<212> PRT
<213> Artificial sequence
<400> 5
Pro Lys Lys Lys Arg Lys Val Ser Ser Glu Thr Gly Pro Val Ala Val
1 5 10 15
Asp Pro Thr Leu Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val Phe
20 25 30
Phe Asp Pro Arg Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile
35 40 45
Lys Trp Gly Thr Ser His Lys Ile Trp Arg His Ser Ser Lys Asn Thr
50 55 60
Thr Lys His Val Glu Val Asn Phe Ile Glu Lys Phe Thr Ser Glu Arg
65 70 75 80
His Phe Cys Pro Ser Thr Ser Cys Ser Ile Thr Trp Phe Leu Ser Trp
85 90 95
Ser Pro Cys Gly Glu Cys Ser Lys Ala Ile Thr Glu Phe Leu Ser Gln
100 105 110
His Pro Asn Val Thr Leu Val Ile Tyr Val Ala Arg Leu Tyr His His
115 120 125
Met Asp Gln Gln Asn Arg Gln Gly Leu Arg Asp Leu Val Asn Ser Gly
130 135 140
Val Thr Ile Gln Ile Met Thr Ala Pro Glu Tyr Asp Tyr Cys Trp Arg
145 150 155 160
Asn Phe Val Asn Tyr Pro Pro Gly Lys Glu Ala His Trp Pro Arg Tyr
165 170 175
Pro Pro Leu Trp Met Lys Leu Tyr Ala Leu Glu Leu His Ala Gly Ile
180 185 190
Leu Gly Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln
195 200 205
Leu Thr Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu
210 215 220
Pro Pro His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Gly Ser Ser
225 230 235 240
Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr
245 250 255
Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Asn Gly Lys Ile Leu
260 265 270
Gly Leu Ala Ile Gly Val Ala Ser Val Gly Val Gly Ile Leu Asp Lys
275 280 285
Lys Thr Gly Glu Ile Ile His Ala Ser Ser Arg Ile Phe Pro Ala Ala
290 295 300
Thr Ala Asp Ser Asn Val Glu Arg Arg Gly Phe Arg Gln Gly Arg Arg
305 310 315 320
Leu Gly Arg Arg Lys Lys His Arg Lys Val Arg Leu Ala Asp Leu Phe
325 330 335
Ser Asp Thr Gly Leu Ile Thr Asp Phe Ser Lys Val Ser Ile Asn Leu
340 345 350
Asn Pro Tyr Glu Leu Arg Ile Lys Gly Leu Asn Glu Lys Leu Thr Asn
355 360 365
Glu Glu Leu Phe Ile Ala Leu Lys Asn Ile Val Lys Arg Arg Gly Ile
370 375 380
Ser Tyr Leu Asp Asp Ala Asn Glu Asp Gly Glu Ser Ser Ser Ser Glu
385 390 395 400
Tyr Gly Lys Ala Val Glu Glu Asn Arg Lys Leu Leu Ala Asp Lys Thr
405 410 415
Pro Gly Gln Ile Gln Leu Glu Arg Phe Glu Lys Tyr Gly Gln Val Arg
420 425 430
Gly Asp Phe Thr Ile Glu Glu Asn Gly Glu Lys His Arg Leu Leu Asn
435 440 445
Val Phe Ser Thr Ser Ala Tyr Lys Lys Glu Ala Glu Arg Ile Leu Thr
450 455 460
Lys Gln Gln Asp Tyr Asn Gln Asp Ile Thr Asp Glu Phe Ile Gln Ala
465 470 475 480
Tyr Leu Thr Ile Leu Thr Gly Lys Arg Lys Tyr Tyr His Gly Pro Gly
485 490 495
Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg Phe Arg Thr Asp Gly Thr
500 505 510
Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile Gly Lys Cys Thr Phe Tyr
515 520 525
Pro Glu Glu Tyr Arg Ala Ala Lys Ala Ser Tyr Thr Ala Gln Glu Phe
530 535 540
Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr Val Pro Thr Glu Thr Lys
545 550 555 560
Lys Leu Ser Glu Glu Gln Lys Arg Gln Ile Ile Glu Tyr Ala Lys Gly
565 570 575
Ala Lys Thr Leu Gly Ala Ala Thr Leu Leu Lys Tyr Ile Ala Lys Leu
580 585 590
Val Asp Gly Ser Val Glu Asp Ile Lys Gly Tyr Arg Ile Asp Lys Ser
595 600 605
Glu Lys Pro Glu Met His Thr Phe Asp Ile Tyr Arg Lys Met Gln Thr
610 615 620
Leu Glu Thr Val Asp Val Glu Lys Leu Ser Arg Glu Val Leu Asp Glu
625 630 635 640
Leu Ala His Ile Leu Thr Leu Asn Thr Glu Arg Glu Gly Ile Glu Glu
645 650 655
Ala Ile Lys Val Ser Phe Ile Lys Arg Glu Phe Glu Gln Asp Gln Ile
660 665 670
Ala Glu Leu Val Ser Phe Arg Lys Ser Asn Ser Ser Leu Phe Gly Lys
675 680 685
Gly Trp His Asn Phe Ser Ile Lys Leu Met Thr Glu Leu Ile Pro Glu
690 695 700
Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr Ile Leu Thr Arg Leu Gly
705 710 715 720
Lys Gln Lys Thr Lys Ala Arg Ser Lys Arg Thr Lys Tyr Ile Asp Glu
725 730 735
Lys Glu Leu Thr Asp Glu Ile Tyr Asn Pro Val Val Ala Lys Ser Val
740 745 750
Arg Gln Ala Ile Lys Ile Ile Asn Leu Ala Thr Lys Lys Tyr Gly Val
755 760 765
Phe Asp Asn Ile Val Ile Glu Met Ala Arg Glu Asn Asn Glu Glu Asp
770 775 780
Ala Lys Lys Asp Tyr Val Lys Arg Gln Lys Ala Asn Glu Asp Glu Lys
785 790 795 800
Asn Ala Ala Met Glu Lys Ala Ala His Gln Tyr Asn Gly Lys Lys Glu
805 810 815
Leu Pro Asp Asn Val Phe His Gly His Lys Glu Leu Ala Thr Lys Ile
820 825 830
Arg Leu Trp His Gln Gln Gly Glu Lys Cys Leu Tyr Thr Gly Lys Asn
835 840 845
Ile Pro Ile Ser Asp Leu Ile His Asn Gln Tyr Lys Tyr Glu Ile Asp
850 855 860
His Ile Leu Pro Leu Ser Leu Ser Phe Asp Asp Ser Leu Ala Asn Lys
865 870 875 880
Val Leu Val Leu Ala Thr Ala Asn Gln Glu Lys Gly Gln Arg Thr Pro
885 890 895
Phe Gln Ala Leu Asp Ser Met Asp Asp Ala Trp Ser Tyr Arg Glu Phe
900 905 910
Lys Ala Tyr Val Arg Gly Ala Arg Ala Leu Ser Asn Lys Lys Lys Asp
915 920 925
Tyr Leu Leu Asn Glu Glu Asp Ile Asn Lys Ile Glu Val Lys Gln Lys
930 935 940
Phe Ile Glu Arg Asn Leu Val Asp Thr Arg Tyr Ser Ser Arg Val Val
945 950 955 960
Leu Asn Ala Leu Gln Asp Phe Tyr Lys Leu Asn Asp Phe Asp Thr Lys
965 970 975
Ile Ser Val Val Arg Gly Gln Phe Thr Ser Gln Leu Arg Arg Lys Trp
980 985 990
Arg Ile Asp Lys Ser Arg Glu Thr Tyr His His His Ala Val Asp Ala
995 1000 1005
Leu Ile Ile Ala Ala Ser Ser Gln Leu Arg Leu Trp Lys Lys Gln
1010 1015 1020
Gly Asn Pro Leu Ile Ser Tyr Lys Glu Asn Gln Phe Val Asp Ser
1025 1030 1035
Glu Thr Gly Glu Ile Ile Ser Leu Thr Asp Asp Glu Tyr Lys Glu
1040 1045 1050
Leu Val Phe Arg Ala Pro Tyr Asp His Phe Val Asp Thr Val Ser
1055 1060 1065
Ser Lys Lys Phe Glu Asp Arg Ile Leu Phe Ser Tyr Gln Val Asp
1070 1075 1080
Ser Lys Tyr Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ser Thr
1085 1090 1095
Arg Lys Ala Lys Leu Gly Lys Asp Lys Ser Glu Glu Thr Tyr Val
1100 1105 1110
Leu Gly Lys Ile Lys Asp Ile Tyr Thr Gln Thr Gly Tyr Asp Ala
1115 1120 1125
Phe Ile Lys Leu Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr
1130 1135 1140
His Lys Asp Pro Ile Thr Phe Glu Lys Val Ile Glu Glu Ile Leu
1145 1150 1155
Lys Thr Tyr Pro Asp Lys Glu Ile Asn Glu Lys Gly Lys Glu Val
1160 1165 1170
Ala Cys Asn Pro Phe Glu Lys Tyr Arg Gln Glu Asn Gly Pro Leu
1175 1180 1185
Arg Lys Tyr Ser Lys Lys Gly Lys Gly Pro Glu Ile Lys Ser Leu
1190 1195 1200
Lys Tyr Tyr Asp Asn Lys Leu Gly Asn His Ile Asp Ile Thr Pro
1205 1210 1215
Asp Asn Ser Glu Asn Gln Val Ile Leu Gln Ser Leu Lys Pro Trp
1220 1225 1230
Arg Thr Asp Val Tyr Phe Asn His Lys Thr Lys Ile Tyr Glu Leu
1235 1240 1245
Met Gly Leu Lys Tyr Ser Asp Leu Ser Phe Glu Lys Gly Ser Gly
1250 1255 1260
Lys Tyr Arg Ile Ser Leu Asp Lys Tyr Asn Val Ile Lys Lys Lys
1265 1270 1275
Glu Gly Val His Lys Glu Ser Glu Phe Lys Phe Thr Leu Tyr Lys
1280 1285 1290
Asn Asp Leu Ile Leu Ile Lys Asp Leu Glu Lys Ser Glu Gln Gln
1295 1300 1305
Leu Phe Arg Tyr Asn Ser Arg Asn Asp Thr Ser Lys His Tyr Val
1310 1315 1320
Glu Leu Lys Pro Tyr Asp Lys Ala Lys Phe Glu Gly Asn Gln Pro
1325 1330 1335
Leu Met Ala Leu Phe Gly Asn Val Ala Lys Gly Gly Gln Cys Leu
1340 1345 1350
Lys Gly Leu Asn Lys Ala Asn Ile Ser Ile Tyr Lys Val Gln Thr
1355 1360 1365
Asp Val Leu Gly Asn Lys Arg Phe Ile Lys Lys Glu Gly Asp Ala
1370 1375 1380
Pro Lys Leu Glu Phe Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser
1385 1390 1395
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu
1400 1405 1410
Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu
1415 1420 1425
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala
1430 1435 1440
Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp
1445 1450 1455
Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn
1460 1465 1470
Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Gly Gly Ser
1475 1480 1485
Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly
1490 1495 1500
Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu
1505 1510 1515
Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val
1520 1525 1530
His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu
1535 1540 1545
Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln
1550 1555 1560
Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser
1565 1570 1575
Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Pro Lys Lys Lys Arg
1580 1585 1590
Lys Val
1595
<210> 6
<211> 4785
<212> DNA
<213> Artificial sequence
<400> 6
ccaaagaaga agcggaaagt cagcagtgaa accggaccag tggcagtgga cccaaccctg 60
aggagacgga ttgagcccca tgaatttgaa gtgttctttg acccaaggga gctgaggaag 120
gagacatgcc tgctgtacga gatcaagtgg ggcacaagcc acaagatctg gcgccacagc 180
tccaagaaca ccacaaagca cgtggaagtg aatttcatcg agaagtttac ctccgagcgg 240
cacttctgcc cctctaccag ctgttccatc acatggtttc tgtcttggag cccttgcggc 300
gagtgttcca aggccatcac cgagttcctg tctcagcacc ctaacgtgac cctggtcatc 360
tacgtggccc ggctgtatca ccacatggac cagcagaaca ggcagggcct gcgcgatctg 420
gtgaattctg gcgtgaccat ccagatcatg acagccccag agtacgacta ttgctggcgg 480
aacttcgtga attatccacc tggcaaggag gcacactggc caagataccc acccctgtgg 540
atgaagctgt atgcactgga gctgcacgca ggaatcctgg gcctgcctcc atgtctgaat 600
atcctgcgga gaaagcagcc ccagctgaca tttttcacca ttgctctgca gtcttgtcac 660
tatcagcggc tgcctcctca tattctgtgg gctacaggcc tgaagtctgg aggatctagc 720
ggaggatcct ctggcagcga gacaccagga acaagcgagt cagcaacacc agagagcagt 780
ggcggcagca gcggcggcag caacggcaag atcctgggac tggccatcgg agttgcatct 840
gttggagtgg gcatcctgga caagaagacc ggcgagatca tccacgccag cagcagaatc 900
ttccccgccg ccacagccga tagcaacgtg gaacggaggg gcttcagaca gggaagacgg 960
ctgggccgta gaaaaaaaca cagaaaggtg cggttggccg atctgttcag cgacaccggc 1020
ctgataacag acttctctaa agtgtctatc aacctgaacc cctacgagct gcggatcaag 1080
ggcctcaatg agaaactgac aaacgaggaa ctgttcatcg ccctgaagaa catcgtgaag 1140
agaagaggca tcagctacct ggatgacgcc aatgaggacg gcgagagctc ctctagcgag 1200
tacggcaagg ctgtggaaga aaaccgaaag ttgctggccg acaagactcc tggccagatc 1260
cagctggaac gcttcgaaaa gtacggacag gtccgaggag atttcaccat cgaggaaaac 1320
ggcgaaaagc atagactgct gaacgtgttc agcaccagcg cctataagaa agaagccgag 1380
cggattctga ccaagcagca agattacaac caagacatca ccgacgagtt catccaggcc 1440
tacctgacaa tcctgacggg aaagagaaag tactaccatg gccccggcaa cgagaagtct 1500
agaaccgact acggccggtt caggaccgat ggcaccaccc tggacaacat ctttggcatc 1560
ctgatcggca aatgtacatt ctacccagag gagtaccggg cggccaaggc ctcttacacc 1620
gcccaggagt ttaacctcct gaatgacctg aacaatctga cagttccaac cgagacaaag 1680
aaactgagcg aggaacagaa gcggcaaatc atcgagtacg ccaagggagc caagacactt 1740
ggagccgcca ccctgctcaa gtacatcgcc aagctggtgg acggctctgt ggaggatatc 1800
aagggctata gaattgataa aagcgagaaa cctgagatgc acacattcga tatctacaga 1860
aagatgcaga cactggaaac cgtggatgtg gaaaagctgt cacgcgaggt gctggatgag 1920
ctggcccata tcctgacact gaataccgag agagaaggta tcgaggaggc catcaaggtc 1980
agctttatca agagagagtt cgaacaggac cagatcgccg agctggtcag cttccggaag 2040
tccaactcta gcctgtttgg caagggctgg cacaacttca gtatcaaact gatgacagaa 2100
ctgatccccg agctgtatga gaccagcgaa gagcagatga ccatcctgac cagactggga 2160
aagcaaaaga caaaggctag aagcaagcgc acaaagtaca tcgacgagaa ggagctgacc 2220
gacgagatct acaaccccgt ggtggccaag agcgtgagac aggccattaa gatcatcaac 2280
ctggccacca agaagtacgg cgtgttcgac aacatcgtga tcgagatggc cagagagaac 2340
aacgaggagg atgccaagaa agattacgtg aaaagacaaa aagctaatga ggacgaaaag 2400
aacgccgcta tggaaaaggc tgcccaccag tacaacggca agaaggagct gcccgataac 2460
gtgtttcacg gccacaagga actggccaca aagatcagac tgtggcacca gcagggcgag 2520
aagtgcctgt acaccggcaa aaacatccct atctctgatc tgatccacaa ccagtataag 2580
tacgagatcg accacatcct gcctctgtca ctgagcttcg acgacagcct ggccaataag 2640
gtgctggtgc tcgctaccgc caaccaggag aagggccaaa gaacaccttt ccaggccctc 2700
gacagcatgg acgatgcgtg gtcctataga gaatttaagg cctacgtgcg gggcgccaga 2760
gccctgagca acaagaaaaa agattacctg ctgaatgaag aggacatcaa caagatcgaa 2820
gtgaagcaga aattcatcga gaggaacctt gtggacactc ggtactcctc tagagtggtc 2880
ctgaacgccc tgcaggactt ctacaagctg aatgatttcg acaccaagat cagcgtggtg 2940
agaggccagt tcaccagcca gctgagacgg aaatggagaa tcgacaagag cagagaaacc 3000
taccaccacc acgccgtgga cgctctgatc attgccgcta gctcgcagct gagactgtgg 3060
aagaagcagg gcaacccact gatcagctac aaggaaaacc agttcgtcga ctccgaaacc 3120
ggagaaatta tcagcctcac agatgatgaa tacaaggaac tggtgttccg ggctccatac 3180
gaccacttcg tggacacagt gagcagcaaa aagtttgaag acagaatcct tttctcctac 3240
caggtggatt ccaaatacaa ccggaaaatc agcgacgcca ccatttactc taccagaaag 3300
gccaagctgg gcaaagacaa gagcgaggaa acctacgtgc tgggcaagat aaaggacatc 3360
tacacccaga ccggctacga tgccttcatc aagctgtaca agaaggacaa gtccaaattt 3420
ctgatgtacc acaaggatcc tatcaccttt gagaaggtga tcgaggaaat cctgaagacc 3480
taccccgaca aggaaatcaa cgagaagggc aaggaagtgg catgcaaccc ttttgaaaaa 3540
tatagacagg agaatggacc tctgagaaag tattctaaga aaggtaaggg ccctgagatc 3600
aagagcctga agtactacga caacaaactc ggcaaccaca tcgacataac ccctgacaac 3660
agcgaaaatc aggtgatcct ccagtccctg aaaccttggc ggaccgacgt gtacttcaac 3720
cacaaaacca agatttatga gctgatgggc ctgaagtaca gcgacctgag cttcgagaag 3780
ggcagcggca agtaccggat tagcctggac aaatataacg tgatcaagaa aaaggagggc 3840
gtgcacaagg aaagcgagtt caagttcaca ctgtacaaga acgacctgat cctaatcaag 3900
gatctggaaa agagcgagca gcagctgttt agatacaaca gccggaacga tacatccaag 3960
cactacgtgg agctgaagcc ttacgacaag gccaaattcg agggaaatca acctctgatg 4020
gccctgttcg gcaatgtggc caagggaggc cagtgcctga agggcctgaa caaagccaac 4080
atcagcatct acaaggtgca gaccgacgtg ctgggcaaca agcggttcat caagaaagaa 4140
ggcgacgctc ctaagctgga atttagcggc gggagcggcg ggagcggggg gagcactaat 4200
ctgagcgaca tcattgagaa ggagactggg aaacagctgg tcattcagga gtccatcctg 4260
atgctgcctg aggaggtgga ggaagtgatc ggcaacaagc cagagtctga catcctggtg 4320
cacaccgcct acgacgagtc cacagatgag aatgtgatgc tgctgacctc tgacgccccc 4380
gagtataagc cttgggccct ggtcatccag gattctaacg gcgagaataa gatcaagatg 4440
ctgagcggag gatccggagg atctggaggc agcaccaacc tgtctgacat catcgagaag 4500
gagacaggca agcagctggt catccaggag agcatcctga tgctgcccga agaagtcgaa 4560
gaagtgatcg gaaacaagcc tgagagcgat atcctggtcc ataccgccta cgacgagagt 4620
accgacgaaa atgtgatgct gctgacatcc gacgccccag agtataagcc ctgggctctg 4680
gtcatccagg attccaacgg agagaacaaa atcaaaatgc tgtctggcgg ctcaaaaaga 4740
accgccgacg gcagcgaatt cgagcccaag aagaagagga aagtc 4785
<210> 7
<211> 4937
<212> DNA
<213> Artificial sequence
<400> 7
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg tgagaccgag agagggtctc agtttttgta ctctcaagaa attgcagaag 300
ctacaaagat aaggcttcat gccgaaatca acaccctgtc tcttggcggg gtgttttttt 360
ttttaaagaa ttctcgacct cgagacaaat ggcagtattc atccacaatt ttaaaagaaa 420
aggggggatt ggggggtaca gtgcagggga aagaatagta gacataatag caacagacat 480
acaaactaaa gaattacaaa aacaaattac aaaaattcaa aattttcggg tttattacag 540
ggacagcaga gatccacttt ggccgcggct cgagggggtt ggggttgcgc cttttccaag 600
gcagccctgg gtttgcgcag ggacgcggct gctctgggcg tggttccggg aaacgcagcg 660
gcgccgaccc tgggactcgc acattcttca cgtccgttcg cagcgtcacc cggatcttcg 720
ccgctaccct tgtgggcccc ccggcgacgc ttcctgctcc gcccctaagt cgggaaggtt 780
ccttgcggtt cgcggcgtgc cggacgtgac aaacggaagc cgcacgtctc actagtaccc 840
tcgcagacgg acagcgccag ggagcaatgg cagcgcgccg accgcgatgg gctgtggcca 900
atagcggctg ctcagcaggg cgcgccgaga gcagcggccg ggaaggggcg gtgcgggagg 960
cggggtgtgg ggcggtagtg tgggccctgt tcctgcccgc gcggtgttcc gcattctgca 1020
agcctccgga gcgcacgtcg gcagtcggct ccctcgttga ccgaatcacc gacctctctc 1080
cccaggggga tccatggtga gcaagggcga ggagctgttc accggggtgg tgcccatcct 1140
ggtcgagctg gacggcgacg taaacggcca caagttcagc gtgtccggcg agggcgaggg 1200
cgatgccacc tacggcaagc tgaccctgaa gttcatctgc accaccggca agctgcccgt 1260
gccctggccc accctcgtga ccaccctgac ctacggcgtg cagtgcttca gccgctaccc 1320
cgaccacatg aagcagcacg acttcttcaa gtccgccatg cccgaaggct acgtccagga 1380
gcgcaccatc ttcttcaagg acgacggcaa ctacaagacc cgcgccgagg tgaagttcga 1440
gggcgacacc ctggtgaacc gcatcgagct gaagggcatc gacttcaagg aggacggcaa 1500
catcctgggg cacaagctgg agtacaacta caacagccac aacgtctata tcatggccga 1560
caagcagaag aacggcatca aggtgaactt caagatccgc cacaacatcg aggacggcag 1620
cgtgcagctc gccgaccact accagcagaa cacccccatc ggcgacggcc ccgtgctgct 1680
gcccgacaac cactacctga gcacccagtc cgccctgagc aaagacccca acgagaagcg 1740
cgatcacatg gtcctgctgg agttcgtgac cgccgccggg atcactctcg gcatggacga 1800
gctgtacaag taaagcggcc gcgactctag atcataatca gccataccac atttgtagag 1860
gttttacttg ctttaaaaaa cctcccacac ctccccctga acctgaaaca taaaatgaat 1920
gcaattgttg ttgttaactt gtttattgca gcttataatg gttacaaata aagcaatagc 1980
atcacaaatt tcacaaataa agcatttttt tcactgcatt ctagttgtgg tttgtccaaa 2040
ctcatcaatg tatcttagtc gaccgatgcc cttgagagcc ttcaacccag tcagctcctt 2100
ccggtgggcg cggggcatga ctatcgtcgc cgcacttatg actgtcttct ttatcatgca 2160
actcgtagga caggtgccgg cagcgctctt ccgcttcctc gctcactgac tcgctgcgct 2220
cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata cggttatcca 2280
cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga 2340
accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct gacgagcatc 2400
acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa agataccagg 2460
cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat 2520
acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca cgctgtaggt 2580
atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa ccccccgttc 2640
agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg gtaagacacg 2700
acttatcgcc actggcagca gccactggta acaggattag cagagcgagg tatgtaggcg 2760
gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga acagtatttg 2820
gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc tcttgatccg 2880
gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag attacgcgca 2940
gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac gctcagtgga 3000
acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc ttcacctaga 3060
tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag taaacttggt 3120
ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt 3180
catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat 3240
ctggccccag tgctgcaatg ataccgcggg acccacgctc accggctcca gatttatcag 3300
caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct 3360
ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt 3420
tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg 3480
cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca 3540
aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt 3600
tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat 3660
gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac 3720
cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc agaactttaa 3780
aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt 3840
tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt 3900
tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa 3960
gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt 4020
atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaa 4080
taggggttcc gcgcacattt ccccgaaaag tgccacctga cgcgccctgt agcggcgcat 4140
taagcgcggc gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccctag 4200
cgcccgctcc tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc 4260
aagctctaaa tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc 4320
ccaaaaaact tgattagggt gatggttcac gtagtgggcc atcgccctga tagacggttt 4380
ttcgcccttt gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa 4440
caacactcaa ccctatctcg gtctattctt ttgatttata agggattttg ccgatttcgg 4500
cctattggtt aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat 4560
taacgcttac aatttgccat tcgccattca ggctgcgcaa ctgttgggaa gggcgatcgg 4620
tgcgggcctc ttcgctatta cgccagccca agctaccatg ataagtaagt aatattaagg 4680
tacgggaggt acttggagcg gccgcaataa aatatcttta ttttcattac atctgtgtgt 4740
tggttttttg tgtgaatcga tagtactaac atacgctctc catcaaaaca aaacgaaaca 4800
aaacaaacta gcaaaatagg ctgtccccag tgcaagtgca ggtgccagaa catttctcta 4860
tcgataggta ccgattagtg aacggatctc gacggtatcg atcacgagac tagcctcgag 4920
cggccgcccc cttcacc 4937
<210> 8
<211> 86
<212> DNA
<213> Artificial sequence
<400> 8
gtttttgtac tctcaagaaa ttgcagaagc tacaaagata aggcttcatg ccgaaatcaa 60
caccctgtct cttggcgggg tgtttt 86
<210> 9
<211> 7
<212> PRT
<213> Artificial sequence
<400> 9
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 10
<211> 3743
<212> DNA
<213> Artificial sequence
<400> 10
agcggaggat cctctggcag cgagacacca ggaacaagcg agtcagcaac accagagagc 60
agtggcggca gcagcggcgg cagcaacggc aagatcctgg gactggccat cggagttgca 120
tctgttggag tgggcatcct ggacaagaag accggcgaga tcatccacgc cagcagcaga 180
atcttccccg ccgccacagc cgatagcaac gtggaacgga ggggcttcag acagggaaga 240
cggctgggcc gtagaaaaaa acacagaaag gtgcggttgg ccgatctgtt cagcgacacc 300
ggcctgataa cagacttctc taaagtgtct atcaacctga acccctacga gctgcggatc 360
aagggcctca atgagaaact gacaaacgag gaactgttca tcgccctgaa gaacatcgtg 420
aagagaagag gcatcagcta cctggatgac gccaatgagg acggcgagag ctcctctagc 480
gagtacggca aggctgtgga agaaaaccga aagttgctgg ccgacaagac tcctggccag 540
atccagctgg aacgcttcga aaagtacgga caggtccgag gagatttcac catcgaggaa 600
aacggcgaaa agcatagact gctgaacgtg ttcagcacca gcgcctataa gaaagaagcc 660
gagcggattc tgaccaagca gcaagattac aaccaagaca tcaccgacga gttcatccag 720
gcctacctga caatcctgac gggaaagaga aagtactacc atggccccgg caacgagaag 780
tctagaaccg actacggccg gttcaggacc gatggcacca ccctggacaa catctttggc 840
atcctgatcg gcaaatgtac attctaccca gaggagtacc gggcggccaa ggcctcttac 900
accgcccagg agtttaacct cctgaatgac ctgaacaatc tgacagttcc aaccgagaca 960
aagaaactga gcgaggaaca gaagcggcaa atcatcgagt acgccaaggg agccaagaca 1020
cttggagccg ccaccctgct caagtacatc gccaagctgg tggacggctc tgtggaggat 1080
atcaagggct atagaattga taaaagcgag aaacctgaga tgcacacatt cgatatctac 1140
agaaagatgc agacactgga aaccgtggat gtggaaaagc tgtcacgcga ggtgctggat 1200
gagctggccc atatcctgac actgaatacc gagagagaag gtatcgagga ggccatcaag 1260
gtcagcttta tcaagagaga gttcgaacag gaccagatcg ccgagctggt cagcttccgg 1320
aagtccaact ctagcctgtt tggcaagggc tggcacaact tcagtatcaa actgatgaca 1380
gaactgatcc ccgagctgta tgagaccagc gaagagcaga tgaccatcct gaccagactg 1440
ggaaagcaaa agacaaaggc tagaagcaag cgcacaaagt acatcgacga gaaggagctg 1500
accgacgaga tctacaaccc cgtggtggcc aagagcgtga gacaggccat taagatcatc 1560
aacctggcca ccaagaagta cggcgtgttc gacaacatcg tgatcgagat ggccagagag 1620
aacaacgagg aggatgccaa gaaagattac gtgaaaagac aaaaagctaa tgaggacgaa 1680
aagaacgccg ctatggaaaa ggctgcccac cagtacaacg gcaagaagga gctgcccgat 1740
aacgtgtttc acggccacaa ggaactggcc acaaagatca gactgtggca ccagcagggc 1800
gagaagtgcc tgtacaccgg caaaaacatc cctatctctg atctgatcca caaccagtat 1860
aagtacgaga tcgaccacat cctgcctctg tcactgagct tcgacgacag cctggccaat 1920
aaggtgctgg tgctcgctac cgccaaccag gagaagggcc aaagaacacc tttccaggcc 1980
ctcgacagca tggacgatgc gtggtcctat agagaattta aggcctacgt gcggggcgcc 2040
agagccctga gcaacaagaa aaaagattac ctgctgaatg aagaggacat caacaagatc 2100
gaagtgaagc agaaattcat cgagaggaac cttgtggaca ctcggtactc ctctagagtg 2160
gtcctgaacg ccctgcagga cttctacaag ctgaatgatt tcgacaccaa gatcagcgtg 2220
gtgagaggcc agttcaccag ccagctgaga cggaaatgga gaatcgacaa gagcagagaa 2280
acctaccacc accacgccgt ggacgctctg atcattgccg ctagctcgca gctgagactg 2340
tggaagaagc agggcaaccc actgatcagc tacaaggaaa accagttcgt cgactccgaa 2400
accggagaaa ttatcagcct cacagatgat gaatacaagg aactggtgtt ccgggctcca 2460
tacgaccact tcgtggacac agtgagcagc aaaaagtttg aagacagaat ccttttctcc 2520
taccaggtgg attccaaata caaccggaaa atcagcgacg ccaccattta ctctaccaga 2580
aaggccaagc tgggcaaaga caagagcgag gaaacctacg tgctgggcaa gataaaggac 2640
atctacaccc agaccggcta cgatgccttc atcaagctgt acaagaagga caagtccaaa 2700
tttctgatgt accacaagga tcctatcacc tttgagaagg tgatcgagga aatcctgaag 2760
acctaccccg acaaggaaat caacgagaag ggcaaggaag tggcatgcaa cccttttgaa 2820
aaatatagac aggagaatgg acctctgaga aagtattcta agaaaggtaa gggccctgag 2880
atcaagagcc tgaagtacta cgacaacaaa ctcggcaacc acatcgacat aacccctgac 2940
aacagcgaaa atcaggtgat cctccagtcc ctgaaacctt ggcggaccga cgtgtacttc 3000
aaccacaaaa ccaagattta tgagctgatg ggcctgaagt acagcgacct gagcttcgag 3060
aagggcagcg gcaagtaccg gattagcctg gacaaatata acgtgatcaa gaaaaaggag 3120
ggcgtgcaca aggaaagcga gttcaagttc acactgtaca agaacgacct gatcctaatc 3180
aaggatctgg aaaagagcga gcagcagctg tttagataca acagccggaa cgatacatcc 3240
aagcactacg tggagctgaa gccttacgac aaggccaaat tcgagggaaa tcaacctctg 3300
atggccctgt tcggcaatgt ggccaaggga ggccagtgcc tgaagggcct gaacaaagcc 3360
aacatcagca tctacaaggt gcagaccgac gtgctgggca acaagcggtt catcaagaaa 3420
gaaggcgacg ctcctaagct ggaatttagc ggcgggagcg gcgggagcgg ggggagcact 3480
aatctgagcg acatcattga gaaggagact gggaaacagc tggtcattca ggagtccatc 3540
ctgatgctgc ctgaggaggt ggaggaagtg atcggcaaca agccagagtc tgacatcctg 3600
gtgcacaccg cctacgacga gtccacagat gagaatgtga tgctgctgac ctctgacgcc 3660
cccgagtata agccttgggc cctggtcatc caggattcta acggcgagaa taagatcaag 3720
atgctgagcg gaggatccgg agg 3743
<210> 11
<211> 30
<212> DNA
<213> Artificial sequence
<400> 11
agcggaggat cctctggcag cgagacacca 30
<210> 12
<211> 33
<212> DNA
<213> Artificial sequence
<400> 12
cctccggatc ctccgctcag catcttgatc tta 33
<210> 13
<211> 24
<212> DNA
<213> Artificial sequence
<400> 13
accgtgggca agagtttctg ccac 24
<210> 14
<211> 24
<212> DNA
<213> Artificial sequence
<400> 14
aaacgtggca gaaactcttg ccca 24
<210> 15
<211> 24
<212> DNA
<213> Artificial sequence
<400> 15
accgctgcgt tcctagaacc acag 24
<210> 16
<211> 24
<212> DNA
<213> Artificial sequence
<400> 16
aaacctgtgg ttctaggaac gcag 24
<210> 17
<211> 24
<212> DNA
<213> Artificial sequence
<400> 17
accgaatgct ggctacagat gtcc 24
<210> 18
<211> 24
<212> DNA
<213> Artificial sequence
<400> 18
aaacggacat ctgtagccag catt 24
<210> 19
<211> 24
<212> DNA
<213> Artificial sequence
<400> 19
accgctcata tgtcacttac ctct 24
<210> 20
<211> 24
<212> DNA
<213> Artificial sequence
<400> 20
aaacagaggt aagtgacata tgag 24
<210> 21
<211> 24
<212> DNA
<213> Artificial sequence
<400> 21
accggagaca ggatctcact gtgt 24
<210> 22
<211> 24
<212> DNA
<213> Artificial sequence
<400> 22
aaacacacag tgagatcctg tctc 24
<210> 23
<211> 24
<212> DNA
<213> Artificial sequence
<400> 23
accgtgctct aggtggtgtt aatg 24
<210> 24
<211> 24
<212> DNA
<213> Artificial sequence
<400> 24
aaaccattaa caccacctag agca 24
<210> 25
<211> 24
<212> DNA
<213> Artificial sequence
<400> 25
accgcagcaa catgaacaac tgaa 24
<210> 26
<211> 24
<212> DNA
<213> Artificial sequence
<400> 26
aaacttcagt tgttcatgtt gctg 24
<210> 27
<211> 24
<212> DNA
<213> Artificial sequence
<400> 27
accgaagagc caagtcttac tgta 24
<210> 28
<211> 24
<212> DNA
<213> Artificial sequence
<400> 28
aaactacagt aagacttggc tctt 24
<210> 29
<211> 24
<212> DNA
<213> Artificial sequence
<400> 29
accgctgaca agtactagct tatg 24
<210> 30
<211> 24
<212> DNA
<213> Artificial sequence
<400> 30
aaaccataag ctagtacttg tcag 24
<210> 31
<211> 24
<212> DNA
<213> Artificial sequence
<400> 31
accgttcctc atagcaacat cact 24
<210> 32
<211> 24
<212> DNA
<213> Artificial sequence
<400> 32
aaacagtgat gttgctatga ggaa 24
<210> 33
<211> 19
<212> DNA
<213> Artificial sequence
<400> 33
ctgacctggc agataccac 19
<210> 34
<211> 20
<212> DNA
<213> Artificial sequence
<400> 34
ccacaggact taggaacgac 20
<210> 35
<211> 23
<212> DNA
<213> Artificial sequence
<400> 35
cccttgaaaa gtgcagtgtg tcg 23
<210> 36
<211> 23
<212> DNA
<213> Artificial sequence
<400> 36
ggcaattccc tttgaaagac tgc 23
<210> 37
<211> 21
<212> DNA
<213> Artificial sequence
<400> 37
ccgaggtact gttgctgctt c 21
<210> 38
<211> 22
<212> DNA
<213> Artificial sequence
<400> 38
gagatggcaa gcctttgttg cg 22
<210> 39
<211> 22
<212> DNA
<213> Artificial sequence
<400> 39
gatgctcatt ggtagctcgt gc 22
<210> 40
<211> 25
<212> DNA
<213> Artificial sequence
<400> 40
ctatctgtcc atccatgcat ttgcc 25
<210> 41
<211> 20
<212> DNA
<213> Artificial sequence
<400> 41
cctactgcgg atgccttctt 20
<210> 42
<211> 21
<212> DNA
<213> Artificial sequence
<400> 42
ttagcttggt gtggcagcat g 21
<210> 43
<211> 25
<212> DNA
<213> Artificial sequence
<400> 43
caagtcattg tgatgactga ggagc 25
<210> 44
<211> 19
<212> DNA
<213> Artificial sequence
<400> 44
ggccagccta tgatgggcc 19
<210> 45
<211> 25
<212> DNA
<213> Artificial sequence
<400> 45
ggatgctgtg atgactgaga cgtag 25
<210> 46
<211> 28
<212> DNA
<213> Artificial sequence
<400> 46
tggacatttt gagtttgaaa aggctgtg 28
<210> 47
<211> 24
<212> DNA
<213> Artificial sequence
<400> 47
caggcgtgct gtaatacatg aacc 24
<210> 48
<211> 26
<212> DNA
<213> Artificial sequence
<400> 48
gtcaccatag gataggaagt cagcag 26
<210> 49
<211> 18
<212> DNA
<213> Artificial sequence
<400> 49
gtcccactgc accagcag 18
<210> 50
<211> 32
<212> DNA
<213> Artificial sequence
<400> 50
cctattctat ctgagggagg acatgattga ag 32
<210> 51
<211> 26
<212> DNA
<213> Artificial sequence
<400> 51
ctctgcctgg aagaataatg agaacc 26
<210> 52
<211> 23
<212> DNA
<213> Artificial sequence
<400> 52
ccaggatggt gtttgtgaga tgg 23