Encoding gene of cadherin Cry toxin binding region of rice leaf roller and encoding protein and application thereof
1. The nucleotide sequence of the cadherin Cry toxin binding region gene of the rice leaf roller is as follows: shown as SEQ ID NO. 14.
2. The protein encoded by the gene of the cadherin Cry toxin binding region of rice leaf roller as claimed in claim 1, wherein the amino acid sequence of the protein is shown as SEQ ID NO. 15.
3. The use of the rice leaf roller cadherin Cry toxin binding region gene as claimed in claim 1 for predicting resistance to rice leaf roller Cry toxins.
4. Use of the encoded protein of claim 2 in binding to the Cry1Ac toxin.
5. Use of the protein encoded according to claim 2 for screening Bt proteins against rice leaf rollers.
6. The use of claim 3 wherein the Cry toxin is a Cry1Ac toxin.
Background
Cnaphalocrocis medinalis is a major pest on rice in china and other asian countries (Khan et al, 1988; Riley et al, 1995; Yang et al, 2018 a). Cnaphalocrocis medinalis larvae feed by folding the rice leaves to defoliate them, causing severe loss of rice yield (Padmavathi et al, 2013; senthi-Nathan, 2019). At present, chemical pesticides are mainly used for preventing and controlling cnaphalocrocis medinalis, and chemical pesticides are not recommended to pests due to the problems of environmental influence, pest drug resistance and the like (Li et al, 2016; Xiao and Wu, 2019). Biopesticides are prepared from microorganisms or natural products, are environmentally friendly and are an effective alternative to chemical pesticides (Chandler et al, 2011). Bt toxin proteins from Bacillus thuringiensis (Bt) are currently the most commonly used biopesticides and during the sporulation phase produce insecticidal crystallins (Cry toxins) which are highly active against specific insects and are not harmful to vertebrates or the environment (Bravo et al, 2011; Mendelsohn et al, 2003; Sanahuja et al, 2011). Thus, the Cry toxins are used in a wide variety of ways including transgenic crops and spray formulations to control pests.
The mode of action of Cry toxins has beenThe mechanism is widely studied, but remains controversial (Soberon et al, 2009; Vachon et al, 2012). The 2 modes of action currently available explain the mechanism of action of Cry toxins, namely the "perforation model" (Bravo et al, 2004) and the "signaling model" (Zhang et al, 2006). Both models suggest that the Bt toxin is ingested by the target insect, subsequently solubilized in an alkaline midgut environment, and then activated by the insect's digestive proteases. Activated toxins (active toxins) then bind to different protein receptors located on the brush border membrane vesicles of the midgut cells (BBMV) (Knowles and Ellar, 1987; Jurit-Fuentes and Crickmore, 2017.) in the perforation model, the activated toxins can oligomerize by binding to specific cadherins (cadherins.) Next, the toxin oligomers bind to another receptor protein such as aminopeptidase N (APN) or alkaline phosphatase (ALKALINEPHASTASE, ALP) and are inserted into the midgut epithelial cell membrane to form cavities leading to insect death (Pardo-Lopez et al, 2013; Adang et al 2012, 2014.) recently, ABC transporters have become key functional receptors for Cry toxins and are believed to promote Cry toxin oligomerization and perforation (Ockel ) signaling after binding to enterotoxin in the perforation model, mg (magnesium)2+Dependent cAMP signaling pathways are activated, leading to apoptosis and resulting insect death (Zhang et al, 2005; Zhang et al, 2006). Both models suggest that binding of Cry toxins to cadherin triggers a complex multi-step reaction process. Furthermore, cadherin mutations have been shown to be associated with resistance to Cry toxins in several lepidopteran pests (Fabrick et al, 2007; Gomez et al, 2001). In the heliothis virescens YHD2 resistant line, the deletion protein expressed by the cadherin gene already has no binding region with Cry1Ac toxin (Gahan, 2001). Therefore, cadherins are crucial receptors on the molecular basis of studying the mechanism of Bt insecticidal activity and resistance (Gahan et al, 2001; Fabrick and Tabashnik, 2007; Zhang et al, 2012).
The structure of cadherin is predicted by bioinformatics, and secondary structures of cadherin are found to include amino-terminal signal peptide (SIG), extracellular domain (containing 9-14 cadherin repeats, CR), membrane-proximal extracellular domain (MPED), transmembrane domain (TMD), and cytoplasmic domain (CYT). In Cadherin from lepidopteran insects, the Cry Toxin Binding Region (TBR) is located predominantly in the 6 CR domains (e.g., CR6-11/CR7-12) and MPED (Fabrick and Wu, 2015) near the cell membrane. The inventors previously identified that the diamondback moth cadherin fragment (T1202-I1447, PxCad-TBR) and its homologous fragment from Helicoverpa armigera (T1217-L1461, HaCad-TBR) bind to the Cry1Ac toxin (Gao et al, 2019). Heterologously expressed cadherin fragments containing TBR bind Cry toxins with high affinity, promoting oligomerization and enhancing Cry toxicity in certain lepidopteran insects (Chen et al, 2007; Fabrick et al, 2009; paceco et al, 2009 b; Park et al, 2009; Park et al, 2019).
Cadherins are important receptors in understanding the insecticidal action and resistance mechanism of Cry toxins, and are of great significance in studying the action mechanism of Cry toxins and insects cadherin and in designing specific insecticidal proteins. At present, no reports are found about cadherin genes of rice leaf rollers, particularly related genes of toxin binding regions.
Disclosure of Invention
Aiming at the problems, the application firstly clones a full-length sequence of cadherin gene eDNA from rice leaf roller larvae, further determines the nucleotide sequence and the amino acid sequence of the cadherin gene eDNA, locates Cmcad-CR6-MPED (G759-L1575) encoding genes of Cry1Ac toxin binding regions of the rice leaf roller cadherin on the basis of the full-length sequence and further develops the application of the cadherin gene eDNA.
Specifically, the method is realized through the following technical scheme:
firstly, the application firstly determines the genes of the cadherin Cry toxin binding region of the rice leaf roller, and the nucleotide sequences of the genes are as follows: SEQ ID NO. 14; the gene is positioned at the 2398-4848 th site of the full-length sequence (shown as SEQ ID NO.10) of the rice leaf roller cadherin gene;
secondly, the application locates the coding protein of the cadherin Cry toxin binding region gene of the rice leaf roller, and the amino acid sequence of the coding protein is shown as SEQ ID NO. 15. The segment of protein is positioned at 759-1575 of the amino acid sequence (shown as SEQ ID NO.11) of the rice leaf roller cadherin;
thirdly, the application provides the application of the gene in the cadherin Cry toxin binding region of the rice leaf roller in predicting the resistance of the Cry toxin of the rice leaf roller; if the nucleotide sequence of the cadherin Cry toxin binding region gene of the rice leaf roller to be detected is different from that of SEQ ID No.14, the fact that the rice leaf roller receptor has gene mutation leads to reduction of binding capacity is shown, and the fact that insects generate resistance to Cry toxin is judged.
Fourth, the application provides the application of the coding protein of the gene of the cadherin Cry toxin binding region of the rice leaf roller in combination with Cry1Ac toxin.
Fifthly, the application provides the application of the coding protein of the cadherin Cry toxin binding domain gene of the rice leaf roller in screening Bt protein of the rice leaf roller; if Bt protein to be screened and coded protein CmCad-CR6-MPED of Cry toxin binding region gene have high affinity, the Bt protein and the coded protein CmCad-CR6-MPED are inevitably inserted into rice leaf roller midgut BBMV, so that epithelial cell membranes form holes to die, and Bt toxin insecticidal protein with high toxicity to rice leaf roller can be rapidly screened or modified by utilizing the coded protein CmCad-CR 6-MPED.
On the basis of obtaining a full-length sequence of a cadherin of rice leaf rollers for the first time, a CR6-MPED region (G759-L1575) is predicted to be used as a toxin binding region of a CmCad candidate Cry1Ac toxin. The CmCad-CR6-MPED recombinant protein is constructed and cloned into pET26b (+) vector, expressed and purified in Escherichia coli BL21(DE 3). The purified recombinant protein CmCad-CR6-MPED was determined to be about 120kDa in size. Further determines the combination of the toxin binding region CmCad-CR6-MPED of the rice leaf roller cadherin and the CrylAc toxin, and provides the possibility for the CmCad to be used as a functional receptor of the CrylAc toxin. The positioning of the binding region is helpful for researching an action mechanism and a resistance mechanism of the Bt toxin, and designing more effective and specific insecticidal protein for controlling lepidoptera agricultural pests (in particular to rice leaf roller) on the basis of the action mechanism and the resistance mechanism.
Drawings
FIG. 1 is a detailed sequence alignment of cadherin amino acids from Ostrinia nubilalis, O.furnacalis, Diatraea saccharalis and Chilo inhibitoralis in the Lepidoptera family (Pidralidae). The conserved regions are indicated by red dashed boxes and are used for designing degenerate primer pairs. The cadherin sequences used for the analysis were from the NCBI database, and the accession numbers for each sequence were as follows: nubilalis (AAY44392.1), o.furnacalis (ABS59299.1), d.saccharoalis (AFI81418.1) and c.compressive alis (AAM 78590.1).
FIG. 2 is a structural feature and phylogenetic analysis of rice leaf rollers cadherin. Wherein, A is a structural schematic diagram of a conserved domain of the CmCad protein. The signal peptide (SIG), Cadherin Repeat (CR), membrane proximal extracellular region (MPED), transmembrane region (TMD) and cytoplasmic region (CYT) are illustrated. The amino acid numbers above the structure of each region represent the signal sequence and the boundaries of each region, respectively. B is the phylogenetic relationship of CmCad with other 34 lepidoptera cadherins. The full-length amino acid sequence of each insect cadherin gene was retrieved from the NCBI database. The guidance value (%) is displayed on each branch. White letters represent the insect family. The cadherin sequences used include Cnaphalocrocis medinalis (QNS31153.1), Vanessa tameamia (XP _026498507.1), Danaus plexippus (OWR42519.1), Bicyclus anynana (XP _023948291.1), Pieris rapae (XP _022113402.1), Leptidea sinapis (VVD00118.1), Zerene processonia (XP _038217094.1), Papilio polytes (XP _013137775.1), Papilio xuthuis (KPI99469.1), Papilio machion (XP _014361099.1), Eombyxx mandarin (XP _014361099.1), Bombyx mori (NP _014361099.1), Manduca segeta (AAM 014361099.1), AFLymantria dispari (AAL 014361099.1), Lymantria xylaria 36ylina (AAM 014361099.1), Lymantria 014361099.1), Lymanthali 014361099.1), Helnichia 014361099.1), Haynanthus 014361099.1, Happy strain 014361099.1, Hazaria 014361099.1, Hazari 014361099.1, Hazaria 014361099.1, Hazara 014361099.1, Hazaria 014361099.1, Hazara 014361099.1, Hazaria 014361099.1, Hazara 014361099.1, Hazaria 014361099.1, Hazara 014361099.1, Hazaria 014361099.1, Hazara, spodoptera exigua (AEB97395.1), Spodoptera litura (XP-022826291.1), Mythimna separata (AEI61920.1), Sesamia infenses (AEL22856.1) and Sesamia nonagrioides (ABV 74206.1).
FIG. 3 is an SDS-PAGE and Western blotting analysis of the expressed CmCad-CR6-MPED recombinant protein. Wherein A is a schematic diagram of each partition structure of CmCad-CR6-MPED fragment. B is pET26B recombinant plasmid induced protein expression and Ni column purification. Lane M: protein molecular weight; lane 1: whole cells that are not induced; lane 2: IPTG-induced whole cells; lane 3: an IPTG-induced protein from the periplasm; lane 4: IPTG-induced inclusion bodies; lane 5: purified CmCad-CR6-MPED recombinant protein. C is Western blotting analysis of the expressed CmCad-CR6-MPED recombinant protein and an anti-His antibody. Lane M: protein molecular weight; lane 6: anti-His antibody detected pET26b empty; lane 7: the anti-His antibody detects the purified CmCad-CR6-MPED recombinant protein.
FIG. 4 is a ligand blotting analysis of the expressed CmCad-CR6-MPED recombinant protein binding to CrylAc toxin. Wherein A is a CrylAc toxin marked by streptavidin detection biotin; unlabeled CrylAc was used as a control. B is ligand blocking of the CmCad-CR6-MPED recombinant protein detected by biotin-labeled CrylAc toxin; pET26b was used as a negative control in ligand blotting assays when unloaded.
Detailed Description
For the convenience of understanding the technical solutions of the present invention, the following embodiments are further described with reference to the accompanying drawings and specific examples, which are only used for illustration and are not intended to limit the scope of the claims of the present invention; the experimental methods described in the following examples are all conventional methods unless otherwise specified; the reagents and biomaterials are commercially available, unless otherwise specified.
Example 1 amplification of the full Length sequence of the cadherin Gene of Cnaphalocrocis medinalin
1.1 amplification of conserved region sequence of cadherin gene of Cnaphalocrocis medinalin
The rice leaf rollers are collected from Nanjing city (118 deg. 52 'E, 32 deg. 01' N) of Jiangsu province, and after several generations of indoor breeding, they are selectedConsistent-age 4-instar larvae were harvested, RNA extracted using the Invitrogen Trizol kit, and Invitrogen SuperScriptTMII Reverse transcription of total RNA with Reverse Transcriptase kit to obtain cDNA template. The rice leaf rollers belong to the lepidopteran borer family (pidraldae), and the cadherin amino acid sequences of 4 insects (o.nubilalis, o.furnacalis, d.saccharalis, and c.pressalis) of the same family were aligned using the bioinformatics software DNAMAN version 9.0 (fig. 1). According to the characteristics of the conserved regions of the 4 insect cadherins, a pair of degenerate primers (an upstream primer CmCad-con-F with the nucleotide sequence shown as SEQ ID No.1 and a downstream primer CmCad-con-R with the nucleotide sequence shown as SEQ ID No. 2) are designed and used for amplifying the conserved regions of the cadherins of the rice leaf rollers; then using cDNA as a template and I-5TM2 × High Fidelity Master Mix (MCLAB, USA) was amplified by PCR, wherein the nucleotide code of IUBcode degenerate primers: r is A/G; y is C/T; h is A/C/T; n ═ A/C/G/T
The PCR amplification system is as follows: 2 μ l of 2 × High Fidelity MasterMix 25 μ l, 10 μ M of CmCad-con-F primer and CmCad-con-R primer, 2 μ l of cDNA template, ddH2O is complemented to 50 mu l of the total reaction system;
PCR amplification conditions: 5min at 94 ℃; 30 cycles of 94 ℃ for 30s, 56 ℃ for 30s and 72 ℃ for 1min for 30 s; extension at 72 ℃ for 10min and holding at 4 ℃.
After the PCR reaction is finished, agarose gel electrophoresis is carried out to detect that the PCR product is a single product of 2667bp, and the corresponding PCR purified product is sent to Nanjing engine biology company for sequencing. The sequencing result is compared with the amino acid sequence, and the obtained conservative sequence has high conservative property with 4 insects in the lepidoptera snout moth family, which indicates that the required conservative amino acid region is obtained. The nucleotide sequence of the conserved region is shown as SEQ ID No.3, the size is 2667bp, and the corresponding 889 amino acid residue sequence is shown as SEQ ID No. 4.
1.2 obtaining the full-length sequence of the cadherin gene of rice leaf roller
The RACE cDNA library was constructed using the Smart RACE cDNA amplification kit (Clontech) using the RNA obtained in step 1.1 as template, and the procedures were performed according to the kit instructions. The obtained products were 5 'RACE Ready cDNA and 3' RACE Ready cDNA, which were used as templates for 5 'RACE and 3' RACE reactions, respectively.
Two specific primers (a primer CmCad-5 '-R with a nucleotide sequence shown as SEQ ID No.5 and a primer CmCad-3' -F with a nucleotide sequence shown as SEQ ID No. 6) are designed according to the nucleotide sequence (SEQ ID No.3) of the conserved region obtained in the step 1.1. 5 'RACE PCR reactions were performed using the Universal Primer A Mix (UPM, nucleotide sequence shown in SEQ ID No. 7) carried by the Smart RACE kit itself and the Primer CmCad-5' -R. 3 'RACE PCR amplification was performed using primer UPM and primer CmCad-3' -F. And (3) PCR reaction conditions: 30s at 94 ℃, 3min at 72 ℃ and 5 cycles; 30s at 94 ℃, 3min at 70 ℃ and 5 cycles; for 5' RACE PCR, 94 ℃ 30s, 68 ℃ 30s, 72 ℃ 3min, 25 cycles. Whereas for 3 'RACE PCR, the extension time for each step was 2 minutes, the other steps were identical to 5' RACE. The resulting PCR product was cloned into pClone007 vector (Progesteron), and sequenced by Nanjing Progesteron Co.
And (3) splicing the obtained 5 'and 3' nucleotide sequences with the nucleotide sequence of the conserved region in the step 1.1 to obtain the full-length cDNA sequence of the rice leaf roller cadherin gene. After obtaining the full-length gene cDNA sequence, a pair of primers (CmCad-full-F with the nucleotide sequence shown as SEQ ID No.8 and CmCad-full-R with the nucleotide sequence shown as SEQ ID No. 9) is designed for verifying the Open Reading Frame (ORF) of the rice leaf folder cadherin. The 5, 455bp full-length sequence CmCad cDNA obtained (SEQ ID No.10) was finally submitted to the NCBI database (GenBank NCBI accession No. MN 796259.1).
Sequence analysis showed that the cloned cDNA sequence contained a 5, 175bp ORF, a 123bp 5 'untranslated region (UTR) and a 157bp 3' -UTR, which contained a 26bp poly (A) tail. One of the classical polyadenylation signal sequences, ATTAAA, is located 20bp upstream of the poly (A) tail. The CmCad cDNA encodes 1725 amino acids (SEQ ID No.11), with a predicted molecular weight of 192.85kDa and a pI of 4.06.
All primer sequences involved in amplification are artificially synthesized by Nanjing Optimalaceae biology company except that the UPM is carried by the kit.
Example 2 Structure and evolutionary analysis of CmCad sequence of CmCad protein of Cnaphalocrocis medinalis
The secondary structural features of Cmcad of Cnaphalocrocis medinalis were analyzed using ISRECPROFILE server (https:// myhits. isb-sib. ch/cgi-bin/PFSCAN). The phylogenetic tree was constructed by the neighbor-joining method using MEGA 7.0 software, with the numbers at the nodes being the 1000 repeated self-propagating confidence values (bootstrap values).
The predicted CmCad secondary structure consists of signal peptide (SIG), 11 Cadherin Repeats (CR), the membrane proximal extracellular region (MPED), the transmembrane region (TMD), and the cytoplasmic region (CYT), shown as a typical insect midgut Cadherin structure (fig. 2A).
To study the evolutionary relationships of cadherins from different insect species, phylogenetic trees were constructed using the neighbor joining method (fig. 2B), and the results showed that cadherins of 3 inaceae, 3 piericidae, 3 pteroideae, 2 serinae, 2 toxophilae, 2 shankhoviridae, 6 borer moth families and 9 noctuidae families (trichoplusia forms another cluster) were clustered according to the families, respectively (fig. 2B). CmCad was clustered with cadherin from european corn borer (o. nubilalis) and asian corn borer (o. furnacalis) (fig. 2B).
Example 3 construction, expression and purification of cadherin Cry toxin binding regions of rice leaf rollers
The TBR of cadherin Cry toxins in lepidopteran insects is located predominantly in the 6 CR domains (e.g., CR6-11/CR7-12) and MPED near the cell membrane. In particular to CR6-MPED of rice leaf roller, which is 817 amino acids in total, and is constructed, expressed and purified by a truncated CmCad-CR6-MPED fragment (J.Zhong, X.Hu, X.Zhang, Y.Liu, C.xu, C.Zhang, M.Lin, X.Liu.Broad specificity immunity for detection of Bacillus thuringiensis Cry toxins through engineering of a single channel variable fragment with Biological Macromolecules, 2018, 107: 920. alpha. 928). The method specifically comprises the following steps: the CmCad-CR6-MPED fragment was amplified using a forward primer (CmCad-CR 6-MPED-F, nucleotide sequence shown in SEQ ID No. 12) and a reverse primer (CmCad-CR 6-MPED-R, nucleotide sequence shown in SEQ ID No. 13) containing restriction sites for NcoI and NotI, respectively, the primer sequences being artificially synthesized by Nanjing Okagaku Kogyo Co., Ltd.
Amplification with I-5TM2 XPCR Master Mix, using pClone007-CmCad plasmid as template, 98 deg.C 2min, 98 deg.C 20s, 58 deg.C 20s, 30 cycles, 72 deg.C 1min, the final extension step is 72 deg.C 5 min.
The PCR product was purified using agarose gel DNA extraction kit (Axygen) and subcloned into pET-26b (+) vector (Novagen) using NcoI and NotI cleavage sites. The ligation product was transformed into E.coli (E.coli) BL21(DE3) competent cells (Takara Shuzo Co., Ltd.). The successfully transformed recombinant plasmid and nucleotide sequence were verified by sequencing (Nanjing Okagaku Biotech).
The sequencing-confirmed pET-26b-CmCad-CR6-MPED recombinant protein was cultured in 2 XTY medium containing 50. mu.g/ml kanamycin (Kanamycin) (Solebao), and cultured at 37 ℃ and 250rpm to the late logarithmic growth (OD)6000.5-0.6), 1mM isopropyl-beta-D-thiogalactopyranoside (IPTG) was added and then cultured at 30 ℃ and 220rpm for 14h to induce expression of CmCad-CR 6-MPED. Expression-inducing cells were centrifuged at 8,000 Xg for 15min, resuspended pellet in PBS pH 7.4, and lysed by sonication. The cell lysates were centrifuged at 12,000 Xg for 30min, the inclusion bodies in the pellet were dissolved in 8M urea and incubated overnight at 4 ℃ with slow rotation at Labquare Shaker Rotissure (Thermo Fisher Scientific). Solubilized recombinant protein in HiTrap Ni2+Purification on a column (GE Healthcare). CmCad-CR6-MPED was eluted with 100mM imidazole and then gradually dialyzed against 6M, 4M, 2M and urea-free PBS to remove residual urea and imidazole. The purity of CmCad-CR6-MPED was assessed by 10% SDS-PAGE and protein concentration determined by Bradford protein detection kit (Solebox).
In order to detect the expression of the CmCad-CR6-MPED recombinant protein, a Western blotting method is adopted. Mu.g of the purified protein was separated by 10% SDS-PAGE and then electrotransferred to a polyvinylidene fluoride (PVDF) membrane (Millipore Corp.). After transfer, PVDF membrane in blocking buffer (PBS containing 0.1% Tween-20 and 3% BSA, PBST) for 1h at room temperature, then with anti-His tag antibody (1: 7,500 dilution) (kang century biology) were incubated for 1 h. The blot was washed 5 times for 10 minutes each with washing buffer (PBS containing 0.1% Tween-20, PBST), then developed using ChromoSensor reagent (GenScript Inc.) for 2-3 minutes at room temperature and photographed using a high resolution digital camera.
Based on the homology features of the Cryl toxin binding region described by the lepidopteran insect cadherin, this example predicted the CR6-MPED region (G759-L1575) as the toxin binding region of the Cry1Ac toxin candidate as CmCad (fig. 3A). The CmCad-CR6-MPED recombinant protein was constructed and cloned into pET26B (+) vector, expressed in E.coli BL21(DE3), and the amount of induced protein was determined under IPTG induction and by SDS-PAGE (FIG. 3B, lanes: 1-4). Since the expressed CmCad-CR6-MPED recombinant protein carries a His tag, HiTrap Ni was used2+After column purification, the expressed protein was approximately 120kDa (FIG. 3B, lane: 5). Western blotting further detected that the empty vector pET26b was not bound to the anti-His tag antibody (FIG. 3C, lane: 6), and that the binding size of the purified recombinant protein CmCad-CR6-MPED to the anti-His tag antibody (about 120kDa) (FIG. 3C, lane: 7) was consistent with the SDS-PAGE results. These results indicate that the band of about 120kDa is the expressed recombinant protein CmCad-CR6-MPED, the nucleotide sequence of which is shown in SEQ ID NO.14, and the amino acid sequence of which is shown in SEQ ID NO. 15.
Example 4 binding of Cnaphalocrocis medinalis cadherin Cry toxin binding regions to Cry1Ac
Cry1Ac toxin was purchased from Envirotest-china (envirologix inc) and dissolved in sodium carbonate buffer (CBS, pH 9.6), dispensed and stored at-80 ℃ until use. The soluble Cry toxins were biotinylated using the EZ-Link Sulfo-NHS-SS-Biotinylation kit (Thermo Fisher Scientific) and manipulated according to the kit instructions. The method specifically comprises the following steps: mu.L of 10 mM biotin was used to label 1mg of the Cry toxin in a 1mL final volume reaction. The reaction was slowly rotated at room temperature for 2h, and then excess biotin reagent was removed by desalting column. The biotin labeling efficiency was detected by using horseradish peroxidase (HRP) conjugated streptavidin (Thermo Fisher Scientific) in combination with Western blotting. The concentration of biotinylated Cry toxins was determined using the Bradford method.
To detect the binding between CrylAc and CmCad-CR6-MPED, ligand blotting method was used. The specific method is basically consistent with the Western blotting detection, and is different in that after the sealing buffer solution is sealed, the PVDF membrane and the Cry toxin marked by 20nM biotin are incubated for 1h at room temperature. After PBST was washed 5 times, the PVDF membrane was then bound with streptavidin-HRP (1: 7,500 dilution) for 1 h. The final color development method is the same as the Western blotting detection.
To further confirm the binding of the CmCad-CR6-MPED recombinant protein to the CrylAc toxin, ligand blotting analysis was performed on the expressed recombinant protein using biotin-labeled CrylAc. The detection result is shown in FIG. 4A, and the Western blotting result shows that the biotin-labeled CrylAc toxin is confirmed by streptavidin-HRP detection. The results of ligand blotting analysis confirmed the binding of CrylAc toxin to CmCad-CR6-MPED recombinant protein (FIG. 4B).
The CrylAc toxin binding region of the rice leaf roller cadherin is positioned on CmCad-CR6-MPED (G759-L1575), and the resistance mechanism is mainly that the Cry toxin is related to the reduction of the binding capacity of the receptor protein on the insect BBMV, so when the receptor mutation causes the reduction of the binding capacity, the insect can be judged to generate certain resistance to the Cry toxin. The CrylAc toxin binding region of the rice leaf roller cadherin can detect whether the insect species generates resistance or not at a molecular level.
In addition, the application defines the binding region of the cadherin and the CrylAc toxin of the rice leaf roller, if a certain Bt protein has high affinity with the binding region, the Bt protein is inevitably inserted into the BBMV of the midgut of the rice leaf roller, so that epithelial cell membranes form holes to die, and the Bt insecticidal protein with high toxicity to the rice leaf roller can be rapidly screened or modified by the method.
Sequence listing
<110> agricultural science and academy of Jiangsu province
<120> encoding gene of cadherin Cry toxin binding region of rice leaf roller and encoding protein and application thereof
<141> 2021-06-22
<160> 15
<170> SIPOSequenceListing 1.0
<210> 1
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
athathgaya tgaaygayaa y 21
<210> 2
<211> 19
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
rtgyttrttn gtnccnggn 19
<210> 3
<211> 2667
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
ataatcgaca tgaatgacaa catgccgttg tttgacgagg gcacgctgga gcagaacttg 60
cgcgtgcgcg aggtgtccgc cagcggcgtc gtcatcgggt ccgtgctcgc caccgacatc 120
gacggacccc tctacaacag agtacgctac accatagttc ctcgcaatga cacgccagtt 180
gggttggtga agatagactt caacaacggg cagatcgcgg tggatgagga cggcgccatc 240
gacgcggacg tcccgccgcg ccagtacctg tactacactg tcattgccag cgaccggtgc 300
tacgagaccg accagagcct gtgtccgcca gaccctacct actgggagac gatggaggat 360
atccaaatag aaatcctaga cacgaacaac aaggtacccg aagccgacta cgagaggttc 420
aacgtgacgg tgtacgtgtg ggagaacgcc acgacaggcg acgaggtggt gcagctctac 480
tccagtgacc tcgacagaga cgaaatatac aacacggtgc gatatcagat caactacgcg 540
gtgaacgctc ggctgcggcc gttcttctcg gtggaccagg actcggggct ggtggtggtg 600
gactacacca cggacgaggt gctggaccgc gacggcgacg agcccaaaca caccatcttc 660
ctcaacttca tcgacaactt ccactcggaa ggagatggaa gacgaaatca gtatgatacg 720
caagtggaag tgatcctcct ggatgtgaac gacaacgctc cagaaatgcc ctcgccagaa 780
gaacttttct gggataatgt atccgagaac cttttagagg gtgtgagact atcgccgcac 840
atatacgcgc cggaccgcga cgagccggac acggacaact cgcgcgtcgg ataccgcatc 900
ctcgccctgg ccgtcacgga ccggccgggg ctcgacgtgc ccgacctctt caccatggtg 960
cagatccaga acatcacggg cgagctggag accgcgctcc cactgcgggg ctactggggc 1020
acgtaccaga ttcacatcga ggcgttcgac cacggtcatc cccagcagtt ttcagacgag 1080
gtttacaggc tcacgatcca accgtacaac ttccattcgc cggtattcca gtttcctcta 1140
cacgactcca ccatcagact tgcgacggag cttacaacag agaatggaca gctgacgacc 1200
gcttctggtc agtttctgga ccgaatccac gccaccgacg aagacggcct acacgccggg 1260
aaagtcacct tccaagtgca aggaaacgag gaagcaacag agtatttcaa cgtggtaaat 1320
agtccagatg gtgacaatac tggaaccctt gttctgttga agacattccc agaagagatc 1380
agggaattcc ggataacgat cagggcgaca gatggaggca cagatccagg tccactttca 1440
acggattccg ccttcacggt tatattcgtg ccttcgcgag gagatccggt cttcaatatg 1500
tcatcgactc cagttgcttt cattgagggc attgctggca tggagcagag cttccaacta 1560
ccgcaggcag aagatattaa gaacttcgcg tgtacagacg actgtttcaa catatactac 1620
aggattattg acggtaacaa tgaaggcctg ttcagcctgg aaccgtcaac caacgtgatc 1680
cgactggtgc gcgagttgga ccgagaggcc gccgctacac acacaatcat ggtggccgcc 1740
agcaactcgc ccgacgccac caaccagccg ctgcaggcat ccatcctagt cgtcaacatc 1800
aatgtgcgag aagctaaccc ccggccaata ttcgaacgag aactgtacac tgctggcatc 1860
tctacagccg acagcatcgg cagagagcta ctcactgtta aggcgacgca ctcggaagac 1920
gcgacagtga cgtacaccat agaccaggcc agcatgcagg tggacagcag cctggaggcg 1980
gtgcgcgagt cggccttcgc gctcaacgca gccaccggcg cgctggcgct caacatgcag 2040
cccaccgccg ccatgcacgg catgttcgac ttcctcgtcc tggccactga ccctgctaat 2100
gcaaatgaca cgactcaggt gaaggtctac ctcatttcgt ctcttaaccg tgtgaccttc 2160
atattcgtca acacgctgga agaagtggag gcgcacagag atttcatagc gcagacgttc 2220
accgccggat tcagcatgac gtgcaacatc gacgaggtgg tcccgcacag cgacagcaac 2280
ggcgtcgcgc gcgaggacgt gtccgaggtg cgcggccact tcatccgcgg caacgtgccc 2340
gtgctcgcca ccgagatcga ggagctccgc agcgacacgt tgctgctgcg caacatccag 2400
cacagcctga gcgccaacct gctgctgctg caggactttg tgacggacgc cagccccgac 2460
ggcggcgccg actccgccac caccacgctg tacgtgctcg ccgcgctgtc cgcgctgctg 2520
gccgcgctgt gcctggtgct gctgctcacc ttcttcatca ggacccgcga attgaaccgg 2580
cggctgcaag ctctctcgat gacgaagtac ggctccgtgg actcggggct gaaccgcgtg 2640
gggctcgcgc ccggcaccaa caagcac 2667
<210> 4
<211> 889
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 4
Ile Ile Asp Met Asn Asp Asn Met Pro Leu Phe Asp Glu Gly Thr Leu
1 5 10 15
Glu Gln Asn Leu Arg Val Arg Glu Val Ser Ala Ser Gly Val Val Ile
20 25 30
Gly Ser Val Leu Ala Thr Asp Ile Asp Gly Pro Leu Tyr Asn Arg Val
35 40 45
Arg Tyr Thr Ile Val Pro Arg Asn Asp Thr Pro Val Gly Leu Val Lys
50 55 60
Ile Asp Phe Asn Asn Gly Gln Ile Ala Val Asp Glu Asp Gly Ala Ile
65 70 75 80
Asp Ala Asp Val Pro Pro Arg Gln Tyr Leu Tyr Tyr Thr Val Ile Ala
85 90 95
Ser Asp Arg Cys Tyr Glu Thr Asp Gln Ser Leu Cys Pro Pro Asp Pro
100 105 110
Thr Tyr Trp Glu Thr Met Glu Asp Ile Gln Ile Glu Ile Leu Asp Thr
115 120 125
Asn Asn Lys Val Pro Glu Ala Asp Tyr Glu Arg Phe Asn Val Thr Val
130 135 140
Tyr Val Trp Glu Asn Ala Thr Thr Gly Asp Glu Val Val Gln Leu Tyr
145 150 155 160
Ser Ser Asp Leu Asp Arg Asp Glu Ile Tyr Asn Thr Val Arg Tyr Gln
165 170 175
Ile Asn Tyr Ala Val Asn Ala Arg Leu Arg Pro Phe Phe Ser Val Asp
180 185 190
Gln Asp Ser Gly Leu Val Val Val Asp Tyr Thr Thr Asp Glu Val Leu
195 200 205
Asp Arg Asp Gly Asp Glu Pro Lys His Thr Ile Phe Leu Asn Phe Ile
210 215 220
Asp Asn Phe His Ser Glu Gly Asp Gly Arg Arg Asn Gln Tyr Asp Thr
225 230 235 240
Gln Val Glu Val Ile Leu Leu Asp Val Asn Asp Asn Ala Pro Glu Met
245 250 255
Pro Ser Pro Glu Glu Leu Phe Trp Asp Asn Val Ser Glu Asn Leu Leu
260 265 270
Glu Gly Val Arg Leu Ser Pro His Ile Tyr Ala Pro Asp Arg Asp Glu
275 280 285
Pro Asp Thr Asp Asn Ser Arg Val Gly Tyr Arg Ile Leu Ala Leu Ala
290 295 300
Val Thr Asp Arg Pro Gly Leu Asp Val Pro Asp Leu Phe Thr Met Val
305 310 315 320
Gln Ile Gln Asn Ile Thr Gly Glu Leu Glu Thr Ala Leu Pro Leu Arg
325 330 335
Gly Tyr Trp Gly Thr Tyr Gln Ile His Ile Glu Ala Phe Asp His Gly
340 345 350
His Pro Gln Gln Phe Ser Asp Glu Val Tyr Arg Leu Thr Ile Gln Pro
355 360 365
Tyr Asn Phe His Ser Pro Val Phe Gln Phe Pro Leu His Asp Ser Thr
370 375 380
Ile Arg Leu Ala Thr Glu Leu Thr Thr Glu Asn Gly Gln Leu Thr Thr
385 390 395 400
Ala Ser Gly Gln Phe Leu Asp Arg Ile His Ala Thr Asp Glu Asp Gly
405 410 415
Leu His Ala Gly Lys Val Thr Phe Gln Val Gln Gly Asn Glu Glu Ala
420 425 430
Thr Glu Tyr Phe Asn Val Val Asn Ser Pro Asp Gly Asp Asn Thr Gly
435 440 445
Thr Leu Val Leu Leu Lys Thr Phe Pro Glu Glu Ile Arg Glu Phe Arg
450 455 460
Ile Thr Ile Arg Ala Thr Asp Gly Gly Thr Asp Pro Gly Pro Leu Ser
465 470 475 480
Thr Asp Ser Ala Phe Thr Val Ile Phe Val Pro Ser Arg Gly Asp Pro
485 490 495
Val Phe Asn Met Ser Ser Thr Pro Val Ala Phe Ile Glu Gly Ile Ala
500 505 510
Gly Met Glu Gln Ser Phe Gln Leu Pro Gln Ala Glu Asp Ile Lys Asn
515 520 525
Phe Ala Cys Thr Asp Asp Cys Phe Asn Ile Tyr Tyr Arg Ile Ile Asp
530 535 540
Gly Asn Asn Glu Gly Leu Phe Ser Leu Glu Pro Ser Thr Asn Val Ile
545 550 555 560
Arg Leu Val Arg Glu Leu Asp Arg Glu Ala Ala Ala Thr His Thr Ile
565 570 575
Met Val Ala Ala Ser Asn Ser Pro Asp Ala Thr Asn Gln Pro Leu Gln
580 585 590
Ala Ser Ile Leu Val Val Asn Ile Asn Val Arg Glu Ala Asn Pro Arg
595 600 605
Pro Ile Phe Glu Arg Glu Leu Tyr Thr Ala Gly Ile Ser Thr Ala Asp
610 615 620
Ser Ile Gly Arg Glu Leu Leu Thr Val Lys Ala Thr His Ser Glu Asp
625 630 635 640
Ala Thr Val Thr Tyr Thr Ile Asp Gln Ala Ser Met Gln Val Asp Ser
645 650 655
Ser Leu Glu Ala Val Arg Glu Ser Ala Phe Ala Leu Asn Ala Ala Thr
660 665 670
Gly Ala Leu Ala Leu Asn Met Gln Pro Thr Ala Ala Met His Gly Met
675 680 685
Phe Asp Phe Leu Val Leu Ala Thr Asp Pro Ala Asn Ala Asn Asp Thr
690 695 700
Thr Gln Val Lys Val Tyr Leu Ile Ser Ser Leu Asn Arg Val Thr Phe
705 710 715 720
Ile Phe Val Asn Thr Leu Glu Glu Val Glu Ala His Arg Asp Phe Ile
725 730 735
Ala Gln Thr Phe Thr Ala Gly Phe Ser Met Thr Cys Asn Ile Asp Glu
740 745 750
Val Val Pro His Ser Asp Ser Asn Gly Val Ala Arg Glu Asp Val Ser
755 760 765
Glu Val Arg Gly His Phe Ile Arg Gly Asn Val Pro Val Leu Ala Thr
770 775 780
Glu Ile Glu Glu Leu Arg Ser Asp Thr Leu Leu Leu Arg Asn Ile Gln
785 790 795 800
His Ser Leu Ser Ala Asn Leu Leu Leu Leu Gln Asp Phe Val Thr Asp
805 810 815
Ala Ser Pro Asp Gly Gly Ala Asp Ser Ala Thr Thr Thr Leu Tyr Val
820 825 830
Leu Ala Ala Leu Ser Ala Leu Leu Ala Ala Leu Cys Leu Val Leu Leu
835 840 845
Leu Thr Phe Phe Ile Arg Thr Arg Glu Leu Asn Arg Arg Leu Gln Ala
850 855 860
Leu Ser Met Thr Lys Tyr Gly Ser Val Asp Ser Gly Leu Asn Arg Val
865 870 875 880
Gly Leu Ala Pro Gly Thr Asn Lys His
885
<210> 5
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
ggtctggcgg acacaggctc tggtcggt 28
<210> 6
<211> 28
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
accgactccg ccaccaccac gctgtacg 28
<210> 7
<211> 45
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
ctaatacgac tcactatagg gcaagcagtg gtatcaacgc agagt 45
<210> 8
<211> 18
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
atgggggttg acactcgc 18
<210> 9
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
tctgttaaat tgtttgttgg tgaag 25
<210> 10
<211> 5455
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
ttacatgggg agacaaaaga aaaacataaa aatagatagt gcttgtgata ctttactcgt 60
tggatttgca gtgtttgaac ttttttggat atttttactt taaaagattc cagatcatgc 120
aagatggggg ttgacactcg cctcgcagcg gcggtgctgc tccttaccat agctcctgcc 180
gtatttacac aagagaggcc atcgtgcacg tacatggtcc aaatcccgag gcctgacacc 240
cctgtgttcc ctgatcagga cttcactgga gtaacatgga gccaacggcc gctgatacca 300
gctgattcta gagaggacct gtgcatggac gaatgggtcg tatccgtgag cacgcaggtc 360
atcttcctgg aagaggagat cgagggagag gtcaccatcg cgcggctcaa ctaccagggt 420
accgagacgc cagagatagg agcgtttctg gcaggcagct tgcccaatct gggtcctgtc 480
atacggcggg ttggcaacga gtggcatctt gtggttactc agagacagga cttcgagaat 540
ccaataatga gggattacat gttccgcctg aacattccgg gagagacgct gtcgccctta 600
gtgtctctgg agatcgtgaa catcgacgac aacccgccca tcatcgaggt gttccaggcg 660
tgccaggttg atgaactagg tgagccccgc gccaccgact gcgtgtacac agtgagagac 720
gcggatgggc agatcagcac cagcgtgatg agcttccggg tggagagcaa ccggcccagc 780
gacgagcaga tcttcgtgat gaagggcgcc aatgtcgaaa acgattggtt caccatgacg 840
atgactgtac atattacgga gccgctcaac tttgaaacca acgcgctgca cgtgtttaac 900
gtcattgcta ctgactctcg gccgaaccac cagacggcgt cgatgatgat ccaggtgcag 960
aacgtggagc accggccgcc gcgctgggtg aacatcttct ccgtgcagca gttcgacgag 1020
aagacggtcc agcagttccc tctgcaggcc atcgatggcg acacggggat cgataaacct 1080
atcgattaca agctcatcaa agacccagca gatgacttct tttccctgga ggtgttgccg 1140
gggggccgca gcggcgccat cctgcacgtg gacaagatcg accgggacac gcttatgcgg 1200
gaagtgtttc aggtcaccat cgttgccttc aagtacgaca acgaggcgtt ctcgacggcg 1260
cgcgaggtgg tgatcatcgt gaacgacatc aacgaccagt ggccgctgcc gctgcagacc 1320
acgccctaca ccatctcgat catggaggag acaccgctca ctctcaactt cgccacacca 1380
ttcggtttcc acgatagaga tttgggtgaa aacgctcaat acaccgtaac cctcgaagat 1440
gactacccgc ccggcgcagc ctccgccttc cagataaacc ccaacgtggg gtaccagcag 1500
cagaccttca tcatgagcac cgtcaaccac tccatgctcg acttcgaagt gccggagttc 1560
caaaccatca gaatcaaggt gattgcaacg gacaataaca acacgaactt cgttggcgtg 1620
gcgacggtgg agatcagtct catcaactgg aacgacgagc tgcccatctt cagtgagagc 1680
tcctacaccg cctccttcaa agagacggtg ggcaagggct tcgccgttgc tacaataccg 1740
gctactgata gggacattga cgatcgagtc gagcacagtt tgatgggcaa cgccggcgag 1800
tacctctcca tcgacaaaga cagcggcgcg atcatcgtgt ccgtcgacga agccttcgat 1860
taccacagac agaatgtact ctttgtacag ataagagcgg acgacacgct cggggagccg 1920
tacaacacag ccacgacgca gctagtgatc cagctggagg acgtcaacaa cacacctccc 1980
actttgcggc tgcctcgcgg cagtccaagc gtcgaagaga acgttcctga tggatacatt 2040
ataacccaag agattcacgc cactgaccct gataccacag caaaacttgt gttcgaaatt 2100
gactgggatt ccacctgggc cactaagcaa ggccgtgaga cacctgaaga agaatttaaa 2160
aattgcgtag aaataaaaac attgtaccag aacccagaac agctgggcac cgcctacgga 2220
cagctggtgg tgagggagat ccgtgacggc gtcaccatcg acttcgagga gttcgaggtg 2280
ctgtacctta ccgtgagggt cagggacctc aacactgaac tccaggatga ttacgatgaa 2340
tccacattca cgctaaggat aatcgacatg aatgacaaca tgccgttgtt tgacgagggc 2400
acgctggagc agaacttgcg cgtgcgcgag gtgtccgcca gcggcgtcgt catcgggtcc 2460
gtgctcgcca ccgacatcga cggacccctc tacaacagag tacgctacac catagttcct 2520
cgcaatgaca cgccagttgg gttggtgaag atagacttca acaacgggca gatcgcggtg 2580
gatgaggacg gcgccatcga cgcggacgtc ccgccgcgcc agtacctgta ctacactgtc 2640
attgccagcg accggtgcta cgagaccgac cagagcctgt gtccgccaga ccctacctac 2700
tgggagacga tggaggatat ccaaatagaa atcctagaca cgaacaacaa ggtacccgaa 2760
gccgactacg agaggttcaa cgtgacggtg tacgtgtggg agaacgccac gacaggcgac 2820
gaggtggtgc agctctactc cagtgacctc gacagagacg aaatatacaa cacggtgcga 2880
tatcagatca actacgcggt gaacgctcgg ctgcggccgt tcttctcggt ggaccaggac 2940
tcggggctgg tggtggtgga ctacaccacg gacgaggtgc tggaccgcga cggcgacgag 3000
cccaaacaca ccatcttcct caacttcatc gacaacttcc actcggaagg agatggaaga 3060
cgaaatcagt atgatacgca agtggaagtg atcctcctgg atgtgaacga caacgctcca 3120
gaaatgccct cgccagaaga acttttctgg gataatgtat ccgagaacct tttagagggt 3180
gtgagactat cgccgcacat atacgcgccg gaccgcgacg agccggacac ggacaactcg 3240
cgcgtcggat accgcatcct cgccctggcc gtcacggacc ggccggggct cgacgtgccc 3300
gacctcttca ccatggtgca gatccagaac atcacgggcg agctggagac cgcgctccca 3360
ctgcggggct actggggcac gtaccagatt cacatcgagg cgttcgacca cggtcatccc 3420
cagcagtttt cagacgaggt ttacaggctc acgatccaac cgtacaactt ccattcgccg 3480
gtattccagt ttcctctaca cgactccacc atcagacttg cgacggagct tacaacagag 3540
aatggacagc tgacgaccgc ttctggtcag tttctggacc gaatccacgc caccgacgaa 3600
gacggcctac acgccgggaa agtcaccttc caagtgcaag gaaacgagga agcaacagag 3660
tatttcaacg tggtaaatag tccagatggt gacaatactg gaacccttgt tctgttgaag 3720
acattcccag aagagatcag ggaattccgg ataacgatca gggcgacaga tggaggcaca 3780
gatccaggtc cactttcaac ggattccgcc ttcacggtta tattcgtgcc ttcgcgagga 3840
gatccggtct tcaatatgtc atcgactcca gttgctttca ttgagggcat tgctggcatg 3900
gagcagagct tccaactacc gcaggcagaa gatattaaga acttcgcgtg tacagacgac 3960
tgtttcaaca tatactacag gattattgac ggtaacaatg aaggcctgtt cagcctggaa 4020
ccgtcaacca acgtgatccg actggtgcgc gagttggacc gagaggccgc cgctacacac 4080
acaatcatgg tggccgccag caactcgccc gacgccacca accagccgct gcaggcatcc 4140
atcctagtcg tcaacatcaa tgtgcgagaa gctaaccccc ggccaatatt cgaacgagaa 4200
ctgtacactg ctggcatctc tacagccgac agcatcggca gagagctact cactgttaag 4260
gcgacgcact cggaagacgc gacagtgacg tacaccatag accaggccag catgcaggtg 4320
gacagcagcc tggaggcggt gcgcgagtcg gccttcgcgc tcaacgcagc caccggcgcg 4380
ctggcgctca acatgcagcc caccgccgcc atgcacggca tgttcgactt cctcgtcctg 4440
gccactgacc ctgctaatgc aaatgacacg actcaggtga aggtctacct catttcgtct 4500
cttaaccgtg tgaccttcat attcgtcaac acgctggaag aagtggaggc gcacagagat 4560
ttcatagcgc agacgttcac cgccggattc agcatgacgt gcaacatcga cgaggtggtc 4620
ccgcacagcg acagcaacgg cgtcgcgcgc gaggacgtgt ccgaggtgcg cggccacttc 4680
atccgcggca acgtgcccgt gctcgccacc gagatcgagg agctccgcag cgacacgttg 4740
ctgctgcgca acatccagca cagcctgagc gccaacctgc tgctgctgca ggactttgtg 4800
acggacgcca gccccgacgg cggcgccgac tccgccacca ccacgctgta cgtgctcgcc 4860
gcgctgtccg cgctgctggc cgcgctgtgc ctggtgctgc tgctcacctt cttcatcagg 4920
acccgcgaat tgaaccggcg gctgcaagct ctctcgatga cgaagtacgg ctccgtggac 4980
tcggggctga accgcgtggg gctcgcgccc ggcaccaaca agcacgccgt cgagggctcc 5040
aaccccatct ggaatgaagc catcaaagct ccagacttcg acgccatcag tgacttgagt 5100
ggggactcag acctgatcgg catcgaggac ttgccgcagt tccgcgagga ctacttcccg 5160
cccgcggaca ccaactccgc gactctcatt gcagtccatc caaggggggg agacgaaggc 5220
ctcgccaccc acgaaaacaa cttcggcttc aacaccagcc ccttcagcca ggacttcacc 5280
aacaaacaat ttaacagatg aagaagctcc atactcttaa tcatgttgat tgaagatatt 5340
attaaataaa tatctatgta tgtaaatatt gtaccatatt tgtgtttaat ttattagatt 5400
tgtattaatt aaagtaactt tatttttgaa aaaaaaaaaa aaaaaaaaaa aaaaa 5455
<210> 11
<211> 1725
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 11
Met Gly Val Asp Thr Arg Leu Ala Ala Ala Val Leu Leu Leu Thr Ile
1 5 10 15
Ala Pro Ala Val Phe Thr Gln Glu Arg Pro Ser Cys Thr Tyr Met Val
20 25 30
Gln Ile Pro Arg Pro Asp Thr Pro Val Phe Pro Asp Gln Asp Phe Thr
35 40 45
Gly Val Thr Trp Ser Gln Arg Pro Leu Ile Pro Ala Asp Ser Arg Glu
50 55 60
Asp Leu Cys Met Asp Glu Trp Val Val Ser Val Ser Thr Gln Val Ile
65 70 75 80
Phe Leu Glu Glu Glu Ile Glu Gly Glu Val Thr Ile Ala Arg Leu Asn
85 90 95
Tyr Gln Gly Thr Glu Thr Pro Glu Ile Gly Ala Phe Leu Ala Gly Ser
100 105 110
Leu Pro Asn Leu Gly Pro Val Ile Arg Arg Val Gly Asn Glu Trp His
115 120 125
Leu Val Val Thr Gln Arg Gln Asp Phe Glu Asn Pro Ile Met Arg Asp
130 135 140
Tyr Met Phe Arg Leu Asn Ile Pro Gly Glu Thr Leu Ser Pro Leu Val
145 150 155 160
Ser Leu Glu Ile Val Asn Ile Asp Asp Asn Pro Pro Ile Ile Glu Val
165 170 175
Phe Gln Ala Cys Gln Val Asp Glu Leu Gly Glu Pro Arg Ala Thr Asp
180 185 190
Cys Val Tyr Thr Val Arg Asp Ala Asp Gly Gln Ile Ser Thr Ser Val
195 200 205
Met Ser Phe Arg Val Glu Ser Asn Arg Pro Ser Asp Glu Gln Ile Phe
210 215 220
Val Met Lys Gly Ala Asn Val Glu Asn Asp Trp Phe Thr Met Thr Met
225 230 235 240
Thr Val His Ile Thr Glu Pro Leu Asn Phe Glu Thr Asn Ala Leu His
245 250 255
Val Phe Asn Val Ile Ala Thr Asp Ser Arg Pro Asn His Gln Thr Ala
260 265 270
Ser Met Met Ile Gln Val Gln Asn Val Glu His Arg Pro Pro Arg Trp
275 280 285
Val Asn Ile Phe Ser Val Gln Gln Phe Asp Glu Lys Thr Val Gln Gln
290 295 300
Phe Pro Leu Gln Ala Ile Asp Gly Asp Thr Gly Ile Asp Lys Pro Ile
305 310 315 320
Asp Tyr Lys Leu Ile Lys Asp Pro Ala Asp Asp Phe Phe Ser Leu Glu
325 330 335
Val Leu Pro Gly Gly Arg Ser Gly Ala Ile Leu His Val Asp Lys Ile
340 345 350
Asp Arg Asp Thr Leu Met Arg Glu Val Phe Gln Val Thr Ile Val Ala
355 360 365
Phe Lys Tyr Asp Asn Glu Ala Phe Ser Thr Ala Arg Glu Val Val Ile
370 375 380
Ile Val Asn Asp Ile Asn Asp Gln Trp Pro Leu Pro Leu Gln Thr Thr
385 390 395 400
Pro Tyr Thr Ile Ser Ile Met Glu Glu Thr Pro Leu Thr Leu Asn Phe
405 410 415
Ala Thr Pro Phe Gly Phe His Asp Arg Asp Leu Gly Glu Asn Ala Gln
420 425 430
Tyr Thr Val Thr Leu Glu Asp Asp Tyr Pro Pro Gly Ala Ala Ser Ala
435 440 445
Phe Gln Ile Asn Pro Asn Val Gly Tyr Gln Gln Gln Thr Phe Ile Met
450 455 460
Ser Thr Val Asn His Ser Met Leu Asp Phe Glu Val Pro Glu Phe Gln
465 470 475 480
Thr Ile Arg Ile Lys Val Ile Ala Thr Asp Asn Asn Asn Thr Asn Phe
485 490 495
Val Gly Val Ala Thr Val Glu Ile Ser Leu Ile Asn Trp Asn Asp Glu
500 505 510
Leu Pro Ile Phe Ser Glu Ser Ser Tyr Thr Ala Ser Phe Lys Glu Thr
515 520 525
Val Gly Lys Gly Phe Ala Val Ala Thr Ile Pro Ala Thr Asp Arg Asp
530 535 540
Ile Asp Asp Arg Val Glu His Ser Leu Met Gly Asn Ala Gly Glu Tyr
545 550 555 560
Leu Ser Ile Asp Lys Asp Ser Gly Ala Ile Ile Val Ser Val Asp Glu
565 570 575
Ala Phe Asp Tyr His Arg Gln Asn Val Leu Phe Val Gln Ile Arg Ala
580 585 590
Asp Asp Thr Leu Gly Glu Pro Tyr Asn Thr Ala Thr Thr Gln Leu Val
595 600 605
Ile Gln Leu Glu Asp Val Asn Asn Thr Pro Pro Thr Leu Arg Leu Pro
610 615 620
Arg Gly Ser Pro Ser Val Glu Glu Asn Val Pro Asp Gly Tyr Ile Ile
625 630 635 640
Thr Gln Glu Ile His Ala Thr Asp Pro Asp Thr Thr Ala Lys Leu Val
645 650 655
Phe Glu Ile Asp Trp Asp Ser Thr Trp Ala Thr Lys Gln Gly Arg Glu
660 665 670
Thr Pro Glu Glu Glu Phe Lys Asn Cys Val Glu Ile Lys Thr Leu Tyr
675 680 685
Gln Asn Pro Glu Gln Leu Gly Thr Ala Tyr Gly Gln Leu Val Val Arg
690 695 700
Glu Ile Arg Asp Gly Val Thr Ile Asp Phe Glu Glu Phe Glu Val Leu
705 710 715 720
Tyr Leu Thr Val Arg Val Arg Asp Leu Asn Thr Glu Leu Gln Asp Asp
725 730 735
Tyr Asp Glu Ser Thr Phe Thr Leu Arg Ile Ile Asp Met Asn Asp Asn
740 745 750
Met Pro Leu Phe Asp Glu Gly Thr Leu Glu Gln Asn Leu Arg Val Arg
755 760 765
Glu Val Ser Ala Ser Gly Val Val Ile Gly Ser Val Leu Ala Thr Asp
770 775 780
Ile Asp Gly Pro Leu Tyr Asn Arg Val Arg Tyr Thr Ile Val Pro Arg
785 790 795 800
Asn Asp Thr Pro Val Gly Leu Val Lys Ile Asp Phe Asn Asn Gly Gln
805 810 815
Ile Ala Val Asp Glu Asp Gly Ala Ile Asp Ala Asp Val Pro Pro Arg
820 825 830
Gln Tyr Leu Tyr Tyr Thr Val Ile Ala Ser Asp Arg Cys Tyr Glu Thr
835 840 845
Asp Gln Ser Leu Cys Pro Pro Asp Pro Thr Tyr Trp Glu Thr Met Glu
850 855 860
Asp Ile Gln Ile Glu Ile Leu Asp Thr Asn Asn Lys Val Pro Glu Ala
865 870 875 880
Asp Tyr Glu Arg Phe Asn Val Thr Val Tyr Val Trp Glu Asn Ala Thr
885 890 895
Thr Gly Asp Glu Val Val Gln Leu Tyr Ser Ser Asp Leu Asp Arg Asp
900 905 910
Glu Ile Tyr Asn Thr Val Arg Tyr Gln Ile Asn Tyr Ala Val Asn Ala
915 920 925
Arg Leu Arg Pro Phe Phe Ser Val Asp Gln Asp Ser Gly Leu Val Val
930 935 940
Val Asp Tyr Thr Thr Asp Glu Val Leu Asp Arg Asp Gly Asp Glu Pro
945 950 955 960
Lys His Thr Ile Phe Leu Asn Phe Ile Asp Asn Phe His Ser Glu Gly
965 970 975
Asp Gly Arg Arg Asn Gln Tyr Asp Thr Gln Val Glu Val Ile Leu Leu
980 985 990
Asp Val Asn Asp Asn Ala Pro Glu Met Pro Ser Pro Glu Glu Leu Phe
995 1000 1005
Trp Asp Asn Val Ser Glu Asn Leu Leu Glu Gly Val Arg Leu Ser Pro
1010 1015 1020
His Ile Tyr Ala Pro Asp Arg Asp Glu Pro Asp Thr Asp Asn Ser Arg
1025 1030 1035 1040
Val Gly Tyr Arg Ile Leu Ala Leu Ala Val Thr Asp Arg Pro Gly Leu
1045 1050 1055
Asp Val Pro Asp Leu Phe Thr Met Val Gln Ile Gln Asn Ile Thr Gly
1060 1065 1070
Glu Leu Glu Thr Ala Leu Pro Leu Arg Gly Tyr Trp Gly Thr Tyr Gln
1075 1080 1085
Ile His Ile Glu Ala Phe Asp His Gly His Pro Gln Gln Phe Ser Asp
1090 1095 1100
Glu Val Tyr Arg Leu Thr Ile Gln Pro Tyr Asn Phe His Ser Pro Val
1105 1110 1115 1120
Phe Gln Phe Pro Leu His Asp Ser Thr Ile Arg Leu Ala Thr Glu Leu
1125 1130 1135
Thr Thr Glu Asn Gly Gln Leu Thr Thr Ala Ser Gly Gln Phe Leu Asp
1140 1145 1150
Arg Ile His Ala Thr Asp Glu Asp Gly Leu His Ala Gly Lys Val Thr
1155 1160 1165
Phe Gln Val Gln Gly Asn Glu Glu Ala Thr Glu Tyr Phe Asn Val Val
1170 1175 1180
Asn Ser Pro Asp Gly Asp Asn Thr Gly Thr Leu Val Leu Leu Lys Thr
1185 1190 1195 1200
Phe Pro Glu Glu Ile Arg Glu Phe Arg Ile Thr Ile Arg Ala Thr Asp
1205 1210 1215
Gly Gly Thr Asp Pro Gly Pro Leu Ser Thr Asp Ser Ala Phe Thr Val
1220 1225 1230
Ile Phe Val Pro Ser Arg Gly Asp Pro Val Phe Asn Met Ser Ser Thr
1235 1240 1245
Pro Val Ala Phe Ile Glu Gly Ile Ala Gly Met Glu Gln Ser Phe Gln
1250 1255 1260
Leu Pro Gln Ala Glu Asp Ile Lys Asn Phe Ala Cys Thr Asp Asp Cys
1265 1270 1275 1280
Phe Asn Ile Tyr Tyr Arg Ile Ile Asp Gly Asn Asn Glu Gly Leu Phe
1285 1290 1295
Ser Leu Glu Pro Ser Thr Asn Val Ile Arg Leu Val Arg Glu Leu Asp
1300 1305 1310
Arg Glu Ala Ala Ala Thr His Thr Ile Met Val Ala Ala Ser Asn Ser
1315 1320 1325
Pro Asp Ala Thr Asn Gln Pro Leu Gln Ala Ser Ile Leu Val Val Asn
1330 1335 1340
Ile Asn Val Arg Glu Ala Asn Pro Arg Pro Ile Phe Glu Arg Glu Leu
1345 1350 1355 1360
Tyr Thr Ala Gly Ile Ser Thr Ala Asp Ser Ile Gly Arg Glu Leu Leu
1365 1370 1375
Thr Val Lys Ala Thr His Ser Glu Asp Ala Thr Val Thr Tyr Thr Ile
1380 1385 1390
Asp Gln Ala Ser Met Gln Val Asp Ser Ser Leu Glu Ala Val Arg Glu
1395 1400 1405
Ser Ala Phe Ala Leu Asn Ala Ala Thr Gly Ala Leu Ala Leu Asn Met
1410 1415 1420
Gln Pro Thr Ala Ala Met His Gly Met Phe Asp Phe Leu Val Leu Ala
1425 1430 1435 1440
Thr Asp Pro Ala Asn Ala Asn Asp Thr Thr Gln Val Lys Val Tyr Leu
1445 1450 1455
Ile Ser Ser Leu Asn Arg Val Thr Phe Ile Phe Val Asn Thr Leu Glu
1460 1465 1470
Glu Val Glu Ala His Arg Asp Phe Ile Ala Gln Thr Phe Thr Ala Gly
1475 1480 1485
Phe Ser Met Thr Cys Asn Ile Asp Glu Val Val Pro His Ser Asp Ser
1490 1495 1500
Asn Gly Val Ala Arg Glu Asp Val Ser Glu Val Arg Gly His Phe Ile
1505 1510 1515 1520
Arg Gly Asn Val Pro Val Leu Ala Thr Glu Ile Glu Glu Leu Arg Ser
1525 1530 1535
Asp Thr Leu Leu Leu Arg Asn Ile Gln His Ser Leu Ser Ala Asn Leu
1540 1545 1550
Leu Leu Leu Gln Asp Phe Val Thr Asp Ala Ser Pro Asp Gly Gly Ala
1555 1560 1565
Asp Ser Ala Thr Thr Thr Leu Tyr Val Leu Ala Ala Leu Ser Ala Leu
1570 1575 1580
Leu Ala Ala Leu Cys Leu Val Leu Leu Leu Thr Phe Phe Ile Arg Thr
1585 1590 1595 1600
Arg Glu Leu Asn Arg Arg Leu Gln Ala Leu Ser Met Thr Lys Tyr Gly
1605 1610 1615
Ser Val Asp Ser Gly Leu Asn Arg Val Gly Leu Ala Pro Gly Thr Asn
1620 1625 1630
Lys His Ala Val Glu Gly Ser Asn Pro Ile Trp Asn Glu Ala Ile Lys
1635 1640 1645
Ala Pro Asp Phe Asp Ala Ile Ser Asp Leu Ser Gly Asp Ser Asp Leu
1650 1655 1660
Ile Gly Ile Glu Asp Leu Pro Gln Phe Arg Glu Asp Tyr Phe Pro Pro
1665 1670 1675 1680
Ala Asp Thr Asn Ser Ala Thr Leu Ile Ala Val His Pro Arg Gly Gly
1685 1690 1695
Asp Glu Gly Leu Ala Thr His Glu Asn Asn Phe Gly Phe Asn Thr Ser
1700 1705 1710
Pro Phe Ser Gln Asp Phe Thr Asn Lys Gln Phe Asn Arg
1715 1720 1725
<210> 12
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
ggcacgctgg agcagaactt 20
<210> 13
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
cagcgtggtg gtggcggagt 20
<210> 14
<211> 2451
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
ggcacgctgg agcagaactt gcgcgtgcgc gaggtgtccg ccagcggcgt cgtcatcggg 60
tccgtgctcg ccaccgacat cgacggaccc ctctacaaca gagtacgcta caccatagtt 120
cctcgcaatg acacgccagt tgggttggtg aagatagact tcaacaacgg gcagatcgcg 180
gtggatgagg acggcgccat cgacgcggac gtcccgccgc gccagtacct gtactacact 240
gtcattgcca gcgaccggtg ctacgagacc gaccagagcc tgtgtccgcc agaccctacc 300
tactgggaga cgatggagga tatccaaata gaaatcctag acacgaacaa caaggtaccc 360
gaagccgact acgagaggtt caacgtgacg gtgtacgtgt gggagaacgc cacgacaggc 420
gacgaggtgg tgcagctcta ctccagtgac ctcgacagag acgaaatata caacacggtg 480
cgatatcaga tcaactacgc ggtgaacgct cggctgcggc cgttcttctc ggtggaccag 540
gactcggggc tggtggtggt ggactacacc acggacgagg tgctggaccg cgacggcgac 600
gagcccaaac acaccatctt cctcaacttc atcgacaact tccactcgga aggagatgga 660
agacgaaatc agtatgatac gcaagtggaa gtgatcctcc tggatgtgaa cgacaacgct 720
ccagaaatgc cctcgccaga agaacttttc tgggataatg tatccgagaa ccttttagag 780
ggtgtgagac tatcgccgca catatacgcg ccggaccgcg acgagccgga cacggacaac 840
tcgcgcgtcg gataccgcat cctcgccctg gccgtcacgg accggccggg gctcgacgtg 900
cccgacctct tcaccatggt gcagatccag aacatcacgg gcgagctgga gaccgcgctc 960
ccactgcggg gctactgggg cacgtaccag attcacatcg aggcgttcga ccacggtcat 1020
ccccagcagt tttcagacga ggtttacagg ctcacgatcc aaccgtacaa cttccattcg 1080
ccggtattcc agtttcctct acacgactcc accatcagac ttgcgacgga gcttacaaca 1140
gagaatggac agctgacgac cgcttctggt cagtttctgg accgaatcca cgccaccgac 1200
gaagacggcc tacacgccgg gaaagtcacc ttccaagtgc aaggaaacga ggaagcaaca 1260
gagtatttca acgtggtaaa tagtccagat ggtgacaata ctggaaccct tgttctgttg 1320
aagacattcc cagaagagat cagggaattc cggataacga tcagggcgac agatggaggc 1380
acagatccag gtccactttc aacggattcc gccttcacgg ttatattcgt gccttcgcga 1440
ggagatccgg tcttcaatat gtcatcgact ccagttgctt tcattgaggg cattgctggc 1500
atggagcaga gcttccaact accgcaggca gaagatatta agaacttcgc gtgtacagac 1560
gactgtttca acatatacta caggattatt gacggtaaca atgaaggcct gttcagcctg 1620
gaaccgtcaa ccaacgtgat ccgactggtg cgcgagttgg accgagaggc cgccgctaca 1680
cacacaatca tggtggccgc cagcaactcg cccgacgcca ccaaccagcc gctgcaggca 1740
tccatcctag tcgtcaacat caatgtgcga gaagctaacc cccggccaat attcgaacga 1800
gaactgtaca ctgctggcat ctctacagcc gacagcatcg gcagagagct actcactgtt 1860
aaggcgacgc actcggaaga cgcgacagtg acgtacacca tagaccaggc cagcatgcag 1920
gtggacagca gcctggaggc ggtgcgcgag tcggccttcg cgctcaacgc agccaccggc 1980
gcgctggcgc tcaacatgca gcccaccgcc gccatgcacg gcatgttcga cttcctcgtc 2040
ctggccactg accctgctaa tgcaaatgac acgactcagg tgaaggtcta cctcatttcg 2100
tctcttaacc gtgtgacctt catattcgtc aacacgctgg aagaagtgga ggcgcacaga 2160
gatttcatag cgcagacgtt caccgccgga ttcagcatga cgtgcaacat cgacgaggtg 2220
gtcccgcaca gcgacagcaa cggcgtcgcg cgcgaggacg tgtccgaggt gcgcggccac 2280
ttcatccgcg gcaacgtgcc cgtgctcgcc accgagatcg aggagctccg cagcgacacg 2340
ttgctgctgc gcaacatcca gcacagcctg agcgccaacc tgctgctgct gcaggacttt 2400
gtgacggacg ccagccccga cggcggcgcc gactccgcca ccaccacgct g 2451
<210> 15
<211> 817
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 15
Gly Thr Leu Glu Gln Asn Leu Arg Val Arg Glu Val Ser Ala Ser Gly
1 5 10 15
Val Val Ile Gly Ser Val Leu Ala Thr Asp Ile Asp Gly Pro Leu Tyr
20 25 30
Asn Arg Val Arg Tyr Thr Ile Val Pro Arg Asn Asp Thr Pro Val Gly
35 40 45
Leu Val Lys Ile Asp Phe Asn Asn Gly Gln Ile Ala Val Asp Glu Asp
50 55 60
Gly Ala Ile Asp Ala Asp Val Pro Pro Arg Gln Tyr Leu Tyr Tyr Thr
65 70 75 80
Val Ile Ala Ser Asp Arg Cys Tyr Glu Thr Asp Gln Ser Leu Cys Pro
85 90 95
Pro Asp Pro Thr Tyr Trp Glu Thr Met Glu Asp Ile Gln Ile Glu Ile
100 105 110
Leu Asp Thr Asn Asn Lys Val Pro Glu Ala Asp Tyr Glu Arg Phe Asn
115 120 125
Val Thr Val Tyr Val Trp Glu Asn Ala Thr Thr Gly Asp Glu Val Val
130 135 140
Gln Leu Tyr Ser Ser Asp Leu Asp Arg Asp Glu Ile Tyr Asn Thr Val
145 150 155 160
Arg Tyr Gln Ile Asn Tyr Ala Val Asn Ala Arg Leu Arg Pro Phe Phe
165 170 175
Ser Val Asp Gln Asp Ser Gly Leu Val Val Val Asp Tyr Thr Thr Asp
180 185 190
Glu Val Leu Asp Arg Asp Gly Asp Glu Pro Lys His Thr Ile Phe Leu
195 200 205
Asn Phe Ile Asp Asn Phe His Ser Glu Gly Asp Gly Arg Arg Asn Gln
210 215 220
Tyr Asp Thr Gln Val Glu Val Ile Leu Leu Asp Val Asn Asp Asn Ala
225 230 235 240
Pro Glu Met Pro Ser Pro Glu Glu Leu Phe Trp Asp Asn Val Ser Glu
245 250 255
Asn Leu Leu Glu Gly Val Arg Leu Ser Pro His Ile Tyr Ala Pro Asp
260 265 270
Arg Asp Glu Pro Asp Thr Asp Asn Ser Arg Val Gly Tyr Arg Ile Leu
275 280 285
Ala Leu Ala Val Thr Asp Arg Pro Gly Leu Asp Val Pro Asp Leu Phe
290 295 300
Thr Met Val Gln Ile Gln Asn Ile Thr Gly Glu Leu Glu Thr Ala Leu
305 310 315 320
Pro Leu Arg Gly Tyr Trp Gly Thr Tyr Gln Ile His Ile Glu Ala Phe
325 330 335
Asp His Gly His Pro Gln Gln Phe Ser Asp Glu Val Tyr Arg Leu Thr
340 345 350
Ile Gln Pro Tyr Asn Phe His Ser Pro Val Phe Gln Phe Pro Leu His
355 360 365
Asp Ser Thr Ile Arg Leu Ala Thr Glu Leu Thr Thr Glu Asn Gly Gln
370 375 380
Leu Thr Thr Ala Ser Gly Gln Phe Leu Asp Arg Ile His Ala Thr Asp
385 390 395 400
Glu Asp Gly Leu His Ala Gly Lys Val Thr Phe Gln Val Gln Gly Asn
405 410 415
Glu Glu Ala Thr Glu Tyr Phe Asn Val Val Asn Ser Pro Asp Gly Asp
420 425 430
Asn Thr Gly Thr Leu Val Leu Leu Lys Thr Phe Pro Glu Glu Ile Arg
435 440 445
Glu Phe Arg Ile Thr Ile Arg Ala Thr Asp Gly Gly Thr Asp Pro Gly
450 455 460
Pro Leu Ser Thr Asp Ser Ala Phe Thr Val Ile Phe Val Pro Ser Arg
465 470 475 480
Gly Asp Pro Val Phe Asn Met Ser Ser Thr Pro Val Ala Phe Ile Glu
485 490 495
Gly Ile Ala Gly Met Glu Gln Ser Phe Gln Leu Pro Gln Ala Glu Asp
500 505 510
Ile Lys Asn Phe Ala Cys Thr Asp Asp Cys Phe Asn Ile Tyr Tyr Arg
515 520 525
Ile Ile Asp Gly Asn Asn Glu Gly Leu Phe Ser Leu Glu Pro Ser Thr
530 535 540
Asn Val Ile Arg Leu Val Arg Glu Leu Asp Arg Glu Ala Ala Ala Thr
545 550 555 560
His Thr Ile Met Val Ala Ala Ser Asn Ser Pro Asp Ala Thr Asn Gln
565 570 575
Pro Leu Gln Ala Ser Ile Leu Val Val Asn Ile Asn Val Arg Glu Ala
580 585 590
Asn Pro Arg Pro Ile Phe Glu Arg Glu Leu Tyr Thr Ala Gly Ile Ser
595 600 605
Thr Ala Asp Ser Ile Gly Arg Glu Leu Leu Thr Val Lys Ala Thr His
610 615 620
Ser Glu Asp Ala Thr Val Thr Tyr Thr Ile Asp Gln Ala Ser Met Gln
625 630 635 640
Val Asp Ser Ser Leu Glu Ala Val Arg Glu Ser Ala Phe Ala Leu Asn
645 650 655
Ala Ala Thr Gly Ala Leu Ala Leu Asn Met Gln Pro Thr Ala Ala Met
660 665 670
His Gly Met Phe Asp Phe Leu Val Leu Ala Thr Asp Pro Ala Asn Ala
675 680 685
Asn Asp Thr Thr Gln Val Lys Val Tyr Leu Ile Ser Ser Leu Asn Arg
690 695 700
Val Thr Phe Ile Phe Val Asn Thr Leu Glu Glu Val Glu Ala His Arg
705 710 715 720
Asp Phe Ile Ala Gln Thr Phe Thr Ala Gly Phe Ser Met Thr Cys Asn
725 730 735
Ile Asp Glu Val Val Pro His Ser Asp Ser Asn Gly Val Ala Arg Glu
740 745 750
Asp Val Ser Glu Val Arg Gly His Phe Ile Arg Gly Asn Val Pro Val
755 760 765
Leu Ala Thr Glu Ile Glu Glu Leu Arg Ser Asp Thr Leu Leu Leu Arg
770 775 780
Asn Ile Gln His Ser Leu Ser Ala Asn Leu Leu Leu Leu Gln Asp Phe
785 790 795 800
Val Thr Asp Ala Ser Pro Asp Gly Gly Ala Asp Ser Ala Thr Thr Thr
805 810 815
Leu