一种融合蛋白、碱基编辑工具及其应用
技术领域
本发明属于基因编辑
技术领域
,具体涉及一种融合蛋白、碱基编辑工具及其应用。背景技术
CRISPR/Cas9系统是细菌用来防御噬菌体DNA注入和质粒转移的天然防御系统,自被发现以后被人类广泛开发和利用,构建了依赖于引导RNA(gRNA)靶向作用的DNA编辑系统和平台,主要用于靶向基因组编辑、转录调控、表观基因编辑等。Cas9系统的主要作用原理是通过gRNA中的tracrRNA来招募Cas9蛋白,与gRNA结合后使得Cas9从一个未激活的构象变成具有DNA识别能力的构象。经典CRISPR/Cas9系统的crRNA前20碱基使得Cas9具有靶序列特异性,gRNA和Cas9蛋白复合物在DNA序列上寻找Cas9蛋白的识别前间隔序列邻近基序(PAM,protospacer adjacent motif;靶基因组上的特定碱基,经典SpCas9的PAM为NGG),在成功识别PAM位点后,Cas9使DNA局部解链,gRNA进入后与DNA互补,形成RNA-DNA互补结构,最终gRNA与目标DNA完全互补使得Cas9蛋白的HNH活性域形成具有稳定的具有活性的构象来剪切目标链DNA。与此同时,引起更大的构象变化,使得非目标链DNA进入RuvC活性域被其剪切[1]。RuvC结构域内的D10和HNH结构域内的H840分别对两个结构域的切割活性至关重要,引入D10A或H840A突变以后使得Cas9变为仅有单链切割活性的Cas9 nickase(Cas9n),当同时引入两个突变时则变为仅有靶向DNA结合活性而无核酸内切酶活性的dCas9。
在Cas9n和dCas9基础上,开发了一系列基因组或表观基因组编辑工具,基本策略是在Cas9n和dCas9的末端连接具有特定功能的催化酶或表观遗传因子,利用Cas9n和dCas9的靶向活性,在gRNA的引导下将特定功能因子转运到特定基因组位点,实现特定位点基因编辑、表观修饰编辑、转录激活或抑制等。其中最为经典的一类定点编辑工具为单碱基编辑工具(base editor),即在Cas9n蛋白的N端连接DNA脱氨酶,其在gRNA序列下由Cas9转运到靶向DNA序列范围内,对特定的核苷酸进行脱氨反应,并且在脱氨碱基链的互补链上利用Cas9n(D10A)的切割活性造成单链切口,再通过碱基修复机制和DNA复制实现碱基的精确替换。第一类胞嘧啶碱基编辑器CBE(Cytidine base editor)首先由哈弗大学David Liu实验室报道,通过将rat APOBEC1胞嘧啶脱氨酶与dCas9蛋白融合得到了第一个胞嘧啶碱基编辑器。并且为了提高编辑效率,他们将尿嘧啶DNA糖基化酶抑制蛋白UGI与Cas9n融合,一致细胞把尿嘧啶重新变为胞嘧啶;为了使细胞优先使用脱氨基的DNA链作为DNA修复模板,DavidLiu实验室进一步将dCas9换成只切割脱氨基链的互补连单链的Cas9n,由此大大提高了CBE的编辑效率,能够高效将碱基C/G替换为T/A(C/G-to-T/A)[2]。此后,David Liu实验室发明了能够实现靶位点碱基A/T替换为G/C(A/T-to-G/C)的腺嘌呤碱基编辑器ABE(Adeninebase editor)[3]。该碱基编辑器通过将RNA腺嘌呤脱氨酶TadA定向进化得到能够对DNA腺嘌呤进行脱氨的TadA*,将TadA*/TadA二聚体与Cas9n蛋白融合,得到了具有高效腺嘌呤编辑活性的ABE7.0[4]。
此后,许多实验室都开始对碱基编辑器进行改造和优化,包括不同脱氨酶和Cas9蛋白的组合和优化,得到了不同类型不同特征的碱基编辑器,使得碱基编辑器的编辑效率和编辑范围得到了大大提高。其中,最为重要的是David Liu实验室发明的第四代碱基编辑器ancBE4max,通过使用ancAPOBEC1代替rat APOBEC1、融合两个UGI、增加APOBEC1-Cas9n和Cas9n-UGI间linker的长度、优化核定位信号序列(NLS,nuclear localization signal)等,极大提高了编辑产物的纯度和效率比例[5]。ancBE4max识别的PAM为NGG,对应的编辑窗口是gRNA范围内5’端的4-8位,其Cas9n来源于化脓链球菌(Streptococcus pyogenes,SpCas9;共计1369个氨基酸)。然而,ancBE4max的靶向窗口和PAM限制性(主要识别NGG序列的PAM),大幅限制了基因组中可被靶向的范围。
因此,科学家开发了一系列通过蛋白质工程和定向进化获得SpCas9蛋白突变体与脱氨酶进行组合,从而获得了一系列具有各种靶向特性和识别PAM的碱基编辑器。其中,包括可以识别NGN的xCas9[6]和SpCas9-NG[7]、几乎不受PAM限制的Cas9变体SpRY[8]。科学界亦尝试利用不同种属来源的Cas9同源物与脱氨酶进行组合,例如Nme2Cas9[9]、SaCas9[10]、St1Cas9[11]、xCas9[12]等,从而获得了具有不同编辑特性、不同长度靶向序列、不同识别窗口等的新型编辑器。
基于SpCas9的经典编辑器的编辑窗口主要均为4-8位,并且各类编辑器均存在PAM偏好性或部分位点靶向效率低的情况。而且,经典碱基编辑器的表达质粒大小远超出腺病毒的包装范围,不利于临床研究和应用。因此开发不同编辑窗口、不同识别PAM和表达质粒更小的新型碱基编辑器,是目前基因编辑应用研究和临床应用的关键。
参考文献:
1.Jiang,F.and J.A.Doudna,CRISPR-Cas9 Structures and Mechanisms.AnnuRev Biophys,2017.46:p.505-529.
2.Komor,A.C.,et al.,Programmable editing of a target base in genomicDNA without double-stranded DNA cleavage.Nature,2016.533(7603):p.420-424.
3.Gaudelli,N.M.,et al.,Programmable base editing of A·T to G·C ingenomic DNA without DNA cleavage.Nature,2017.551(7681):p.464-471.
4.Gaudelli,N.M.,et al.,Programmable base editing of AT to GC ingenomic DNA without DNA cleavage.Nature,2017.551:p.464-471.
5.Koblan,L.W.,et al.,Improving cytidine and adenine base editors byexpression optimization and ancestral reconstruction.Nature Biotechnology,2018.36.
6.Hu,J.H.,et al.,Evolved Cas9 variants with broad PAM compatibilityand high DNA specificity.Nature,2018.
7.Engineered CRISPR-Cas9 nuclease with expanded targetingspace.Science(New York,N.Y.),2018.361(6408):p.1259.
8.Walton,R.T.,et al.,Unconstrained genome targeting with near-PAMlessengineered CRISPR-Cas9 variants.Science.368.
9.Edraki,A.,et al.,A Compact,High-Accuracy Cas9 with a DinucleotidePAM for In Vivo Genome Editing.Mol Cell,2019.73(4):p.714-726.e4.
10.Nishimasu,H.,et al.,Crystal Structure of Staphylococcus aureusCas9.Cell,2015.162(5):p.1113-26.
11.Zhang,Y.,et al.,Catalytic-state structure and engineering ofStreptococcus thermophilus Cas9.Nature Catalysis,2020.3(10):p.813-823.
12.Hu,J.H.,et al.,Evolved Cas9 variants with broad PAM compatibilityand high DNA specificity.Nature,2018.556(7699):p.57-63.
发明内容
本发明的目的在于克服现有技术的不足之处而提供一种识别PAM序列为NHAAAA的新型胞嘧啶碱基编辑器,改变单碱基编辑器的编辑窗口,进而扩宽单碱基编辑器的靶向范围。
本发明所采取的技术方案是:
本发明的第一个方面,提供一种融合蛋白,所述融合蛋白包括SsiCas9n多肽,所述SsiCas9n多肽的氨基酸序列为:
(a)SsiCas9 D9A nickase第2~1122位氨基酸序列;或
(b)SEQ ID NO.1所示的氨基酸序列;或
(c)与SEQ ID NO.1所示的氨基酸序列相比具有90%以上序列一致性的氨基酸序列,且具有(a)所限定的氨基酸序列所具有的功能。
在本发明的一些优选实施方式中,所述SsiCas9n多肽的氨基酸序列为能够识别NHAAAA作为PAM,N表示任意碱基。
在本发明的一些优选实施方式中,所述SsiCas9n多肽的氨基酸序列为能够作为Cas9nickase在靶向序列的互补链导致单链DNA切割。
在本发明的一些实施方式中,所述融合蛋白还包含脱氨酶ancAPOBEC1多肽,所述脱氨酶ancAPOBEC1多肽的氨基酸序列为:
(d)SEQ ID NO.3所示的氨基酸序列;或
(e)与SEQ ID NO.3所示氨基酸序列相比具有90%以上序列一致性的氨基酸序列,且具有(d)所限定的功能,优选为具有胞嘧啶脱氨酶功能。
在本发明的一些实施方式中,所述融合蛋白还包含尿嘧啶糖基化酶抑制剂(UGI),所述抑制剂的氨基酸序列为:
(f)SEQ ID NO.4所示氨基酸序列;或
(g)与SEQ ID NO.4所示氨基酸序列相比具有90%以上序列一致性的氨基酸序列,且具有(f)所限定的氨基酸功能,优选为具有尿嘧啶DNA糖基化酶抑制剂功能。
在本发明的一些实施方式中,所述融合蛋白还包含核定位信号肽,优选地,所述核定位信号多肽片段位于融合蛋白的N端和/或C端,所述核位信号多肽片段的氨基酸序列如SEQ ID NO.9所示。
在本发明的一些实施方式中,所述融合蛋白还包括核定位信号多肽、本发明第二方面所述的脱氨酶、第一连接子、本发明第一方面所述的SsiCas9n多肽、第二连接子、本发明第一方面所述的抑制剂和核定位信号多肽。
在本发明的一些实施方式中,所述融合蛋白从N端至C端依次包括BPNLS、ancAPOBEC1多肽片段、第一连接子、SsiCas9 D9A nickase的N端第2~1122个氨基酸组成的多肽片段、第二连接子、2*UGI多肽和BPNLS多肽序列。
在本发明的一些实施方式中,所述第一连接子优选为32aa linker,所述第一连接子优选为10aa linker。
在本发明的一些优选实施方式中,所述融合蛋白的氨基酸序列为:
(h)SEQ ID NO.5所示氨基酸序列;或
(i)与SEQ ID NO.5具有80%以上序列相似性的氨基酸序列、且具有(h)所限定的氨基酸序列的功能,优选的具有胞嘧啶脱氨酶功能,更优选的具有胞嘧啶碱基编辑器功能,更优选为能够识别NHAAAA作为PAM。
本发明还提供一种可以编码本发明第一方面所述SsiCas9 D9A nickase的核酸分子,所述核酸分子的序列为:
(j)如SEQ ID NO.2所示的序列,该序列为经过密码子优化以后适合于真核生物表达的DNA编码序列;或
(k)与SEQ ID NO.1所示的氨基酸序列相比具有90%以上序列一致性的氨基酸序列对应的DNA编码序列,且具有(a)或(j)所限定的功能;或
(l)如SEQ ID NO.2所示的DNA序列具有同义密码子的DNA序列。
本发明的第二个方面,提供一种可以编码本发明第一方面所述融合蛋白的基因。
在本发明的一些实施方式中,所述基因的序列为:
(m)SEQ ID NO.6所示的序列;或
(n)与SEQ ID NO.5所示的氨基酸序列相比具有90%以上序列一致性的氨基酸序列对应的DNA编码序列,且具有(h)或(m)所限定的功能;或
(o)如SEQ ID NO.6所示的DNA序列具有同义密码子的DNA序列。
本发明的第三个方面,提供一种组合物,所述组合物包含一种gRNA和本发明第一方面所述融合蛋白,
其中,所述gRNA是嵌合的非天然存在的向导多核苷酸;
所述gRNA/Cas复合物能完全或部分识别、结合靶序列并使靶序列产生切口或解旋、切割靶序列。
在本发明的一些优选实施方式中,所述gRNA表达元件由U6 promoter、gRNA靶向序列插入酶切位点、scaffold(Ssi特异性)和终止信号依次组成。
在本发明的一些实施方式中,所述scaffold是根据中华链球菌串联重复序列设计,其序列为:
(p)如SEQ ID NO.8所示的DNA序列;或
(q)与SEQ ID NO.8具有80%以上序列相似性的DNA序列、且具有(p)所限定的DNA序列的功能。
在本发明的一些优选实施方式中,所述gRNA的序列为:
(r)如SEQ ID NO.7所示的DNA序列;或
(s)与SEQ ID NO.7具有80%以上序列相似性的DNA序列、且具有(r)所限定的DNA序列的功能。
在本发明的一些优选实施方式中,所述gRNA表达载体,还包括含EGFP标签的编码序列,更优选地,包括靶向特异位点的gRNA。
其中经过真核密码子优化的Cas9蛋白同源物SsiCas9编码序列;可识别NHAAAA为PAM序列,与已报道的碱基编辑器的PAM识别序列不同;设计的gRNA长度为20nt;Ssi-ancBE4max可以将靶向序列5′端3~12位的碱基C转变为碱基T,可靶向已报道的胞嘧啶碱基编辑器不能靶向的位点,从而扩展了单碱基编辑器在全基因组的可靶向范围,为单碱基编辑器的应用提供更多的可选性。
本发明的第四个方面,提供包含本发明第二方面所述基因的重组载体、重细菌或细胞系。
在本发明的一些实施方式中,所述细胞为真核细胞或原核细胞。
在本发明的一些优选实施方式中,所述细胞为小鼠细胞或人细胞。
在本发明的一些优选实施方式中,所述细胞为人胚胎肾细胞。
在本发明的一些更优选实施方式中,所述细胞为HRK293T细胞。
本发明的第五个方面,提供本发明第一方面所述融合蛋白或本发明第二方面所述基因或本发明第三方面所述组合物或本发明第四方面所述重组载体、重组菌或细胞系在基因编辑中的应用。
本发明的第六个方面,提供一种基因编辑方法,具体为使用本发明第一方面所述融合蛋白或本发明第二方面所述基因或本发明第三方面所述组合物或本发明第四方面所述重组载体、重组菌或细胞系进行体内或体外基因编辑。
本发明的有益效果是:
本发明提供了一种基于中华链球菌(Streptococcus sinensis)来源的融合蛋白(编辑碱基器)和一种新的碱基编辑工具,具体为通过将识别NHAAAA的SsiCas9与BE4max相结合,得到一种名为SsiCas9-ancBE4max的新型胞嘧啶碱基编辑器(CBE),经检测,该编辑器可高效诱导编辑窗口5’端3-12位中C-to-T的高效转换,且识别的PAM为NHAAAA。所述编辑工具包括根据中华链球菌的串联重复序列设计的scaffold序列,设计的靶向gRNA长度为20nt,该工具可实现特定碱基(C-to-T)的转变,拓宽了碱基编辑的靶向范围和应用范围。可以识别NHAAAA作为PAM,编辑范围为靶向序列5’端3-12位的胞嘧啶,可高效将胞嘧啶转变为胸腺嘧啶(C-to-T),拓宽了碱基编辑的靶向范围。
并且本发明的碱基编辑工具的蛋白大小可适用于腺病毒的包装要求,具有很好的应用前景。本发明提供的碱基编辑工具可高效诱导编辑窗口5’端3-12位内C-to-T的高效转换,且识别的PAM为NHAAAA,扩展了碱基编辑的基因组靶向范围,提供了碱基编辑和基因校正的工具选择性。本发明提供的碱基编辑器缩小了碱基编辑工具的表达质粒大小,使其更适用于腺病毒(AAV)的包装范围,具有良好的基因治疗前景和产业化前景。
附图说明
图1为Ssi蛋白的结构域示意图。
图2为Ssi-ancBE4max的蛋白结构域示意图。
图3为Ssi-ancBE4max的质粒结构示意图谱。
图4为Ssi-ancBE4max系统的gRNA的质粒结构示意图谱。
图5为Ssi-ancBE4max为本发明实施例3实验结果示意图。其中图5中A图为Ssi2位点的编辑结果,图5中B图为Ssi6位点的编辑结果,图5中C图为Ssi8的位点的编辑结果,图5中D图为Ssi10位点的编辑结果。
图6为Ssi-ancBE4max编辑系统在HEK293T细胞中的编辑效率统计热图。虚线框为编辑窗口示意图。
具体实施方式
以下将结合实施例对本发明的构思及产生的技术效果进行清楚、完整地描述,以充分地理解本发明的目的、特征和效果。显然,所描述的实施例只是本发明的一部分实施例,而不是全部实施例,基于本发明的实施例,本领域的技术人员在不付出创造性劳动的前提下所获得的其他实施例,均属于本发明保护的范围。
实施例1
将中华链球菌来源的Cas9蛋白同源物SsiCas9与SpCas9进行氨基酸序列比对,划分出SsiCas9的功能结构域,其结构域如图1所示,并找出SsiCas9的RuvC域功能位点(9位的天冬氨酸D9)并将其突变成丙氨酸(A),从而获得SsiCas9 D9A nickase,其氨基酸序列如SEQ ID NO.1所示。
将中华链球菌SsiCas9 D9A的原核密码子进行真核优化,从而获得适合真核细胞表达的SsiCas9 D9A的编码DNA序列,SEQ ID NO.2所示。优化以后的SsiCas9 D9A商业公司全基因合成。构建策略是在ancBE4max的基础上,将ancBE4max的SpCas9 D10A替换为SsiCas9D9A,其中ancBE4max由商业公司全基因合成。下一步,我们将ancBE4max的部分XTENlinker-SpCas9 D10A-10aa linker-UGI通过内切酶BamHI酶切除,然后在商业公司合成SsiCas9D9A时补上ancBE4max被切除的部分即部分XTEN linker-SsiCas9 D9A-10aalinker-UGI(序列两端带内切酶BamHI酶切位点),如SEQ ID NO.10所示。
通过限制性内切酶BamHI(R0136L)酶切AncBE4max(载体为pCMV)质粒,酶切反应的条件是37℃的水浴酶切2h,酶切体系(50μl)为:10xBuffer:5μl,载体:5μg,BamHI酶:3μl,ddH2O:加至50μl;通过凝胶电泳鉴定是否酶切完全;酶切完全后利用clean up试剂盒(AxyPrep PCR清洁试剂盒)纯化线性化载体,用15μl ddH2O洗脱。将合成的XTEN linker-SsiCas9 D9A-10aa linker-UGI进行PCR扩增,并在两端酶切位点外引入保护碱基,利用由金唯智生物科技有限公司合成的PCR引物,其中引物序列为:
Ssi PCR for:5’-agcggaggatcctctggcagcgagacacca-3’(SEQ ID NO.11);
Ssi PCR rev:5’-cctccggatcctccgctcagcatcttgatctta-3’(SEQ ID NO.12)。
进行PCR反应扩增载体片段,并利用clean up试剂盒(AxyPrep PCR清洁试剂盒)纯化。纯化以后的PCR产物进行BamH1酶切反应,酶切体系参照上述体系。
将纯化的XTEN linker-SsiCas9 D9A-10aa linker-UGI与BamH1线性化载体pCMV_ancBE4max酶连获得初步连接产物。连接体系(10μl)为:纯化线性化载体pCMV_ancBE4max:1μl(50ng),XTEN linker-SsiCas9 D9A-10aa linker-UGI BamH1酶切产物:1μl(100ng),T4DNA Ligase Buffer:1μl,T4 DNA Ligase:1μl,ddH2O:6μl;酶连条件是16度连接2h。酶连产物转化后涂板,挑取单克隆摇菌测序和克隆鉴定,构建得到SsiCas9-ancBE4max的蛋白和DNA序列分别如SEQ ID NO.5和SEQ ID NO.6所示。自N端至C端依次包括BPNLS、ancAPOBEC1多肽片段、32aa linker、SsiCas9 D9A nickase的N端第2~1122个氨基酸组成的多肽片段、10aa linker、2*UGI多肽和BPNLS多肽序列依次融合而成。其中BPNLS核定位信号多肽片段的氨基酸序列如SEQ ID NO.9所示;其中ancAPOBEC1多肽氨基酸序列如SEQ ID NO.3所示,UGI多肽氨基酸序列SEQ ID NO.4所示,SsiCas9-acnBE4max氨基酸序列如SEQ ID NO.5所示,SsiCas9-acnBE4max氨基酸序列对应的DNA编码序列如SEQ ID NO.6所示。
构建成功的质粒结构域示意图如图2所示,质粒结构图谱如图3所示。
鉴定阳性的单克隆经过菌液扩大培养,按照试剂盒步骤抽提质粒(TIANGEN:TIANpure Midi Plasmid Kit)并测浓度,确保转染时用量足够且没有盐和蛋白等杂质污染。
实施例2
2.1SsiCas9-ancBE4max系统gRNA质粒的载体构建
以pGL3-U6-sgRNA(Addgene#51133)为表达骨架,构建适用于SsiCas9 gRNA编辑系统的gRNA表达载体。根据中华链球菌来源的串联重复序列,设计适用于SsiCas9 gRNA作用系统的scaffold序列,将pGL3-U6-sgRNA(Addgene#51133)的scaffold(适用于SpCas9)替换为SsiCas9 gRNA scaffold(适用于SsiCas9),构建成功的完整质粒如SEQ ID NO.7所示,命名为pGL3-U6-Ssi gRNA,其质粒结构示意图见图4。连接入靶向gRNA序列的酶切位点为两个BsaI酶切位点,质粒由商业公司全基因合成。
2.2SsiCas9-ancBE4max系统靶向gRNA质粒的构建
设计gRNA并合成两条互补配对的oligos,上游序列为:5’-accg-20nt-3’,下游序列为:5’-aaac-20nt-3’(20nt的下游可替换序列与上游20nt可替换序列互补配对),上游序列为20nt-NHAAAA(PAM所在DNA链)。合成的上下游序列通过程序(95℃,5min;95℃-85℃降温速度-2℃/s;85℃-25℃降温速度-0.1℃/s;4℃保持)退火,连接到经过BsaI(NEB:R0539L)线性化的pGL3-U6-Ssi gRNA载体上。
线性化酶切体系如下所示:pGL3-U6-Ssi gRNA 2μg;buffer(NEB:R0539L)6μL;BsaI 2μL;ddH2O补齐到60μL。37℃酶切过夜。连接体系如下:T4连接buffer(NEB:M0202L)1μL,线性化载体20ng,退火的oligo片段(10μM)5μL,T4 DNA连接酶(NEB:M0202L)0.5μL,ddH2O补齐到10μL。16℃连接过夜。连接的载体通过转化,挑菌和鉴定。对阳性克隆扩增提取质粒(Axygene:AP-MN-P-250G)并测定浓度。
挑选人内源基因EMX1、RUNX1、DNMT1、AARSD1、GMPR2、ABCD3和NFYB等,共设计19条gRNA,合成20条Oligos,序列见表1。
表1 Oligos序列
sgSsi-1 for
5’-ACCGtgggcaagagtttctgccac-3’(SEQ ID NO.13)
sgSsi-1 rev
5’-AAACgtggcagaaactcttgccca-3’(SEQ ID NO.14)
sgSsi-2 for
5’-ACCGctgcgttcctagaaccacag-3’(SEQ ID NO.15)
sgSsi-2 rev
5’-AAACctgtggttctaggaacgcag-3’(SEQ ID NO.16)
sgSsi-3 for
5’-ACCGaatgctggctacagatgtcc-3’(SEQ ID NO.17)
sgSsi-3 rev
5’-AAACggacatctgtagccagcatt-3’(SEQ ID NO.18)
sgSsi-4 for
5’-ACCGctcatatgtcacttacctct-3’(SEQ ID NO.19)
sgSsi-4 rev
5’-AAACagaggtaagtgacatatgag-3’(SEQ ID NO.20)
sgSsi-5 for
5’-ACCGgagacaggatctcactgtgt-3’(SEQ ID NO.21)
sgSsi-5 rev
5’-AAACacacagtgagatcctgtctc-3’(SEQ ID NO.22)
sgSsi-6 for
5’-ACCGtgctctaggtggtgttaatg-3’(SEQ ID NO.23)
sgSsi-6 rev
5’-AAACcattaacaccacctagagca-3’(SEQ ID NO.24)
sgSsi-7 for
5’-ACCGcagcaacatgaacaactgaa-3’(SEQ ID NO.25)
sgSsi-7 rev
5’-AAACttcagttgttcatgttgctg-3’(SEQ ID NO.26)
sgSsi-8 for
5’-ACCGaagagccaagtcttactgta-3’(SEQ ID NO.27)
sgSsi-8 rev
5’-AAACtacagtaagacttggctctt-3’(SEQ ID NO.28)
sgSsi-9 for
5’-ACCGctgacaagtactagcttatg-3’(SEQ ID NO.29)
sgSsi-9 rev
5’-AAACcataagctagtacttgtcag-3’(SEQ ID NO.30)
sgSsi-10 for
5’-ACCGttcctcatagcaacatcact-3’(SEQ ID NO.31)
sgSsi-10 rev
5’-AAACagtgatgttgctatgaggaa-3’(SEQ ID NO.32)
实施例3
利用上述实施例构建的SsiCas9-ancBE4max质粒和pGL3-U6-Ssi gRNA质粒构成的碱基编辑系统转染HEK293T细胞,过程如下:
3.1HEK293T细胞(来自ATCC)复苏,在10cm培养皿(Corning,430167)中培养,培养基为混有10%的胎牛血清(HyClone,SV30087)的DMEM(HyClone,SH30243.01)。培养温度为37℃,二氧化碳浓度为5%。多次传代后当细胞密度为90%时,细胞分盘至24孔板。
3.2HEK293T细胞复苏三代后观察细胞状态,将状态良好的细胞铺板24孔板中,铺板细胞培养18-24h后,当细胞浓度为80%时对其进行转染,转染过程中各成分的用量:SsiCas9-ancBE4max质粒1μg,pGL3-U6-Ssi gRNA质粒:0.5μg,EZTrans转染试剂(李记生物)4.5μl。
3.3具体转染步骤(同上海李记生物EZ Trans转染试剂高效版步骤)为:
3.3.1配置A试剂:对于每孔细胞,将1.5μg质粒DNA(1μg SsiCas9-ancBE4max质粒+0.5μg pGL3-U6-Ssi gRNA质粒)稀释到50μl无血清无双抗的高糖DMEM培养基(或者OPTI-MEM培养基),混匀。
3.3.2配置B试剂:对于每孔细胞,将4.5μl EZ Trans转染试剂(EZ Trans:质粒DNA=2:1)稀释到50μl无血清无双抗的高糖DMEM培养基(或者OPTI-MEMⅠ培养基),轻轻混匀。此步骤不能使用含血清的培养基稀释质粒和EZ Trans转染试剂,因为血清含有大量的带负电蛋白质,可能干扰转染试剂对核酸的吸附,从而影响转染效率。
3.3.3A试剂和B试剂同时静置5min,将B试剂尽快全部加入到A试剂中,轻轻混匀。混合的顺序不能颠倒进行。
3.3.4室温静置15min,以形成EZ Trans-DNA复合物。将配置好的EZ Trans-DNA转染复合物全部均匀滴入到含细胞的培养皿中,轻轻晃动培养皿或轻微振荡,让EZ Trans-DNA复合物分散均匀。
3.3.5在37℃,5%CO2培养箱培养4~6h,去除含EZ Trans-DNA复合物的培养液,更换新的培养液,培养3天。
3.4转染的细胞培养3天后用胰酶消化细胞获取细胞,进一步通过流式分选获取GFP阳性的细胞(FITC荧光强度top 15%),收取的细胞利用酚氯仿法抽取基因组DNA。
3.5以选取的内源基因靶向位点上下游各100-130bp分别设计并合成PCR引物,加水稀释至10μM。用诺唯赞高保真酶试剂盒(Vazyme,p501-d2)PCR扩增各基因组靶向位点片段。PCR产物样品用AxyPrep DNA凝胶回收试剂盒(Axygen,AP-GX-250G)做割胶回收,去除非特异性条带。PCR引物序列如表2所示。
表2PCR引物序列
3.6通过凝胶电泳初步鉴定目的片段是否扩增成功,扩增成功的目的片段进行Sanger测序,分析测序结果观察靶位点是否存在特定碱基点突变(C-to-T或G-to-A)。
测序结果见附图5,其中图5中A图为Ssi2位点的编辑结果,图5中B图为Ssi6位点的编辑结果,图5中C图为Ssi8的位点的编辑结果,图5中D图为Ssi10位点的编辑结果;其中图5中A图~D图的左图第一列为靶向DNA序列示意图;第二列为PAM序列;图右方为对应靶向位点的编辑结果效率统计图。右图为gRNA范围内不同位置C-to-T的编辑效率统计结果。图5中共展示了4个编辑位点的编辑结果,分别为Ssi2、Ssi6、Ssi8、Ssi10,由图5可见,本实施例1获得的基因编辑工具SsiCas9-ancBE4max可导致高效的C-to-T转换。而且在HEK293T细胞中,共计测试了10个内源人类基因组位点,结果见图6,发现SsiCas9-ancBE4max均可导致高效的C-to-T转换,并且编辑范围主要在gRNA序列范围内的3-12位,拓宽了碱基编辑器的靶向范围。
上述具体实施方式对本发明作了详细说明,但是本发明不限于上述实施例,在所属技术领域普通技术人员所具备的知识范围内,还可以在不脱离本发明宗旨的前提下作出各种变化。此外,在不冲突的情况下,本发明的实施例及实施例中的特征可以相互组合。
SEQUENCE LISTING
<110> 广州大学
<120> 一种融合蛋白、碱基编辑工具及其应用
<130>
<160> 52
<170> PatentIn version 3.5
<210> 1
<211> 1121
<212> PRT
<213> 人工序列
<400> 1
Asn Gly Lys Ile Leu Gly Leu Ala Ile Gly Val Ala Ser Val Gly Val
1 5 10 15
Gly Ile Leu Asp Lys Lys Thr Gly Glu Ile Ile His Ala Ser Ser Arg
20 25 30
Ile Phe Pro Ala Ala Thr Ala Asp Ser Asn Val Glu Arg Arg Gly Phe
35 40 45
Arg Gln Gly Arg Arg Leu Gly Arg Arg Lys Lys His Arg Lys Val Arg
50 55 60
Leu Ala Asp Leu Phe Ser Asp Thr Gly Leu Ile Thr Asp Phe Ser Lys
65 70 75 80
Val Ser Ile Asn Leu Asn Pro Tyr Glu Leu Arg Ile Lys Gly Leu Asn
85 90 95
Glu Lys Leu Thr Asn Glu Glu Leu Phe Ile Ala Leu Lys Asn Ile Val
100 105 110
Lys Arg Arg Gly Ile Ser Tyr Leu Asp Asp Ala Asn Glu Asp Gly Glu
115 120 125
Ser Ser Ser Ser Glu Tyr Gly Lys Ala Val Glu Glu Asn Arg Lys Leu
130 135 140
Leu Ala Asp Lys Thr Pro Gly Gln Ile Gln Leu Glu Arg Phe Glu Lys
145 150 155 160
Tyr Gly Gln Val Arg Gly Asp Phe Thr Ile Glu Glu Asn Gly Glu Lys
165 170 175
His Arg Leu Leu Asn Val Phe Ser Thr Ser Ala Tyr Lys Lys Glu Ala
180 185 190
Glu Arg Ile Leu Thr Lys Gln Gln Asp Tyr Asn Gln Asp Ile Thr Asp
195 200 205
Glu Phe Ile Gln Ala Tyr Leu Thr Ile Leu Thr Gly Lys Arg Lys Tyr
210 215 220
Tyr His Gly Pro Gly Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg Phe
225 230 235 240
Arg Thr Asp Gly Thr Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile Gly
245 250 255
Lys Cys Thr Phe Tyr Pro Glu Glu Tyr Arg Ala Ala Lys Ala Ser Tyr
260 265 270
Thr Ala Gln Glu Phe Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr Val
275 280 285
Pro Thr Glu Thr Lys Lys Leu Ser Glu Glu Gln Lys Arg Gln Ile Ile
290 295 300
Glu Tyr Ala Lys Gly Ala Lys Thr Leu Gly Ala Ala Thr Leu Leu Lys
305 310 315 320
Tyr Ile Ala Lys Leu Val Asp Gly Ser Val Glu Asp Ile Lys Gly Tyr
325 330 335
Arg Ile Asp Lys Ser Glu Lys Pro Glu Met His Thr Phe Asp Ile Tyr
340 345 350
Arg Lys Met Gln Thr Leu Glu Thr Val Asp Val Glu Lys Leu Ser Arg
355 360 365
Glu Val Leu Asp Glu Leu Ala His Ile Leu Thr Leu Asn Thr Glu Arg
370 375 380
Glu Gly Ile Glu Glu Ala Ile Lys Val Ser Phe Ile Lys Arg Glu Phe
385 390 395 400
Glu Gln Asp Gln Ile Ala Glu Leu Val Ser Phe Arg Lys Ser Asn Ser
405 410 415
Ser Leu Phe Gly Lys Gly Trp His Asn Phe Ser Ile Lys Leu Met Thr
420 425 430
Glu Leu Ile Pro Glu Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr Ile
435 440 445
Leu Thr Arg Leu Gly Lys Gln Lys Thr Lys Ala Arg Ser Lys Arg Thr
450 455 460
Lys Tyr Ile Asp Glu Lys Glu Leu Thr Asp Glu Ile Tyr Asn Pro Val
465 470 475 480
Val Ala Lys Ser Val Arg Gln Ala Ile Lys Ile Ile Asn Leu Ala Thr
485 490 495
Lys Lys Tyr Gly Val Phe Asp Asn Ile Val Ile Glu Met Ala Arg Glu
500 505 510
Asn Asn Glu Glu Asp Ala Lys Lys Asp Tyr Val Lys Arg Gln Lys Ala
515 520 525
Asn Glu Asp Glu Lys Asn Ala Ala Met Glu Lys Ala Ala His Gln Tyr
530 535 540
Asn Gly Lys Lys Glu Leu Pro Asp Asn Val Phe His Gly His Lys Glu
545 550 555 560
Leu Ala Thr Lys Ile Arg Leu Trp His Gln Gln Gly Glu Lys Cys Leu
565 570 575
Tyr Thr Gly Lys Asn Ile Pro Ile Ser Asp Leu Ile His Asn Gln Tyr
580 585 590
Lys Tyr Glu Ile Asp His Ile Leu Pro Leu Ser Leu Ser Phe Asp Asp
595 600 605
Ser Leu Ala Asn Lys Val Leu Val Leu Ala Thr Ala Asn Gln Glu Lys
610 615 620
Gly Gln Arg Thr Pro Phe Gln Ala Leu Asp Ser Met Asp Asp Ala Trp
625 630 635 640
Ser Tyr Arg Glu Phe Lys Ala Tyr Val Arg Gly Ala Arg Ala Leu Ser
645 650 655
Asn Lys Lys Lys Asp Tyr Leu Leu Asn Glu Glu Asp Ile Asn Lys Ile
660 665 670
Glu Val Lys Gln Lys Phe Ile Glu Arg Asn Leu Val Asp Thr Arg Tyr
675 680 685
Ser Ser Arg Val Val Leu Asn Ala Leu Gln Asp Phe Tyr Lys Leu Asn
690 695 700
Asp Phe Asp Thr Lys Ile Ser Val Val Arg Gly Gln Phe Thr Ser Gln
705 710 715 720
Leu Arg Arg Lys Trp Arg Ile Asp Lys Ser Arg Glu Thr Tyr His His
725 730 735
His Ala Val Asp Ala Leu Ile Ile Ala Ala Ser Ser Gln Leu Arg Leu
740 745 750
Trp Lys Lys Gln Gly Asn Pro Leu Ile Ser Tyr Lys Glu Asn Gln Phe
755 760 765
Val Asp Ser Glu Thr Gly Glu Ile Ile Ser Leu Thr Asp Asp Glu Tyr
770 775 780
Lys Glu Leu Val Phe Arg Ala Pro Tyr Asp His Phe Val Asp Thr Val
785 790 795 800
Ser Ser Lys Lys Phe Glu Asp Arg Ile Leu Phe Ser Tyr Gln Val Asp
805 810 815
Ser Lys Tyr Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ser Thr Arg
820 825 830
Lys Ala Lys Leu Gly Lys Asp Lys Ser Glu Glu Thr Tyr Val Leu Gly
835 840 845
Lys Ile Lys Asp Ile Tyr Thr Gln Thr Gly Tyr Asp Ala Phe Ile Lys
850 855 860
Leu Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr His Lys Asp Pro
865 870 875 880
Ile Thr Phe Glu Lys Val Ile Glu Glu Ile Leu Lys Thr Tyr Pro Asp
885 890 895
Lys Glu Ile Asn Glu Lys Gly Lys Glu Val Ala Cys Asn Pro Phe Glu
900 905 910
Lys Tyr Arg Gln Glu Asn Gly Pro Leu Arg Lys Tyr Ser Lys Lys Gly
915 920 925
Lys Gly Pro Glu Ile Lys Ser Leu Lys Tyr Tyr Asp Asn Lys Leu Gly
930 935 940
Asn His Ile Asp Ile Thr Pro Asp Asn Ser Glu Asn Gln Val Ile Leu
945 950 955 960
Gln Ser Leu Lys Pro Trp Arg Thr Asp Val Tyr Phe Asn His Lys Thr
965 970 975
Lys Ile Tyr Glu Leu Met Gly Leu Lys Tyr Ser Asp Leu Ser Phe Glu
980 985 990
Lys Gly Ser Gly Lys Tyr Arg Ile Ser Leu Asp Lys Tyr Asn Val Ile
995 1000 1005
Lys Lys Lys Glu Gly Val His Lys Glu Ser Glu Phe Lys Phe Thr
1010 1015 1020
Leu Tyr Lys Asn Asp Leu Ile Leu Ile Lys Asp Leu Glu Lys Ser
1025 1030 1035
Glu Gln Gln Leu Phe Arg Tyr Asn Ser Arg Asn Asp Thr Ser Lys
1040 1045 1050
His Tyr Val Glu Leu Lys Pro Tyr Asp Lys Ala Lys Phe Glu Gly
1055 1060 1065
Asn Gln Pro Leu Met Ala Leu Phe Gly Asn Val Ala Lys Gly Gly
1070 1075 1080
Gln Cys Leu Lys Gly Leu Asn Lys Ala Asn Ile Ser Ile Tyr Lys
1085 1090 1095
Val Gln Thr Asp Val Leu Gly Asn Lys Arg Phe Ile Lys Lys Glu
1100 1105 1110
Gly Asp Ala Pro Lys Leu Glu Phe
1115 1120
<210> 2
<211> 3363
<212> DNA
<213> 人工序列
<400> 2
aacggcaaga tcctgggact ggccatcgga gttgcatctg ttggagtggg catcctggac 60
aagaagaccg gcgagatcat ccacgccagc agcagaatct tccccgccgc cacagccgat 120
agcaacgtgg aacggagggg cttcagacag ggaagacggc tgggccgtag aaaaaaacac 180
agaaaggtgc ggttggccga tctgttcagc gacaccggcc tgataacaga cttctctaaa 240
gtgtctatca acctgaaccc ctacgagctg cggatcaagg gcctcaatga gaaactgaca 300
aacgaggaac tgttcatcgc cctgaagaac atcgtgaaga gaagaggcat cagctacctg 360
gatgacgcca atgaggacgg cgagagctcc tctagcgagt acggcaaggc tgtggaagaa 420
aaccgaaagt tgctggccga caagactcct ggccagatcc agctggaacg cttcgaaaag 480
tacggacagg tccgaggaga tttcaccatc gaggaaaacg gcgaaaagca tagactgctg 540
aacgtgttca gcaccagcgc ctataagaaa gaagccgagc ggattctgac caagcagcaa 600
gattacaacc aagacatcac cgacgagttc atccaggcct acctgacaat cctgacggga 660
aagagaaagt actaccatgg ccccggcaac gagaagtcta gaaccgacta cggccggttc 720
aggaccgatg gcaccaccct ggacaacatc tttggcatcc tgatcggcaa atgtacattc 780
tacccagagg agtaccgggc ggccaaggcc tcttacaccg cccaggagtt taacctcctg 840
aatgacctga acaatctgac agttccaacc gagacaaaga aactgagcga ggaacagaag 900
cggcaaatca tcgagtacgc caagggagcc aagacacttg gagccgccac cctgctcaag 960
tacatcgcca agctggtgga cggctctgtg gaggatatca agggctatag aattgataaa 1020
agcgagaaac ctgagatgca cacattcgat atctacagaa agatgcagac actggaaacc 1080
gtggatgtgg aaaagctgtc acgcgaggtg ctggatgagc tggcccatat cctgacactg 1140
aataccgaga gagaaggtat cgaggaggcc atcaaggtca gctttatcaa gagagagttc 1200
gaacaggacc agatcgccga gctggtcagc ttccggaagt ccaactctag cctgtttggc 1260
aagggctggc acaacttcag tatcaaactg atgacagaac tgatccccga gctgtatgag 1320
accagcgaag agcagatgac catcctgacc agactgggaa agcaaaagac aaaggctaga 1380
agcaagcgca caaagtacat cgacgagaag gagctgaccg acgagatcta caaccccgtg 1440
gtggccaaga gcgtgagaca ggccattaag atcatcaacc tggccaccaa gaagtacggc 1500
gtgttcgaca acatcgtgat cgagatggcc agagagaaca acgaggagga tgccaagaaa 1560
gattacgtga aaagacaaaa agctaatgag gacgaaaaga acgccgctat ggaaaaggct 1620
gcccaccagt acaacggcaa gaaggagctg cccgataacg tgtttcacgg ccacaaggaa 1680
ctggccacaa agatcagact gtggcaccag cagggcgaga agtgcctgta caccggcaaa 1740
aacatcccta tctctgatct gatccacaac cagtataagt acgagatcga ccacatcctg 1800
cctctgtcac tgagcttcga cgacagcctg gccaataagg tgctggtgct cgctaccgcc 1860
aaccaggaga agggccaaag aacacctttc caggccctcg acagcatgga cgatgcgtgg 1920
tcctatagag aatttaaggc ctacgtgcgg ggcgccagag ccctgagcaa caagaaaaaa 1980
gattacctgc tgaatgaaga ggacatcaac aagatcgaag tgaagcagaa attcatcgag 2040
aggaaccttg tggacactcg gtactcctct agagtggtcc tgaacgccct gcaggacttc 2100
tacaagctga atgatttcga caccaagatc agcgtggtga gaggccagtt caccagccag 2160
ctgagacgga aatggagaat cgacaagagc agagaaacct accaccacca cgccgtggac 2220
gctctgatca ttgccgctag ctcgcagctg agactgtgga agaagcaggg caacccactg 2280
atcagctaca aggaaaacca gttcgtcgac tccgaaaccg gagaaattat cagcctcaca 2340
gatgatgaat acaaggaact ggtgttccgg gctccatacg accacttcgt ggacacagtg 2400
agcagcaaaa agtttgaaga cagaatcctt ttctcctacc aggtggattc caaatacaac 2460
cggaaaatca gcgacgccac catttactct accagaaagg ccaagctggg caaagacaag 2520
agcgaggaaa cctacgtgct gggcaagata aaggacatct acacccagac cggctacgat 2580
gccttcatca agctgtacaa gaaggacaag tccaaatttc tgatgtacca caaggatcct 2640
atcacctttg agaaggtgat cgaggaaatc ctgaagacct accccgacaa ggaaatcaac 2700
gagaagggca aggaagtggc atgcaaccct tttgaaaaat atagacagga gaatggacct 2760
ctgagaaagt attctaagaa aggtaagggc cctgagatca agagcctgaa gtactacgac 2820
aacaaactcg gcaaccacat cgacataacc cctgacaaca gcgaaaatca ggtgatcctc 2880
cagtccctga aaccttggcg gaccgacgtg tacttcaacc acaaaaccaa gatttatgag 2940
ctgatgggcc tgaagtacag cgacctgagc ttcgagaagg gcagcggcaa gtaccggatt 3000
agcctggaca aatataacgt gatcaagaaa aaggagggcg tgcacaagga aagcgagttc 3060
aagttcacac tgtacaagaa cgacctgatc ctaatcaagg atctggaaaa gagcgagcag 3120
cagctgttta gatacaacag ccggaacgat acatccaagc actacgtgga gctgaagcct 3180
tacgacaagg ccaaattcga gggaaatcaa cctctgatgg ccctgttcgg caatgtggcc 3240
aagggaggcc agtgcctgaa gggcctgaac aaagccaaca tcagcatcta caaggtgcag 3300
accgacgtgc tgggcaacaa gcggttcatc aagaaagaag gcgacgctcc taagctggaa 3360
ttt 3363
<210> 3
<211> 228
<212> PRT
<213> 人工序列
<400> 3
Ser Ser Glu Thr Gly Pro Val Ala Val Asp Pro Thr Leu Arg Arg Arg
1 5 10 15
Ile Glu Pro His Glu Phe Glu Val Phe Phe Asp Pro Arg Glu Leu Arg
20 25 30
Lys Glu Thr Cys Leu Leu Tyr Glu Ile Lys Trp Gly Thr Ser His Lys
35 40 45
Ile Trp Arg His Ser Ser Lys Asn Thr Thr Lys His Val Glu Val Asn
50 55 60
Phe Ile Glu Lys Phe Thr Ser Glu Arg His Phe Cys Pro Ser Thr Ser
65 70 75 80
Cys Ser Ile Thr Trp Phe Leu Ser Trp Ser Pro Cys Gly Glu Cys Ser
85 90 95
Lys Ala Ile Thr Glu Phe Leu Ser Gln His Pro Asn Val Thr Leu Val
100 105 110
Ile Tyr Val Ala Arg Leu Tyr His His Met Asp Gln Gln Asn Arg Gln
115 120 125
Gly Leu Arg Asp Leu Val Asn Ser Gly Val Thr Ile Gln Ile Met Thr
130 135 140
Ala Pro Glu Tyr Asp Tyr Cys Trp Arg Asn Phe Val Asn Tyr Pro Pro
145 150 155 160
Gly Lys Glu Ala His Trp Pro Arg Tyr Pro Pro Leu Trp Met Lys Leu
165 170 175
Tyr Ala Leu Glu Leu His Ala Gly Ile Leu Gly Leu Pro Pro Cys Leu
180 185 190
Asn Ile Leu Arg Arg Lys Gln Pro Gln Leu Thr Phe Phe Thr Ile Ala
195 200 205
Leu Gln Ser Cys His Tyr Gln Arg Leu Pro Pro His Ile Leu Trp Ala
210 215 220
Thr Gly Leu Lys
225
<210> 4
<211> 190
<212> PRT
<213> 人工序列
<400> 4
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val
1 5 10 15
Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile
20 25 30
Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu
35 40 45
Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr
50 55 60
Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile
65 70 75 80
Lys Met Leu Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser Thr Asn Leu
85 90 95
Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu Val Ile Gln Glu
100 105 110
Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu Val Ile Gly Asn Lys
115 120 125
Pro Glu Ser Asp Ile Leu Val His Thr Ala Tyr Asp Glu Ser Thr Asp
130 135 140
Glu Asn Val Met Leu Leu Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp
145 150 155 160
Ala Leu Val Ile Gln Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu
165 170 175
Ser Gly Gly Ser Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu
180 185 190
<210> 5
<211> 1595
<212> PRT
<213> 人工序列
<400> 5
Pro Lys Lys Lys Arg Lys Val Ser Ser Glu Thr Gly Pro Val Ala Val
1 5 10 15
Asp Pro Thr Leu Arg Arg Arg Ile Glu Pro His Glu Phe Glu Val Phe
20 25 30
Phe Asp Pro Arg Glu Leu Arg Lys Glu Thr Cys Leu Leu Tyr Glu Ile
35 40 45
Lys Trp Gly Thr Ser His Lys Ile Trp Arg His Ser Ser Lys Asn Thr
50 55 60
Thr Lys His Val Glu Val Asn Phe Ile Glu Lys Phe Thr Ser Glu Arg
65 70 75 80
His Phe Cys Pro Ser Thr Ser Cys Ser Ile Thr Trp Phe Leu Ser Trp
85 90 95
Ser Pro Cys Gly Glu Cys Ser Lys Ala Ile Thr Glu Phe Leu Ser Gln
100 105 110
His Pro Asn Val Thr Leu Val Ile Tyr Val Ala Arg Leu Tyr His His
115 120 125
Met Asp Gln Gln Asn Arg Gln Gly Leu Arg Asp Leu Val Asn Ser Gly
130 135 140
Val Thr Ile Gln Ile Met Thr Ala Pro Glu Tyr Asp Tyr Cys Trp Arg
145 150 155 160
Asn Phe Val Asn Tyr Pro Pro Gly Lys Glu Ala His Trp Pro Arg Tyr
165 170 175
Pro Pro Leu Trp Met Lys Leu Tyr Ala Leu Glu Leu His Ala Gly Ile
180 185 190
Leu Gly Leu Pro Pro Cys Leu Asn Ile Leu Arg Arg Lys Gln Pro Gln
195 200 205
Leu Thr Phe Phe Thr Ile Ala Leu Gln Ser Cys His Tyr Gln Arg Leu
210 215 220
Pro Pro His Ile Leu Trp Ala Thr Gly Leu Lys Ser Gly Gly Ser Ser
225 230 235 240
Gly Gly Ser Ser Gly Ser Glu Thr Pro Gly Thr Ser Glu Ser Ala Thr
245 250 255
Pro Glu Ser Ser Gly Gly Ser Ser Gly Gly Ser Asn Gly Lys Ile Leu
260 265 270
Gly Leu Ala Ile Gly Val Ala Ser Val Gly Val Gly Ile Leu Asp Lys
275 280 285
Lys Thr Gly Glu Ile Ile His Ala Ser Ser Arg Ile Phe Pro Ala Ala
290 295 300
Thr Ala Asp Ser Asn Val Glu Arg Arg Gly Phe Arg Gln Gly Arg Arg
305 310 315 320
Leu Gly Arg Arg Lys Lys His Arg Lys Val Arg Leu Ala Asp Leu Phe
325 330 335
Ser Asp Thr Gly Leu Ile Thr Asp Phe Ser Lys Val Ser Ile Asn Leu
340 345 350
Asn Pro Tyr Glu Leu Arg Ile Lys Gly Leu Asn Glu Lys Leu Thr Asn
355 360 365
Glu Glu Leu Phe Ile Ala Leu Lys Asn Ile Val Lys Arg Arg Gly Ile
370 375 380
Ser Tyr Leu Asp Asp Ala Asn Glu Asp Gly Glu Ser Ser Ser Ser Glu
385 390 395 400
Tyr Gly Lys Ala Val Glu Glu Asn Arg Lys Leu Leu Ala Asp Lys Thr
405 410 415
Pro Gly Gln Ile Gln Leu Glu Arg Phe Glu Lys Tyr Gly Gln Val Arg
420 425 430
Gly Asp Phe Thr Ile Glu Glu Asn Gly Glu Lys His Arg Leu Leu Asn
435 440 445
Val Phe Ser Thr Ser Ala Tyr Lys Lys Glu Ala Glu Arg Ile Leu Thr
450 455 460
Lys Gln Gln Asp Tyr Asn Gln Asp Ile Thr Asp Glu Phe Ile Gln Ala
465 470 475 480
Tyr Leu Thr Ile Leu Thr Gly Lys Arg Lys Tyr Tyr His Gly Pro Gly
485 490 495
Asn Glu Lys Ser Arg Thr Asp Tyr Gly Arg Phe Arg Thr Asp Gly Thr
500 505 510
Thr Leu Asp Asn Ile Phe Gly Ile Leu Ile Gly Lys Cys Thr Phe Tyr
515 520 525
Pro Glu Glu Tyr Arg Ala Ala Lys Ala Ser Tyr Thr Ala Gln Glu Phe
530 535 540
Asn Leu Leu Asn Asp Leu Asn Asn Leu Thr Val Pro Thr Glu Thr Lys
545 550 555 560
Lys Leu Ser Glu Glu Gln Lys Arg Gln Ile Ile Glu Tyr Ala Lys Gly
565 570 575
Ala Lys Thr Leu Gly Ala Ala Thr Leu Leu Lys Tyr Ile Ala Lys Leu
580 585 590
Val Asp Gly Ser Val Glu Asp Ile Lys Gly Tyr Arg Ile Asp Lys Ser
595 600 605
Glu Lys Pro Glu Met His Thr Phe Asp Ile Tyr Arg Lys Met Gln Thr
610 615 620
Leu Glu Thr Val Asp Val Glu Lys Leu Ser Arg Glu Val Leu Asp Glu
625 630 635 640
Leu Ala His Ile Leu Thr Leu Asn Thr Glu Arg Glu Gly Ile Glu Glu
645 650 655
Ala Ile Lys Val Ser Phe Ile Lys Arg Glu Phe Glu Gln Asp Gln Ile
660 665 670
Ala Glu Leu Val Ser Phe Arg Lys Ser Asn Ser Ser Leu Phe Gly Lys
675 680 685
Gly Trp His Asn Phe Ser Ile Lys Leu Met Thr Glu Leu Ile Pro Glu
690 695 700
Leu Tyr Glu Thr Ser Glu Glu Gln Met Thr Ile Leu Thr Arg Leu Gly
705 710 715 720
Lys Gln Lys Thr Lys Ala Arg Ser Lys Arg Thr Lys Tyr Ile Asp Glu
725 730 735
Lys Glu Leu Thr Asp Glu Ile Tyr Asn Pro Val Val Ala Lys Ser Val
740 745 750
Arg Gln Ala Ile Lys Ile Ile Asn Leu Ala Thr Lys Lys Tyr Gly Val
755 760 765
Phe Asp Asn Ile Val Ile Glu Met Ala Arg Glu Asn Asn Glu Glu Asp
770 775 780
Ala Lys Lys Asp Tyr Val Lys Arg Gln Lys Ala Asn Glu Asp Glu Lys
785 790 795 800
Asn Ala Ala Met Glu Lys Ala Ala His Gln Tyr Asn Gly Lys Lys Glu
805 810 815
Leu Pro Asp Asn Val Phe His Gly His Lys Glu Leu Ala Thr Lys Ile
820 825 830
Arg Leu Trp His Gln Gln Gly Glu Lys Cys Leu Tyr Thr Gly Lys Asn
835 840 845
Ile Pro Ile Ser Asp Leu Ile His Asn Gln Tyr Lys Tyr Glu Ile Asp
850 855 860
His Ile Leu Pro Leu Ser Leu Ser Phe Asp Asp Ser Leu Ala Asn Lys
865 870 875 880
Val Leu Val Leu Ala Thr Ala Asn Gln Glu Lys Gly Gln Arg Thr Pro
885 890 895
Phe Gln Ala Leu Asp Ser Met Asp Asp Ala Trp Ser Tyr Arg Glu Phe
900 905 910
Lys Ala Tyr Val Arg Gly Ala Arg Ala Leu Ser Asn Lys Lys Lys Asp
915 920 925
Tyr Leu Leu Asn Glu Glu Asp Ile Asn Lys Ile Glu Val Lys Gln Lys
930 935 940
Phe Ile Glu Arg Asn Leu Val Asp Thr Arg Tyr Ser Ser Arg Val Val
945 950 955 960
Leu Asn Ala Leu Gln Asp Phe Tyr Lys Leu Asn Asp Phe Asp Thr Lys
965 970 975
Ile Ser Val Val Arg Gly Gln Phe Thr Ser Gln Leu Arg Arg Lys Trp
980 985 990
Arg Ile Asp Lys Ser Arg Glu Thr Tyr His His His Ala Val Asp Ala
995 1000 1005
Leu Ile Ile Ala Ala Ser Ser Gln Leu Arg Leu Trp Lys Lys Gln
1010 1015 1020
Gly Asn Pro Leu Ile Ser Tyr Lys Glu Asn Gln Phe Val Asp Ser
1025 1030 1035
Glu Thr Gly Glu Ile Ile Ser Leu Thr Asp Asp Glu Tyr Lys Glu
1040 1045 1050
Leu Val Phe Arg Ala Pro Tyr Asp His Phe Val Asp Thr Val Ser
1055 1060 1065
Ser Lys Lys Phe Glu Asp Arg Ile Leu Phe Ser Tyr Gln Val Asp
1070 1075 1080
Ser Lys Tyr Asn Arg Lys Ile Ser Asp Ala Thr Ile Tyr Ser Thr
1085 1090 1095
Arg Lys Ala Lys Leu Gly Lys Asp Lys Ser Glu Glu Thr Tyr Val
1100 1105 1110
Leu Gly Lys Ile Lys Asp Ile Tyr Thr Gln Thr Gly Tyr Asp Ala
1115 1120 1125
Phe Ile Lys Leu Tyr Lys Lys Asp Lys Ser Lys Phe Leu Met Tyr
1130 1135 1140
His Lys Asp Pro Ile Thr Phe Glu Lys Val Ile Glu Glu Ile Leu
1145 1150 1155
Lys Thr Tyr Pro Asp Lys Glu Ile Asn Glu Lys Gly Lys Glu Val
1160 1165 1170
Ala Cys Asn Pro Phe Glu Lys Tyr Arg Gln Glu Asn Gly Pro Leu
1175 1180 1185
Arg Lys Tyr Ser Lys Lys Gly Lys Gly Pro Glu Ile Lys Ser Leu
1190 1195 1200
Lys Tyr Tyr Asp Asn Lys Leu Gly Asn His Ile Asp Ile Thr Pro
1205 1210 1215
Asp Asn Ser Glu Asn Gln Val Ile Leu Gln Ser Leu Lys Pro Trp
1220 1225 1230
Arg Thr Asp Val Tyr Phe Asn His Lys Thr Lys Ile Tyr Glu Leu
1235 1240 1245
Met Gly Leu Lys Tyr Ser Asp Leu Ser Phe Glu Lys Gly Ser Gly
1250 1255 1260
Lys Tyr Arg Ile Ser Leu Asp Lys Tyr Asn Val Ile Lys Lys Lys
1265 1270 1275
Glu Gly Val His Lys Glu Ser Glu Phe Lys Phe Thr Leu Tyr Lys
1280 1285 1290
Asn Asp Leu Ile Leu Ile Lys Asp Leu Glu Lys Ser Glu Gln Gln
1295 1300 1305
Leu Phe Arg Tyr Asn Ser Arg Asn Asp Thr Ser Lys His Tyr Val
1310 1315 1320
Glu Leu Lys Pro Tyr Asp Lys Ala Lys Phe Glu Gly Asn Gln Pro
1325 1330 1335
Leu Met Ala Leu Phe Gly Asn Val Ala Lys Gly Gly Gln Cys Leu
1340 1345 1350
Lys Gly Leu Asn Lys Ala Asn Ile Ser Ile Tyr Lys Val Gln Thr
1355 1360 1365
Asp Val Leu Gly Asn Lys Arg Phe Ile Lys Lys Glu Gly Asp Ala
1370 1375 1380
Pro Lys Leu Glu Phe Ser Gly Gly Ser Gly Gly Ser Gly Gly Ser
1385 1390 1395
Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly Lys Gln Leu
1400 1405 1410
Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu Val Glu Glu
1415 1420 1425
Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val His Thr Ala
1430 1435 1440
Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu Thr Ser Asp
1445 1450 1455
Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln Asp Ser Asn
1460 1465 1470
Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser Gly Gly Ser
1475 1480 1485
Gly Gly Ser Thr Asn Leu Ser Asp Ile Ile Glu Lys Glu Thr Gly
1490 1495 1500
Lys Gln Leu Val Ile Gln Glu Ser Ile Leu Met Leu Pro Glu Glu
1505 1510 1515
Val Glu Glu Val Ile Gly Asn Lys Pro Glu Ser Asp Ile Leu Val
1520 1525 1530
His Thr Ala Tyr Asp Glu Ser Thr Asp Glu Asn Val Met Leu Leu
1535 1540 1545
Thr Ser Asp Ala Pro Glu Tyr Lys Pro Trp Ala Leu Val Ile Gln
1550 1555 1560
Asp Ser Asn Gly Glu Asn Lys Ile Lys Met Leu Ser Gly Gly Ser
1565 1570 1575
Lys Arg Thr Ala Asp Gly Ser Glu Phe Glu Pro Lys Lys Lys Arg
1580 1585 1590
Lys Val
1595
<210> 6
<211> 4785
<212> DNA
<213> 人工序列
<400> 6
ccaaagaaga agcggaaagt cagcagtgaa accggaccag tggcagtgga cccaaccctg 60
aggagacgga ttgagcccca tgaatttgaa gtgttctttg acccaaggga gctgaggaag 120
gagacatgcc tgctgtacga gatcaagtgg ggcacaagcc acaagatctg gcgccacagc 180
tccaagaaca ccacaaagca cgtggaagtg aatttcatcg agaagtttac ctccgagcgg 240
cacttctgcc cctctaccag ctgttccatc acatggtttc tgtcttggag cccttgcggc 300
gagtgttcca aggccatcac cgagttcctg tctcagcacc ctaacgtgac cctggtcatc 360
tacgtggccc ggctgtatca ccacatggac cagcagaaca ggcagggcct gcgcgatctg 420
gtgaattctg gcgtgaccat ccagatcatg acagccccag agtacgacta ttgctggcgg 480
aacttcgtga attatccacc tggcaaggag gcacactggc caagataccc acccctgtgg 540
atgaagctgt atgcactgga gctgcacgca ggaatcctgg gcctgcctcc atgtctgaat 600
atcctgcgga gaaagcagcc ccagctgaca tttttcacca ttgctctgca gtcttgtcac 660
tatcagcggc tgcctcctca tattctgtgg gctacaggcc tgaagtctgg aggatctagc 720
ggaggatcct ctggcagcga gacaccagga acaagcgagt cagcaacacc agagagcagt 780
ggcggcagca gcggcggcag caacggcaag atcctgggac tggccatcgg agttgcatct 840
gttggagtgg gcatcctgga caagaagacc ggcgagatca tccacgccag cagcagaatc 900
ttccccgccg ccacagccga tagcaacgtg gaacggaggg gcttcagaca gggaagacgg 960
ctgggccgta gaaaaaaaca cagaaaggtg cggttggccg atctgttcag cgacaccggc 1020
ctgataacag acttctctaa agtgtctatc aacctgaacc cctacgagct gcggatcaag 1080
ggcctcaatg agaaactgac aaacgaggaa ctgttcatcg ccctgaagaa catcgtgaag 1140
agaagaggca tcagctacct ggatgacgcc aatgaggacg gcgagagctc ctctagcgag 1200
tacggcaagg ctgtggaaga aaaccgaaag ttgctggccg acaagactcc tggccagatc 1260
cagctggaac gcttcgaaaa gtacggacag gtccgaggag atttcaccat cgaggaaaac 1320
ggcgaaaagc atagactgct gaacgtgttc agcaccagcg cctataagaa agaagccgag 1380
cggattctga ccaagcagca agattacaac caagacatca ccgacgagtt catccaggcc 1440
tacctgacaa tcctgacggg aaagagaaag tactaccatg gccccggcaa cgagaagtct 1500
agaaccgact acggccggtt caggaccgat ggcaccaccc tggacaacat ctttggcatc 1560
ctgatcggca aatgtacatt ctacccagag gagtaccggg cggccaaggc ctcttacacc 1620
gcccaggagt ttaacctcct gaatgacctg aacaatctga cagttccaac cgagacaaag 1680
aaactgagcg aggaacagaa gcggcaaatc atcgagtacg ccaagggagc caagacactt 1740
ggagccgcca ccctgctcaa gtacatcgcc aagctggtgg acggctctgt ggaggatatc 1800
aagggctata gaattgataa aagcgagaaa cctgagatgc acacattcga tatctacaga 1860
aagatgcaga cactggaaac cgtggatgtg gaaaagctgt cacgcgaggt gctggatgag 1920
ctggcccata tcctgacact gaataccgag agagaaggta tcgaggaggc catcaaggtc 1980
agctttatca agagagagtt cgaacaggac cagatcgccg agctggtcag cttccggaag 2040
tccaactcta gcctgtttgg caagggctgg cacaacttca gtatcaaact gatgacagaa 2100
ctgatccccg agctgtatga gaccagcgaa gagcagatga ccatcctgac cagactggga 2160
aagcaaaaga caaaggctag aagcaagcgc acaaagtaca tcgacgagaa ggagctgacc 2220
gacgagatct acaaccccgt ggtggccaag agcgtgagac aggccattaa gatcatcaac 2280
ctggccacca agaagtacgg cgtgttcgac aacatcgtga tcgagatggc cagagagaac 2340
aacgaggagg atgccaagaa agattacgtg aaaagacaaa aagctaatga ggacgaaaag 2400
aacgccgcta tggaaaaggc tgcccaccag tacaacggca agaaggagct gcccgataac 2460
gtgtttcacg gccacaagga actggccaca aagatcagac tgtggcacca gcagggcgag 2520
aagtgcctgt acaccggcaa aaacatccct atctctgatc tgatccacaa ccagtataag 2580
tacgagatcg accacatcct gcctctgtca ctgagcttcg acgacagcct ggccaataag 2640
gtgctggtgc tcgctaccgc caaccaggag aagggccaaa gaacaccttt ccaggccctc 2700
gacagcatgg acgatgcgtg gtcctataga gaatttaagg cctacgtgcg gggcgccaga 2760
gccctgagca acaagaaaaa agattacctg ctgaatgaag aggacatcaa caagatcgaa 2820
gtgaagcaga aattcatcga gaggaacctt gtggacactc ggtactcctc tagagtggtc 2880
ctgaacgccc tgcaggactt ctacaagctg aatgatttcg acaccaagat cagcgtggtg 2940
agaggccagt tcaccagcca gctgagacgg aaatggagaa tcgacaagag cagagaaacc 3000
taccaccacc acgccgtgga cgctctgatc attgccgcta gctcgcagct gagactgtgg 3060
aagaagcagg gcaacccact gatcagctac aaggaaaacc agttcgtcga ctccgaaacc 3120
ggagaaatta tcagcctcac agatgatgaa tacaaggaac tggtgttccg ggctccatac 3180
gaccacttcg tggacacagt gagcagcaaa aagtttgaag acagaatcct tttctcctac 3240
caggtggatt ccaaatacaa ccggaaaatc agcgacgcca ccatttactc taccagaaag 3300
gccaagctgg gcaaagacaa gagcgaggaa acctacgtgc tgggcaagat aaaggacatc 3360
tacacccaga ccggctacga tgccttcatc aagctgtaca agaaggacaa gtccaaattt 3420
ctgatgtacc acaaggatcc tatcaccttt gagaaggtga tcgaggaaat cctgaagacc 3480
taccccgaca aggaaatcaa cgagaagggc aaggaagtgg catgcaaccc ttttgaaaaa 3540
tatagacagg agaatggacc tctgagaaag tattctaaga aaggtaaggg ccctgagatc 3600
aagagcctga agtactacga caacaaactc ggcaaccaca tcgacataac ccctgacaac 3660
agcgaaaatc aggtgatcct ccagtccctg aaaccttggc ggaccgacgt gtacttcaac 3720
cacaaaacca agatttatga gctgatgggc ctgaagtaca gcgacctgag cttcgagaag 3780
ggcagcggca agtaccggat tagcctggac aaatataacg tgatcaagaa aaaggagggc 3840
gtgcacaagg aaagcgagtt caagttcaca ctgtacaaga acgacctgat cctaatcaag 3900
gatctggaaa agagcgagca gcagctgttt agatacaaca gccggaacga tacatccaag 3960
cactacgtgg agctgaagcc ttacgacaag gccaaattcg agggaaatca acctctgatg 4020
gccctgttcg gcaatgtggc caagggaggc cagtgcctga agggcctgaa caaagccaac 4080
atcagcatct acaaggtgca gaccgacgtg ctgggcaaca agcggttcat caagaaagaa 4140
ggcgacgctc ctaagctgga atttagcggc gggagcggcg ggagcggggg gagcactaat 4200
ctgagcgaca tcattgagaa ggagactggg aaacagctgg tcattcagga gtccatcctg 4260
atgctgcctg aggaggtgga ggaagtgatc ggcaacaagc cagagtctga catcctggtg 4320
cacaccgcct acgacgagtc cacagatgag aatgtgatgc tgctgacctc tgacgccccc 4380
gagtataagc cttgggccct ggtcatccag gattctaacg gcgagaataa gatcaagatg 4440
ctgagcggag gatccggagg atctggaggc agcaccaacc tgtctgacat catcgagaag 4500
gagacaggca agcagctggt catccaggag agcatcctga tgctgcccga agaagtcgaa 4560
gaagtgatcg gaaacaagcc tgagagcgat atcctggtcc ataccgccta cgacgagagt 4620
accgacgaaa atgtgatgct gctgacatcc gacgccccag agtataagcc ctgggctctg 4680
gtcatccagg attccaacgg agagaacaaa atcaaaatgc tgtctggcgg ctcaaaaaga 4740
accgccgacg gcagcgaatt cgagcccaag aagaagagga aagtc 4785
<210> 7
<211> 4937
<212> DNA
<213> 人工序列
<400> 7
gagggcctat ttcccatgat tccttcatat ttgcatatac gatacaaggc tgttagagag 60
ataattggaa ttaatttgac tgtaaacaca aagatattag tacaaaatac gtgacgtaga 120
aagtaataat ttcttgggta gtttgcagtt ttaaaattat gttttaaaat ggactatcat 180
atgcttaccg taacttgaaa gtatttcgat ttcttggctt tatatatctt gtggaaagga 240
cgaaacaccg tgagaccgag agagggtctc agtttttgta ctctcaagaa attgcagaag 300
ctacaaagat aaggcttcat gccgaaatca acaccctgtc tcttggcggg gtgttttttt 360
ttttaaagaa ttctcgacct cgagacaaat ggcagtattc atccacaatt ttaaaagaaa 420
aggggggatt ggggggtaca gtgcagggga aagaatagta gacataatag caacagacat 480
acaaactaaa gaattacaaa aacaaattac aaaaattcaa aattttcggg tttattacag 540
ggacagcaga gatccacttt ggccgcggct cgagggggtt ggggttgcgc cttttccaag 600
gcagccctgg gtttgcgcag ggacgcggct gctctgggcg tggttccggg aaacgcagcg 660
gcgccgaccc tgggactcgc acattcttca cgtccgttcg cagcgtcacc cggatcttcg 720
ccgctaccct tgtgggcccc ccggcgacgc ttcctgctcc gcccctaagt cgggaaggtt 780
ccttgcggtt cgcggcgtgc cggacgtgac aaacggaagc cgcacgtctc actagtaccc 840
tcgcagacgg acagcgccag ggagcaatgg cagcgcgccg accgcgatgg gctgtggcca 900
atagcggctg ctcagcaggg cgcgccgaga gcagcggccg ggaaggggcg gtgcgggagg 960
cggggtgtgg ggcggtagtg tgggccctgt tcctgcccgc gcggtgttcc gcattctgca 1020
agcctccgga gcgcacgtcg gcagtcggct ccctcgttga ccgaatcacc gacctctctc 1080
cccaggggga tccatggtga gcaagggcga ggagctgttc accggggtgg tgcccatcct 1140
ggtcgagctg gacggcgacg taaacggcca caagttcagc gtgtccggcg agggcgaggg 1200
cgatgccacc tacggcaagc tgaccctgaa gttcatctgc accaccggca agctgcccgt 1260
gccctggccc accctcgtga ccaccctgac ctacggcgtg cagtgcttca gccgctaccc 1320
cgaccacatg aagcagcacg acttcttcaa gtccgccatg cccgaaggct acgtccagga 1380
gcgcaccatc ttcttcaagg acgacggcaa ctacaagacc cgcgccgagg tgaagttcga 1440
gggcgacacc ctggtgaacc gcatcgagct gaagggcatc gacttcaagg aggacggcaa 1500
catcctgggg cacaagctgg agtacaacta caacagccac aacgtctata tcatggccga 1560
caagcagaag aacggcatca aggtgaactt caagatccgc cacaacatcg aggacggcag 1620
cgtgcagctc gccgaccact accagcagaa cacccccatc ggcgacggcc ccgtgctgct 1680
gcccgacaac cactacctga gcacccagtc cgccctgagc aaagacccca acgagaagcg 1740
cgatcacatg gtcctgctgg agttcgtgac cgccgccggg atcactctcg gcatggacga 1800
gctgtacaag taaagcggcc gcgactctag atcataatca gccataccac atttgtagag 1860
gttttacttg ctttaaaaaa cctcccacac ctccccctga acctgaaaca taaaatgaat 1920
gcaattgttg ttgttaactt gtttattgca gcttataatg gttacaaata aagcaatagc 1980
atcacaaatt tcacaaataa agcatttttt tcactgcatt ctagttgtgg tttgtccaaa 2040
ctcatcaatg tatcttagtc gaccgatgcc cttgagagcc ttcaacccag tcagctcctt 2100
ccggtgggcg cggggcatga ctatcgtcgc cgcacttatg actgtcttct ttatcatgca 2160
actcgtagga caggtgccgg cagcgctctt ccgcttcctc gctcactgac tcgctgcgct 2220
cggtcgttcg gctgcggcga gcggtatcag ctcactcaaa ggcggtaata cggttatcca 2280
cagaatcagg ggataacgca ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga 2340
accgtaaaaa ggccgcgttg ctggcgtttt tccataggct ccgcccccct gacgagcatc 2400
acaaaaatcg acgctcaagt cagaggtggc gaaacccgac aggactataa agataccagg 2460
cgtttccccc tggaagctcc ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat 2520
acctgtccgc ctttctccct tcgggaagcg tggcgctttc tcatagctca cgctgtaggt 2580
atctcagttc ggtgtaggtc gttcgctcca agctgggctg tgtgcacgaa ccccccgttc 2640
agcccgaccg ctgcgcctta tccggtaact atcgtcttga gtccaacccg gtaagacacg 2700
acttatcgcc actggcagca gccactggta acaggattag cagagcgagg tatgtaggcg 2760
gtgctacaga gttcttgaag tggtggccta actacggcta cactagaaga acagtatttg 2820
gtatctgcgc tctgctgaag ccagttacct tcggaaaaag agttggtagc tcttgatccg 2880
gcaaacaaac caccgctggt agcggtggtt tttttgtttg caagcagcag attacgcgca 2940
gaaaaaaagg atctcaagaa gatcctttga tcttttctac ggggtctgac gctcagtgga 3000
acgaaaactc acgttaaggg attttggtca tgagattatc aaaaaggatc ttcacctaga 3060
tccttttaaa ttaaaaatga agttttaaat caatctaaag tatatatgag taaacttggt 3120
ctgacagtta ccaatgctta atcagtgagg cacctatctc agcgatctgt ctatttcgtt 3180
catccatagt tgcctgactc cccgtcgtgt agataactac gatacgggag ggcttaccat 3240
ctggccccag tgctgcaatg ataccgcggg acccacgctc accggctcca gatttatcag 3300
caataaacca gccagccgga agggccgagc gcagaagtgg tcctgcaact ttatccgcct 3360
ccatccagtc tattaattgt tgccgggaag ctagagtaag tagttcgcca gttaatagtt 3420
tgcgcaacgt tgttgccatt gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg 3480
cttcattcag ctccggttcc caacgatcaa ggcgagttac atgatccccc atgttgtgca 3540
aaaaagcggt tagctccttc ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt 3600
tatcactcat ggttatggca gcactgcata attctcttac tgtcatgcca tccgtaagat 3660
gcttttctgt gactggtgag tactcaacca agtcattctg agaatagtgt atgcggcgac 3720
cgagttgctc ttgcccggcg tcaatacggg ataataccgc gccacatagc agaactttaa 3780
aagtgctcat cattggaaaa cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt 3840
tgagatccag ttcgatgtaa cccactcgtg cacccaactg atcttcagca tcttttactt 3900
tcaccagcgt ttctgggtga gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa 3960
gggcgacacg gaaatgttga atactcatac tcttcctttt tcaatattat tgaagcattt 4020
atcagggtta ttgtctcatg agcggataca tatttgaatg tatttagaaa aataaacaaa 4080
taggggttcc gcgcacattt ccccgaaaag tgccacctga cgcgccctgt agcggcgcat 4140
taagcgcggc gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc agcgccctag 4200
cgcccgctcc tttcgctttc ttcccttcct ttctcgccac gttcgccggc tttccccgtc 4260
aagctctaaa tcgggggctc cctttagggt tccgatttag tgctttacgg cacctcgacc 4320
ccaaaaaact tgattagggt gatggttcac gtagtgggcc atcgccctga tagacggttt 4380
ttcgcccttt gacgttggag tccacgttct ttaatagtgg actcttgttc caaactggaa 4440
caacactcaa ccctatctcg gtctattctt ttgatttata agggattttg ccgatttcgg 4500
cctattggtt aaaaaatgag ctgatttaac aaaaatttaa cgcgaatttt aacaaaatat 4560
taacgcttac aatttgccat tcgccattca ggctgcgcaa ctgttgggaa gggcgatcgg 4620
tgcgggcctc ttcgctatta cgccagccca agctaccatg ataagtaagt aatattaagg 4680
tacgggaggt acttggagcg gccgcaataa aatatcttta ttttcattac atctgtgtgt 4740
tggttttttg tgtgaatcga tagtactaac atacgctctc catcaaaaca aaacgaaaca 4800
aaacaaacta gcaaaatagg ctgtccccag tgcaagtgca ggtgccagaa catttctcta 4860
tcgataggta ccgattagtg aacggatctc gacggtatcg atcacgagac tagcctcgag 4920
cggccgcccc cttcacc 4937
<210> 8
<211> 86
<212> DNA
<213> 人工序列
<400> 8
gtttttgtac tctcaagaaa ttgcagaagc tacaaagata aggcttcatg ccgaaatcaa 60
caccctgtct cttggcgggg tgtttt 86
<210> 9
<211> 7
<212> PRT
<213> 人工序列
<400> 9
Pro Lys Lys Lys Arg Lys Val
1 5
<210> 10
<211> 3743
<212> DNA
<213> 人工序列
<400> 10
agcggaggat cctctggcag cgagacacca ggaacaagcg agtcagcaac accagagagc 60
agtggcggca gcagcggcgg cagcaacggc aagatcctgg gactggccat cggagttgca 120
tctgttggag tgggcatcct ggacaagaag accggcgaga tcatccacgc cagcagcaga 180
atcttccccg ccgccacagc cgatagcaac gtggaacgga ggggcttcag acagggaaga 240
cggctgggcc gtagaaaaaa acacagaaag gtgcggttgg ccgatctgtt cagcgacacc 300
ggcctgataa cagacttctc taaagtgtct atcaacctga acccctacga gctgcggatc 360
aagggcctca atgagaaact gacaaacgag gaactgttca tcgccctgaa gaacatcgtg 420
aagagaagag gcatcagcta cctggatgac gccaatgagg acggcgagag ctcctctagc 480
gagtacggca aggctgtgga agaaaaccga aagttgctgg ccgacaagac tcctggccag 540
atccagctgg aacgcttcga aaagtacgga caggtccgag gagatttcac catcgaggaa 600
aacggcgaaa agcatagact gctgaacgtg ttcagcacca gcgcctataa gaaagaagcc 660
gagcggattc tgaccaagca gcaagattac aaccaagaca tcaccgacga gttcatccag 720
gcctacctga caatcctgac gggaaagaga aagtactacc atggccccgg caacgagaag 780
tctagaaccg actacggccg gttcaggacc gatggcacca ccctggacaa catctttggc 840
atcctgatcg gcaaatgtac attctaccca gaggagtacc gggcggccaa ggcctcttac 900
accgcccagg agtttaacct cctgaatgac ctgaacaatc tgacagttcc aaccgagaca 960
aagaaactga gcgaggaaca gaagcggcaa atcatcgagt acgccaaggg agccaagaca 1020
cttggagccg ccaccctgct caagtacatc gccaagctgg tggacggctc tgtggaggat 1080
atcaagggct atagaattga taaaagcgag aaacctgaga tgcacacatt cgatatctac 1140
agaaagatgc agacactgga aaccgtggat gtggaaaagc tgtcacgcga ggtgctggat 1200
gagctggccc atatcctgac actgaatacc gagagagaag gtatcgagga ggccatcaag 1260
gtcagcttta tcaagagaga gttcgaacag gaccagatcg ccgagctggt cagcttccgg 1320
aagtccaact ctagcctgtt tggcaagggc tggcacaact tcagtatcaa actgatgaca 1380
gaactgatcc ccgagctgta tgagaccagc gaagagcaga tgaccatcct gaccagactg 1440
ggaaagcaaa agacaaaggc tagaagcaag cgcacaaagt acatcgacga gaaggagctg 1500
accgacgaga tctacaaccc cgtggtggcc aagagcgtga gacaggccat taagatcatc 1560
aacctggcca ccaagaagta cggcgtgttc gacaacatcg tgatcgagat ggccagagag 1620
aacaacgagg aggatgccaa gaaagattac gtgaaaagac aaaaagctaa tgaggacgaa 1680
aagaacgccg ctatggaaaa ggctgcccac cagtacaacg gcaagaagga gctgcccgat 1740
aacgtgtttc acggccacaa ggaactggcc acaaagatca gactgtggca ccagcagggc 1800
gagaagtgcc tgtacaccgg caaaaacatc cctatctctg atctgatcca caaccagtat 1860
aagtacgaga tcgaccacat cctgcctctg tcactgagct tcgacgacag cctggccaat 1920
aaggtgctgg tgctcgctac cgccaaccag gagaagggcc aaagaacacc tttccaggcc 1980
ctcgacagca tggacgatgc gtggtcctat agagaattta aggcctacgt gcggggcgcc 2040
agagccctga gcaacaagaa aaaagattac ctgctgaatg aagaggacat caacaagatc 2100
gaagtgaagc agaaattcat cgagaggaac cttgtggaca ctcggtactc ctctagagtg 2160
gtcctgaacg ccctgcagga cttctacaag ctgaatgatt tcgacaccaa gatcagcgtg 2220
gtgagaggcc agttcaccag ccagctgaga cggaaatgga gaatcgacaa gagcagagaa 2280
acctaccacc accacgccgt ggacgctctg atcattgccg ctagctcgca gctgagactg 2340
tggaagaagc agggcaaccc actgatcagc tacaaggaaa accagttcgt cgactccgaa 2400
accggagaaa ttatcagcct cacagatgat gaatacaagg aactggtgtt ccgggctcca 2460
tacgaccact tcgtggacac agtgagcagc aaaaagtttg aagacagaat ccttttctcc 2520
taccaggtgg attccaaata caaccggaaa atcagcgacg ccaccattta ctctaccaga 2580
aaggccaagc tgggcaaaga caagagcgag gaaacctacg tgctgggcaa gataaaggac 2640
atctacaccc agaccggcta cgatgccttc atcaagctgt acaagaagga caagtccaaa 2700
tttctgatgt accacaagga tcctatcacc tttgagaagg tgatcgagga aatcctgaag 2760
acctaccccg acaaggaaat caacgagaag ggcaaggaag tggcatgcaa cccttttgaa 2820
aaatatagac aggagaatgg acctctgaga aagtattcta agaaaggtaa gggccctgag 2880
atcaagagcc tgaagtacta cgacaacaaa ctcggcaacc acatcgacat aacccctgac 2940
aacagcgaaa atcaggtgat cctccagtcc ctgaaacctt ggcggaccga cgtgtacttc 3000
aaccacaaaa ccaagattta tgagctgatg ggcctgaagt acagcgacct gagcttcgag 3060
aagggcagcg gcaagtaccg gattagcctg gacaaatata acgtgatcaa gaaaaaggag 3120
ggcgtgcaca aggaaagcga gttcaagttc acactgtaca agaacgacct gatcctaatc 3180
aaggatctgg aaaagagcga gcagcagctg tttagataca acagccggaa cgatacatcc 3240
aagcactacg tggagctgaa gccttacgac aaggccaaat tcgagggaaa tcaacctctg 3300
atggccctgt tcggcaatgt ggccaaggga ggccagtgcc tgaagggcct gaacaaagcc 3360
aacatcagca tctacaaggt gcagaccgac gtgctgggca acaagcggtt catcaagaaa 3420
gaaggcgacg ctcctaagct ggaatttagc ggcgggagcg gcgggagcgg ggggagcact 3480
aatctgagcg acatcattga gaaggagact gggaaacagc tggtcattca ggagtccatc 3540
ctgatgctgc ctgaggaggt ggaggaagtg atcggcaaca agccagagtc tgacatcctg 3600
gtgcacaccg cctacgacga gtccacagat gagaatgtga tgctgctgac ctctgacgcc 3660
cccgagtata agccttgggc cctggtcatc caggattcta acggcgagaa taagatcaag 3720
atgctgagcg gaggatccgg agg 3743
<210> 11
<211> 30
<212> DNA
<213> 人工序列
<400> 11
agcggaggat cctctggcag cgagacacca 30
<210> 12
<211> 33
<212> DNA
<213> 人工序列
<400> 12
cctccggatc ctccgctcag catcttgatc tta 33
<210> 13
<211> 24
<212> DNA
<213> 人工序列
<400> 13
accgtgggca agagtttctg ccac 24
<210> 14
<211> 24
<212> DNA
<213> 人工序列
<400> 14
aaacgtggca gaaactcttg ccca 24
<210> 15
<211> 24
<212> DNA
<213> 人工序列
<400> 15
accgctgcgt tcctagaacc acag 24
<210> 16
<211> 24
<212> DNA
<213> 人工序列
<400> 16
aaacctgtgg ttctaggaac gcag 24
<210> 17
<211> 24
<212> DNA
<213> 人工序列
<400> 17
accgaatgct ggctacagat gtcc 24
<210> 18
<211> 24
<212> DNA
<213> 人工序列
<400> 18
aaacggacat ctgtagccag catt 24
<210> 19
<211> 24
<212> DNA
<213> 人工序列
<400> 19
accgctcata tgtcacttac ctct 24
<210> 20
<211> 24
<212> DNA
<213> 人工序列
<400> 20
aaacagaggt aagtgacata tgag 24
<210> 21
<211> 24
<212> DNA
<213> 人工序列
<400> 21
accggagaca ggatctcact gtgt 24
<210> 22
<211> 24
<212> DNA
<213> 人工序列
<400> 22
aaacacacag tgagatcctg tctc 24
<210> 23
<211> 24
<212> DNA
<213> 人工序列
<400> 23
accgtgctct aggtggtgtt aatg 24
<210> 24
<211> 24
<212> DNA
<213> 人工序列
<400> 24
aaaccattaa caccacctag agca 24
<210> 25
<211> 24
<212> DNA
<213> 人工序列
<400> 25
accgcagcaa catgaacaac tgaa 24
<210> 26
<211> 24
<212> DNA
<213> 人工序列
<400> 26
aaacttcagt tgttcatgtt gctg 24
<210> 27
<211> 24
<212> DNA
<213> 人工序列
<400> 27
accgaagagc caagtcttac tgta 24
<210> 28
<211> 24
<212> DNA
<213> 人工序列
<400> 28
aaactacagt aagacttggc tctt 24
<210> 29
<211> 24
<212> DNA
<213> 人工序列
<400> 29
accgctgaca agtactagct tatg 24
<210> 30
<211> 24
<212> DNA
<213> 人工序列
<400> 30
aaaccataag ctagtacttg tcag 24
<210> 31
<211> 24
<212> DNA
<213> 人工序列
<400> 31
accgttcctc atagcaacat cact 24
<210> 32
<211> 24
<212> DNA
<213> 人工序列
<400> 32
aaacagtgat gttgctatga ggaa 24
<210> 33
<211> 19
<212> DNA
<213> 人工序列
<400> 33
ctgacctggc agataccac 19
<210> 34
<211> 20
<212> DNA
<213> 人工序列
<400> 34
ccacaggact taggaacgac 20
<210> 35
<211> 23
<212> DNA
<213> 人工序列
<400> 35
cccttgaaaa gtgcagtgtg tcg 23
<210> 36
<211> 23
<212> DNA
<213> 人工序列
<400> 36
ggcaattccc tttgaaagac tgc 23
<210> 37
<211> 21
<212> DNA
<213> 人工序列
<400> 37
ccgaggtact gttgctgctt c 21
<210> 38
<211> 22
<212> DNA
<213> 人工序列
<400> 38
gagatggcaa gcctttgttg cg 22
<210> 39
<211> 22
<212> DNA
<213> 人工序列
<400> 39
gatgctcatt ggtagctcgt gc 22
<210> 40
<211> 25
<212> DNA
<213> 人工序列
<400> 40
ctatctgtcc atccatgcat ttgcc 25
<210> 41
<211> 20
<212> DNA
<213> 人工序列
<400> 41
cctactgcgg atgccttctt 20
<210> 42
<211> 21
<212> DNA
<213> 人工序列
<400> 42
ttagcttggt gtggcagcat g 21
<210> 43
<211> 25
<212> DNA
<213> 人工序列
<400> 43
caagtcattg tgatgactga ggagc 25
<210> 44
<211> 19
<212> DNA
<213> 人工序列
<400> 44
ggccagccta tgatgggcc 19
<210> 45
<211> 25
<212> DNA
<213> 人工序列
<400> 45
ggatgctgtg atgactgaga cgtag 25
<210> 46
<211> 28
<212> DNA
<213> 人工序列
<400> 46
tggacatttt gagtttgaaa aggctgtg 28
<210> 47
<211> 24
<212> DNA
<213> 人工序列
<400> 47
caggcgtgct gtaatacatg aacc 24
<210> 48
<211> 26
<212> DNA
<213> 人工序列
<400> 48
gtcaccatag gataggaagt cagcag 26
<210> 49
<211> 18
<212> DNA
<213> 人工序列
<400> 49
gtcccactgc accagcag 18
<210> 50
<211> 32
<212> DNA
<213> 人工序列
<400> 50
cctattctat ctgagggagg acatgattga ag 32
<210> 51
<211> 26
<212> DNA
<213> 人工序列
<400> 51
ctctgcctgg aagaataatg agaacc 26
<210> 52
<211> 23
<212> DNA
<213> 人工序列
<400> 52
ccaggatggt gtttgtgaga tgg 23