桑树WRKY转录因子的全基因组鉴定及生物信息学分析

Genome-wide identification and bioinformatics analysis for mulberry WRKY transcription factors

  • 摘要: 目的明确桑树基因组中WRKY转录因子家族结构及其功能特征,为进一步揭示WRKY转录因子家族生物学功能提供科学依据.方法利用生物信息学方法对桑树WRKY转录因子的数目、类型、结构、系统进化关系、保守结构域和密码子使用偏性等进行全面分析.结果基于桑树全基因组蛋白数据库,共鉴定出55个桑树WRKY转录因子家族基因,占桑树基因总数(29261)的1.88%.桑树WRKY转录因子存在6种内含子数量类型及15种内含子相位类型,其中27个基因含有2个内含子,25个基因的相位类型为2-2型.保守结构域系统进化分析结果显示,桑树WRKY转录因子家族蛋白主要分为三大类(Ⅰ、Ⅱ和Ⅲ),Ⅰ类可分为ⅠN和ⅠC两个亚组,Ⅱ类根据聚类情况又可分为Ⅱa、Ⅱb、Ⅱc、Ⅱd和Ⅱe等5个亚组.桑树WRKY转录因子蛋白保守结构域分析发现有五类Motif的保守性较强,桑树WRKY转录因子蛋白中均包含C端Motif l,Ⅰ类蛋白同时含有N端Motif 3.桑树WRKY转录因子家族基因启动子区富含PBF(C2H2锌指因子)和AHL(拟南芥hook因子)元件.密码子使用偏性分析结果显示,桑树WRKY转录因子家族基因的有效密码子数(ENC)介于48.00~60.00,密码子第3位GC含量(GC3s)介于0.330~0.722,平均亲水性值(Gravy)均为负值;同义密码子相对使用度(RSCU)>1.000的密码子有29个,且以A(6个)或T(11个)结尾较G(4个)或C(8个)结尾的略多.结论桑树WRKY转录因子家族包含55个成员,内含子相位类型一致的同组成员可能来源于同一祖先基因,且与基因复制和基因组重排有关;蛋白序列高度保守,在植物抵御环境胁迫过程中发挥作用;基因密码子使用偏性较弱,主要受碱基突变选择压力影响.

     

    Abstract: Objective In this study,WRKY transcription factor family structure in mulberry genome was identified to provide reference for revealing biological fimction of WRKY transcription factor family.MethodThe number,type,structure,system evolution,conserved domain and codon usage of mulberry WRKY transcription factor family were analyzed by bioinformatics methods.RcsultA total of 55 mulberry WRKY transcription factor genes were identified based on mulberry whole genome protein database,which accounted for 1.88% of total mulberry genes (29261).Mulberry WRKY transcription factor family was divided into six types based on intron number and fifteen types based on intron phase,of which twenty-seven genes contained two introns and twenty-five genes belonged to 2-2 intron phase type.The phylogenetic analysis on conserved domian showed that mulberry WRKY transcription factor family proteins were divided into three categories(Ⅰ,Ⅱ and Ⅲ).Category Ⅰ could be separated into Ⅰ N and Ⅰ C,and category Ⅱ was classified into Ⅱ a,Ⅱ b,Ⅱ c,Ⅱ d and Ⅱ e.The analysis on conserved domain of mulberry WRKY transcription factor family proteins showed that five types of Motif were highly conserved.In all mulberry WRKY transcription factor proteins,C terminal Motif 1 was contained,at the same time,category Ⅰ proteins contained N terminal Motif 3.The WRKY transcription factor family gene promoter region was rich in PBF(C2H2 zinc finger factors) and AHL(Arabidopsis thaliana hook factors) elements.Usage bias of codon indicated that effective codon number of mulberry WRKY transcription factor family genes (ENC) was 48.00-60.00,GC content of the third position in codon(GC3s) was 0.330-0.722.The average hydrophilism values were all negative.Relative synonymous codon usage (RSCU) of twenty-nine codons>1.000,and the number of codons ended with A (six) or T (eleven) was larger than those ended with G (four) or C (eight).Conclusion Fifty-five members are identified in mulberry WRKY transcription factor family.Genes with the same introns phase in the same category probably derive from a common ancestral gene,and relate to gene duplication and genome rearrangement events.The protein sequence is highly conserved and they function under environment stress.The codon usage bias of most genes is weak,and mainly affected by selection pressure of base mutation.

     

/

返回文章
返回