香蕉基因组密码子使用偏好性分析
Codon usage bias of banana genome
-
摘要: 目的分析香蕉基因组的密码子组成及使用偏好性,探讨影响密码子偏好性形成的主要因素,为提高香蕉外源基因的表达水平及转基因抗病育种提供参考.方法以香蕉基因组的36242条高置信蛋白编码基因CDS序列为研究对象,运用CodonW 1.4.4统计分析香蕉基因组的密码子组成及使用参数,确定最优密码子,并分析密码子使用参数间相关性.结果从香蕉基因组数据中筛选获得36242个高置信蛋白编码基因CDS序列,平均长度为1035 bp,GC含量为3.0%~75.8%,其中低于20.0%的仅13个序列,全基因组中GC总含量为50.4%;同义密码子第3位出现G或C的频率为52.9%,比出现A或T的频率高.香蕉基因组的有效密码子数(ENC)介于20.0~61.0,平均为50.7;共有17个最优密码子,其中有15个密码子的第3位是G或C;基因编码区的长度和ENC存在正相关,随着基因编码区长度的增加,对以G或C结尾的密码子使用偏好性逐渐降低,且编码区长度为400~600 bp的基因具有较高的基因表达水平.结论香蕉基因组中多数基因的密码子使用偏好性较弱,但少部分基因具有强偏好性,偏好使用以G或C结尾的密码子,且偏好性受核苷酸组成、基因突变及自然选择等因素的影响.Abstract: ObjectiveIn order to provide reference basis for improving expression level of exogenous genes in banana and breeding disease-resistant banana,codon composition and usage bias of banana genome were analyzed,and the factors affecting codon usage bias were studied.MethodA total of 36242 high-confidence protein sequences in banana genome were used as data source.Codon composition and usage parameters were analyzed to determine optimal codons using CodonW 1.4.4,then correlation between usage parameters were analyzed.ResultCDS sequences of 36242 high confidence protein coding genes were selected from Banana Genome Database,their average length was 1035 bp.GC content of these genes ranged from 3.0% to 75.8%,with the average of 50.4% for whole genome.But there were only 13 sequences with GC content being less than 20.0%.Occurrence frequency of G or C in the third position of synonymous codon was 52.9%,which was higher than that of A or T.The effective number of codon(ENC) of banana genome ranged from 20.0 to 61.0,with the average of 50.7.A total of 17 optimal codons were found from banana genome,15 of which the third position of synonymous codon was G or C.Furthermore,length of gene coding regions was positively correlated with ENC.In other words,the longer gene coding region was,the weaker usage bias of codons ending with G or C was.And genes with length of 400-600 bp showed higher expression level.ConclusionThe codon usage bias for most of genes is weak in banana genome,but a few of genes show stronger codon usage bias,and prefer codons ending with G or C.Meanwhile,the codon usage bias of banana genome is affected by some factors like composition of nucleotides,mutations,natural selection etc.