河八王转录组SSR和SNP序列特征及系统发育分析

Analysis of SSR,SNP sequence features and phylogeny of the transcriptome of Narenga porphyrocoma(Hance)Bor.

  • 摘要: 【目的】对甘蔗野生种河八王转录组SSR和SNP序列特征及系统发育进行分析,为深入研究甘蔗属植物分子标记开发、种质资源利用、种群遗传结构和分化历史动态提供参考。【方法】基于河八王转录组数据,利用MISA和SOAPsnp软件对获得的Unigenes进行SSR和SNP位点发掘及序列特征分析,并从JGI数据库下载二穗短柄草、水稻、谷子、高粱、玉米物和拟南芥转录组数据,采用最大似然(ML)方法构建系统发育进化树,并估算物种分歧时间。【结果】通过河八王转录组测序共获得171000000条Raw reads,经过数据过滤获得156800000条Clean reads,经进一步组装后获得130393条Unigenes,其中有14233条Unigenes含有16372个SSR位点,发生频率为12.56%。含有1个以上SSR位点的Unigenes有1839条,复合型SSR位点有656个。SSR重复基元类型丰富,从单核苷酸到六核苷酸重复均有分布,共有612种SSR重复基元种类,数量最多的类型为三核苷酸重复类型(49.16%),其次是二核苷酸重复(25.54%)和单核苷酸重复(18.30%)。在所有的核苷酸重复类型中,重复基元占SSR位点总数比例<0.50%的类型有14种,出现频率较高的3种重复基元分别为CCG/CGG、A/T和AG/CT。SSR序列长度为12~191 bp,其中长度≤25 bp的SSR位点共有15123个,占SSR位点总数的96.23%,其中长度为15 bp的数量最多,占SSR位点总数的32.16%。SSR重复基元的重复次数为4~24次,且以5、6和7次重复为主。河八王转录组序列共有222106个SNP位点,平均每条Unigenes上有1.70个SNP位点,核苷酸转换类型的比例(65.92%)明显高于颠换类型(34.08%),6种单核苷酸变异类型中,A/G发生频率最高(33.07%),其次是C/T (32.84%)。系统发育分析和物种分化时间估算结果显示,河八王与高粱的亲缘关系最近,分化时间为14.6百万年(Ma)。【结论】河八王转录组中SSR和SNP位点非常丰富,具有较高的遗传多态性,说明利用转录组测序开发甘蔗SSR和SNP分子标记是一种切实可行的方法。利用转录组数据构建系统发育进行树的方法可用于其他缺乏基因组数据的物种系统发育研究。

     

    Abstract: 【Objective】 The purpose of the study was to analyze the SSR,SNP sequence features and phylogeny of the transcriptome of the wild sugarcane species Narenga porphyrocoma(Hance)Bor.,so as to provide a reference for in-depth study of molecular marker development,germplasm resources utilization,population genetic structure and historical dynamics of differentiation of sugarcane plants.【Method】Based on transcriptome data of N.porphyrocoma,MISA and SOAPSnp softwarewere used to excavate SSR and SNP loci,and to analyze the sequence features of the obtained Unigenes. The transcriptome data of Brachypodium distachyon, Oryza sativa, Setaria italic, Sorghum bicolor,Zea mays, and Arabidopsis thaliana were downloaded from the JGI database,and the maximum likelihood(ML)method was used to construct a phylogenetic evolutionary tree and estimate the species divergence time.【Result】A total of 171000000 Raw reads were obtained by sequencing the transcriptome of N. porphyrocoma,156800000 Clean reads were obtained after data filtering,and 130393 Unigenes were obtained after further assembly. Among them,14233 Unigenes contained 16372 SSR loci,and the frequency of occurrence was 12.56%. There were 1839 Unigenes containing more than 1 SSR locus and 656 of compound SSR loci. SSR repeat motiftypes were abundant,ranging from single nucleotide to six nucleotide repeats, with a total of 612 SSR repeat motif types. The type with the largest number was trinucleotide repeat type(49.16%), followed by dinucleotide repeats(25.54%)and single nucleotide repeats(18.30%). Among all nucleotide repeat types, there were 14 types in which the proportion of repeat motifs to the total number of SSR loci was <0.50%. and the three types of repeat motifs with higher frequency were CCG/CGG,A/T and AG/CT respectively. The sequence lengths of SSR loci were 12-191 bp. There were atotal of 15123 SSR loci with sequence length ≤ 25 bp,accounting for 96.23% of the total number of SSR loci. Amongthem,the number of loci with SSR length of 15 bp was the largest, accounting for 32.16% of the total number of SSR loci. The number of repeats of SSR nucleotide repeat motifs was 4-24,and 5,6 and 7 repeats were dominant. There were 222106 SNP loci in the transcriptome sequence of N. porphyrocoma,with an average of 1.70 SNP loci in each Unigenes. The proportion of nucleotide conversion types(65.92%)was higher than that of transversion types(34.08%). Among the six types of single nucleotide variant types,the frequency of A/G was the highest (33.07%),followed by C/T(32.84%). The results of phylogenetic analysis and species divergence time estimation showed that N. porphyrocoma had the closest genetic relationship with S. bicolor,and the differentiation time was 14.6 million years(Ma).【Conclusion】The SSR and SNP loci in the transcriptome of N. porphyrocoma are very abundant with high genetic polymorphism,indicating that it is a feasible method to develop SSR and SNP molecular markers in sugarcane by using transcriptome sequencing. The method of using transcriptome data to construct phylogenetic evolutionary tree can be used for phylogenetic analysis studies of other species lacking genomic data.

     

/

返回文章
返回