孙利娜, 林茂, 黄旭光, 陈尔, 杨舒婷, 王华新, 龚建英. 2024: 伊丽莎白安格斯三角梅转录组的SSR、SNP和InDel特征分析. 南方农业学报, 55(3): 745-753. DOI: 10.3969/j.issn.2095-1191.2024.03.015
引用本文: 孙利娜, 林茂, 黄旭光, 陈尔, 杨舒婷, 王华新, 龚建英. 2024: 伊丽莎白安格斯三角梅转录组的SSR、SNP和InDel特征分析. 南方农业学报, 55(3): 745-753. DOI: 10.3969/j.issn.2095-1191.2024.03.015
SUN Li-na, LIN Mao, HUANG Xu-guang, CHEN Er, YANG Shu-ting, WANG Hua-xin, GONG Jian-ying. 2024: SSR, SNP and InDel characteristics analysis based on transcriptome of Bougainvillea glabra‘Elizabeth Angus’. Journal of Southern Agriculture, 55(3): 745-753. DOI: 10.3969/j.issn.2095-1191.2024.03.015
Citation: SUN Li-na, LIN Mao, HUANG Xu-guang, CHEN Er, YANG Shu-ting, WANG Hua-xin, GONG Jian-ying. 2024: SSR, SNP and InDel characteristics analysis based on transcriptome of Bougainvillea glabra‘Elizabeth Angus’. Journal of Southern Agriculture, 55(3): 745-753. DOI: 10.3969/j.issn.2095-1191.2024.03.015

伊丽莎白安格斯三角梅转录组的SSR、SNP和InDel特征分析

SSR, SNP and InDel characteristics analysis based on transcriptome of Bougainvillea glabra‘Elizabeth Angus’

  • 摘要: 【目的】基于转录组测序数据分析伊丽莎白安格斯三角梅SSR、SNP和InDel位点特征,为开发三角梅分子标记、选育无刺或少刺品种、品种鉴定及亲缘关系分析提供理论依据。【方法】以伊丽莎白安格斯三角梅3个时期的枝刺和茎段为材料,对其进行转录组测序,采用Trinity对获得的高质量测序数据进行序列组装,利用MISA和GATK3对SSR、SNP和InDel进行特征分析。【结果】18个样本转录组测序平均获得45905982 bp Raw data,质控过滤后获得45640193 bp Clean data,拼接后获得312812条转录本和144512条Unigenes,有54516个SSR位点分布于40820条Unigenes上,发生频率为28.25%,平均分布距离为2.67 kb,包含1个以上SSR位点的Unigenes 10269条,占Unigenes总数的4.25%。在重复基元类型中,单核苷酸、二核苷酸和三核苷酸重复数量占优势,其中单核苷酸重复数量最多(39904个,占比73.20%),其次为二核苷酸重复(8169个,占比14.98%)和三核苷酸重复(5899个,占比10.82%),五核苷酸重复最少(31个,占比0.06%)。单核苷酸~六核苷酸重复类型共检测到98种重复基元,出现频率为0.01%~25.71%,其中出现频率最高的基元为A/T(37151个),占SSR位点总数的68.15%。SSR各类型重复基元的重复次数集中在5~23次,SSR序列的长度10~60 bp,平均长度为20.38 bp。共检测到231248个SNP位点和99580个InDel位点,其中SNP位点平均分布距离为1.59 kb,InDel位点平均分布距离为0.68 kb,且均以含1个位点的Unigenes数量最多,Unigenes数量随SNP和InDel位点数量的增加而逐渐减少。【结论】伊丽莎白安格斯三角梅转录组中SSR位点数量多、类型丰富,分布特征明显,可用于开发大量SSR标记,SNP和InDel位点发生频率低于模式植物,有待深度挖掘。

     

    Abstract: 【Objective】The characteristics of SSR, SNP and InDel sites in Bougainvillea glabra ‘Elizabeth Angus’ were analyzed based on transcriptome sequencing data to provide theoretical basis for developing molecular markers, breeding thornless or less thorn varieties, variety identification and kinship analysis of B. spectabilis Willd. 【Method】The branch thorn and stem segment at three development stages of B. glabra‘ Elizabeth Angus’ were used to transcriptomed, The obtained high-quality sequencing data were sequenced and assembled by Trinity, and SSR, SNP and InDel were characterized using MISA and GATK3. 【Result】A total of 45905982 bp raw data were obtained from the transcriptome of 18 samples, and 45640193 bp clean data were obtained after quality control filtration. 312812 transcripts and 144512 unigenes were obtained after splicing, and 54516 SSR sites were distributed on 40820 unigenes, the frequency was 28.25%, the average distance was 2.67 kb, and 10269 unigenes contained more than one SSR locus, accounting for 4.25% of the total number of unigenes. Among the repeat unit types, the numbers of mononucleotide, dinucleotide and trinucleotide repeats were dominant, mononucleotide type had the largest number of repeat motifs (39904,73.20%) , the second was dinucleotide repeats (8169,14.98%) and trinucleotide repeats (5899,10.82%), the pentanucleotide repeats was the least (31,0.06%). A total of 98 repetitive motifs were detected from mononucleotide to hexanucleotide repeat types, with an occurrence frequency of 0.01%-25.71%. Among them, the most frequent motif was A/T (37151), accounting for 68.15% of the total SSR sites. The motif repeats of SSR mainly concentrated in 5-23 times and the length of SSR sequences was mainly 10-60 bp, the average length was 20.38 bp. A total of 231248 SNP sites and 99580 InDel sites were detected, with an average distribution distance of 1.59 kb SNP site and 0.68 kb InDel site respectively, and the number of unigenes contained one site was the largest, and the number of unigenes gradually decreased with increa-sing the number of SNP and InDel sites.【 Conclusion】B. glabra‘ Elizabeth Angus’ transcriptome has abundant SSR sites, rich types and obvious distribution characteristics, which can be used to develop a large number of SSR markers. SNP and InDel sites occur less frequently than model plants, which requires further mining.

     

/

返回文章
返回