刘玉萍, 王棋, 黄新芯, 徐开达, 王忠明, 高天翔, 杨天燕. 2022: 基于高通量测序的带鱼肌肉组织转录组微卫星信息分析. 南方农业学报, 53(3): 725-734. DOI: 10.3969/j.issn.2095-1191.2022.03.014
引用本文: 刘玉萍, 王棋, 黄新芯, 徐开达, 王忠明, 高天翔, 杨天燕. 2022: 基于高通量测序的带鱼肌肉组织转录组微卫星信息分析. 南方农业学报, 53(3): 725-734. DOI: 10.3969/j.issn.2095-1191.2022.03.014
LIU Yu-ping, WANG Qi, HUANG Xin-xin, XU Kai-da, WANG Zhong-ming, GAO Tian-xiang, YANG Tian-yan. 2022: Bioinformatic analysis of microsatellite loci in the muscle transcriptome of Trichiurus lepturus based on high-throughput sequencing. Journal of Southern Agriculture, 53(3): 725-734. DOI: 10.3969/j.issn.2095-1191.2022.03.014
Citation: LIU Yu-ping, WANG Qi, HUANG Xin-xin, XU Kai-da, WANG Zhong-ming, GAO Tian-xiang, YANG Tian-yan. 2022: Bioinformatic analysis of microsatellite loci in the muscle transcriptome of Trichiurus lepturus based on high-throughput sequencing. Journal of Southern Agriculture, 53(3): 725-734. DOI: 10.3969/j.issn.2095-1191.2022.03.014

基于高通量测序的带鱼肌肉组织转录组微卫星信息分析

Bioinformatic analysis of microsatellite loci in the muscle transcriptome of Trichiurus lepturus based on high-throughput sequencing

  • 摘要: 【目的】通过高通量测序平台对带鱼肌肉组织进行转录组测序,从海量数据中查找微卫星(SSR)位点并进行生物信息学分析,为科学制定带鱼种质资源保护对策和管理措施提供参考依据。【方法】基于Illumina HiSeqTM 2500高通量测序平台对采自浙江舟山近海的带鱼肌肉组织进行转录组测序,经FastQC和Trimmomatic进行数据质量评估和过滤后,使用Trinity组装、去冗余得到Unigenes,再运用Micro-Satellite(MISA)挖掘SSR信息,并以Excel 2010计算SSR的数目、发生频率、出现频率、分布距离与密度、重复基元类别及重复区段长度等信息。【结果】带鱼肌肉组织转录组测序共获得40424018条Raw reads,经Trinity从头组装获得70113条Transcripts,去冗余后得到50482条Unigenes,其总长度为33886190 bp,平均长度为671.25 bp。使用MISA进行筛选,结果共发现18873个SSR位点,且这些SSR位点仅分布在其中的13082条Unigenes上,发生频率为25.91%,出现频率为37.39%。SSR按核苷酸重复类型进行分类,可分为单核苷酸重复、二核苷酸重复、三核苷酸重复、四核苷酸重复、五核苷酸重复和六核苷酸重复6种类型,以单核苷酸重复SSR数最多(10763个),出现频率高达21.32%;带鱼肌肉组织转录组SSR中共检测出173种重复基元,以四核苷酸重复基元种类最多(66种)、单核苷酸重复基元种类最少(4种)。单核苷酸重复中以A碱基的数量最多,有5167个(占48.09%);二核苷酸重复基元以TG为主,有692个(占19.55%);三核苷酸重复基元中以GAG为主,有206个(占10.08%);四核苷酸重复基元出现频率较高的有AAAC、ATGG、ATGT、CTGT、CTTT和TCCA,出现频率均为3.64%;五核苷酸重复和六核苷酸重复的基元类型数量分布较均匀,无明显优势重复基元。SSR基元重复次数主要分布在5~6次和10~12次,共9864个,占SSR总数的52.27%。【结论】经高通量测序获得的带鱼肌肉转录组SSR可用性高且具有较高的多态性潜能,在此基础上可有针对性进行引物设计,为带鱼遗传多样性评价、遗传结构分析及功能基因克隆等研究提供有效的分子标记,进而为其种质资源的保护与利用提供遗传学数据资料。

     

    Abstract: 【Objective】Transcriptome sequencing of Trichiurus lepturus muscle tissue was carried out based on highthroughput sequencing platform. Microsatellite(SSR)loci were searched from massive data and bioinformatics analysis was conducted to provide references for germplasm resources protection and management of T. lepturus.【Method】The Illumina HiSeqTM 2500 high-throughput sequencing platform was used to sequence the muscle transcriptome of T. lepturus collected from the coastal waters of Zhoushan,Zhejiang Province. The quality of high throughput sequencing data was evaluated and filtered by software FastQC and Trimmomatic. Unigenes were obtained through assembly and redundancy removal using software Trinity. Micro-Satellite(MISA)tool was performed to explore the SSR information. Excel 2010 was used to calculate the number,occurring frequency,appearing frequency,distribution distance and density,repeat motif and repeat length of SSR.【Result】A total of 40424018 raw reads were generated from muscle transcriptome sequencing. About 70113 transcripts were obtained by de novo assembly using Trinity,and 50482 unigenes were retained after deduplication,with a total length of 33886190 bp and an average length of 671.25 bp. A total of 18873 SSR loci distributing among 13082 unigenes were screened by MISA,with the occurrence frequency of 25.91% and the occurrence frequency of 37.39%,respectively. SSR was classified according to the types of nucleotide repeats,which could be divided into six types:Mononucleotide repeat,dinucleotide repeat,trinucleotide repeat,tetranucleotide repeat,pentanucleotide repeat and hexanucleotide repeat. The number of mononucleotide repeat was the largest(10763),with the occurrence frequency up to 21.32%. A total of 173 repeat motifs were detected,in which the number of tetranucleotide repeat motifs (66)was the largest and the number of mononucleotide repeat motifs(4)was the smallest. Among the mononucleotide repeats,the number of A was the largest(5167,48.09%);TG was the main dinucleotide repeat(692,19.55%);GAG was the main trinucleotide repeat(206,10.08%);AAAC,ATGG,ATGT,CTGT,CTTT and TCCA were the most frequent repeats with the same frequency of 3.64%. Pentanucleotide repeats and hexanucleotide repeats were evenly distributed, and no obvious dominant repeat motifs were observed. The number of repeat motifs(9864)was mainly distributed in 5-6 and 10-12 times,accounting for 52.27% of the total number of SSR.【Conclusion】The SSRs obtained by high-throughput sequencing have higher availability and polymorphism potential. On this basis,researchers can targetedly design primers and provide effective molecular markers for the studies of genetic diversity evaluation,genetic structure analysis and functional genes cloning of T. lepturus,and then provide genetic data for further protection and utilization of its germplasm resources

     

/

返回文章
返回