Abstract:
【Objective】Transcriptome sequencing of
Trichiurus lepturus muscle tissue was carried out based on highthroughput sequencing platform. Microsatellite(SSR)loci were searched from massive data and bioinformatics analysis was conducted to provide references for germplasm resources protection and management of
T. lepturus.【Method】The Illumina HiSeq
TM 2500 high-throughput sequencing platform was used to sequence the muscle transcriptome of
T. lepturus collected from the coastal waters of Zhoushan,Zhejiang Province. The quality of high throughput sequencing data was evaluated and filtered by software FastQC and Trimmomatic. Unigenes were obtained through assembly and redundancy removal using software Trinity. Micro-Satellite(MISA)tool was performed to explore the SSR information. Excel 2010 was used to calculate the number,occurring frequency,appearing frequency,distribution distance and density,repeat motif and repeat length of SSR.【Result】A total of 40424018 raw reads were generated from muscle transcriptome sequencing. About 70113 transcripts were obtained by de novo assembly using Trinity,and 50482 unigenes were retained after deduplication,with a total length of 33886190 bp and an average length of 671.25 bp. A total of 18873 SSR loci distributing among 13082 unigenes were screened by MISA,with the occurrence frequency of 25.91% and the occurrence frequency of 37.39%,respectively. SSR was classified according to the types of nucleotide repeats,which could be divided into six types:Mononucleotide repeat,dinucleotide repeat,trinucleotide repeat,tetranucleotide repeat,pentanucleotide repeat and hexanucleotide repeat. The number of mononucleotide repeat was the largest(10763),with the occurrence frequency up to 21.32%. A total of 173 repeat motifs were detected,in which the number of tetranucleotide repeat motifs (66)was the largest and the number of mononucleotide repeat motifs(4)was the smallest. Among the mononucleotide repeats,the number of A was the largest(5167,48.09%);TG was the main dinucleotide repeat(692,19.55%);GAG was the main trinucleotide repeat(206,10.08%);AAAC,ATGG,ATGT,CTGT,CTTT and TCCA were the most frequent repeats with the same frequency of 3.64%. Pentanucleotide repeats and hexanucleotide repeats were evenly distributed, and no obvious dominant repeat motifs were observed. The number of repeat motifs(9864)was mainly distributed in 5-6 and 10-12 times,accounting for 52.27% of the total number of SSR.【Conclusion】The SSRs obtained by high-throughput sequencing have higher availability and polymorphism potential. On this basis,researchers can targetedly design primers and provide effective molecular markers for the studies of genetic diversity evaluation,genetic structure analysis and functional genes cloning of
T. lepturus,and then provide genetic data for further protection and utilization of its germplasm resources