SSR and SNP characteristics analysis in calyx of Clematis florida Thunb. based on transcriptome sequencing
-
Abstract
Objective This study aimed to identify and characterize simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) loci from the transcriptome sequencing data of calyx of Clematis florida Thunb. The goal was to develop related molecular markers,so as to provide a theoretical basis for calyx of Clematis florida Thunb. germplasm resource conservation,marker-assisted breeding,and population genetics research.Method Samples of calyx were collected from three Clematis florida Thunb. varieties ‘Henryi’,‘Polish spirit’,and ‘Mme Julia Correvon’. The high-quality sequencing data was assembled using Trinity 2.8.5. The sequencing data was organized, and the SSR and SNP characteristics of the transcriptome were analyzed using MISA and GATK4 software.Result A total of 351006616 raw reads were obtained from the calyx transcriptome data ofClematis florida Thunb. After filtering,350133730 clean reads were retained. After assembly and redundancy removal,84765 unigenes were generated,from which 7570 SSR loci were identified,with an average density of one SSR per 9.55 kb. The SSR loci in unigene sequences belonged to 6 main repeat types,dominated by dinucleotide and trinucleotide repeats,accounting for 48.68% and 32.91% of the total SSR loci,respectively. These 6 repeat types included 269 distinct repeat motifs. Among dinucleotide repeats,CT/TC was the most prevalent (38.42%),while among trinucleotide repeats,AGA/TCT was dominant (9.35%). The number of repeat motifs in SSR loci decreased with increasing repeat counts,ranging from 4 to 37 repeats. Loci with 4-14 repeats were the most concentrated,accounting for 88.48% of total SSRs. The length of SSR loci ranged from 12 to 294 bp,with an average length of 25.79 bp,and with most of loci ranging from 12 to 30 bp. Based on the sequences of 7570 SSR loci,6037 primer pairs were designed,with a success rate of 79.75%. A random selection of 14 primers showed an amplification rate of 71.43%. For SNP loci,784437-828387 SNPs were identified from the transcriptome data of 9 calyx samples of Clematis florida Thunb. Transition types (C/T,G/A,T/C) accounted for 63.33% of total SNPs,while transversion types (A/G,A/T,T/A,C/A,G/T,A/C,T/G,C/G,G/C) accounted for 36.67%.Conclusion The calyx transcriptome of Clematis florida Thunb. contains abundant SSR loci with wide distribution and moderate density. Significant variations were observed in SSR length and repeat counts. The dominant repeat motifs of SSRs (A/T,CT/TC,and AG/GA) may be associated with codon usage bias. Both SSRs and SNPs exhibit high polymorphism, making them suitable for genetic diversity studies of Clematis calyx.
-
-