« Home « Kết quả tìm kiếm

Microsatellite development from genome skimming and transcriptome sequencing: Comparison of strategies and lessons from frog species


Tóm tắt Xem thử

- Background: Even though microsatellite loci frequently have been isolated using recently developed next-generation sequencing (NGS) techniques, this task is still difficult because of the subsequent polymorphism screening requires a substantial amount of time.
- Results: The results revealed that the number of isolated microsatellites increases with increased data quantity and read length.
- Larger k-mer sizes produced fewer total number of microsatellite loci, but these loci had a longer repeat length, suggesting greater polymorphism.
- Indeed, hundreds of microsatellite loci were isolated without any screening steps using this high-throughput sequencing technology [7, 8].
- The popularity of using NGS not only varies by platforms but also by the sources of the sequences.
- Full list of author information is available at the end of the article.
- A well-known relationship exists between microsat- ellite length and polymorphism level, due to mutation rate increases with an increasing number of repeat units in an appropriate range [1, 26].
- more repeat units generally show higher mutation rates because replication slippage may in- crease in proportion to the number of repeats [1, 27].
- Getting a high number of repeats may help improve the polymorphism of microsatellites and their allelic richness, especially import- ant for species with low genetic variability..
- Genome skimming is by far one of the simplest methodologies for NGS, involving random sampling of a small percentage of total genomic DNA, which can obtain sequences containing plenty of microsatellites [35].
- However, such comprehen- sive evaluations are only fragmentarily reported in a limited number of species [11, 17].
- A deeper understanding of the isolation of microsatellites from NGS is therefore needed, not only to understand evolutionary and mutational properties of microsatellites but also to appropriately use microsatellite markers in ecological and evolutionary studies..
- Additionally, we investigated how the quantity or sequencing depth of NGS data, read length, and assem- bly strategy affect the number and polymorphism of the isolated microsatellites..
- De novo assembly of the cleaned reads for genome se- quencing for each dataset was performed using SOAPde- novo2 [44] and Trinity [45].
- For each assem- bly, we monitored the change in total number of contigs and N50 size over the assessed parameter range.
- For tran- scriptome sequencing, Trinity and SOAPdenovo2 (8 k-mer sizes between 25 and 60, with a step size of 5) were used for the assembly of the cleaned reads..
- The identification of microsatellite loci from the assembly data was performed with QDD3 v3.1.2 [46]..
- To test the polymorphism and stability of the detected SSRs, we chose Amolops mantzorum and Quasipaa boulengeri for PCR amplification.
- Eight samples for each species were used for preliminary test- ing of the primer amplification success rate.
- The total number of contigs and the average length of contigs are shown in Fig.
- The smaller k-mer sizes led to the detection of more microsatellite loci with flanking sequences that could be used for primer design (Fig.
- All the assembly strategies produced sufficient numbers of microsatellite loci for further analysis, even the lowest number produced by Trinity was greater than 1000.
- From the results of MISA, the number of SSRs decreased with increasing k-mer sizes, while the maximum RRL increased with increasing k-mer sizes (Additional file 4: Table S4).
- Take Amolops chunganensis for example, the largest number of SSRs is when K and the smallest number of SSRs is when K .
- Notably, the numbers of identified SSRs were different between the two softwares, since MISA includes all the Table 1 Overview of the sequencing data from Illumina next- generation sequencing for each species of frog.
- Amolops mantzorum.
- number of identified SSRs, while QDD only counts the number of identified SSRs with primers..
- For RNA-Seq, Trinity is better than SOAPdenovo to obtain average length of contigs and total number of microsatellite loci at any k-mer size (Additional file 5:.
- The total number of contigs for RNA-Seq was much smaller than that obtained for genomic sequencing, but the average length of contigs was longer with RNA-Seq for both species.
- Both the total number of microsatellite loci and the maximum RRL in RNA-Seq were obviously lower than that obtained from genomic sequencing (p <.
- a Total number of contigs.
- c Total number of microsatellites isolated using each assembler.
- d Maximum repeat region length of microsatellites for each assembler.
- Table 2 Assembly and microsatellite loci detection statistics for transcriptome sequencing using Trinity.
- Amolops mantzorum Quasipaa boulengeri Total number of contigs .
- Total number of SSRs 958 1554.
- The total number of identified microsatellite loci was reduced by the decline in the quantity of data.
- As expected, the number of microsatellite loci was associ- ated with the size of dataset.
- For instance, a twofold increase in data quantity, such as from half to full in Amolops mantzorum, resulted in a nearly twofold increase in the number of microsatellite loci detected..
- Furthermore, the number of SSRs for each type of nucleotide repeat also followed this tendency.
- In other words, the smaller the quantity of NGS data, the fewer the microsatellite loci identified.
- The relation of the k-mer sizes to the data quantity reflected that the total number of contigs and microsatellites decreased with increasing k-mer sizes (Fig.
- To test how the length of the NGS data affects the final quality and the number of SSR loci, we selected the gen- ome sequencing data from Amolops chunganensis for length simulation.
- The shorter the read length, the fewer microsatellite loci were identified.
- Interest- ingly, the number of microsatellite loci identified was.
- nearly equal to the read lengths of 125 and 150, implying that for short sequence read assembly, when read length was not increased dramatically, the read length could not directly influence the number of microsatellite loci isolated.
- With increasing k-mer sizes, the total number of contigs and microsatellites would decreases (Fig.
- The number of alleles per locus ranged from 2 to 21 with an average of 9.
- The number of alleles ranged from 2 to 23, with an average of 7.23..
- Microsatellite loci .
- Both the number of alleles (Na) and the observed het- erozygosity (Ho) were affected by the sources of omics data (Fig.
- The number of repeats positively influenced the Na for genomic data (for A.
- However, the number of repeats and Na were not significantly correlated in transcriptomic SSRs (for A.
- 2 Relation of the k-mer sizes to the data quantity (a-d) and to the read lengths (e, f) from genomic datasets.
- a Total number of contigs from data quantity simulation of Amolops mantzorum.
- b Total number of identified microsatellites from data quantity simulation of A.
- c Total number of contigs from data quantity simulation of Quasipaa boulengeri.
- d Total number of identified microsatellites from data quantity simulation of Q.
- e Total number of contigs from read length simulation of A.
- f Total number of identified microsatellites from read length simulation of A.
- a The shorter the read length, the fewer the microsatellite loci identified.
- b The number of dinucleotides, trinucleotides and tetranucleotides identified using different read lengths.
- 4 Comparison of the polymorphism between transcriptomic and genomic microsatellite loci within Amolops mantzorum and Quasipaa boulengeri.
- a The number of alleles (Na).
- Although the abun- dance of microsatellites differs between taxa and even closely related species, millions of microsatellite loci have been detected in human and mouse genomes [1]..
- The results from our data quantity simulation revealed that the number of isolated microsatellites increases following an increase in the data quantity when the data quantity is smaller than the entire genome (Table 3, Fig.
- For other species, the coverage could be esti- mated to 1–2X according to the C-value of the closely related species for these frogs (C-value: 2–7).
- The number of isolated microsatellites don’t increase follow- ing an increase in the data quantity between species..
- Read length is considered to be one of the most important factors for microsatellite development using NGS [12, 17].
- MiSeq has the longest read lengths (2 × 300 bp) of the Illumina platforms, but its output consists of relatively few reads.
- 5 Relation of the number of repeat units to the polymorphism for genomic and transcriptomic microsatellite loci in two frog species.
- a Amolops mantzorum.
- b Quasipaa boulengeri.
- This indicates that microsatellite development from short sequence read assembly would also be influenced by both the length of the flanking re- gions and the repeat length of microsatellites..
- However, the choice of an appropriate k-mer size is one of the major difficulties in using DBG assemblers [39].
- The difference of the yields of microsatel- lites can be attributed to the low coverage of genome skimming and the properties of DBG.
- For Trinity, because of eliminating both low-complexity and singleton k-mers as initial seeds for contig extensions in Inchworm’s step, it causes the low number of contigs and microsatellites [71].
- For transcriptome sequencing, Trinity is better than SOAPdenovo at any k-mer size with regard to average length of contigs and total number of microsatellite loci (Additional file 5: Table S5).
- Whatever the methods used, finding polymorphic microsatellite loci is the critical issue.
- The mutation process seems to be heterogeneous with respect to loci, repeat types and organ- isms, but it is widely accepted that microsatellites with a greater number of repeats are more mutable .
- Al- leles with a higher number of repeats often mutate at a higher rate, and the relationship between length and rate has been reported to be exponential rather than linear [23]..
- The results from genomic SSRs revealed that the number of repeats or repeat lengths positively influenced the Na (p <.
- Transcriptomic SSRs are expected to display lower levels of polymorphism than genomic SSRs, as they are associ- ated with conserved regions of the genome [19, 75].
- Because of the functional constraint on tran- scriptome SSRs, it is more easily to find functional linked loci from transcriptome than genome.
- Overall, these results forge a framework for our deep understanding of the evolution and distribution of microsatellites and how different isolation strategies affect microsatellite development using NGS..
- By using genome skimming and transcriptomic sequencing of non-model frog species, we investigated how the assembly strategy, read length, sequencing depth, and library layout affect the number and polymorphism of the isolated microsa- tellites.
- These results forge a framework for our deep understanding of the evolution and isolation strategies of microsatellites: 1) the number of isolated microsatel- lites increases with increased data quantity and read length.
- Characteristics of the microsatellite markers for Amolops mantzorum and Quasipaa boulengeri.
- Number of identified SSRs by MISA from genomic data of Amolops chunganensis and Amolops mantzorum.
- Na: Number of alleles.
- Rise of the machines – recommendations for ecologists when using next generation sequencing for microsatellite development.
- Microsatellite marker development by multiplex ion torrent PGM sequencing: a case study of the endangered Odorrana narina complex of frogs.
- estimation of success rates of microsatellite loci development in selected newt species (Calotriton asper, Lissotriton helveticus, and Triturus cristatus) and comparison with Illumina-based approaches.
- Mutational dynamics of microsatellites.
- Efficient isolation of polymorphic microsatellites from high-throughput sequence data based on number of repeats.
- Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci.
- genepop ’ 007: a complete re-implementation of the genepop software for windows and Linux.
- Isolation and characterization of eleven polymorphic tetranucleotide microsatellite loci for Quasipaa boulengeri (Anura: Dicroglossidae).
- Isolation of highly polymorphic microsatellite loci for a species with a large genome size:.
- iMSAT: a novel approach to the development of microsatellite loci using barcoded Illumina libraries

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt