« Home « Kết quả tìm kiếm

Re-examination of two diatom reference genomes using long-read sequencing


Tóm tắt Xem thử

- Results: We have used Oxford Nanopore Technologies long-read sequencing to update and validate the quality and contiguity of the T.
- Full list of author information is available at the end of the article.
- The first diatom nuclear genomes to be sequenced were those of the radial centric diatom Thalassiosira pseudo- nana (CCMP1335) [3] and the pennate diatom Phaeodac- tylum tricornutum (CCMP632) [11].
- The original version of the T.
- Optical restriction site mapping further resolved 90% of the scaffolds to 24 chromosomes [3].
- Despite the relatively recent divergence of the Mediophyceae and Bacillariophyceae.
- of the P.
- Since the release of the T.
- Some of the more recently released diatom genomes (e.g., Cyclotella cryptica, Fragilariopsis cylindrus, and Seminavis robusta) were generated using long-read sequencing .
- The quality and contiguity of long-read derived genome assemblies can potentially be further improved using optical genome mapping and scaffolding, such as the Bionano Genomics platform [35] (for a detailed overview of the Bionano Genomics workflow, refer to the following resources: [36–39])..
- In this study, we used Oxford Nanopore long-read se- quencing to produce updated versions of the T.
- tri- cornutum, 76.8% of the unfiltered nanopore reads aligned to the reference genome with an average of 73.7% identity, while 76.6% of the unfiltered T.
- tricornutum included reads) of the original long-read dataset (986,604 reads).
- pseudonana dataset included only reads) of the initial read dataset (701,596)..
- De novo assemblies of the filtered datasets were pro- duced using two dedicated long-read assemblers, Canu [49] and Flye [50, 51].
- noticeably from low percent identify to the reference genome and poor gene completeness (see below)—both symptoms of the high per-base error rate of nanopore sequencing (Table 2, Table S2).
- Com- parison of the final polished Canu and Flye long-read assemblies to the previously published reference ge- nomes yielded average sequence identities of ~ 99% for both T.
- 79.0%) of the genes in both the Flye and ref- erence assemblies existing as complete single copies (Table 2, Table S2.
- tricornutum Canu assembly raised the possibility that the assembly algorithm either resolved both haplotypes of the diploid genome (see below) or revealed large seg- mental duplications that were collapsed in the reference assembly..
- The fragmented nature of the Canu assemblies for both diatom genomes is a consequence of the different way that the two assemblers handle allelic diversity and repetitive genomic content.
- tricornutum genome (see below) likely con- founded the Canu assembly algorithm and contributed to the fragmented nature of the final assembly.
- tricornutum genome, LTR-RT insertions often occur in just one of the haplotypes.
- 1 Genome completeness using single-copy orthologs (BUSCO eukaryota_odb9 database) was assessed for the Thalassiosira pseudonana (a) and Phaeodactylum tricornutum (b) reference genomes as well as the unpolished and polished versions of the Canu and Flye de novo assemblies for both diatom species.
- That said, the Canu assembly is likely closer to reality than the reference or Flye v2.3 assemblies because it captures more of the complexities intrinsic to the P.
- pseudonana, further exploration of the Canu assembly is needed in order to determine if its fragmented nature is the result of greater LTR-RT content than previously recognized in the refer- ence genome [3, 11].
- Our long-read assemblies resolved some of the un- answered questions posed by the T.
- None of the contigs in our Canu long-read assembly contained telomeres at both ends, although 58 of these contigs have a telomere at one end.
- Mapping our Canu contigs to the reference genome scaffolds indicated that 34 telo- meres on the reference scaffolds were also present on the ends of the homologous Canu contigs.
- Single telomeres were resolved for 25 of the remaining Flye contigs, which, when mapped to the reference scaf- folds, validated the resolution of the majority of the ‘sin- gle-telomere’ reference scaffolds.
- So, while our Flye assembly did not resolve chromosome-level contigs, it did map well to the more complete scaffolds of the ref- erence genome [3, 11].
- pseudonana Flye assembly and the ori- ginal reference, with insertions and tandem expansions contributing to 58% of the total size variation (Fig.
- they thus serve as a useful test of the potential for long-read se- quencing to improve genome assembly.
- To that end, the reference and polished long-read diatom genome assem- blies were assessed for copies of the complete rRNA op- eron (18S, ITS1, 5.8S, ITS2, 28S).
- The initial version of the reference assembly for T.
- however, the assembler that was used to generate the second version of the T.
- To assess if failure to resolve tandem rRNA arrays in the original ref- erence genomes was a symptom of the assembly process collapsing highly repetitive genomic regions, we mapped the raw sequence data produced by Bowler et al.
- All things considered, our long-read assemblies provide a more accurate picture of the ribosomal RNA operon organization in the T.
- pseudonana Flye long-read assembly to the complete set of proteins predicted for the reference resulted in the identification of 99.9% of the previously reported genes.
- Comparison of the newly assembled, RNA-seq-based T.
- Of the newly predicted genes were found to have high similarity (≥70%.
- Out of the remaining 2686 pre- dicted genes (16.
- pseudonana genes and not artefacts of the gene finding process.
- A blastp analysis against the NCBI protein database indicated that 1189 genes out of the 1862 newly predicted genes for T.
- 67 ‘new’ genes mapped to the “gap-resolved” re- gions (see above) of the T.
- While three of the 33 genes were identified as being transposons, the remaining 30 genes were previously unidentified in T.
- These numbers are based on comparison of the protein coding gene datasets for T..
- Out of the T..
- Bionano optical mapping of the Phaeodactylum tricornutum genome.
- Ploidy assessment of the de novo P.
- When combined with the 155 contigs that were too small to be anchored to the physical consensus maps (28.2% of the Canu assembly), the total genome size in- creased to 66.8 Mbp.
- The N50 of the Bionano-Canu hy- brid assembly was found to be 1.06 Mbp, representing a 4.2-fold increase when compared to the Canu assembly alone (Table 2.
- Our Bionano data resolved multiple super-scaffolds that are homologous to the same regions of the refer- ence chromosomes, supporting the separation of the Canu contigs and scaffolds into two copies.
- Resolution of the remaining 11 reference chro- mosomes as distinct haplotypes was even less straight- forward.
- [62], but due to the repetitive nature of the area where the Canu contigs are joined, we could not confidently join chromosomes 24 and 29..
- Assessment of the remaining eight hybrid-hybrid super-scaffolds was more complicated.
- Given that our Bionano data support the separation of the Canu contigs into two haplotypes, we attempted to confirm that the original reference genome is indeed the product of haplotype amalgamation.
- chromosomes are a mixture of the two haplotypes re- solved by Canu and supported by Bionano optical map- ping (Table S7)..
- To assess the possibility that it is the Canu contigs that are the amalgams and that the published reference chro- mosomes represent only one of the two haplotypes, Illu- mina short reads were mapped to the Canu haplotypes and reference chromosomes.
- Additionally, the low labeling density likely contributed to the generation of the hybrid-hybrid Bionano-Canu contigs..
- All things considered, while the Bionano data validate the separation of the Canu contigs into haplotypes, nei- ther Bionano nor nanopore sequencing (together or in isolation) was able to fully phase the P.
- A prominent feature of the P.
- These propor- tions represent more than a two-fold increase in repeat content relative to our reassessments of the published reference genomes [P.
- Trans- posable elements (TEs) comprised ~ 41% more of the T..
- [14], we detected a small proportion of the P.
- Whereas Ty1/copia-like LTR-RTs comprise 5.7% of the P.
- Table S8), they were classified as an even larger fraction of the Canu genome assembly at LTR-RTs, 8.18 Mbp.
- Out of the 84 LTR-RT insertions detected in our search, 73 (87%) were identified in the P.
- It is worth noting that further ana- lysis of the Canu contigs representing alternative haplo- types indicated that most CoDi insertions were located in only a single haplotype, which is consistent with the observations of Maumus et al.
- These analyses indi- cated that the vast majority of the novel loci uncovered for P.
- Out of the 8.18 Mbp of LTR-RT sequences estimated by RepeatMasker for the P.
- tricornutum, LTR-RTs which contribute a signifi- cant portion of the genome [3, 11].
- Our re- sequencing of the genomes of T.
- More specifically, we were able to provide a more robust and comprehen- sive perspective of the number and locations of.
- Characterization of the LTR-RT loci resolved per CoDi group for the reference genome (e) and Canu assembly (f).
- These results highlight one of the major benefits of long-read sequencing, i.e., the ability to resolve repeti- tive content even if it comes at the expense of con- tiguity.
- Long-read sequencing has the potential to give rise to highly contiguous scaffolds representing all or most of an organism’s chromosomes (see e.g., the recent Nanopore sequencing of the model nematode C.
- We view our long-read assemblies as additional genomic datasets that do not replace but complement and enhance the existing Sanger-based reference genomes – in isolation, neither provides a complete picture of the P.
- Subsets of the long- read datasets were created based on read length and read quality (−-mean-q-weight = 8) using the program Filtlong (v0.1.0) [70].
- tricornu- tum included selection of the highest quality reads ≥20 Kbp for ~100x coverage of the expected genome size of.
- pseudonana dataset included reads ≥30 Kbp for 100x coverage of the expected genome size of ~35Mbp (−-tar- get_bases .
- to compare different assemblies of the same genomic se- quence data to assess assembly accuracy.
- Gaps inserted in the original reference genomes for each diatom were assessed by manually inspecting MAFFT local alignments of the reference chromosomes and their homologous long-read derived contigs.
- Gaps were deter- mined as resolved if a long-read contig spanned the inserted gap as well as the nucleotide sequence flanking the start and stop coordinates of the gap..
- Further assessment of the protein se- quences assigned to orthologous groups by Broccoli was performed using PLAST (v .
- An in-house perl script was used to determine if any original reads spanned the boundaries of the LTR-RTs in the Canu genome..
- Assembly statistics for various polishing iterations of the de novo long-read derived genomes for Thalassiosira pseudonana and Phaeodactylum tricornutum.
- PloidyNGS plot of the frequency of the two most abundant alleles in the Phaeodactylum tricornutum genome indicates that it is a diploid organism.
- Bionano- Canu super-scaffolds that were identified as ‘ hybrid-hybrid ’ scaffolds most probably owing to errors of the Bionano mapping process.
- For each ex- ample (A-F), a schematic of the super-scaffold is annotated with its re- spective Canu contigs shown in purple blocks.
- Blastn results are reported for each of the Canu contigs resolved to the ‘ hybrid-hybrid ’ super-scaffold against the appropriate reference genome chromosome.
- Multiple sequence alignment showing a region of the P.
- That pattern is strongly suggestive that the reference is an amalgamation of the two haplotypes resolved by the Canu assembly.
- The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism.
- Evolution of the diatoms: V.
- Evolution of the diatoms: insights from fossil, biological and molecular data.
- Expression of the retrotransposons Surcouf and Blackbeard in the marine diatom Phaeodactylum tricornutum under thermal stress.
- Sequencing of the complete genome of an araphid pennate diatom Synedra acus subsp.
- Genome and methylome of the oleaginous diatom Cyclotella cryptica reveal genetic flexibility toward a high lipid phenotype.
- Evolutionary genomics of the cold-adapted diatom Fragilariopsis cylindrus..
- Genome properties of the diatom Phaeodactylum tricornutum

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt