« Home « Kết quả tìm kiếm

Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing


Tóm tắt Xem thử

- MaSuRCA was less tolerant of low-quality long reads than SPAdes and.
- Only one exception existed that the reference genome and hybrid assemblies of Pseudomonas aeruginosa with mediocre-quality long reads carried 241 virulence genes, whereas 184 virulence genes were identified in the hybrid assemblies of low-quality long reads.
- The MaSuRCA assemblies of Escherichia coli O157:H7 and Salmonella Typhimurium with mediocre-quality long reads contained 126 and 118 virulence genes, respectively, while 110 and 107 virulence genes were detected in their MaSuRCA assemblies of low-quality long reads, respectively.
- The pan genomes of the hybrid assemblies of S.
- Typhimurium with mediocre-quality long reads were similar to that of the reference genome, while SPAdes and Unicycler were more tolerant of low-quality long reads than MaSuRCA for the pan-genome analysis.
- Oxford Nanopore sequencing can generate long reads that span repetitive regions in bacterial genomes, thus resulting in less fragmented or even complete ge- nomes.
- Here, Oxford Nanopore long reads can scaffold contigs generated by Illumina short reads to disambiguate regions of the assembly graph that cannot be resolved by Illumina short reads alone, as implemented in assemblers such as MaSuRCA [8], SPAdes [9], and Unicycler [10]..
- It supports hybrid assembly with Illumina short reads and Oxford Nanopore long reads [8], which utilizes a.
- SPAdes constructs the de Bruijn assembly graph of k-mers from Illumina short reads, maps Oxford Nanopore long reads to the graph to close gaps using the consensus of long reads, and finally resolves repeats by incorporating long-read paths into the decision rule of ExSPAnder..
- The hybrid assembly pipeline of Unicycler produces an Illumina short-read assembly graph and then uses Oxford Nanopore long reads to build brid- ges, which often allows it to resolve all repeats in the gen- ome and produce a complete genome assembly..
- Meanwhile, by using a long-read simulator such as Badread [15], the quality of simulated reads can be artificially controlled to approximate Oxford Nanopore long reads of differing quality.
- Despite the advantages of simulated reads, they may sometimes be unrealistic because simulators are not able to model all relevant features of Oxford Nanopore long reads such as error profiles, read lengths, and quality scores.
- Simulated Illumina short reads and Oxford Nanopore long reads (both mediocre and low quality) of each.
- Re- garding the simulated long reads that contained artificial error profiles, genome completeness and accuracy re- flects the robustness of an assembler to tolerate a variety of read parameters [16].
- 0.05) to those of the reference genomes.
- All hybrid as- semblies of mediocre-quality long reads had the same (P >.
- Lower averages of the numbers of single nucleo- tide polymorphisms (SNPs) per 1 million bp of the reference genome were identified in the MaSuRCA (0.48) and Unicycler assemblies of mediocre-quality long reads (0.69) than the SPAdes assemblies of mediocre- quality long reads (1.76) (Additional file: Table S12)..
- The MaSuRCA, SPAdes, and Unicycler assemblies of mediocre-quality long reads had similar (P >.
- We also used simulated Oxford Nanopore long reads of low quality to examine if the hybrid assembly ap- proaches of MaSuRCA, SPAdes, and Unicycler could tol- erate more sequence errors (Table 2).
- Similar to mediocre-quality long reads, neither MaSuRCA, SPAdes, nor Unicycler managed to complete the genomes using low-quality long reads.
- However, all hybrid assembly ap- proaches produced more fragmented contigs using low- quality long reads than mediocre-quality long reads..
- coli O157:H7 Sakai and Clostridium botulinum CDC_1632 with low-quality long reads produced much more highly fragmented as- semblies, with 83 and 11 contigs, respectively.
- 0.05) than those of the MaSuRCA assemblies.
- The hybrid assemblies of low- quality long reads had similar GC contents to those of mediocre-quality long reads.
- 0.05) in GC content among the reference genomes, MaSuRCA, SPAdes, and Unicycler assemblies of low-quality long reads.
- Compared to the MaSuRCA as- semblies of mediocre-quality long reads, a noticeable de- crease in the average of complete BUSCOs (94.5%) was observed for those of low-quality long reads (Additional file: Table S10).
- There were increases in the averages of fragmented (2.1%) and missing BUSCOs (3.4%) of the MaSuRCA assemblies of low-quality long reads compared to those of mediocre-quality long reads.
- In contrast, the BUSCO profiles in the SPAdes and Unicycler assemblies of low-quality long reads remained the same as those of mediocre-quality long reads.
- The complete BUSCOs of the MaSuRCA assemblies of low-quality long reads were significantly lower (P <.
- Interestingly, compared to mediocre-quality long reads, even lower averages of the numbers of SNPs per one million bp of the reference genome were observed in the SPAdes (1.45) and Unicycler assemblies (0.32) of low-quality long reads (Additional file: Table S13), with no significant differences (P >.
- The SPAdes and Unicycler assemblies of low-quality long reads had similar averages of OrthoANIu values than those of mediocre-quality long reads, which were 99.96 and 99.98%, respectively (Add- itional file: Table S16).
- However, we found that the MaSuRCA assemblies of low-quality long reads had a.
- 0.05) than those of the SPAdes and Unicycler assemblies..
- The genome completeness and accuracy of an assem- bly given a set of real reads indicates the reliability to achieve a complete and accurate assembly, which in- corporate naturally occurring features of Oxford Nano- pore long reads [16].
- Compared to the reference genome, the Unicycler Table 2 Hybrid assemblies of bacterial strains with simulated Illumina short reads and low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler.
- Typhimurium LT2 with mediocre- and low-quality long reads did not contain IncFII (S), while IncFIB (S) and IncFII (S) were not identified in the MaSuRCA assembly of low-quality long reads.
- MaSuRCA was an outlier of assembling accurate ge- nomes using low-quality long reads compared to SPAdes and Unicycler.
- A combin- ation of Illumina short reads and Oxford Nanopore long reads can contribute to a better understanding of the.
- The higher error rates of Oxford Nanopore long reads could be compensated for by bioinformatic algorithms through hybrid assembly to acquire more ac- curate AMR profiling [21].
- [22] closed the complete genome of a multidrug-resistant Plesiomo- nas shigelloides strain, which was assembled with Illu- mina short reads and Oxford Nanopore long reads using MaSuRCA.
- [23] assembled Illumina short reads and Oxford Nanopore long reads using SPAdes.
- [20] used Unicycler to obtain the hy- brid assembly of Illumina short reads and Oxford Nano- pore long reads.
- The MaSuRCA, SPAdes, and Unicycler assemblies of mediocre- quality long reads provided consistent genotypes and predicted phenotypes with their corresponding reference genomes, indicating that they were all capable of acquiring hybrid assemblies that can be used for accurate predictions of AMR phenotypes.
- The MaSuRCA, SPAdes, and Unicy- cler assemblies of low-quality long reads also performed well, which showed congruent genotypes and predicted phenotypes with those of mediocre-quality long reads..
- While it is feasible to assemble Oxford Nanopore long reads alone into complete genomes [24], doing so would com- promise the genome accuracy of bacterial pathogens, which could lead to incorrect AMR profiling [21].
- Future improvements to library preparation, basecalling, and long-read-only assembly algorithms may mitigate this limitation, but until then both Illumina short reads and Oxford Nanopore long reads are needed to produce best assemblies of bacterial pathogens, as demonstrated in our study..
- pneumoniae strain using the SPAdes assembly of Illu- mina short reads and Oxford Nanopore long reads.
- pneumoniae strain based on the Unicycler assembly of Illumina short reads and Oxford Nanopore long reads.
- To our knowledge, the use of the MaSuRCA assemblies of Illumina short reads and Oxford Nanopore long reads to identify viru- lence genes of bacterial pathogens has not been reported..
- Concerning the iden- tification of virulence genes, all hybrid assembly approaches could tolerate a higher level of error in low- quality long reads.
- aeruginosa PAO1 with mediocre-quality long reads carried up to 241 virulence genes, which were consistent with the reference genome, whereas only 184 virulence genes were present in the hybrid assemblies of low-quality long reads.
- Typhimurium LT2 with mediocre-quality long reads.
- Therefore, the hybrid assembly approaches of MaSuRCA, SPAdes, and Unicy- cler enabled an accurate MLST based on Illumina short reads and Oxford Nanopore long reads, even in the case where low-quality long reads were used..
- simulated Illumina short reads and low-quality Oxford Nanopore long reads, as predicted based on their MaSuRCA, SPAdes, and Unicycler assemblies.
- The MaSuRCA, SPAdes, and Unicycler assemblies of P.
- coli O157:H7 Sakai with mediocre- quality long reads were on the same clade where those of low-quality long reads were located (Figs.
- The pan genomes of the MaSuRCA, SPAdes, and Unicy- cler assemblies of S.
- Typhimurium LT2 with mediocre- quality long reads were similar to that of the reference genome that had 8352 genes with 3783 core genes and 4569 accessory genes (Fig.
- The hybrid assembly ap- proaches of SPAdes and Unicycler tolerated a higher level of error in Oxford Nanopore long reads since the numbers of core and accessory genes of the pan ge- nomes of the SPAdes and Unicycler assemblies of low- quality long reads were similar to those of the reference genome (Fig.
- However, we observed a decrease in the number of core genes (3726) and an increase in the number of accessory genes (4769) in the pan genome of the MaSuRCA assembly of low-quality long reads com- pared to that of mediocre-quality long reads that had 3781 core genes and 4575 accessory genes.
- 1 Whole-genome phylogenetic tree of the hybrid assemblies of Pseudomonas aeruginosa PAO1 with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 P.
- The observed better performance of the SPAdes and Unicycler assem- blies could be due to superior hybrid assembly processes where Illumina short reads can ameliorate the shortcom- ings of Oxford Nanopore long reads with errors that introduce truncated genes [6].
- Our pan-genome analyses thus highlight the difficulty of MaSuRCA in using highly error-prone Oxford Nanopore long reads to produce ac- curate hybrid assemblies, which can lead to an imperfect representation of genome annotation.
- We found that high- error Oxford Nanopore long reads can be efficiently as- sembled in combination with Illumina short reads to produce assemblies using the hybrid assembly pipeline of Unicycler, bringing us one step closer to the objective.
- The result of the.
- 2 Whole-genome phylogenetic tree of the hybrid assemblies of Listeria monocytogenes CFSAN008100 with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 L..
- Instead, it relies on a submodule of Flye for the final assembly of corrected mega-reads produced using both longer super- reads of Illumina short reads and Oxford Nanopore long reads .
- These biases became especially pronounced when low-quality long reads with a higher level of error were used.
- For the hybrid assem- bly approach of SPAdes, the set of Oxford Nanopore long reads are collected spanning the same pair of sink and source edges of the Illumina short-read assembly graph and close the coverage gap using the consensus sequence of all these reads.
- We found that although the SPAdes assemblies performed similarly to the Unicycler assemblies for genomic analyses, they were highly frag- mented in all cases, which could be attributed to the fact that SPAdes does not assemble Oxford Nanopore long reads before gap closure.
- produces the Illumina short-read graph, Oxford Nano- pore long reads are then assembled with Miniasm, followed by multiple rounds of Racon polishing, for long-read bridging [9].
- [32] carried out a review to analyze state-of-the-art bioinformatic tools for Oxford Nanopore long reads in terms of accuracy, speed, memory efficiency, and scalability.
- 3 Core-genome phylogenetic tree of the hybrid assemblies of Escherichia coli O157:H7 Sakai with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 Shiga-toxin producing E.
- Illumina short-read assembly graph using Oxford Nano- pore long reads.
- pathogens was achieved by assembly algorithms that ini- tiated the hybrid assembly with high-quality Illumina short reads and filled the gaps with Oxford Nanopore long reads.
- While improved contiguity was associated with the assembly of Oxford Nanopore long reads in advance of gap closure, Unicycler implemented both ap- proaches and exhibited improved assemblies and gen- omic analyses, suggesting algorithmic approaches following that model may be most fruitful in the future..
- 4 Core-genome phylogenetic tree of the hybrid assemblies of Cronobacter sakazakii CFSAN068773 with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 C.
- Simulated Illumina short reads and Oxford Nanopore long reads.
- 5 Pan genomes of the hybrid assemblies of Salmonella Typhimurium LT2 with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads using MaSuRCA (mediocre-quality, a low-quality, d, SPAdes (mediocre-quality, b low-quality, e, and Unicycler (mediocre-quality, c low-quality, f) and 20 S.
- on the Nanopore error model to generate simulated Ox- ford Nanopore long reads of mediocre quality, defined as a read with a mean fragment size of 15,000 bp, frag- ment size standard deviation of 13,000 bp, mean identity of 85, max identity of 95, identity standard deviation of 5, and coverage of 50×.
- 6 Pan genomes of the hybrid assemblies of Campylobacter jejuni CFSAN032806 with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA (a), SPAdes (b), and Unicycler (c) and 20 C.
- Real Illumina short reads and Oxford Nanopore long reads.
- Real Illumina short reads and Oxford Nanopore long reads of 12 strains of 11 species of bacterial pathogens (12 strains) (Table 2), together covering a wide range of genome sizes and GC contents, were obtained from the Sequence Read Archive (SRA) of the NCBI (Additional file: Table S2).
- For strains with no available PacBio assemblies, PacBio long reads were assembled using the long-read assembly pipeline (normal mode) of Unicycler followed by three rounds of polishing with Illumina short reads using Pilon 1.23 [34]..
- Illumina short reads and Oxford Nanopore long reads of each strain were assembled using MaSuRCA 3.3.9, SPAdes 3.12.0, and Unicycler 0.4.8..
- Oxford Nanopore long reads were provided with the --nanopore option..
- An Illumina short- read assembly graph was first produced using SPAdes, and then Miniasm and Racon were applied to build brid- ges with Oxford Nanopore long reads.
- Bacterial strains with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads..
- Bacterial strains with real Illumina short reads and Oxford Nanopore long reads..
- Genome completeness of the hybrid assemblies of bacterial strains with simulated Illumina short reads and mediocre-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler compared to their corresponding reference genomes..
- Genome completeness of the hybrid assemblies of bacterial strains with simulated Illumina short reads and low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler compared to their corresponding reference genomes..
- Genome completeness of the hybrid assemblies of bacterial strains with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler compared to their corresponding reference genomes..
- Numbers of single nucleotide polymorphisms (SNPs) in the hybrid assemblies of bacterial strains with simulated Illumina short reads and mediocre-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler, as determined by aligning to their corresponding reference genomes and expressed as SNPs per 1 million bp of the reference genome..
- Numbers of single nucleotide polymorphisms (SNPs) in the hybrid assemblies of bacterial strains with simulated Illumina short reads and low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler, as determined by aligning to their corresponding reference genomes and expressed as SNPs per 1 million bp of the reference genome..
- Numbers of single nucleotide polymorphisms (SNPs) in the hybrid assemblies of bacterial strains with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler, as determined by aligning to their corresponding reference genomes and expressed as SNPs per 1 million bp of the reference genome..
- Average Nucleotide Identity (ANI) of the hybrid assemblies of bacterial strains with simulated Illumina short reads and mediocre-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler, as determined by aligning to their corresponding reference genomes and expressed as OrthoANIu values.
- Average Nucleotide Identity (ANI) of the hybrid assemblies of bacterial strains with simulated Illumina short reads and low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler, as determined by aligning to their corresponding refer- ence genomes and expressed as OrthoANIu values.
- Average Nucleotide Identity (ANI) of the hybrid assemblies of bacterial strains with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler, as.
- hybridSPAdes: an algorithm for hybrid assembly of short and long reads.
- Badread: simulation of error-prone long reads

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt