« Home « Kết quả tìm kiếm

Gen2Epi: An automated whole-genome sequencing pipeline for linking full genomes to antimicrobial susceptibility and molecular epidemiological data in Neisseria gonorrhoeae


Tóm tắt Xem thử

- genomes to antimicrobial susceptibility and molecular epidemiological data in Neisseria gonorrhoeae.
- Background: Recent adva1nces in whole genome sequencing (WGS) based technologies have facilitated multi-step applications for predicting antimicrobial resistance (AMR) and investigating the molecular epidemiology of Neisseria gonorrhoeae.
- Results: We present Gen2Epi, a pipeline that assembles short reads into full scaffolds and automatically assigns molecular epidemiological and AMR information to the assembled genomes.
- The median genome coverage of full-length scaffolds and “ N ” statistics (N50, NG50, and NGA50) were higher than, or comparable to, previously published results and the scaffolding process improved the quality of the draft genome assemblies.
- Keywords: Bioinformatics, Whole-genome sequencing (WGS), De novo genome assembly, Scaffolding, Molecular epidemiology, Strain typing, Antimicrobial resistance, Molecular typing, Neisseria gonorrhoeae.
- [email protected].
- Full list of author information is available at the end of the article.
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- A variety of bacterial strain typing facilities have been made available to the community.
- The web-based components of the pubMLST database [6] were implemented using MLST software [7].
- Inouye et al.
- Martin et al.
- A disadvantage of this approach is the possibility of missing a gene during gene identification or making an error in the number of predicted genes due to gene frag- mentation over multiple contigs.
- Despite the presence of scaffolding tools, none of the previously published WGS studies assembled contigs into full genomes, except in one study where authors used PacBio long reads to assemble N.
- gonorrhoeae World Health Organization (WHO) reference strains [17], and a second study re- cently published by Harris et al.
- Another commonly used method in the Neisseria community is the Nullarbor pipeline, which performs complete analysis, from read cleaning to variant calling [29].
- Six samples were excluded from the 1054 WGS samples in the original EuroGASP study (see Additional file 1: Table S1 for more information).
- For instance, available WHO reference genomes were used in the case of WHO datasets [17], N.
- The WHO sequencing read datasets along with reference genomes, Ng plasmid sequences, AMR gene sequences, allelic sequences with their profiles and metadata for NG-MLST and NG-STAR along with lo- cally created databases of Ng plasmid sequences (Additional file 1: Table S2) and MLST allelic se- quences are provided in the Gen2Epi VirtualBox.
- The architecture of the pipeline has been implemented in five main modules (Fig.
- Each sequen- tial step of the pipeline are automatic, linked with each other and described below..
- Step 2: De novo assembly of chromosome and plasmid In the second stage, the output from step one is used to generate contigs using separate de novo assemblies of chromosomes and plasmids.
- In addition, this step also uti- lizes the “stats.sh” function from BBMap [40] to generate a contig assembly statistics report of the assembled con- tigs.
- The quality of the assembled scaffolds is then assessed by comparison with the respective reference genome for.
- 1 A flowchart outlining the design of the Gen2Epi pipeline with integrated third-party software.
- The completeness of the assembled genomes can also be assessed via optional manual examination of the mul- tiple alignments of scaffolds with the respective N.
- The detailed results of the full genome assembly evaluation in terms of measures such as median misas- semblies, alignment length, and different “N” statistics (N50, NA50, NG50, and NGA50 [42]) are shown in Table 1.
- To compare the completeness of the genomes gener- ated by Gen2Epi with published results, optimally fully assembled genomes from these previously published studies should be used .
- In the case of Saskatchewan and New Zealand isolates [30, 31], the assemblies were terminated at the level of contigs.
- The “N” statistics values reported for Gen2Epi in the case of the New Zealand samples were higher than previously pub- lished values (N50: 40970, NG50: 40198, and NGA50:.
- Moreover, the median genome coverage by Gen2Epi in the case of the New Zealand isolates was 92.3%, lower than the previously pub- lished result .
- The probable reason behind this result is that Lee et al.
- Even though full genomes generated by the proposed pipeline were accurate and overcame the limitation of gene fragmentation over multiple contigs, there were still some irregularities (see Additional file 1: Table S4C where missing AMR genes are represented with “NA”) that can be seen in the results.
- A characteristic of the pipeline is that it filters out assembled contigs having low-quality bases and contamination during the scaffold- ing process, occasionally resulting in important AMR genes (such as 23S rRNA) being missing from the mo- lecular marker identification and AMR analysis step..
- This occurred in the case of the irregularities noted in Table S4C (Additional file 1: Table S4C), supporting the notion that the accuracy and completeness of the final assembled genome is highly dependent upon the quality of the input dataset.
- Furthermore, we could not compare these irregularities with the previous work as, except for the Saskatchewan isolates [30, 31], none of the published studies used in this analysis performed NG-STAR typing..
- Trimming is one of the most effective methods to re- move the poor quality bases at the end of the reads from input datasets as error introduced during sequencing cy- cles by the Illumina sequencing platform often leads to degradation in base quality [46].
- The effects of trimming were most apparent in the current analysis when contigs generated from trimmed reads were compared with those generated from raw reads, as in the case of the.
- In the case of EuroGASP 2013 isolates, dif- ferences were observed in three samples (ERR147119, ERR1560830, and ERR1469562) where STs were differ- ent from those previously reported by the authors (see the next section for further explanation)..
- To assess the accuracy of in silico molecular epidemio- logical markers and AMR determinants generated by the Gen2Epi pipeline, we first analyzed the strain typing re- sults from published studies that have used one or all of the three typing schemes (Table 2).
- Gen2Epi not Table 1 Evaluation of the genome assemblies.
- Six samples were excluded from the analysis due to the lack of SRA numbers.
- α N50 is defined as the length of the shortest contig at 50% of the total assembly length.
- NA50 and NGA50 are similar to N50 and NG50 except it is based on the alignment of the contigs against a reference genome.
- In the case of the Saskatchewan isolates, the NG-STAR results for 12 isolates are 100% identical to the previously published results.
- information for the remaining 15 isolates is not given in the original publica- tions [30, 31].
- “multiple” ST in the previous study [18].
- In addition, we observed the presence of “ NA ” occasionally in the NG-STAR out- put (Additional file 1: Table S4C) due to the filtering of AMR genes with low-quality bases during the scaf- folding process.
- The pipeline was validated by one of the authors (Demczuk) at a separate site (National Microbiology Laboratory, Public Health Agency, Winnipeg) by analyzing the WHO N..
- Tools such as Gen2Epi, Sanger pipeline, Nullarbor, SRST, and NGMASTER in the first group have a command-line interface.
- In the second group, tools such as Patric, pubMLST, NG-MAST, Pathogenwatch, and NG-STAR are web-based applica- tions that are convenient to use but may be frustrating or impractical when users have to deal with a large dataset, especially in cases of manual retrieval of the final outcomes.
- Gen2Epi NG-MAST .
- In the case of Saskatchewan isolates, 2 NG-MAST STs and 9 NG-MLST STs were not previously published.
- The only ST out of 103 NG-MLST sequence types that Gen2Epi could not identify was 1925 because the corresponding sample was not included in the present study.
- In contrast, all of the three strain typing tools (pubMLST, NG-MAST, and NG-MLST) require AMR genes identified from the assembled contigs in FASTA format.
- Functions for variant calling, mismatched read elimin- ation, novel AMR determinant prediction, assembly im- provement using long reads, and scripts that can take advantage of API (Application Programming Interface) facilities implemented in pubMLST, will be included in the next version.
- The first two steps (Data Cleaning, De novo assembly of Chromosome and Plasmid, and Plasmid-type Identifi- cation) of the current Gen2Epi version are universal for the analysis of other pathogenic bacteria such as Chla- mydia trachomatis, N.
- However, functions for the complete WGS analysis (including novel scaffolding and typing scripts) of other bacteria will be implemented in the future GUI version of Gen2Epi..
- We have developed a novel WGS pipeline named Gen2- Epi to assemble Illumina short reads to full genomes and assign strain typing (NG-MAST and NG-MLST) and AMR determinant information (NG-STAR) automatically to the assembled genomes.
- The accuracy of the pipeline was validated by testing it on 1484 N.
- www.cs.usask.ca/pub/combi.
- With this virtual machine configuration on a computer with a 3.6GHz CPU, approximately 12 h of elapsed time are required to complete the full analysis of the test data..
- Genome alignment of the Gen2Epi- produced WHO G scaffold against the corresponding Neisseria gonorrhoeae reference genome using Mauve.
- The extent of the colored bar indicates the strong similarity between the scaffold and the reference genome.
- Ng: Neisseria gonorrhoeae.
- NG-MAST: N..
- NG-STAR: Neisseria gonorrhoeae Sequence Typing for Antimicrobial Resistance.
- The funding bodies had no role in the design or execution of the study..
- WD provided the WHO reference strains and Saskatchewan isolates, and evaluated the pipeline and provided valuable feedback that has been implemented in the current version of Gen2Epi.
- Genomic sequencing of Neisseria gonorrhoeae to respond to the urgent threat of antimicrobial-resistant gonorrhea.
- Harrison OB, Schoen C, Retchless AC, Wang X, Jolley KA, Bray JE, et al..
- Demczuk W, Sidhu S, Unemo M, Whiley DM, Allen VG, Dillon JR, et al..
- Neisseria gonorrhoeae sequence typing for antimicrobial resistance, a novel antimicrobial resistance multilocus typing scheme for tracking global dissemination of N.
- Kwong JC, Gonçalves da Silva A, Dyet K, Williamson DA, Stinear TP, Howden BP, et al.
- NGMASTER: in silico multi-antigen sequence typing for Neisseria gonorrhoeae.
- Lee RS, Seemann T, Heffernan H, Kwong JC, Gonçalves da Silva A, Carter GP, et al.
- Genomic epidemiology and antimicrobial resistance of Neisseria gonorrhoeae in New Zealand.
- Genomic epidemiology and population structure of Neisseria gonorrhoeae from remote highly endemic Western Australian populations.
- Vidovic S, Caron C, Taheri A, Thakur SD, Read TD, Kusalik A, et al.
- Using crude whole-genome assemblies of Neisseria gonorrhoeae as a platform for strain analysis: clonal spread of gonorrhea infection in Saskatchewan, Canada.
- Unemo M, Golparian D, Sánchez-Busó L, Grad Y, Jacobsson S, Ohnishi M, et al.
- The novel 2016 WHO Neisseria gonorrhoeae reference strains for global quality assurance of laboratory investigations: phenotypic, genetic and reference genome characterization.
- Harris SR, Cole MJ, Spiteri G, Sánchez-Busó L, Golparian D, Jacobsson S, et al.
- Public health surveillance of multidrug-resistant clones of Neisseria gonorrhoeae in Europe: a genomic survey.
- Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al..
- Page AJ, De Silva N, Hunt M, Quail MA, Parkhill J, Harris SR, et al.
- Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, et al..
- Ezewudo MN, Joseph SJ, Castillo-Ramirez S, Dean D, Del Rio C, Didelot X, et al.
- Population structure of Neisseria gonorrhoeae based on whole genome data and its relationship with antibiotic resistance.
- Demczuk W, Lynch T, Martin I, Van Domselaar G, Graham M, Bharat A, et al.
- Whole-genome Phylogenomic heterogeneity of Neisseria gonorrhoeae isolates with decreased cephalosporin susceptibility collected in Canada between 1989 and 2013.
- Demczuk W, Martin I, Peterson S, Bharat A, Van Domselaar G, Graham M, et al.
- Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A, Cheng Y, et al..
- Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, et al.
- Database resources of the National Center for biotechnology information.
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al..
- Fuller CW, Middendorf LR, Benner SA, Church GM, Harris T, Huang X, et al..
- Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al..
- Harrison OB, Skett J, McLean J, Trees D, Sunkavalli A, Lourenço AP, et al..
- Using the Neisseria gonorrhoeae core genome to examine gonococcal populations

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt