« Home « Kết quả tìm kiếm

Adaptation of Oxford Nanopore technology for hepatitis C whole genome sequencing and identification of within-host viral variants


Tóm tắt Xem thử

- Adaptation of Oxford Nanopore technology for hepatitis C whole genome sequencing and identification of within-host viral.
- High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing.
- This limitation constrains viral genomic studies that depend on accurate identification of hemi-genome or whole genome, within-host variants, especially those occurring at low frequencies.
- ONT is particularly attractive in this regard due to the portable nature of the MinION sequencer, which makes real-time sequencing in remote and resource-limited locations possible.
- However, this technology (termed here ‘ nanopore sequencing.
- effectiveness of nanopore sequencing for HCV genomes.
- We also introduce a new bioinformatics tool (Nano-Q) to differentiate within-host variants from nanopore sequencing..
- Full list of author information is available at the end of the article.
- 1800 nt) mixed in known proportions, the capacity of nanopore sequencing to reliably identify variants with an abundance as low as 0.1% was demonstrated, provided the autologous reference sequence was available to identify the matching reads.
- Successful pooling and nanopore sequencing of 52 samples from patients with HCV infection demonstrated its cost effectiveness (AUD$ 43 per sample with nanopore sequencing versus $100 with paired-end short read technology).
- The pipeline also identified within-host viral variants and their abundance when the parameters were appropriately adjusted..
- Conclusion: Cost effective HCV whole genome sequencing and within-host variant identification without haplotype reconstruction are potential advantages of nanopore sequencing..
- These within-host viral variants evolve over time in response to host selection pressures either by generating escape mu- tations against natural host immunity, or drug-resistant variants in individuals treated with antiviral drugs.
- Improved understanding of the influ- ence of viral genomics on disease phenotypes requires a detailed examination of the mutational landscape of within-host variants in RNA viruses..
- Until a decade ago, it was largely impossible to charac- terise within-host viral variants.
- nt), but none of the first- or second-generation sequencing platforms can generate reads of full genome length.
- It is possible with NGS to estimate the distribution of within-host viral variants bioinformatically by perform- ing haplotype reconstruction, in which short reads that are likely to originate from the same variant are ‘ stitched together ’ and then extended to form an estimated viral variant [6, 7].
- These methods offer the first opportunity to sequence whole viral genomes as single reads, thereby potentially enab- ling detailed and reliable characterisation of within-host viral variants.
- Of the two commercial platforms, ONT has the added advantage of using a portable sequencer (MinION) that can be linked to a standard computer enabling real-time sequencing in the field or in remote locations without the need for a sophisticated.
- If optimized, this technology may solve the longest standing problem in RNA virus genomics, that is accurate and cost- effective sequencing of within-host viral variants.
- This paper describes an assessment of the utility of nanopore sequencing, in terms of coverage, accuracy and cost, for near full-length HCV genome sequencing using reverse transcribed cDNA amplicons as template..
- In addition, a novel bioinformatics pipeline was designed for identification of within-host viral variants using nanopore data..
- After the nanopore read coverage exceeded 300, the accuracy of the consensus did not im- prove further (beyond 98–99% similarity)..
- 1 The minimum number of nanopore reads required to generate an accurate consensus sequence.
- Nanopore sequencing can identify low frequency variants Two experiments were conducted to determine if low fre- quency variants could be detected.
- Two plasmids had inserts isolated from the same patient at different time points of the infection with a <.
- The number of pairwise mismatches between the reconstructed HCV sequence and the sequence of the original plasmid insert was on average 2.11 per 1000 nt (SD ± 2.41) across all inserts and mixes.
- The compari- son of relative frequencies between the input and the nanopore output (actual versus reconstruction from nanopore sequencing) from both experiments showed that nanopore sequencing accurately reproduced the.
- Nanopore sequencing is cost effective for high throughput HCV sequencing.
- The nanopore sequencing run pro- duced on average 5141 reads per sample (range with a total output of 1.27 million reads (6.82Gbp total yield) during a run time of 47 h.
- Nanopore sequencing was sig- nificantly cheaper with a per sample cost of AUD$ 43 in comparison to AUD$ 100 for Illumina sequencing (esti- mates based on reagent costs in May 2019 in Australia)..
- X axis- input plasmid frequency calculated as a % based on concentration, Y-axis output frequency calculated as the number of nanopore reads per HCV insert as a % of the total nanopore reads per mix.
- bioinformatic tool (Nano-Q) designed by the authors to separate within-host viral variants using nanopore se- quencing data.
- When a single subtype 1a reference se- quence was provided to the pipeline with all reads as the input (i.e without subject-specific de-multiplexing), the Nano-Q tool successfully selected all of the subtype 1a reads and accurately arranged them into accurate subject-specific clusters by comparing Hamming dis- tances using a hierarchical clustering approach.
- Each of the Illumina-generated consensus reads clustered with the respective nanopore- generated variants, and there was no mixing of variants between clusters.
- Differentiation of within-host viral variants.
- When demultiplexed, subject-specific sequences were used as the input to the Nano-Q tool using the recom- mended parameters (−ht: 400, −mc: 20, see Methods for details), a total of 1–22 (median: 6, IQR: 4–9) within- host variants were identified per subject across the 48 subjects (in 4 subjects, the eligible read number after cleaning step were too few for a meaningful interpret- ation).
- A sensitivity analysis was performed by varying several parameters of the pipe- line [e.g.
- 3 Accuracy of pooling multiple samples with PCR based barcoding for nanopore sequencing on the same flow cell.
- For samples with a high number of mismatches, either nanopore or Illumina sequence did not have an adequate coverage in some segments of the genome (adequate coverage was defined as >.
- Nanopore sequencing can be successfully and cost- effectively employed for full genome sequencing of HCV.
- Nano-Q was also able to identify within-host variants without an autologous reference sequence..
- Full genomes are not essential for the diagnosis of viral infections, but do offer substantial advantages for molecular epidemiological in- vestigations, including phylogenetics, as well as studies of within-host viral epistasis .
- Even for diag- nostic purposes, given the low cost and limited expertise required, nanopore sequencing may offer a cheap and af- fordable alternative.
- 4 Identification of within host variants with Nano-Q tool.
- The within host variants identified by Nano-Q tool are represented as brown squares while consensus sequences generated from Illumina sequences are represented by blue dots.
- In contrast, Illu- mina technology offers better quality alignments and ac- curacy in characterization of SNPs, but the short-read length is a barrier to reliable reconstruction of within- host viral variants (haplotypes).
- However, the technical error rate in base calling in nanopore sequencing is much higher when compared to paired-end short read technology (10% vs <.
- Experiments with plasmid mixes documented the abil- ity of nanopore sequencing to reproduce the original se- quences in correct proportions down to a frequency of occurrence as low as 0.1%, when the reference sequence identified the matching reads from the total pool.
- Nanopore sequencing is cost effective compared to other alternatives currently on the market and this mar- gin of cost-saving may improve as more samples are pooled.
- If the aim is consensus level viral sequence ana- lysis, then nanopore sequencing has comparable accur- acy to the current state-of-art Illumina sequencing (which also allows pooling of multiple samples with bar- coding).
- 5 Relationship between the number of low frequency variants (<.
- 5% abundance) and the number of input reads for the Nano-Q tool.
- However, if the aim is to identify the fre- quency of SNPs in an alignment of sequences, given the low quality score of individual base calls (on average Q7 with nanopore sequencing vs.
- Q30–40 with paired end short read technology), nanopore sequencing is currently not recommended..
- The ability to sequence whole RNA viral genomes with nanopore technology provides the exciting pro- spect of characterisation of within-host viral variants, without the need for haplotype reconstruction for the first time.
- Based on the er- rors observed between reads within an alignment per plasmid, the within alignment diversity due to errors of nanopore sequencing was estimated.
- This in turn was useful to calibrate the Hamming distance cut-off to identify between host and within host variants with the Nano-Q tool.
- For flavi- viruses, such as the dengue virus, the current version of the tool can be used without modifications (data not shown)..
- Nanopore sequencing generates HCV consensus se- quences with comparable accuracy to paired-end short read sequencing technology despite a higher error rate in base calling, if appropriately compensated by cover- age.
- Nanopore sequencing is more cost effective for high throughput sequencing than Illumina sequencing when compared under similar circumstances.
- Nanopore sequencing can dif- ferentiate variants at frequencies as low as 0.1% depend- ing on the total depth of coverage per sample.
- The Nano-Q tool reported here may be a useful alternative to identify full-genome length within-host variants with- out haplotype reconstruction..
- Sample preparation and nanopore sequencing.
- Nanopore sequencing was carried out according to the manufacturer’s protocols at the Kinghorn Centre for Clinical Genomics (a licenced ONT service provider), Garvan Institute of Medical Research in Sydney, Australia with an ONT MinION or GridION sequencer on a FLO-MIN107 v9.5 flow cell.
- In addition, these subjects had the full genome of the virus sequenced on the Illumina MiSeq.
- Sensitivity of nanopore sequencing to recover variants in a mix of sequences.
- Estimation of the sensitivity of nanopore sequencing to recognize low frequency variants in a relatively homogenous sequence mix was accomplished by two simulation experiments in which six HCV Envelope se- quence mixtures (E1E2, length: 1800 nt.
- as inserts of a plasmid which was subsequently cloned and extracted) of the same subtype (1a or 1b) were combined in varied proportions to generate 15 different sample mixtures..
- The pro- portions of each of the 6 plasmids in a mixture varied between 0.1–93% across the 15 mixes with a uniform representation across the spectrum of prevalence.
- All sample mixtures were nanopore sequenced with ligation barcoding, and the de-multiplexed read outputs were aligned against each of the six reference sequences.
- The read count for each alignment was considered as a proxy measure of prevalence of the variant in the mixture.
- single nanopore sequencing run on a GridION platform..
- The cost of nanopore sequencing (plus library prepar- ation and service charges) per patient sample were com- pared with that of Illumina sequencing (sequencing on a MiSeq platform with Nextera XT barcodes per sample)..
- Designing a novel bioinformatic pipeline to differentiate within-host variants.
- and finally, a hierarchical clustering algorithm was used to identify within-host variants.
- This fully automated pipeline has several user-defined variables which allows a conservative or a liberal approach to characterizing within host variants.
- A brief summary of the bio- informatic workflow is given below.
- Nano-Q tool.
- consensus sequence as a guide, Nano-Q identified and converted base mismatches of each nanopore read to that of the consensus sequence if the quality score was below a user defined cut-off.
- In the case of stop codons, the violating codon was replaced by that of the consensus.
- true within-host variants with less than 5% actual pairwise difference) a Hamming distance cut off of 80–96 allowed adequate resolution to separate these clusters.
- The number of reads within each.
- frequency of occurrence of the viral variant within the sample.
- The final output of the algorithm was a se- quence file in FASTA format containing all consensus sequences of variants generated from clusters, and the header of each sequence indicated the relative abun- dance of that variant as a fraction between 0 and 1..
- An example of the command line is shown below;.
- Any base mismatch below this threshold will be consid- ered an error and corrected to that of the consensus while those with a quality score above this value will be retained as a true SNP..
- We recommend using 400–480 to differentiate within host full-length HCV variants based on the in silico clonal experiments..
- If this number is lowered the number of clusters will increase and so would the es- timated within host variants.
- This is unlikely to have a sig- nificant impact on the abundance estimate of the major variant..
- An example of Nano-Q tool output (4 files in *.txt format): a) Reference sequence (HIT- P300157_Reference), b) Nano-Q tool progress report (HITP300157_Nano- Q_output), c) All variants generated (HITP300157_all_variants), d) Consen- sus of clusters of variants that met the user defined cut-off for a mini- mum cluster size and their relative frequencies – The final output of the tool (HITP300157_final_variants)..
- Within-host variants gener- ated using Nano-Q tool for subjects in Fig.
- CR, PL and NR analysed the data and wrote the first draft of the manuscript which was revised and approved by all authors..
- None of the funders had any influence of the content reported in this paper..
- The Illumina consensus sequences of the samples described in this manuscript have been previously uploaded to Genbank (Supplementary file 7) [12].
- The within-host variant sequences gen- erated by Nano-Q tool are provided in supplementary file 8..
- computational approaches for improving nanopore sequencing read accuracy.
- Mapping and phasing of structural variation in patient genomes using nanopore sequencing.
- Nanopore sequencing: Review of potential applications in functional genomics.
- Fast and sensitive mapping of nanopore sequencing reads with GraphMap

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt