« Home « Kết quả tìm kiếm

Nanopore sequencing and full genome de novo assembly of human cytomegalovirus TB40/E reveals clonal diversity and structural variations


Tóm tắt Xem thử

- Nanopore sequencing and full genome de novo assembly of human cytomegalovirus TB40/E reveals clonal diversity and.
- Here we improved the reconstruction of HCMV full genomes by means of a hybrid, de novo genome-assembly bioinformatics pipeline upon data generated from the recently released MinION MkI B sequencer from Oxford Nanopore Technologies..
- Results: The MinION run of the HCMV (strain TB40/E) library resulted in ~ 47,000 reads from a single R9 flowcell and in ~ 100× average read depth across the virus genome.
- In the first stage of the bioinformatics algorithm, long contigs (N of lower accuracy were reconstructed.
- In the second stage, short contigs (N50 = 5686) of higher accuracy were assembled, while in the final stage the high quality contigs served as template for the correction of the longer contigs resulting in a high-accuracy, full genome assembly (N .
- The majority (98.8%) of the genomic features from the reference strain were accurately annotated on this full genome construct.
- Keywords: Human cytomegalovirus, Nanopore, MinION, de novo assembly, Recombination, Mutation, Variable number tandem repeats, Quasi-species.
- 1 Department of Zoology, University of Oxford, Oxford, United Kingdom Full list of author information is available at the end of the article.
- 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Human cytomegalovirus (HCMV) is a betaherpesvirus, with the largest known genome of all human herpesvi- ruses.
- Back in 1990 the first full genome sequence of the highly-passaged strain AD169 was published based on overlapping PCR amplified fragments, cloning and trad- itional Sanger sequencing [10].
- In the era of high throughput sequencing (HTS), dedicated library preparation protocols have been developed to enhance the full genome sequencing of HCMV, based on target enrichment [12], host DNA de- pletion and whole genome amplification [13] or on mul- tiple amplicon deep sequencing [5].
- Whilst the longest known read is 950 Kb, the accuracy of the first version of the sequencer did not exceed 72% [15].
- MinION has been used in combination with other se- quencing platforms that deliver shorter reads of higher accuracy, to improve the hybrid de novo assembly of genomic regions that are difficult to be resolved [16] and of Human Herpes Virus 1 (HHV1) genome [17]..
- In the present study we have both reconstructed the full genome of HCMV strain TB40E and captured quasi-species diversity of HCMV in culture.
- 90% of the cells were showing signs of cytopathic effect (cpe), supernatants of infected cultures were harvested every two days until cells were observed to lose their adher- ence to the plastic flask surface.
- The alignment of the reads was performed with LAST setting the align- ment mode in “local” (−T = 0) for mining reads and in.
- We selected this particular reference as it presented the highest similarity compared to our raw de novo assem- bled contigs.
- It is of high importance to note that the LAST output (.maf file) contains only the aligned part of each read to the reference, thus the extraction directly from the .bam files, making use of the samtools flags, would result in partial sequences..
- We performed the de novo assembly of the HCMV gen- ome with Smartdenovo (https://github.com/ruanjue/smart denovo) to generate extra-long contigs of relatively lower accuracy and with Spades [19] for the generation of highly accurate contigs but of shorter length, which were further merged using CAP3.
- Manual curation of misassemblies was per- formed by visual inspection after remapping the raw reads to the contigs and confirming the continuousness and the uniform depth of the alignment.
- We further cu- rated the final assembly sequence using Pilon [20] in two rounds of remapping of the reads to the final con- tigs.
- The annotation of the full genome con- struct was performed with RATT [23] based on the TB40E-Lisa reference strain..
- Using snpEff (v4.3 s) [25] the resulted .vcf files where annotated to the reference genome and SNPs were further filtered with snpSift (v4.3 s) [26].
- To calculate the mean coverage of the reads across the main de novo assembled genome, we used “bedtools coverage”.
- The coverage plots in comparison to the GC con- tent across the genome and the genomic synteny com- parisons were visualised using Artemis [28]..
- We estimated the Neighbor-Joining consensus tree after aligning 29 representative full genome sequences and the de novo assembled genome using MAFFT v7 (https://.
- Hybrid de novo assembly of HCMV genome using only MinION data.
- The lengths of the raw reads (45,965 in total) ranged up to 365,569 bp.
- The assembly was confirmed by remapping of the reads against it, resulting in an average read depth of 100.3 X.
- The visual inspection of the mapping align- ment indicated that the reads were continuous and interlaced confirming the delineated genomic synteny..
- The hybrid bioinformatics algorithm dramatically im- proved the assembly compared to the solo use of the Spades assembler.
- Moreover, our approach provided contigs of higher similarity to the ref- erence (>.
- The effect of the hybrid assembly algorithm on deciphering structural variability and non-canonical contigs of HCMV genomes.
- We closely examined the de novo assembled genome but also all the 36 alternative contigs, to identify structural.
- The low-complexity variable number tandem repeats (VNTRs) in the TRL and the IRL region showed as expected difference in the copy numbers both in the main construct as well as within alternative contigs when compared to the published sequence.
- 1 De novo assembled genome characteristics.
- a: De novo assembled TB40/E clone Nano aligned to TB40/E clone Lisa reference (top)..
- c: read-depth across the de-novo assembled construct (confirmative re-mapping of raw reads), red line represents the average depth (100.35).
- Table 1 Comparison of de novo genome assembly methods Hybrid Assembly Spades Assembly Assembly vs.
- 2 Genome-wide similarity comparison of the de novo assembled HCMV TB40/E genome (vertical) with nine representative HCMV strains (horizontal).
- sequence was variable in composition compared to the main construct.
- These alternative contigs were dispersed across the genome resulting in 3.85X duplication ratio (Table 1), but the phenomenon was more intense over the US –TRS region of the genome (Fig.
- Analyzing the substitution rate of the ami- noacids we observed a skew towards 3 particular changes (P➔L, R➔Q, and G➔R) (Fig.
- The de novo assembly of HCMV and other herpesviruses genomes, is challenging due to the increased length and its unique structure, which is characterized by extended, repeated, internal and terminal regions, but also by.
- Full genome de novo as- sembly of HCMV will be useful in understanding the full extent of the intra- and inter-host genomic variability but also the variability that results from selective pressure e.g..
- To date, the analysis of the HCMV genome has been only based on 2nd generation sequencing platforms that deliver short-read HTS lengths (reviewed in [32.
- As a re- sult, the assembly of the virus has been based either on solo mapping alignments [12] or on hybrid approaches, like the construction of the consensus genomic sequences from de novo assembled contigs supplemented with parts of the reference sequence to fill in the assembly gaps [13]..
- The MinION sequencer has already been used to improve the de novo assemblies of data generated by Illumina HiSeq platforms [16], while we have also shown that Min- ION can improve de novo assemblies of HHV-1 derived from the Roche 454 GS Junior sequencer [17].
- In this study, we developed a novel bioinformatics pipeline, in order to explore the potential of the MinION nanopore sequencer to de novo reconstruct the full HCMV genome, without using supplementary reads from other platforms..
- 3 Full genome phylogenetic analysis of the de novo assembled clone Nano (in red) and 29 representative strains.
- Indicatively, two of the three currently available TB40/E sequences do not share the same genomic struc- ture (Fig.
- Crucially, this was made feasible due to the implementation of our hybrid algorithm, which results in longer contigs, of high accuracy (Table 1) and provides a model method for the optimum usage of long-read data for challenging tasks as the de novo assembly of large and highly repetitive viral genomes.
- 4 De novo assembled clone Nano annotation.
- Apart from the assembling limitations characterizing other short-read platforms, the respective library prepar- ation protocols involve PCR amplifications, which have been shown to introduce artificial recombination events in the highly repetitive context of the HCMV genome [14]..
- recombinants and, given the extra long reads produced by the nanopore technology, provides a unique combination for the structural analysis of the HCMV genome.
- Conse- quently, we were able to accurately reconstruct not only the full genome of the virus, but also to capture overlapping contigs of alternative sequences, and, most importantly, contigs suggesting rearrangement events.
- These rearrange- ments have occurred between the major segments of the genome, with the repetitive sequences to serve as recom- bination hot-spots and were supported by long reads run- ning through the repetitive and expanding into the unique regions.
- 5 Structural variations of alternative contigs compared to the full genome construct.
- evidence of the existence of isomerized quasi-species’ ge- nomes in our cultures (Fig.
- The latter also disrupt the de novo assemblies due to inter-sample vari- ation and generation of conflicting contigs [29].
- In contrast, our con- firmatory mapping of the raw reads on the de novo assembled genome showed the opposite trend, that is an increased read depth across the repeated sequences, due to duplicated mapping of shorter reads, lacking unique sequence segments.
- The depth was more than double compared to the rest of the genome (a’-TRL: 372.7X, IRL-a-IRS: 246.4X, TRS-a’: 206.9X), which is in.
- accordance with the aneuploidy of the respective se- quences in the genome.
- At the same time, we observed multiple copy number variation of the same VNTRs in the alternative contigs of our assembly.
- Our results sup- port the hypothesis that the gaps in the mapping align- ments are not due to the GC content and are mainly driven by discordances of the sequenced sample with the reference and most probably with the VNTR copy numbers.
- Numerous studies provide evidence that the variations of the VNTRs are linked with the functionality and the pathogenicity of specific strains of viruses [39–.
- MinION can un- ambiguously resolve these loci, due to the increased length of its reads and can provide information regarding the clonal diversity of the polymorphic quasi-species present in the sample.
- Our results suggest that future studies fo- cusing on the resolution of the clinical and epidemio- logical aspects of virus VNTRs should make use of longer reads derived from 3rd generation sequencers like MinION..
- Comparing our sample with the TB40/E clone Lisa reference sequence, we found substitutions in the UL1, UL6, UL8, UL9, UL10, UL11 and UL147 genes, which are members of the RL11 (RL11–13, UL1 and UL4–11) and the CXCL (UL146, UL 147) gene families respectively.
- These findings agree with a recent, high-resolution study of the HCMV inter-host diversity, which revealed that the virus is more divergent than.
- Reference amino-acids are shown in the vertical axis.
- The synonymous mutations are distributed in the grey diagonal.
- Although the HCMV de novo assembly is challenging, our bioinformatics pipeline in combination with the in- creased accuracy of the latest versions of MinION allowed the complete assembly of the HCMV genome and re- vealed major genomic rearrangement events.
- In the case of clinical samples however, the viral DNA typically represents only a small fraction of the total genomic material.
- Additional enrich- ment strategies based on biotynilated baits might have to be employed in such cases, as they efficiently increase the proportion of viral reads and improve the assembly of the virus genome [54–56].
- HCMV: Human cytomegalovirus.
- Narayan Ramamurthy (Peter Medawar Building for pathogen Research - University of Oxford) for his help in the BSL3 laboratory..
- Manifestations of human cytomegalovirus infection: proposed mechanisms of acute and chronic disease.
- Human cytomegalovirus genome.
- Variability and recombination of clinical human cytomegalovirus strains from transplantation recipients.
- Human cytomegalovirus clinical isolates carry at least 19 genes not found in laboratory strains.
- Genetic content of wild-type human cytomegalovirus.
- Modification of human cytomegalovirus tropism through propagation in vitro is associated with changes in the viral genome.
- Cloning and sequencing of a highly productive, endotheliotropic virus strain derived from human cytomegalovirus TB40/E.
- Analysis of the protein-coding content of the sequence of human cytomegalovirus strain AD169.
- A review of genetic differences between limited and extensively passaged human cytomegalovirus strains.
- Sijmons S, Thys K, Corthout M, Van Damme E, Van Loock M, Bollen S, et al..
- A method enabling high-throughput sequencing of human cytomegalovirus complete genomes from clinical isolates.
- De novo assembly of human herpes virus type 1 (HHV-1) genome, Mining of non-Canonical Structures and Detection of novel drug-resistance mutations using short- and long-read next generation sequencing technologies.
- Genomic and functional characteristics of human cytomegalovirus revealed by next-generation sequencing.
- Rapid intrahost evolution of human cytomegalovirus is shaped by demography and positive selection.
- Structural variability of the herpes simplex virus 1 genome in vitro and in vivo.
- Attenuation of Mengo virus through genetic engineering of the 5 ′ noncoding poly(C) tract.
- Human cytomegalovirus (HCMV) short tandem repeats analysis in congenital infection.
- Characterization of human cytomegalovirus strains by analysis of short tandem repeat polymorphisms.
- High-throughput analysis of human cytomegalovirus genome diversity highlights the widespread occurrence of gene-disrupting mutations and pervasive recombination.
- NF- κ B- mediated activation of the chemokine CCL22 by the product of the human cytomegalovirus gene UL144 escapes regulation by viral IE86.
- Polymorphisms of the cytomegalovirus (CMV)-encoded tumor necrosis factor-alpha and beta-chemokine receptors in congenital CMV disease.
- Characterization of human cytomegalovirus UL145 and UL136 genes in low-passage clinical isolates from infected Chinese infants

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt