« Home « Kết quả tìm kiếm

DNAscent v2: Detecting replication forks in nanopore sequencing data with deep learning


Tóm tắt Xem thử

- DNAscent v2: detecting replication forks in nanopore sequencing data with deep learning.
- In recent years, the detection of base analogues in Oxford Nanopore Technologies (ONT) sequencing reads has become a promising new method to supersede existing single-molecule methods such as DNA fibre analysis: ONT sequencing yields long reads with high throughput, and sequenced molecules can be mapped to the genome using standard sequence alignment software..
- Results: This paper introduces DNAscent v2, software that uses a residual neural network to achieve fast, accurate detection of the thymidine analogue BrdU with single-nucleotide resolution.
- DNAscent v2 also comes equipped with an autoencoder that interprets the pattern of BrdU incorporation on each ONT-sequenced molecule into replication fork direction to call the location of replication origins termination sites.
- DNAscent v2 surpasses previous versions of DNAscent in BrdU calling accuracy, origin calling accuracy, speed, and versatility across different experimental protocols.
- Unlike NanoMod, DNAscent v2 positively identifies BrdU without the need for sequencing unmodified DNA..
- Unlike RepNano, DNAscent v2 calls BrdU with single-nucleotide resolution and detects more origins than RepNano from the same sequencing data.
- DNAscent v2 is open-source and available at https://github.com/MBoemo/DNAscent..
- Conclusions: This paper shows that DNAscent v2 is the new state-of-the-art in the high-throughput, single-molecule detection of replication fork dynamics.
- These improvements in DNAscent v2 mark an important step towards.
- measuring DNA replication dynamics in large genomes with single-molecule resolution.
- Looking forward, the increase in accuracy in single-nucleotide resolution BrdU calls will also allow DNAscent v2 to branch out into other areas of genome stability research, particularly the detection of DNA repair..
- The high-throughput detection of replication fork movement.
- with single-molecule resolution is critical for understand- ing how a cell replicates its DNA, which is particularly important for diseases like cancer where DNA replication is a therapeutic target [2].
- Oxford Nanopore Technologies (ONT) sequencing has emerged as a cost-effective plat- form for the detection of DNA base modifications such as 5-methylcytosine on long single molecules [3–7].
- 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
- The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- Sequencing with ONT and detecting the position of these bases reveals a footprint of replication fork movement on each sequenced molecule, allowing this method to answer questions that would have been traditionally addressed with DNA fibre analysis but with higher-throughput and the ability to map each sequenced read to the genome.
- This paper introduces DNAscent v2 which uses a new residual neural network architecture to assign a proba- bility of BrdU to each thymidine.
- Overhauling the BrdU detection algorithm from a hidden Markov model to a residual neural network results in high-accuracy BrdU calls (95.7% balanced accuracy.
- see Section S1 and Tables S1-S2 in Additional file 1) that enables the detection of replication dynamics with up to single-nucleotide resolution.
- DNAscent v2 supports BrdU detection on GPUs, providing the speed increase neces- sary to create genome-wide maps of replication dynamics in large genomes, as well as an autoencoder that automat- ically detects replication forks, origins, and termination sites at any point in S-phase and across different experi- mental protocols.
- This work demonstrates that DNAscent v2 is the new state-of-the-art to support DNA replication and genome stability research..
- The DNAscent v2 software consists of a simple two- step analysis pipeline requiring only three easy-to-make inputs: the FAST5 files containing raw signal data (pro- duced by ONT’s MinKNOW software during sequenc- ing), a reference genome, and the alignment (in BAM format) of ONT reads to the genome (Fig.
- The sub- program detect in DNAscent v2 uses these inputs to call the probability of BrdU at each thymidine position for each sequenced molecule.
- The output file from DNAscent detect is the only input for a new subprogram called forkSense that interprets the pattern of BrdU incorporation on each read to determine the probabilities that a leftward- and rightward-moving fork passed through each position dur- ing the BrdU pulse..
- The subprogram detect in DNAscent v2 detects BrdU with single-nucleotide resolution using a residual neural network consisting of depthwise and pointwise convolu- tions (Fig.
- To that end, DNAscent v2 includes a new subprogram called forkSense that was designed to work in both synchronous and asynchronous cells at any point in S-phase.
- forkSense uses an autoen- coder neural network to assign the probabilities that a leftward- and rightward-moving fork passed through each position on a read during the BrdU pulse (Fig.
- forkSense matches up converging and diverging forks in order to call confidence inter- vals of replication origins and termination sites on each nanopore-sequenced molecule.
- Hence, DNAscent detect and forkSense together are able to identify the BrdU “foot- print” of replication forks on each nanopore-sequenced molecule (Fig.
- In addition to improving performance and adding func- tionality, DNAscent v2 development placed a particu- lar focus on ease-of-use and accessibility for laborato- ries that may not have access to computational scien- tists or bioinformaticians.
- Origin calling with RepNano has fourteen adjustable parameters and earlier versions of DNAscent have three, but forkSense in DNAscent v2 does not require any tuning.
- DNAscent v2 also comes packaged with a utility that converts the outputs of detect and forkSense into bedgraphs such that BrdU and fork probabilities can easily be viewed side-by-side for each read (as in Fig.
- 3a-b) in the Integrative Genomics Viewer (IGV) [12] or the UCSC Genome Browser (http://genome.ucsc.edu) [13], and origin, termination, and fork calls are likewise written to bed files.
- To support the genome-wide measurement of replication dynam- ics in organisms with larger genomes, DNAscent v2 can optionally run BrdU detection on a GPU and benchmarks approximately 4.5× faster than DNAscent v1 and approx- imately 3.5× faster than RepNano (see Section S4 and Tables S5-S7 in Additional file 1)..
- To evaluate the performance of DNAscent detect, receiver operator characteristic (ROC) curves were plotted using nanopore sequenced unsubstituted DNA to measure false positives and DNA with four different BrdU-for- thymidine substitution rates (Fig.
- DNAscent v2 out- performed the previous versions of DNAscent by a wide margin in all four samples.
- Bedgraphs of the probabil- ity of BrdU at each thymidine position for a subset of unsubstituted reads and 49% BrdU-for-thymidine substi- tuted reads from the ROC curve analysis are shown in.
- 1 Schematic of the DNAscent v2 workflow.
- DNAscent detect uses a residual neural network to assign the probability of BrdU at each thymidine position in each read.
- DNAscent forkSense uses an autoencoder neural network to interpret the pattern of BrdU incorporation on each read into fork direction, and replication origin, fork, and termiantion calls are written to bed files.
- As an optional third step, DNAscent comes equipped with a utility that can convert the output of DNAscent detect and forkSense into bedgraphs that can be visualised in a genome browser.
- For each read, DNAscent detect performs a hidden Markov signal alignment to create an input tensor for the neural network.
- The final softmax layer normalises the output of the network to the probability that BrdU is at each thymidine position in the read.
- Further details, training information, and the number of parameters used in each layer are described in Section S2 of Additional file 1.
- (c) Architecture of the autoencoder neural network used by DNAscent forkSense.
- For each read, the output of DNAscent detect (the probability of BrdU at each thymidine position along the read) forms the input tensor, and the network outputs the probability that a leftward- and rightward-moving fork passed through each thymidine position on the read during the BrdU pulse.
- Further details, training information, and the number of parameters used in each layer are described in Section S3 of Additional file 1.
- 2 Performance of the DNAscent v2 detect subprogram.
- Detecting BrdU in nascent DNA sequenced with ONT can reveal the movement of replication forks in millions of single molecules.
- Only results for DNAscent are shown, as RepNano does not call BrdU with single-nucleotide resolution.
- (c) Bedgraphs visualised in IGV [12] showing the proability of BrdU called at each thymidine position for a randomly selected subset of reads used in the 49% BrdU ROC curve analysis.
- Each track is a single read, and the y-axis of each track ranges from 0 to 1.
- cerevisiae rDNA consists of kb repeats, each of which has an origin of replication (top track) and a replication fork barrier (vertical lines) that block rightward-moving forks.
- 3 Performance of the DNAscent v2 forkSense subprogram.
- cerevisiae chromosome I are shown for S..
- Origins that are confirmed and likely from OriDB are shown in the top track.
- Eight reads are shown for each experiment where each read is represented as a group of three tracks: the probability of BrdU at each thymidine (upper track.
- from DNAscent detect) and the probability that a leftward-moving fork (middle track.
- from DNAscent forkSense) passed through each position during the BrdU pulse.
- (e) Distribution of the distance between each origin call and the nearest confirmed or likely origin from OriDB for S.
- The results of three versions of DNAscent are shown.
- Results for DNAscent v2 are shown alongside results from the RepNano transition matrix (TM) and convolutional neural network (CNN) origin calling.
- Earlier versions of DNAscent were not designed to call origins in asynchronous cells, so only the results from DNAscent v2 are shown.
- To show that DNAscent v2 distinguishes BrdU from thymidine with single-nucleotide resolution, BrdU detec- tion was run on substrates with two BrdU bases at known positions [9] where DNAscent v2 was able to clearly iden- tify the positions of both BrdU bases (Fig.
- This accu- rate single-nucleotide resolution is particularly important for genome stability applications such as identifying the precise location of replication fork stalls.
- cerevisiae rDNA with 2-kilobase (kb) resolution using DNAscent v0.1 [9], but DNAscent v2 can detect sites of fork pausing/stalling with single-nucleotide res- olution (Fig.
- With DNAscent v2, the BrdU calls are clean enough that the single-nucleotide resolution BrdU calls can be visualised directly as bedgraphs in IGV [12].
- without the need for any smoothing or further processing from the software..
- cerevisiae cells that were synchronised in G1 and released into S-phase in the presence of BrdU with no thymidine chase [9] and asyn- chronous thymidine-auxotrophic S.
- A pileup of replication origins and termination sites called on S.
- Figure S5d in Additional file 1).
- DNAscent v2 is able to capitalise on its improved BrdU detection to detect several fold more origins than both previous versions of DNAscent and RepNano (Fig.
- While several tools have been developed in recent years that can detect BrdU in Oxford Nanopore reads, DNAscent v2 has a number of key advantages.
- Unlike NanoMod [7], DNAscent v2 is able to positively iden- tify BrdU without the need for sequencing both BrdU- substituted and unsubstituted DNA that covers the same.
- region of the genome.
- Unlike RepNano [10], DNAscent v2 can call BrdU with single-nucleotide resolution which is critical for accurately detecting sites of fork stalling and the genomic features (e.g., DNA sequence motifs or replication-transcription collisions) that may have caused aberrant fork movement.
- Importantly, DNAscent v2 far surpasses its previous major releases (v1 and earlier) [9] in accuracy of BrdU calling (Fig.
- The improve- ment to single-nucleotide resolution BrdU calling in detect, together with the forkSense algorithm, has allowed DNAscent v2 to make significantly more origin calls than previous versions when run on the same data set, and as shown by Fig.
- This suggests a decrease in false negative origin calls, enabling DNAscent v2 to create a more accurate picture of how replication took place on each individual molecule.
- When analysing all nanopore-sequenced molecules together, these improve- ments mean that less data is required to create whole- genome maps of replication origin and termination site locations, which is particularly important for studying replication in larger genomes..
- Transitioning the DNAscent detect BrdU calling algo- rithm from the hidden Markov forward algorithm to a new residual neural network architecture has increased the accuracy of single-nucleotide resolution BrdU calling, making this new version of DNAscent applicable to more areas of genome stability research.
- 2 indicates that DNAscent v2 should be able to detect sites of DNA repair, where accurate BrdU calls within very short (1-10 inserted nucleotides for base excision repair and about 30 nucleotides for nucleotide excision repair) would be critical.
- The residual neural network in DNAscent v2 also creates a more natural platform for future work on the detection of multiple base analogues and/or base modifications in the same molecule.
- DNA fibre analysis relies on sequential pulses of different base analogues to determine fork direction while DNAscent currently determines fork direction from the chang- ing frequency of BrdU-for-thymidine substitution across a molecule.
- While DNAscent’s current single-analogue approach is advantageous in its simplicity, the detection of multiple analogues would be necessary to answer certain questions typically addressed with fibre analysis, such as the stability of stalled replication forks [15]..
- This paper has introduced DNAscent v2, which utilises residual neural networks to significantly improve the single-nucleotide accuracy of BrdU calling compared with the hidden Markov approach utilised in earlier versions..
- DNAscent v2 also includes the new forkSense subprogram which uses an autoencoder to infer the movement of replication forks from patterns of BrdU incorporation..
- forkSense can call the location of replication forks, ori- gins, and termination sites in single-molecules across a range of experimental protocols with a sensitivity that exceeds both earlier versions and other competing tools..
- These new methodologies, together with improvements in speed and ease-of-use, make this technology an impor- tant new piece of the toolkit in DNA replication and genome stability research..
- Convolutional neural network.
- Additional file 1: Supplementary information.
- The supplementary information provides technical details about how the neural networks in DNAscent v2 were designed and trained.
- Details are also provided for the runtime comparisons mentioned in the text..
- Research by MAB is supported by Royal Society grant RGS\R1\201251, Isaac Newton Trust grant 19.39b, and startup funds from the University of Cambridge Department of Pathology.
- DNAscent v2 is open-source under GPL-3.0 and is available at https://github..
- Detection of base analogs incorporated during DNA replication by nanopore sequencing.
- FORK-seq: replication landscape of the Saccharomyces cerevisiae genome by nanopore sequencing

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt