« Home « Kết quả tìm kiếm

CircMarker: A fast and accurate algorithm for circular RNA detection


Tóm tắt Xem thử

- CircMarker: a fast and accurate algorithm for circular RNA detection.
- Non-canonical splicing joins 3’ and 5’ and forms the so-called circular RNA..
- It is now believed that circular RNA plays important biological roles such as affecting susceptibility of some diseases..
- During the past several years, multiple experimental methods have been developed to enrich circular RNA while degrade linear RNA.
- Although several useful software tools for circular RNA detection have been developed as well, these tools are based on reads mapping may miss many circular RNA.
- Method: In this paper, we present a new computational approach, named CircMarker, based on k-mers rather than reads mapping for circular RNA detection.
- CircMarker takes advantage of transcriptome annotation files to create the k-mer table for circular RNA detection..
- Results: Empirical results show that CircMarker outperforms existing tools in circular RNA detection on accuracy and efficiency in many simulated and real datasets..
- Conclusions: We develop a new circular RNA detection method called CircMarker based on k-mer analysis.
- Our results on both simulation data and real data demonstrate that CircMarker runs much faster and can find more circular RNA with higher consensus-based sensitivity and high accuracy ratio compared with existing tools..
- Recent studies show that sometimes circular RNA may be generated dur- ing transcription [3].
- Circular RNA (or circRNA) is a type of RNA which forms a covalently closed continuous loop..
- However, since the amount of circular RNA is often much lower than lin- ear RNA, circular RNA has not been thoroughly studied until recently.
- During the past several years, several papers report that circular RNA may be associated with diseases and traits [7].
- Based on this feature, some benchmark experimen- tal methods have been developed to degrade the linear RNA while enriching the circular RNA.
- Computational tools for circular RNA detection have been developed.
- Currently, there are several existing tools for circular RNA detection, such as Find_circ [12], CIRC- explorer [13] and CIRI [14].
- Find_circ is one of the first tools for circular RNA detection.
- The main idea is using the concept of fusion gene to detect circular RNA.
- Then, those un-mapped reads are mapped back to the reference using TopHat-Fusion [16] to detect potential circular RNA candidates with the back-spliced junction reads.
- CIRI uses BWA [17] for reads mapping, trying to find circular RNA by analyzing CIGAR signatures in the SAM file.
- Some of these tools such as CIRCexplorer depend on transcrip- tome annotation, while others support de novo circular RNA detection, such as Find_circ.
- This can be useful for circular RNA detection.Prior literature also tries to evaluate these tools in terms of their performance, such as precision and sensitivity [19]..
- Although BWA and Bowtie are widely used in sequence analysis, reads mapping is still time-consuming for circular RNA detection.
- This is because reads mapping tries to map every read, even when the read is not rele- vant for circular RNA detection.
- Moreover, these tools may miss circular RNA in some cases due to errors in reads mapping.
- For example, some reads related to circular RNA may be un-mapped due to reads error..
- In this paper, we develop a new computational method, called CircMarker, for circular RNA detection.
- The objec- tive of CircMarker is finding the presence of circular RNA (in particular the join of two known exons).
- Cir- cMarker doesn’t reconstruct the complete sequence of circular RNA.
- Instead, CircMarker ana- lyzes short sequence segments, called k-mers, for circular RNA detection.
- Empirical results show that CircMarker is more accurate than (or as accurate as) existing methods on simulated and real datasets in calling circular RNA..
- CircMarker only considers the circular RNA which.
- 1 The procedure of the circular RNA detection.
- a A fast check for finding circular RNA relevant reads by sampling.
- c Calling circular RNA using various criteria and filters.
- The upper: with 3 exons, and the red arrow identified a potential circular RNA.
- We do not consider de novo circular RNA cases in this paper.
- To speed up, CircMarker first performs a fast check to find the reads that are likely to be relevant for circular RNA detection.
- Then it processes each read and compares k-mers in the read with the stored k-mers to identify cir- cular RNA based on the signatures from circular RNA..
- When two k-mers from a single read are out of order rel- ative to the reference, CircMarker considers this as an evidence for the existence of circular RNA..
- Then, the best hitting case will be considered as the self-circular RNA candidate if L e ≤ L e m .
- We consider a candidate a valid self-circular RNA only if there are two tags that are.
- going backward at the circular RNA join junction)..
- Calling circular RNA.
- There are two cases for calling circular RNA: the self- circular case and the regular-circular case..
- Self-circular RNA.
- First, a self-circular RNA candidate will be discarded if the length of current exon is shorter than the read length while the N h is smaller than L e − K + 1..
- Otherwise, the best hitting case will be considered to be a valid self-circular RNA candidate if it contains circular splicing signals in both sides..
- Regular-circular RNA.
- Otherwise, we try to identify the breakpoint at the position of the first deceasing and set it to be the joint junction of circular RNA.
- The candidate will be viewed as a valid regular-circular RNA candidate only if the head exon and tail exon have the tail and head circular splic- ing signal respectively.
- We set the end position of the head exon and the start position of the tail exon as the position of this called regular-circular RNA..
- Oth- erwise we try to identify the breakpoint at the the first increasing and set it to be the joint junction of circular RNA.
- Refining circular RNA candidates.
- We count how many reads support each circular RNA candidate.
- the maximum coverage of circular RNA is unknown in most cases, we set the default value to be a large number to allow all of valid circular RNA candidates..
- Since the study of circular RNA is still at an early stage, there is no widely accepted benchmark data for evaluat- ing the circular RNA calling at present.
- Recently, there are some public circular RNA databases which collect different types of circular RNA from published papers..
- Some databases come with the recommended circular RNA detection tool, such as CircBase [20].
- Others focus on collecting the relationship between circular RNA and diseases or traits, such as Circ2Traits [21]..
- The total number of simulated circular RNA in benchmark is 8033 and 8071 for those two cases respectively.
- Since the coverage of circular RNA is known in simulated data, we set the “maximum support reads”.
- 3a), where CircMarker has fewer false positives and also calls more correct circular RNA than other tools.
- This is likely due to the week per- formance of the option “coverage filter”, for the similar coverage in both linear and circular RNA.
- In those papers, the authors usually only validate parts of the computation- ally detected circular RNA using biological experiments..
- Data collection We choose CircBase [20] as the standard circular RNA database of homo sapiens.
- a The number of circular RNA called by each tool in case 1 (10X and 100X, the left cluster) and case 2 (50X&50X, the right cluster).
- incorrectly called) circular RNA.
- correctly called) circular RNA.
- number N h db : the number of circular RNA which has a matched circular RNA in the database.
- These matched circular RNA are called reliable circular RNA.
- (2) Inter- section: the intersection of reliable circular RNA between CircMarker and other tools.
- This measures the fraction of the number of matched circular RNA with regard to the total called ones N.
- The best tool is expected to have large intersection with other tools (low bias), large number of reliable circular RNA with high reliability ratio and fastest running time..
- Recall that RNase R is an experimental technology that can break down linear RNA and enrich circular RNA.
- As a result, one popular way for validating a circular RNA detection tool is running the tool in two different types of reads: one from only rRNA eliminated sample (called.
- The circular RNA which can be found in both types of reads is considered to be reliable..
- (1) Reliable circular RNA: the reliable circular RNAs are from the intersection of called circular RNAs between the treated and untreated reads..
- Each tool reports its own reliable circular RNA from chro- mosomes 1 to 3.
- (2) Consensus-based sensitivity: we say a called circular RNA to be trusted if this circRNA is called by at least two tools.
- We collect these trusted circular RNA for each chromosome.
- Then, we calculate the intersec- tion between the reliable circular RNA and the benchmark for each tool respectively from chromosome 1 to 3.
- Ideally, a circular RNA detection tool should obtain large number of reliable circRNA with high consensus-based sensitivity and fast running time in each chromosome..
- a The number of circular RNA called by each tool from chromosome 1 to chromosome 3.
- b Intersection: the number of circular RNAs in the intersection of reliable circular RNA N h db between CircMarker and other tools.
- CircMarker finds larger number of reliable circular RNA than others in all three chromosomes (Fig.
- One can see that CircMarker gets the largest num- ber of reliable circular RNA in all three chromosomes.
- Some existing circular RNA calling tools don’t use annotation files.
- Moreover, the circular RNA which is supported by anno- tated exons should be considered as more reliable than the de novo one..
- We find there are 91.2% circular RNA recorded in circBase that could hit the boundary of exons recorded in annota- tion file.
- In this paper, we develop a new circular RNA detection method called CircMarker based on k-mer analysis.
- found by at least two tools) circular RNA be contributed by the reliable circular RNA from each tool.
- Moreover, k-mers contain useful information about circular RNA detection.
- Our results on both simulation data and real data demon- strate that CircMarker can find more circular RNA.
- CircRNA: Circular RNA.
- Mis-splicing yields circular rna molecules.
- Ciri: an efficient and unbiased algorithm for de novo circular rna identification.
- A comprehensive overview and evaluation of circular rna detection tools.
- Circ2traits: a comprehensive database for circular rna potentially associated with disease and traits

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt