« Home « Kết quả tìm kiếm

RBPsuite: RNA-protein binding sites prediction suite based on deep learning


Tóm tắt Xem thử

- RBPsuite: RNA-protein binding sites.
- prediction suite based on deep learning.
- Background: RNA-binding proteins (RBPs) play crucial roles in various biological processes.
- Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs.
- However, the training of deep learning models is very time-intensive and computationally intensive..
- Results: Here we present a deep learning-based RBPsuite, an easy-to-use webserver for predicting RBP binding sites on linear and circular RNAs.
- For linear RNAs, RBPsuite predicts the RBP binding scores with them using our updated iDeepS.
- For circular RNAs (circRNAs), RBPsuite predicts the RBP binding scores with them using our developed CRIP.
- RBPsuite first breaks the input RNA sequence into segments of 101 nucleotides and scores the interaction between the segments and the RBPs.
- RBPsuite further detects the verified motifs on the binding segments gives the binding scores distribution along the full-length sequence..
- Conclusions: RBPsuite is an easy-to-use online webserver for predicting RBP binding sites and freely available at http://www.csbio.sjtu.edu.cn/bioinf/RBPsuite/..
- Keywords: Deep learning, RNA-binding proteins, Linear RNAs, Circular RNAs.
- RNA-binding proteins (RBPs) are involved in many bio- logical processes, their binding sites on RNAs can give insights into mechanisms behind diseases involving RBPs [1].
- Thus, how to identify the RBP binding sites on RNAs is very crucial for follow-up analysis, like the im- pact of mutations on binding sites.
- With high- throughput sequencing developing, there is an explosion in the amount of experimentally verified RBP binding sites, e.g.
- However, these CLIP-seq data still cannot provide the full view of the RBP binding landscape, it is because CLIP-seq relies on gene expression which can be highly variable between experiments.
- data for machine learning models to predict missing RBP binding sites that may not be detected in some ex- periments.
- GraphProt can detect the binding se- quence and structure preference of RBPs and further predict the RBP binding sites on any input RNAs.
- Recently, deep learning-based methods have achieved remarkable results on predicting RBP sites [5, 6].
- Inspired by DeepBind, iDeep in- tegrates multiple sources of features to predict RBP binding sites using a multi-modal deep learning, which consists of a CNN and multiple deep belief networks [8]..
- and structure context.
- Different from iDeepS, pysster encodes the sequence and structure in a one-hot encoded matrix based on an extended alphabet, which combines the se- quence and structure alphabet [11].
- DeepCLIP applies a similar network architecture consisting of a hybrid CNN and LSTM to predict RBP binding sites on RNAs [12].
- iDeepE trains a local CNN and a global CNN to predict RBP binding sites from sequences alone [13].
- The bind- ing mechanism of RBP binding circular RNAs (cir- cRNAs) is different from that of linear RNAs, and thus the trained models on RBP binding linear RNAs cannot generalize well to circRNAs, CRIP is specially developed for predicting RBP binding sites on circRNAs by using a codon-based encoding schema and hybrid deep models [14]..
- In addition, SMARTIV cannot predict RBP binding sites for a single RNA se- quence.
- The backend predictor of the above webservers are non-deep learning-based methods, which are proved to be inferior to deep learning-based methods for pre- dicting RBP binding sites [18].
- Moreover, no online web- server is currently available for predicting RBP binding sites on circRNAs..
- However, to date, there is no online webserver avail- able for predicting RBP binding sites on both linear and circular RNAs using deep learning.
- Most published ap- proaches for predicting RBP binding sites only provide source code with different input data format, like Graph- Prot, our developed iDeepS and CRIP, their dependency is difficult to configure due to frequent update of deep learning framework, like TensorFlow.
- In addition, for deep learning-based approaches, the training of models is very time-intensive and computationally intensive..
- Thus, it is imperative to develop an easy-to-use webser- ver to integrate the state-of-the-art prediction methods for predicting RBP binding sites on RNAs and cover as many RBPs as possible.
- RBPsuite holds a broad applica- tion potential, it can be used to expand our knowledge about RBP binding RNAs, e.g.
- We implement an online webserver RBPsuite for pre- dicting RBP binding sites on full-length linear and circu- lar RNAs from sequences alone.
- For the linear RNAs, the server predicts the RBP binding scores using our up- dated iDeepS, which is retrained on binding RNA targets of 154 RBPs derived from ENCODE.
- For circRNAs, RBPsuite predicts the RBP binding scores using our de- veloped CRIP.
- RBPsuite further detects the verified motifs on the predicted binding segments and visualizes the score distribution within the input sequence..
- To prepare the positive and negative RBP binding training data sets, several steps were proc- essed.
- 4) Negative RBP binding re- gions were produced by implementing shuffleBed of bedtools, these negative sites are those regions without any peak located from the same gene of each peak.
- For circRNAs, we use the trained models of 37 RBPs on the benchmark dataset of CRIP [14].
- In RBPsuite, there are two deep learning-based methods:.
- Updated iDeepS for predicting RBP binding sites on linear RNAs Here we did some modification on the encoding schema of sequence and structure in original iDeepS.
- It first encodes the sequence and structure into a one-hot encoded matrix with an extended alphabet.
- Then the newly one-hot encoded matrix is fed into a CNN and a LSTM to extract high-level features, which are inputted into two fully con- nected layers to predict RBP binding sites on linear RNAs..
- CRIP for predicting RBP binding sites on circRNAs.
- Considering that the interacting patterns of RBP-binding circRNAs are different from those of linear RNAs, the trained models on linear RNAs cannot generalize well to circRNAs.
- Thus, we propose a deep learning based method CRIP for specially predicting RBP-binding sites on circRNAs [14] from sequences alone.
- CRIP first encodes the sequence into one-hot encoded matrix using a stacked codon-based encoding scheme, then the encoded matrix is fed into a hybrid deep learning architecture with a CNN and a biLSTM to predict RBP binding sites on circRNAs..
- To further provide the support evidence for predicted binding sites, we use FIMO [25] in MEME [23] to scan the occurrence of verified motifs on the predicted bind- ing segments.
- We first evaluate the updated iDeepS on the original benchmarked dataset with 31 experiments [8], iDeepS yields an average AUC of 0.85 across 31 experiments, which is close to the original iDeepS.
- DeepCLIP with a similar network architecture on the benchmark dataset from GraphProt.
- For linear RNAs, iDeepS in RBPsuite yields an average AUC of 0.781, pre- cision of 0.673, sensitivity of 0.802 and specificity of 0.591 across 154 RBPs on the independent test set.
- We also retrain CRIP on the circRNA bench- mark set, CRIP yields an average AUC of 0.878, a preci- sion of 0.798 and a sensitivity of 0.813, across 37 RBPs..
- For linear RNAs, the binding scores of individual segments are calculated by iDeepS.
- The output page gives the binding scores for each segment and identified motifs on the segment, and also the score distribution of RBP binding sites within the input sequence.
- 2 The AUCs of the updated iDeepS for linear RNAs on 154 RBPs.
- In addition to the input sequence, users need specify the RNA type ‘Linear RNA’ or ‘Circular RNA’, which de- termines which computational method will be used for predicting the RBP binding sites.
- ‘Specific model’ predicts the binding scores between the input RNA and the chosen RBP using the models trained on.
- ‘General model’ pre- dicts the binding scores between the input RNA and all RBPs with trained models, and the number of RBPs is 154 and 37 for linear RNAs and circRNAs, respectively..
- When the job is finished, the prediction results will ap- pear on the results page.
- If there are verified motifs for the RBP, the motifs on the segments in the result table are marked in red.
- The expected runtime of predicting binding sites of a specific RBP on a linear RNAs and a circRNAs using RBPsuite for sequences with different lengths are listed in Table 2.
- For general model, RBPsuite will predict binding scores of all available RBPs for the segments of the input sequence, as shown in Fig.
- Users can click the RBP of interest to see the predicted RBP binding sites of this RBP on the input sequence (Fig.
- Table 2 The expected runtime of predicting binding sites of a specific RBP on a linear RNA and a circRNA using RBPsuite for sequences with different lengths.
- In the table, the detected motif on the predicted binding site is marked in red.
- Here we use RBPsuite to predict RBP binding sites on full-length RNAs.
- hsa_circ_0054654 has a length of 1821 nts, and it has 13 AGO2 binding sites with the CLIP-seq peaks without overlap.
- circ_0054654 sequence into 18 segments, which are pre- dicted to be 14 AGO2 binding sites with a score cutoff 0.5, as shown in Fig.
- Of the 14 predicted binding sites, 12 are the segments with verified binding sites lo- cating on, only one segment with verified binding site is not detected by RBPsuite (Fig.
- 4b), where star is the veri- fied binding sites of AGO2.
- In RBPSuite, we use FIMO in the MEME tool to detect verified motifs from CISBP-RNA database within the segments of the input RNA sequences.
- Another solution is that transferring models from RBPs with similar binding preference to the RBP with limited verified targets, as done in beRBP [32], which is able to predict binding sites for any RBPs.
- In addition, RBPsuite predicts a 101 nt-long segment locat- ing the RBP binding site but still cannot locate the exact.
- 4 The results of RBPsuite for predicting AGO2 binding sites on hsa_circ_0054654.
- B) The score distribution of 18 segments from hsa_circ_0054654, where the star corresponds to the verified binding sites derived from CLIP-seq read peaks.
- In this study, we implement an online webserver RBPsuite for predicting RBP binding sites on linear and circular RNAs based on deep learning.
- RBPsuite inte- grates two deep learning algorithms iDeepS and CRIP, which predict RBP binding sites on linear RNAs and cir- cRNAs, respectively.
- RBPsuite is able to predict binding linear RNAs for the largest number of RBPs, and is the first deep learning-based webserver for this task.
- In addition, RBPsuite further detects the verified motifs on the segments to give more evidence for supporting the binding segments.
- The prediction performance on the independent test set and a case study both demonstrate the effectiveness of RBPsuite..
- RBPs: RNA binding proteins;.
- RNA-binding proteins in.
- Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP).
- GraphProt: modeling binding preferences of RNA-binding proteins.
- Identifying RNA-binding proteins using multi- label deep learning.
- Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.
- RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.
- Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks.
- Pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks..
- DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning.
- Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks..
- CRIP: predicting circRNA-RBP-binding sites using a codon-based encoding and hybrid deep neural networks.
- SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
- A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data.
- Recent methodology progress of deep learning for RNA-protein interaction prediction.
- A compendium of RNA-binding motifs for decoding gene regulation.
- Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins..
- A census of human RNA-binding proteins.
- beRBP: binding estimation for human RNA-binding proteins

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt