- However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. - It is of great practical significance to expand RNA subcellular localization into multi-label classification problem.. - Results: In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. - In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. - The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. - Keywords: RNA subcellular localization, Multi-label classification, Hilbert-Schmidt independence criterion, Multiple kernel learning, Web server. - protein subcellular localization [1–6]. - [33] built a database called RNALocate, which collected more than 42,000 manually engineered RNA subcellular localization entries. - [34] constructed a database named LncATLAS to store the subcellular localization of lncRNA. - [40] devel- oped lncLocator to predict the subcellular localization of long-stranded non-coding RNA. - [41] proposed a novel method used the sequence-to-sequence model to predict microRNA subcellular localization. - [42] developed MiRGOFS being a GO-based functional similarity measurement for miRNA subcellular localization. - have been used to predict subcellular localization with good results.. - However, most existing RNA subcellular localization classifiers only solve the problem of single-label classifica- tion. - Therefore, it is of great practical significance to expand RNA subcellular localiza- tion into multi-label classification problem. - In view of the above research, there is no multi-label RNA subcellular localization dataset available for this task. - The optimal combined kernel can be put into an integration support vector machine model for training a multi-label RNA subcellular localization classifier. - (3) achieve a major challenge is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion, and the optimal combined kernel can be put into an integration support vector machine model for train- ing a multi-label RNA subcellular localization classifier;. - Here, we compare single-kernel feature models on four RNA sub- cellular localization datasets, as shown in Table 1. - It can be observed that kmer achieves best performance on mRNAs (AP:0.688) and lncRNAs (AP:0.745), NAC obtains best performance on miRNAs (AP:0.785), and DNC gains best performance on snoRNAs (AP:0.793). - Details are shown in Additional file 1: Table S5. - Also, we compare single-kernel feature models on four human RNA sub- cellular localization datasets, as shown in Table 2. - It can be noticed that kmer achieves best performance on mRNAs (AP:0.750), lncRNAs (AP:0.753), and snoRNAs (AP:0.817), CKSNAP obtains best performance on miR- NAs (AP:0.784). - Details are shown in Additional file 1:. - This phenomena is also reflected on four human RNA dataset, as shown in Fig. - Table 1 Average Precision of seven different nucleotide representations on four RNA datasets. - Table 2 Average Precision of seven different nucleotide representations on four human RNA datasets. - Here, we compare five integrated SVM strategies on four RNA subcellular localization datasets, as shown in Table 3. - It can be observed that MKSVM-HSIC achieves best performance on mRNAs (AP:0.703), lncR- NAs (AP:0.757), miRNAs (AP:0.787), and snoRNAs (AP:0.800). - Details are shown in Additional file 1: Table S7. - Also, we compare five integrated SVM strategies on four human RNA subcellular localization datasets, as shown in Table 4. - It can be observed that MK- HSIC achieves best performance on mRNAs (AP:0.755), lncRNAs (AP:0.754), miRNAs (AP:0.791), and snoRNAs (AP:0.816). - Details are shown in Additional file 1: Table S8. - It can be. - Details are shown in Additional file 1: Table S9. - 1 Feature importantce scores of seven characteristics on four RNA datasets. - 2 Feature importantce scores of seven characteristics on four human RNA datasets. - Here, we compare six classification methods on four RNA subcellular localization datasets, as shown in Table 5. - It can be observed that MKSVM-HSIC achieves best performance on mRNAs (AP:0.703), lncR- NAs (AP:0.757) and miRNAs (AP:0.787), and XGBT obtains best performance on snoRNAs (AP:0.806). - Details are shown in Additional file 1: Table S10. - Also, we com- pare six classification methods on four human RNA sub- cellular localization datasets, as shown in Table 6. - It can be noticed that MKSVM-HSIC achieves best performance on mRNAs (AP:0.755), lncRNAs (AP:0.754), miRNAs (AP:0.791), and snoRNAs (AP:0.816). - Details are shown in Additional file 1: Table S11. - Details are shown in Additional file 1: Table S12.. - Table 3 Average Precision of five different integration strategies on four RNA datasets. - Table 4 Average Precision of five different integration strategies on four human RNA datasets. - It will return the possibility of each label for RNA subcellular localization, and also give the suggested labels as final prediction result.. - In this paper, we establish multi-label benchmark data sets for various RNA subcellular localizations to ver- ify prediction tools. - Furthermore, we design an inte- gration SVM prediction model with one-vs-rest strat- egy to fuse a variety of nucleic acid sequence to iden- tify RNA subcellular localization. - In this study, we establish RNA subcellular localization datasets, and then propose an integration learning model for multi-label classification. - In order to study subcellular localization for Homo sapiens, we further establish human RNA subcellular localization datasets. - We use the database of RNA subcellular localization in order to integrate, analyze and identify RNA subcellular localization for speeding up RNA structural and func- tional researches. - Table 5 Average Precision of five different classifiers on four RNA datasets. - Thus, RNALocate pro- vides a comprehensive source of subcellular localization and even insight into the function of hypothetical or new RNAs. - We extract multi-label classification datasets about RNA-associated subcellular localizations on four RNA categories (mRNAs, lncRNAs, miRNAs and snoRNAs).. - The flowchart of mRNA subcellular localization dataset construction framework is shown in Fig. - RNA subcellular localization datasets. - We extract four RNA subcellular localization datasets, including mRNAs, lncRNAs, miRNA and snoRNAs. - We delete samples with duplicate Gene ID and remove samples without corresponding subcellular localization labels, and then construct four RNA subcellular localization datasets.. - We count the number of samples for each category of subcellular localization labels, and then select some. - The statistical distributions of these four RNA datasets are shown in Fig. - Human RNA subcellular localization datasets. - We also extract four Homo sapiens RNA subcellular localization datasets, including H_mRNAs, H_lncRNAs, H_miRNA and H_snoRNAs. - We screen out samples of homo sapiens on above four RNA datasets, and construct four human RNA subcellular localization datasets.. - The statistical distributions of these four human RNA datasets are shown in Fig. - RNA sequence can be represented as follow: S = (s 1. - Table 6 Average Precision of five different classifiers on four human RNA datasets. - 4 The robustness of our novel method on four RNA datasets. - 5 The robustness of our novel method on four human RNA datasets. - 6 Schematic diagram of RNA subcellular localizations in cells. - 7 The flowchart of mRNA subcellular localization dataset construction framework. - 8 The statistical distributions of four RNA subcellular localization datasets. - k = 2) descriptor can be calculated as follows.. - 9 The statistical distributions of four human RNA subcellular localization datasets. - Nucleic acid composition. - The frequency of each natu- ral nucleic acid (‘A’, ‘C’, ‘G’, ‘T’ or ‘U’) can be calculated as follows.. - The frequency of each 2-tuple of natural nucleic acid can be calculated as follows.. - The frequency of each 3-tuple of natural nucleic acid can be calculated as follows.. - The optimal combinatorial kernel can be calculated as follows.. - Convex quadratic programming problem can be solved as follows.. - NAC: nucleic acid composition. - Hum-ploc: A novel ensemble classifier for predicting human protein subcellular localization. - Methodology development for predicting subcellular localization and other attributes of proteins. - A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mploc 2.0. - Gram-positive and gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into chou’ s general pseaac.. - ploc-manimal: predict subcellular localization of animal proteins with both single and multiple sites. - plocbal-mgpos: Predict subcellular localization of gram-positive bacterial proteins by quasi-balancing training dataset and pseaac. - Rnalocate: a resource for rna subcellular localizations. - Lncatlas database for subcellular localization of long noncoding rnas.. - The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. - https://doi.org/. - Prediction of microrna subcellular localization by using a sequence-to-sequence model. - Mirgofs: a go-based functional similarity measurement for mirnas, with applications to the prediction of mirna subcellular localization and mirna–disease association.. - plocdeep-mhum: Predict subcellular localization of human proteins by deep learning. - plocdeep-mplant: Predict subcellular localization of plant proteins by deep learning. - plocdeep-mvirus: A cnn model for predicting subcellular localization of virus proteins by deep learning. - Incorporating organelle correlations into semi-supervised learning for protein subcellular localization prediction.. - Human protein subcellular localization identification via fuzzy model on kernelized neighborhood. - Identification of protein subcellular localization via integrating evolutionary and physicochemical information into chou’s general pseaac. - https://doi.org
Xem thử không khả dụng, vui lòng xem tại trang nguồn hoặc xem
Tóm tắt