« Home « Kết quả tìm kiếm

Globally learning gene regulatory networks based on hidden atomic regulators from transcriptomic big data


Tóm tắt Xem thử

- Gene regulatory networks (GRNs) play fundamental and central roles in response to endogenous or exogenous stimuli for maintaining the viability and plasticity of cells [1, 2].
- China Full list of author information is available at the end of the article.
- The former attempts to model expression patterns of genes including TFs by parameteriz- ing the topology of the GRNs with various methods [9, 10], such as probabilistic graphical models, ODEs and Petri Nets, while the latter treats each pair or subset of regulators and target genes locally and then assembles them into a complete network [11].
- A main disadvantage of the PTP methods, how- ever, is the expensive computational cost raised by the heuristic or greedy search for network parameters in an extremely large space.
- For example, Gaussian graphical models need to estimate a partial correlation matrix of size at least square of the number of genes [12, 13]..
- Friedman et al [10] firstly introduced Bayesian networks to reconstruct S.
- Recently, Siahpirani et al.
- For example, the ARACNE method, proposed by Margolin et al.
- To overcome the over-sensitivity, Meyer et al.
- [22] intro- duced the maximum relevance/minimum redundancy filter for refinement, and Liu et al.
- To relax the constringency, Zhang et al.
- They mainly rely on regression models, in- stead of the similarity measures described above, and can favourably bypass the challenging optimal selection prob- lem of conditional genes in conditional correlation models like CMI.
- In each of the regression problems, a tree-based ensemble model, Random Forests or Extra- Trees [31], is applied to calculate a local ranking of genes, and the resulting p local rankings are finally aggregated to reach a global ranking of all gene pairs..
- We here develop a DL-based GRN inference framework (dlGRN), which intends to learn a sparse representation of the gene regulatory system via a modified DL algorithm and then makes a global inference of the regulators for a tar- get gene based on the sparse representation, independ- ent of known or observed regulators.
- We argue that it is the first time to truly globally reverse engineering GRNs with the help of a sparse representation of the regulatory system.
- We demonstrated the effectiveness and effi- ciency of the proposed method on synthetic data and real-world data about two model organisms and human lung cancer.
- Shi et al.
- Given a pair of TF tf, and TG g, dlGRN then estimates Pearson correlations (PCs) between tf and the resulting ARs associated with g and calculates a confidence score (cs) for the regulation of tf on g via the inverse function of the cumulative distribu- tion of PCs, as shown in Fig.
- The confidence score is meaningful in systems biology and will be robust due to the globalization of ARs.
- Evaluation of the performance of uncovering hidden ARs When applying sfk-svd to Simulation data I, we ob- served that root mean squared errors (RMSEs) grad- ually decreased and converged within ~ 200 iterations in all the data scenarios (Figs.
- S1-S5 in S1 Notes), ir- respective of the values of l and 150}, suggesting the convergence of the algorithm.
- We observed that the par- ameter l took substantial impact on the power: l = 50, i.e., the real number of regulators, always led to the highest RRs and PPVs, while l >.
- Figure 2d-e visualizes the changes of the average RRs and average PPVs over different samples sizes with SNR, showing a trend that the power in- creases as noise reduces, especially when l is large..
- Evaluation of the performance of dlGRN in predicting gene regulations.
- Results reveals that on Simulation data I, dlGRN achieved higher average AUROCs and AUPRs than four state-of-the-art methods, GENIE3 [30], CLR [35], ARAC Ne-AP [11] and ARACNE [21] in all the scenarios of sample sizes and noise levels, as shown in Table 1 (and Table S1 in Supplemental material SII Notes).
- We found that the optimal values of l are always around the num- ber of real regulators [36], which is consistent with the pattern of the power of recovering hidden regulatory sig- nals in simulation experiments (Fig.
- 2 Evaluation of the signal recovery power of dlGRN on Simulation data I.
- This should be related to the increased non- linear complexity in Simulation data II..
- cerevisiae data set and all the three lung cancer data sets, dlGRN still achieved higher AUROCs and AUPRs than those of the four previ- ous methods and competitive results for the E.
- For each of the three lung cancer data sets, we further sorted the predicted regulations in a de- creasing order of cs and counted the numbers of true posi- tives in the top num and 200 for each method, finding that dlGRN called most true positives on all the three data sets and most common true positives, re- gardless of num, as shown in Fig.
- Taken together, these results suggest the superior power of dlGRN in recovering regulations over state-of-the-art methods..
- Following the known 2677 TF-target regulations, we then selected 2677 TF- target pairs with top cs by each method and built GRNs for LUAD on each of the three lung cancer data sets (Fig.
- The ln- transformed distributions of the degree of nodes in the.
- Furthermore, Venn dia- grams of the three sets of 2677 links for different methods (Fig.
- For EGR2, Kim et al.
- Li et al.
- Furthermore, Sun et al.
- Due to the tran- sitive effect of correlations, current methods often fail to infer CRS completely correctly.
- The background 2677-link networks of the lung cancer data contain totally 6678 CRSs, against which we investigated how dlGRN distinguishes direct and indirect regulations.
- Fig- ure 4g-i compares the numbers of the five patterns detected by dlGRN and the four previous methods on the three data sets, showing that dlGRN completely.
- ARACNe missed most direct regulations (P5) on almost all the three data sets, which may be related to the over-trimming of links by DPI.
- These results suggest that dlGRN is intrinsically distinctive of direct and indirect regulations due to the modeling globalization..
- We aver- aged the resulting cs over the three data sets for each of the 55 known TFs (Supplemental material SIII Notes),.
- To experimentally verify the predictions, we searched for the transcription factor binding sites (TFBSs) of the two TFs to the promoter of EGFR using the online JASPAR tool (http://jaspar.genereg..
- 4 Topological analysis of the reconstructed 2677-link GRNs by dlGRN and the four previous methods.
- Node sizes are proportional to the connectivity.
- γ : slope of the fitted power-law curve.
- d-f Distributions of the log-transformed degrees of nodes in the GRNs on data sets, GSE32863 (d), GSE10072 (e) and GSE7670 (f).
- For the lung cancer data set, GSE32863, Selamat et al.
- [50] monitored the DNA methylation profiles of the 116 samples at the same time.
- Many of the inferred methylation regulations have been previously.
- The use of ARs guaran- tees the globalization of the regulation inference.
- Experiments on simula- tion and real data sets show that dlGRN outperforms state-of-the-art methods with higher AUROCs and higher AUPRs in GRN reconstruction..
- Previous methods such as similarity criteria often call plenty of spurious direct reg- ulations due to the transitive effect of correlations.
- We experimentally verified a novel predicted regu- lation, i.e., the regulation of TF TFAP2C on a hot once-gene EGFR, in lung cancer cell A549 and.
- We also notice that TFs preferentially bind to a certain target sequence, and searching for that se- quence or similar patterns in the regulatory regions of the target genes may help improve dlGRN.
- X∈R lp represents the sparse regulation coefficient matrix of the l ARs on target genes, of which element x ij represents the regulation effect of the i-th AR to the j-th target gene.
- The learned AR dictionary reflects a surrogate of the regulatory mechanisms underlying Y.
- ð2Þ where x i is the i-th column of X and t i is a prior positive constant, referred to as scale-free sparsity parameter, that specifies the upper boundary of the number of ARs for the i-th target gene.
- The optimization [2] guarantees the sparsity of the resulting GRNs and makes it under control in network structure.
- Compared with these methods mentioned above, k-SVD-like dictionary learn- ing methods are promising, because they hardly impose none statistical properties on the atomic regulators to be mined except the sparseness of the inferred network structure which is in coordination with the real-world GRN structure..
- where Γ −1 represents the inverse function of the cu- mulative distribution of |pcc| and 0 ≤ α ≤ 1 is a quan- tile cutoff (α = 0.9 as default).
- The pseudo code of the proposed GRN inference approach dlGRN can be listed below:.
- In the proposed GGRM, the parameter l represents the number of atomic regulators (ARs) and should approximate to the number of real-world regulators, including TFs, microRNAs and DNA methylation..
- Theoretically speaking, the value of l needs to be esti- mated based on the biological priors of the organism from which the transcriptomic data was collected.
- The parameter t i is a small positive constant to constrain the maximum l 0 -norm of the i-th regula- tory coefficient vector.
- Two simulation data sets.
- Simulation data II were downloaded from the DREAM5 project (http://www.the-dream-project.org.
- The data sets consist of the expression profiles of 1548 target genes and 195 TFs in 805 samples.
- Five real data sets.
- cerevisiae, which consist of the expression profiles of 4511 target genes and 334 TFs in 805 samples and the expression profiles of 5950 target genes and 333 TFs in 536 samples, respectively.
- Statistical comparison of the two groups each with tripli- cates was conducted using Student’s t-test.
- cerevisiae) are available from http://www.the-dream-project.org/..
- None of the authors have potential financial or ethical conflicts of interest with the contents of this submission..
- Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, et al..
- Architecture of the human regulatory network derived from ENCODE data..
- Yang AP, Liu LG, Chen MM, Liu F, You H, Liu L, et al.
- Duan Y, Tan Z, Yang M, Li J, Liu C, Wang C, et al.
- Marbach D, Costello JC, Kuffner R, Vega NM, Prill RJ, Camacho DM, et al..
- Belliveau NM, Barnes SL, Ireland WT, Jones DL, Sweredoski MJ, Moradian A, et al.
- Gendelman R, Xing H, Mirzoeva OK, Sarde P, Curtis C, Feiler HS, et al..
- Luo Y, Mao C, Yang Y, Wang F, Ahmad FS, Arnett D, et al.
- Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, et al.
- Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al.
- Gao Y, Yurkovich JT, Seo SW, Kabimoldayev I, Dräger A, Chen K, et al..
- Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, et al..
- Iorio MV, Ferracin M, Liu CG, Veronese A, Spizzo R, Sabbioni S, et al..
- Generalizations of the clustering coefficient to weighted complex networks.
- Qi L, Saberi M, Zmuda E, Wang Y, Altarejos J, Zhang X, et al.
- Kim H-J, Hong JM, Yoon K-A, Kim N, Cho D-W, Choi J-Y, et al.
- Nishimori H, Sasaki Y, Yoshida K, Irifune H, Zembutsu H, Tanaka T, et al.
- Shi Q, Zhong YS, Ren Z, Li QL, Zhou PH, Xu MD, et al.
- Analysis of the role of the BMP7-Smad4-Id2 signaling pathway in SW480 colorectal carcinoma cells.
- Li HS, Yang CY, Nallaparaju KC, Zhang H, Liu Y-J, Goldrath AW, et al.
- De Andrade JP, Park JM, Gu VW, Woodfield GW, Kulak MV, Lorenzen AW, et al.
- Selamat SA, Chung BS, Girard L, Zhang W, Zhang Y, Campan M, et al..
- Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng CH, et al.
- Gama-Castro S, Salgado H, Peralta-Gil M, Santos-Zavaleta A, Muñiz-Rascado L, Solano-Lira H, et al.
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, et al..
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu Y, et al

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt