« Home « Kết quả tìm kiếm

PLAIDOH: A novel method for functional prediction of long non-coding RNAs identifies cancer-specific LncRNA activities


Tóm tắt Xem thử

- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- 1c).The novel spliced lncRNAs exhibited a larger difference in mean fold change and higher average expression level (Fig.
- Overview of PLAIDOH pipeline and modular algorithms An overview of the PLAIDOH pipeline and selected example output graphs are shown in Fig.
- Consistent with these and other reports, we observed an increase in the average expression level of lncRNAs located close to FANTOM enhancers, for both positively and negatively correlated LCPs (Fig.
- LncRNA Transcript Cis-regulatory Score.
- 2 An overview of the PLAIDOH pipeline and algorithm output.
- a Schematic of the single, input file required by PLAIDOH to identify all possible lncRNA/Coding gene Pairs (LCPs) in the user ’ s dataset.
- b Overview of the datasets that are used by PLAIDOH to annotate lncRNAs and predict activity based on genomic and epigenomic context.
- c Abridged example of the primary PLAIDOH output table, showing the three scores PLAIDOH calculates for each LCP as well as the 30+ additional columns of valuable information about the lncRNA and coding gene in each LCP..
- LCPs with positive Spearman correlation coefficients (rho) are plotted in the upper half of each plot;.
- those with negative Spearman correlation coefficients (rho) are plotted in the lower half.
- ChIA-PET interaction scores [49, 52] reflect the relative frequency of the interaction between two genomic frag- ments, bound by an immunoprecipitated protein.
- To modulate transcription of another gene, a lncRNA must be located in the nucleus.
- Next, we sought to identify individual lncRNAs that are significantly more correlated with a single coding gene compared to all others in the 400 kb flanking regions.
- We identified such “ outliers ” by calculating the Z-scores of the Spearman correlation of each LCP for a given lncRNA, to identify those with the correl- ation that most deviates from the rest of the LCPs..
- involved in the export of mRNA and small RNA mole- cules from the nucleus [58, 59].
- a Contour plot shows the frequency of significant LCPs numbers as a function of the number of all possible coding gene pairs for each lncRNA.
- b Genomic maps of the two LCPs shown in a.
- positively correlated LCPs are plotted in the left panel and negatively correlated LCPs are in the right panel.
- d Genomic maps of the LCPs shown in c.
- The cancer-specific pattern is exemplified in the STAG1 locus, in which three of four genes within the 800 kb window are significantly corre- lated with AC096992.2 in AML (Fig.
- STAG1 is a ubiquitously expressed component of the cohesin com- plex that stabilizes topologically associating domain (TAD) boundaries with CTCF [62].
- This effect was not due to generally higher expression of all of the neighboring genes in AML compared to the other can- cer types (Additional file 6: Figure S5C), and levels of nearby active and enhancer-associated histone marks H3K27ac and H3K4me1 were not consistently higher in AML compared to the other tumors (Additional file 6:.
- These results suggest that lncRNA AC096992.2 itself may play a cis- regulatory role in the transcription of multiple neighboring genes..
- The cancer-recurrent pattern for an LCP is exempli- fied in the EVI2/ADAP2 17q11.2 locus (Fig.
- Of the 9 coding genes and one other lncRNA in this locus, expression of AC138207.5 is significantly correlated with ADAP2 in all five cancer types, and with EVI2A/B in most of the five cancer types (Fig.
- 5h), may play a cis- regu- latory role in the transcription of multiple neighboring genes in diverse cancer types.
- This score is calculated from the adjusted p -value of the Spearman LCP corre- lation, the level of overlapping promoter-associated H3K4me3 activity, and the fraction of lncRNA transcript localized in the nucleus (see Methods)..
- Ranking the LCPs from least to greatest for each score revealed an inflection point in the distribution of scores.
- The role of lncRNAs in the pathogenesis of Non-Hodgkin B cell Lymphoma (NHL) has not been.
- In samples from TCGA DLBCL, which are a type of NHL, we identified several lymphoma oncogenes within LCPs with high enhancer or transcript cis- regulatory scores, including anti-apoptotic factors BCL2L2 (BCL-W) and BCL2L1 (BCL-XL), BCL6 , and BCL7A , suggesting that the paired lncRNAs may play a role in the transcription of these lymphoma-associated genes (Fig.
- c Heatmap of LCP Spearman correlation p-values for expression of AC096992.2 and each of the genes within 400 kb.
- f Heatmap of LCP Spearman correlation p-values for expression of AC138207.5 and each of the genes within 400 kb flanking.
- d Plots show LCPs ranked by increasing LncRNA Transcript Cis- regulatory Scores.
- f XY plots show Enhancer versus LncRNA Transcript Cis- regulatory Scores segregating LCPs.
- Location of an LCP in quadrant two suggests that tran- scriptional control of the paired coding gene may be mediated through the lncRNA’s overlapping enhancer element, and that lncRNA transcripts here may function via epigenetic mechanisms, e.g.
- Of the validated lncRNAs with suf- ficient data for PLAIDOH analysis, the majority had PLAIDOH Enhancer or Transcript Cis-Regulatory scores above the cutoffs from the CRISPRa screen and from the CRISPR KO screen.
- Further confirmation of PLAIDOH ’ s prioritization scores is evi- denced by several validated anti-sense (AS) lncRNAs in the CRISPRa screen.
- suggest that the lncRNAs’ effects on cell growth may be mediated through modulation of the expression of nearby coding genes, which were identified by PLAIDOH..
- Because lncRNAs are still a relatively new class of genes, many of the available tools were limited to processing and analyzing RNA-seq data, identifying and annotating known and novel lncRNA transcripts .
- The Pan-cancer analysis of the tumors in TCGA inte- grated transcriptome and eCLIP data with transcription factor and lncRNA binding motifs, to predict lncRNA regulatory networks and categories of lncRNA function (transcriptional, post-transcriptional, or both).
- The Pan-cancer method predicts many regulatory targets for individual lncRNAs, and these targets can be located any- where in the genome.
- Because of these differences in the outputs of the Pan-cancer analysis and PLAIDOH, we focused our comparison on lncRNAs ranked highly by the former, and, in some cases, validated by siRNA knock-down [22].
- For corollary confirmatory biological data, PLAIDOH compares subcellular localization of the potentially interacting lncRNA and RBP, as deter- mined by subcellular fraction RNA-seq, Western blot, and immunofluorescence [1, 82], and outputs a coeffi- cient of co-localization.
- red, blue, or purple indicate co-localization of the pair in the cyto- plasm, nucleus, or in both fractions, respectively.
- Most of the RBPs interact with multiple lncRNAs, which is not surprising given that the target proteins were selected based on known RNA-binding function.
- Many of the lncRNAs also interact with multiple RBPs, but PLAI- DOH ’ s analysis and the interaction matrix shown in Fig.
- Figure 7b demonstrates how this approach can stratify lncRNA – RBP pairs by lncRNA and RBP expression, RBP binding site density, and co-localization in the same subcellular fraction.
- It is located upstream of the gene that codes for PLCG2, a phospholipase C family enzyme specific for B lymphocytes that is activated by B cell receptor (BCR)-associated kinases upon antigen-mediated BCR stimulation.
- These results suggest that transcriptional regulation of PLCG2 may be through enhancer-mediated mechanisms rather than via direct activity of the RP11-960 L18.1 tran- script itself.
- To test the role of the lncRNA transcript itself, we used shRNA knock-down and CRISPR knock- out of RP11-960 L18.1 in lymphoma B cell lines that highly express RP11-960 L18.1 (HBL1, U2932).
- Subcellular fraction RNA-seq data showed that a greater fraction of RP11-960 L18.1 transcript was located in the cytoplasm [1].
- For comparison, con- trol mRNA GAPDH is predominantly localized to the cytoplasm and the CTCF-associated noncoding RNA JPX is enriched in the nuclear and chromatin-associated frac- tions [75, 87].
- This approach revealed that RP11-960 L18.1 interacts with a small number of RBPs, including ILF3/NF90, KHDRBS1/SAM68, and PUM2, all of which are highly expressed in the NHL samples (FPKM and also localize to the cytoplasm (Fig.
- Supporting these results, RBP motif scans showed 9 KHDRBS1/SAM68 motifs in the RP11-960 L18.1 tran- script sequence, all within 300 bp of the 3 ′ end of the molecule [88].
- Data from the eCLIP studies also show that RP11-960 L18.1 has 11 binding sites for ILF3/NF90, which is highly expressed in the NHL samples (average 26.4 FPKM).
- This step is particularly challenging in the study of lncRNAs since there is so little known about their func- tion and essentially no established ontologies.
- Here we validated PLAIDOH’s functional predictions for RP11-960 L18.1, confirming that it is not a cis- regulatory lncRNA for PLCG2, but rather likely acts in the cytoplasm by interacting with one or more RBPs (NF90, KHDRBS1, PUM2) to promote the growth of lymphoma cells..
- The input file must have columns in the order shown in Table 1 and the first line must begin with a.
- The start and stop coordinates may be any region of the transcripts the user wishes to study (ie..
- A detailed overview of the PLAIDOH output file can be found in Additional file 2: Table S1 and instructions for running PLAIDOH.pl can be found at: www.github.com/sarahpyfrom/PLAIDOH..
- Visualizing trends in lncRNA concordance and expression For illustrating trends specific to either positively- or negatively-correlated pairs of lncRNAs and proteins we devised plots by the correlation of the lncRNA and pro- tein coding gene, positive Spearman correlation coeffi- cients (rho) values are plotted above the central line and negative Spearman correlation coefficients (rho) values are plotted below.
- The frequency of each combination of total protein number/number significantly correlated proteins was calculated and plotted on the matrix from all lncRNAs in the dataset.
- Example header and first two lines of the modified bedfile required from the user as an input file.
- PLAIDOH enhancer and transcript Cis-regulatory output scores.
- The LncRNA Transcript Cis- regulatory score is calcu- lated as follows:.
- All underlined components of the above calcula- tions are described in more detail in Additional file 2:.
- An RBP was considered to bind a lncRNA if both the K562 and HepG2 eCLIP assays showed binding of the RBP to the lncRNA (score of 1000) OR if both replicates of either cell line showed binding.
- The total number of reads for each transcript was calculated as the sum of fragments per kb per million reads from each sub-cellular dataset and the percent of the total FPKM was calculated for both the Nuclear and Cytoplas- mic datasets.
- For later analysis, is determined to be “Nu- clear” if 70% of the normalized fragments are in the nuclear RNA-seq compartment, “Cytoplasmic” if 70% of the reads are in the cytoplasmic compartment and “Nuclear and cytoplasmic” if between 30 and 70% of the reads are in the nucleus.
- Sub-cellular Fraction delta Ct values were calculated relative to the Ct values of the cyto- plasmic fraction and whole cell post-shRNA delta-Cts were calculated relative to GAPDH levels..
- Data names, sources and descriptions for all of the metrics utilized by PLAIDOH to annotate lncRNA and gene function.
- The fraction of total reads in the nuclear fraction for each transcript was calculated..
- For PLAIDOH analyses, all 48 samples from the DLBC data were compiled into a single input file, and 48 samples were randomly selected from each of the other four cancer sets (LUAD, BRCA, CESC and AML) to create cancer-specific input files with identical transcript annotations..
- All default input data files were curated from publicly available resources and modified to fit specific file formats as outlined in the PLAIDOH documentation (will be sub- mitted along with the script on github).
- All default data sources are outlined in the table below:.
- All default input data files were curated from publicly available resources and modified to fit specific file for- mats as outlined in the PLAIDOH documentation (www.github.com/sarahpyfrom/PLAIDOH).
- C) Bar plots show the cumulative percent of RNA transcripts in each category that were detected in the indicated percentage of samples.
- Contains descriptions of each column in the default “ Output.
- D) Box plot shows the absolute Spearman correlation of negatively (left, black) or positively (right, blue) correlated LCPs, binned by fraction nuclear localization of the lncRNA.
- D) Bar graph shows the levels of the indicated histone marks by ChIP-seq for the region near AC096992.2 .
- J) Plots show LCPs ranked by increasing LncRNA Transcript Cis- regulatory Scores.
- FANTOM: Functional ANnoTation Of the Mammalian genome.
- Funded by the National Institutes of Health and the National Cancer Institute (USA), which had no role in the design of the study, nor in the collection, analysis, or interpretation of the data, nor in the writing of the manuscript..
- JP and SP conceived of the project and designed the method.
- SP developed and implemented the package, performed most of the analyses, generated figures, and assisted in writing the manuscript.
- JP performed some of the analyses, generated figures, and wrote the manuscript.
- Informed written consent was obtained from participants after the nature and possible consequences of the studies were explained.
- Datasets used are publicly available and sources are listed in the Data Sources above and in PLAIDOH documentation (www.github.com/.
- The landscape of long noncoding RNAs in the human transcriptome.
- Gene regulation in the immune system by long noncoding RNAs.
- Function and evolution of the long noncoding RNA circuitry orchestrating X-chromosome inactivation in mammals.
- Long non-coding RNAs in the regulation of the immune response.
- lincRNAs act in the circuitry controlling pluripotency and differentiation..
- Super-Enhancers in the Control of Cell Identity and Disease.
- Transcription of the non-coding RNA upperhand controls Hand2 expression and heart development.
- The accessible chromatin landscape of the human genome.
- Genome-wide map of regulatory interactions in the human genome..
- Dose-dependent role of the cohesin complex in normal and malignant hematopoiesis.
- An integrated encyclopedia of DNA elements in the human genome

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt