« Home « Kết quả tìm kiếm

ICGEC: A comparative method for measuring epigenetic conservation of genes via the integrated signal from multiple histone modifications between cell types


Tóm tắt Xem thử

- Thus, a method for measuring the overall change in the epigenetic circumstance of each gene underpinned by multiple types of histone modifications between cell types is lacking..
- Furthermore, the analysis of the.
- epigenetically dynamic and conserved genes which were defined based on the ICGEC output results demonstrated that ICGEC can deepen our understanding of the biological processes of cell differentiation to overcome the limitations of traditional expression analysis.
- ICGEC can be deemed a state-of-the-art method tailored for comparative epigenomic analysis of changes in cell dynamics..
- The availability of dozens of types of histone modification data has also spurred intensive research on the quantitative relationship between gene expression and mul- tiple histone marks via various machine learning methods [30–34], including state-of-the-art deep learning algorithms [35].
- ChromDiff, which is one of the very few methods focusing on epigenomic comparisons, compares the combinatorial chromatin states between groups of epi- genomes [41].
- Essentially, the two comparative methods utilize either the absolute levels of the raw signals of multiple histone marks or the derived chroma- tin states thereof to reveal the differentially regulated genes or different regulatory genomic regions between different conditions..
- For example, the fold-change detec- tion property of an incoherent feedforward loop is a result of the specific interaction mode between genes, wherein the transcription dynamics of the output gene depend on the relative rather than the absolute change in the input signal [44].
- of the correlation values calculated between that gene and all the other genes with regard to the epigenetic cir- cumstance: the joint signal of multiple histone modifica- tions pertaining to a particular gene.
- These scores represent the relative changes in the epigenetic context of corre- sponding genes and marks, respectively, between two conditions.
- Construction of the epigenetic circumstance matrix of genes.
- were estimated from the signal values of the correspond- ing peaks within the 2 kb upstream to TSS plus the gene body regions (Promoter+Body) (see Methods).
- To valid- ate the result, the signal levels of the histone marks were plotted for the human protein-coding genes with high (RPKM>.
- Together, these re- sults supported the validity of the estimated gene-centric epigenetic levels.
- Boxplots show the distribution patterns of the epigenetic levels of 16 investigated marks for genes in MSC with low expression (RPKM ≤ 1), intermediate expression (1 <.
- Essentially, the error is caused by the method of the comparison, which focuses on the absolute level of epigenetic modification.
- To address this issue, we pro- posed a new comparative method that considers the joint signal of multiple histone modifications rather than individual signals, implicitly utilizes the relative signal among marks and genes and outputs two sets of scores as the estimates of the relative magnitude of the changes with regard to the corresponding epigenetic context of each gene and the gene context of each mark under the two conditions..
- Usually, one would estimate the similarities of different genes in terms of their epigenetic circumstances between two conditions with the Pearson correlation coefficients of the pairwise vectors of epigenetic modification levels..
- Because different genes and epigenetic marks may ex- hibit different degrees of conservation across cell lines, leading to unequal contributions to the comparison of the overall epigenetic circumstance, it seems reasonable to give relatively greater weights to those genes or marks with higher conservation.
- Second, we applied ICGEC to 10 sets of epigenetic circumstance matrices that were downsized by sampling only approximately one quarter of the genes in the full matrices.
- Consequently, ICGEC converged to the same results (Fig.
- Third, we dis- rupted the correspondence between the genes in the original matrices by permutation.
- The greater the difference in the gene scores pro- duced between the use of the full data and the leave- one-mark-out data, the stronger the effect of the mark was on the overall epigenetic circumstance.
- Thus, we demonstrated the indispensable effects of four histone marks on the establishment of the epigenetic cir- cumstance of genes.
- Then, the w m -based weighted correlation for each gene pair is calculated to produce the context matrices of the genes G2G C GG 1 and G2G C GG 2 , respectively.
- Then, the w g -based weighted correlation for each mark pair is calculated to produce the context matrices of the marks (M2M C MM 1 , M2M C MM 2 ) from the corresponding normalized matrices G2M C GM 1g and G2M C GM 2g , respectively.
- Finally, the program judges whether the new weights are sufficiently close to the weights in the last round.
- It should be noted that all of the produced weighted correlations of less than zero must reset to 0 during the iterative process to meet the demands of biological significance.
- Furthermore, the removal of the abovementioned four marks resulted in a different degree of reduction in the number of DEGs in the bin with the lowest gene scores (Fig.
- Taken together, the results showed that ICGEC is reliable in that the ICGEC-derived scores reflect biologic- ally significant changes in the epigenetic circumstance of.
- a Sensitivity of the ICGEC algorithm to random initial weights.
- b Sensitivity of the ICGEC algorithm to a random subset of genes.
- The correlations for gene scores (upper) and mark scores (lower) are indicated between the use of the full and partial data.
- The permuted data used in c and d were produced by breaking the row equivalence and column equivalence, respectively, of the original two ordered epigenetic circumstance matrices from H1 and MSC by shuffling.
- The same rules apply to the epigenetic status of genes.
- We divided the genes into four equal groups: the genes in the group with the lowest gene scores were defined as epigenetically dynamic genes (EDGs), whereas the genes in the group with the highest gene scores were defined as epigenetically conserved genes (ECGs).
- Using a permuta- tion test (see Methods), we found that the DEGs in the.
- a Correlation of gene scores between the use of the full data and the leave-one-mark-out data.
- c Barplot showing the number of DEGs (blue) and essential genes (gray) included in the gene sets with different levels of epigenetic conservation.
- d Number of DEGs in the most epigenetically dynamic gene set versus the composition of the mark sets.
- The EDGs assigned to the three categories allowed us to further elucidate the biological processes associated with the alteration of the epigenetic circumstance in only one or two differentiation directions.
- In contrast, the EDGs in the “H1-to-NPC-only”.
- b Bar plot showing the proportions of DEGs in the corresponding EDGs.
- To obtain more comprehensive insight into the diversity of the differ- entiation programs from H1 to different derived cell lines, we screened the so-called “marks” that corresponded to the four specific differentiation directions based on the epi- genetic dynamics rather than the expression patterns..
- As expected, essential genes were enriched in the DDUGs (Hypergeometric test, P-value .
- 6d, regardless of the absolute magnitude of the mark scores for different histone modifications, histone acetylation marks such as H4K8ac, H3K4ac, H3K18ac and H3K9ac displayed considerable variation across dif- ferent differentiation directions, which suggested that these marks may play important but distinct roles in dif- ferent developmental trajectories.
- We sought to understand why some of the EDGs were DEGs while others were not between two cell lines..
- Considering that the ICGEC just provides a measure of overall epigenetic conservation of genes and marks, we speculate that the different expression dynamics of the DEGs and the non-DEGs might be related to the differ- ence in the influence degree of the different marks on the two sets of genes.
- Furthermore, we related the changes in gene expression to the changes in the marks between two cell lines to be compared for the DEGs and the non-DEGs separately.
- a Pearson correlations in terms of the histone modification levels between H1 and MSC for DEGs and non-DEGs.
- b Pearson correlations in terms of the expression changes and epigenetic changes for each mark for DEG and non-DEGs.
- Here, the expression change was calculated for DEGs and non-DEGs as the difference in the expression levels of the corresponding genes divided by the sum between H1 and MSC.
- The marks in (A-B) are positioned by the difference in the correlations between DEGs and non-DEGs in ascending order.
- Its essence is to compares the architecture of the co-expression networks.
- During implementation, the ICGEC algorithm utilizes two context matrices, G2G C GG and M2M C MM , both of which are derived from G2M C and used for estimating the similarity of the epigenetic context between corresponding genes and marks, respectively, under two conditions.
- This means that we implicitly consider the archi- tecture of the gene regulatory network equivalently under the two conditions and admit that the raw epigenetic signal values are comparable.
- However, a serious defect of such kind of transformation is that the real difference of the original scores derived from different comparisons is masked.
- In this study, we applied ICGEC to the epigenomic data of the human embryonic stem cell line H1 and four cell lines derived from H1.
- Basically, this can be accom- plished in the following way.
- Therefore, we speculate with caution that the epigenetically dynamic but non-differentially expressed genes might indicate a poised state in which these genes would be activated in response to an indicator of the next developmental stage.
- Correspondingly, the levels of both active H3K4me3 and repressive H3K27me3 were higher in the non-DEGs than in the DEGs (one-sided Wilcox rank-sum test, P-value and respectively), which is reminiscent of the concept of a.
- The identification of the exact function of these non-DEGs and the molecular mechan- ism underlying their transcriptional behavior awaits fur- ther experimental verification..
- So far, the results presented in the main text are based on the epigenetic signal from the entire gene locus (Pro- moter+Body).
- scheme to the largest degree (Additional file 1: Figure S10A).
- A merit of LNS is that it explicitly adjusts the vari- ance of the distribution of within- and between-species.
- We advise the users to run ICGEC with Guan’s normalization into account if the distribution of the correlation coefficients data represents a large difference between the two conditions..
- As exemplified by the analysis of the basic process of human embryonic stem cell differentiation, we demonstrated that ICGEC, whether used alone or in combination with traditional expression analysis, can provide novel biological insights..
- By reference to a recently published paper [47], the sig- nal intensity of a mark on each gene (i.e., the level of histone modification) was calculated as the weighted sum of the peak signal values over all peaks within a spe- cified genomic region.
- L g denotes the length of the whole region of focus on a gene g.
- First, for two cell lines, C 1 and C 2 , to be compared by ICGEC, their associated epigenetic circumstance matri- ces, G2M C 1 and G2M C 2 , were ordered to ensure that the equivalent rows corresponded to the epigenetic circum- stances of the same genes and that the equivalent col- umns corresponded to the epigenetic levels of all genes for the same types of marks in the two cell lines.
- First, the genes in which at least half of the histone modifications presented zero signals in both matrices were discarded.
- The normalization procedure ensures that the resultant matrices exhibit zero mean and unit variance with re- spect to the marks and genes in each cell line, allowing a meaningful comparison of the same genes and marks between two conditions through their associated gene profiles and mark profiles, respectively [69]..
- This step represents the entry of the outer loop.
- Second, wPCCs were calculated for equivalent rows in G2G C GG 1 and G2G C GG 2 , which corresponded to the esti- mates of similarity between the epigenetic context of the corresponding genes under two conditions.
- This step represents the entry of the first inner loop.
- The weights of the genes used herein were obtained from the above inner loop..
- The reference gene set was all of the genes included in the matrices under comparison.
- 0.05) with respect to the “Biological Process”.
- For each TF, the hypergeometric test was used to determine whether a gene set was preferentially regulated by the TF relative to all of the genes addressed by ICGEC..
- We re- quired that the gene score of a DDSG in the associated.
- quantile of the entire gene score.
- quantile of the entire gene score..
- Boxplots show the distribution patterns of the levels of 16 investigated marks from gene body (upper panel) or promoter (lower panel) regions for genes in MSC with low expression (RPKM ≤ 1), intermediate expression (1 <.
- (A) Correlation of gene scores between the use of the full data and the leave-one-mark-out data.
- Comparison of the gene scores between genes show- ing at least a two-fold change in expression level versus those without such a change.
- Correspondence be- tween genes showing alterations in the epigenetic circumstances and genes showing dynamic expression.
- (B) Bar plot showing that the proportions of DEGs in the corresponding ECGs are significantly higher than those from using randomly permutated data for the comparison from H1 to MSC or NPC.
- Comparison of the gene scores between DEG and non-EDG among EDGs.
- (A-C) Pearson correlations in terms of the histone modification levels between H1 and ME, between H1 and TBL, and between H1 and NPC, respectively, for DEGs and non-DEGs, re- spectively.
- (D-F) Pearson correlations in terms of the expression changes and epigenetic changes for each mark between H1 and ME, between H1 and TBL, and between H1 and NPC, respectively, for DEG and non-DEGs..
- The marks in (A-F) are positioned by the difference in the cor- relations between DEGs and non-DEGs in ascending order.
- (A-D) Spearman correlations in terms of the histone modification levels between H1 and MSC, between H1 and ME, between H1 and TBL, and between H1 and NPC, respectively, for DEGs and non-DEGs, respectively..
- (E-H) Spearman correlations in terms of the expression changes and epi- genetic changes for each mark between H1 and MSC, between H1 and ME, between H1 and TBL, and between H1 and NPC, respectively, for DEG and non-DEGs.
- The marks in (A-H) are positioned by the dif- ference in the correlations between DEGs and non-DEGs in ascending order.
- Comparison of the distributions of gene-gene correlation coefficients be- tween H1 and MSC cell lines.
- The download links to the histone modification data used in this study.
- RY, JT, ZW and YT contributed to the interpretation of the results, revised and approved the final paper..
- The funders provided the financial support to the research, but had no role in the design of the study, analysis, interpretations of data and in writing the manuscript..
- A map of the cis-regulatory sequences in the mouse genome.
- DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt