« Home « Kết quả tìm kiếm

Inferring time series chromatin states for promoter-enhancer pairs based on Hi-C data


Tóm tắt Xem thử

- chromatin state trajectories.
- With the advent of time series Hi-C data it is now possible to connect promoters and enhancers and to analyze chromatin state trajectories at promoter-enhancer pairs..
- Results: We present TimelessFlex, a framework for investigating chromatin state trajectories at promoters and enhancers and at promoter-enhancer pairs based on Hi-C information.
- We utilize time series ATAC-seq data measuring open chromatin to define promoters and enhancer.
- The code of the framework is available at https://github.com/henriettemiko/TimelessFlex..
- Conclusions: TimelessFlex clusters time series histone modifications at promoter-enhancer pairs based on Hi-C and it can identify distinct chromatin states at promoter and enhancer feature regions and their changes over time..
- Full list of author information is available at the end of the article Miko et al.
- Whether histone modifications are causal or a consequence of the activity of the genomic locus remains unclear..
- The output are clusters of regions with similar chromatin state trajectories.
- We extend this approach by (1) a strategy to employ time series ATAC-seq data to improve def- initions of promoters and distal regions called ”enhan- cer candidates”.
- A set of candidate regulatory regions is first annotated from ATAC-seq data across the time series.
- As we utilize ATAC-seq and Hi-C merely to define regions and their interactions, but do not exploit the temporal or quantitative information present in ATAC-seq or Hi-C, we also use these data for corroboration..
- Chromatin state trajectories for enhancer feature regions during mouse hematopoiesis.
- 1), for the scenario that there are time series ChIP-seq and ATAC-seq data available but no accompanying Hi-C data set.
- We defined one consistent set of distal regions (“enhancer candidates”) across the time series based on ATAC-seq data (see Methods), which resulted in 48,804 enhancer feature regions.
- The corresponding ATAC-seq signal confirms that the enhancer regions are more accessible at these.
- At these time points the ATAC-seq signal shows a strong increase in accessibility.
- Chromatin state trajectories during human pancreatic differentiation.
- Chromatin state trajectories for enhancer feature regions As in the case of hematopoiesis above, we started by annotating enhancer feature regions from ATAC-seq data.
- Paired chromatin state trajectories for promoter-enhancer pairs.
- Promoter-enhancer candidate pairs were determined based on ATAC-seq and Hi-C data (see Methods) and led to 3617 initialization feature pairs and 3406 multi feature pairs.
- For one such metric we used the quantitative ATAC-seq signal which is not used for clustering.
- More precisely, we computed the Spearman correlation co-efficent between H3K27ac signal and ATAC-seq signal for each.
- The correlation of the noise cluster is 0.4 and served as adequate baseline.
- As another measure, we computed the RNA-seq derived gene expression levels of the closest transcript TSSs as baseline, to compare them to the Hi-C supported assignments.
- Fig- ure 9 shows a much weaker gene expression of the baseline assignments compared to the cluster-assigned promoters in Fig.
- Motif analysis of the enhancer candidates with HOMER found motifs from the FOX family..
- Altogether, this demonstrates that our approach can (a) identify distinct chromatin trajectories which are (b) supported by complementary genomics data, are (c) enriched in sequence motifs and functional interactions of known relevant TFs, and (d) enrich for enhancers with an impact on gene expression compared to the baseline of the closest assignment.
- 3 Example clusters of enhancer feature regions during mouse hematopoiesis.
- TimelessFlex learns chromatin state trajectories of promoter and enhancer feature regions and of promoter- enhancer feature pairs during differentiation by co- clustering multiple histone modification data sets.
- Noticeably, the trend of the histone mark signals of the enhancer side is much stronger compared to the pro- moter side.
- 5 Example clusters of enhancer feature regions during human pancreatic differentiation.
- However, as readout of the promoters, the gene expression signal from RNA-seq correlates well with the inferred chromatin trajectories.
- Paired clustering allows for direct comparison of the accessibility signals of the promoter and the enhancer..
- This suggests that the activity of the promoter is comparatively better predicted by using histone mark signals than accessibility.
- Cluster number 10 is the minimum of the BIC in the investigated range and therefore chosen as cluster number.
- Instead, we here use a data driven approach employing ATAC-seq data for defining precise coordinates of promoter and en- hancer candidate regions.
- The ATAC-seq data defined open chromatin regions across all time points are of variable sizes, and we chose windows extending the edges of open chromatin regions by 500 bp, which leads to more pronounced histone mark signals compared to fixed-size windows.
- ATAC-seq is utilized to define promoters and enhancers and Hi-C data is used to assign them to Hi-C interaction pairs.
- 8 Spearman correlation of H3K27ac signal and ATAC-seq signal for enhancer clusters.
- For clusters 7, 3 and noise cluster 10 the Spearman correlation coefficient was computed between H3K27ac signal and ATAC-seq signal for each feature region.
- For mouse hematopoiesis, we downloaded ChIP-seq and ATAC-seq data from GEO under accession number GSE59636 [19].
- We employed ChIP-seq data for H3K4me1/.
- 2/3 and H3K27ac and ATAC-seq on the following six time points forming a branching tree: common myeloid progeni- tor (CMP), megakaryocyte erythroid progenitor (MEP), erythrocyte A (EryA), granulocyte macrophage progenitor (GMP), granulocyte (Granu) and monocyte (Mono)..
- ATAC-seq data were generated and deposited in GEO under accession number GSE151769.
- Table 2 gives an overview of the data samples for the different genomic data types..
- ChIP-seq data.
- ATAC-seq data.
- Library preparation ATAC-seq for human pancreatic differentiation [35] was performed on approximately 50, 000 nuclei.
- Data processing Paired-end ATAC-seq data from pan- creatic differentiation was processed similarly to [36]:.
- To account for the size of the transposase, read pairs were filtered to have a distance of at least 38 bp between them.
- For single-end ATAC-seq data from hematopoiesis, Nextera adapters were trimmed from reads with Trim Galore 0.6.1.
- RNA-seq data.
- ATAC-seq .
- ATAC-seq 2 2 2 2.
- TimelessFlex is a flexible framework for investigating chromatin state trajectories at feature regions around promoters and enhancers or at pairs of such feature re- gions.
- TimelessFlex extends Timeless [18] by integrating the additional data types ATAC-seq and Hi-C.
- An overview of the steps in Time- lessFlex and the employed genomic data types is given in Fig.
- For the lat- ter, we here use time series ATAC-seq data to define promoters and enhancer candidates, which are partially assigned to promoter-enhancer pairs based on detected Hi-C interactions.
- In this step, promoters and enhancer candidates are de- fined based on time series ATAC-seq data and assigned to promoter-enhancer pairs based on Hi-C interactions if available.
- Combining ATAC-seq peaks over time into one set of open chromatin regions.
- For defining promoters and enhancer candidates, we employ time series ATAC-seq data.
- Therefore, the sets of ATAC-seq peaks from each time point are combined and then merged if they overlap with a minimal length of 101 bp..
- The resolution and coverage of Hi-C data and ATAC- seq data is very different.
- ATAC-seq has in principle single-nucleotide resolution, where it is used for TF footprinting, and the open chromatin regions as derived here have a median width of 700–1400 bp.
- The candi- date assignment of promoters and enhancers to each other was based on Hi-C derived interactions from all time points combined, regardless of the specific time(s) the interaction was detected.
- Feature regions around initialization pairs are called initialization feature pairs.
- The directed acyclic graphs (DAGs) of the Bayesian network and the random variables for clustering feature regions from mouse hematopoiesis data and promoter- enhancer feature pairs from human pancreatic differenti- ation are shown in Fig.
- 13 DAGs of Bayesian network for clustering feature regions (top) and promoter-enhancer feature pairs (bottom).
- One half of the continuous nodes represents histone mark signals of the promoter side and the other half represents histone mark signals of the enhancer side.
- histone mark signals of the promoter side and the other side the histone mark signals of the enhancer side..
- As the cluster assignment is unobservable, the parameters of the model cannot be computed directly.
- The cluster with the highest probability is used as the cluster assign- ment of the region..
- In the next step, the multi feature regions are clus- tered.
- where L is likelihood of the model, N is number of ob- servations (data points) and k is degrees of freedom (number of parameters)..
- For visualization of the resulting clusters, normalized counts are used.
- seq, ATAC-seq or Hi-C data.
- Note that ATAC-seq is only used to define the coordinates of candidate regions, and Hi-C only to determine promoter-enhancer pairs – i.
- For each time point in a cluster, the logarithm of the geometric average of the ex- pected FPKMs plus 1 was finally computed..
- To see how accessibility changes over time in the clus- ters, the time series ATAC-seq signal representing the cut sites over the clustered feature regions is computed..
- Normalized 1 bp bedgraphs of ATAC-seq data are used, and for each time point, the length normalized number of cut sites in each region was determined.
- Resulting ATAC-seq signals were normalized and divided by 2..
- As assigning promoter-enhancer pairs via Hi-C did not take the time point of the interaction into account, the clustering does not use information at which time point interactions occurred.
- We only use the subset of those genes that are in the Hi-C pairs of the clustering.
- Model selection for clustering of enhancer feature regions during mouse hematopoiesis.
- All 19 clusters of enhancer feature regions during mouse hematopoiesis.
- Chromatin state trajectories are shown for each cluster..
- Model selection for clustering of enhancer feature regions during human pancreatic differentiation.
- All 8 clusters of enhancer feature regions during human pancreatic differentiation.
- Chromatin state trajectories and gene expres- sion signals from RNA-seq are shown for each cluster..
- Geusz for uploading the ATAC-seq data..
- performed ATAC-seq experiments, H.M.
- The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
- ATAC-seq and ChIP-seq data for mouse hematopoietic differentiation were downloaded from GEO under accession number GSE59636 [19]..
- ATAC-seq data for human pancreatic differentiation have been deposited in GEO under accession number GSE151769.
- Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome.
- Spatiotemporal clustering of the epigenome reveals rules of dynamic gene regulation.
- Chromatin state dynamics during blood formation.
- A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.
- The ENCODE Blacklist: Identification of Problematic Regions of the Genome

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt