« Home « Kết quả tìm kiếm

Clonal reconstruction from time course genomic sequencing data

- Clonal reconstruction from time course genomic sequencing data.
- Background: Bacterial cells during many replication cycles accumulate spontaneous mutations, which result in the birth of novel clones.
- composition over time, as revealed in the long-term evolution experiments (LTEEs).
- Accurately inferring the haplotypes of novel clones as well as the clonal frequencies and the clonal evolutionary history in a bacterial.
- population is useful for the characterization of the evolutionary pressure on multiple correlated mutations instead of that on individual mutations..
- spontaneously, and thus the likelihood of a mutation occurring in a specific clone is proportional to the frequency of the clone in the population when the mutation occurs.
- Conclusion: We developed efficient algorithms to reconstruct the clonal evolution history from time course genomic sequencing data.
- coli strains (i.e., the founder clones) were grown in parallel, each under a daily serial passage for 30 years [3, 6, 7].
- A variety of phenotypic changes were observed in the bacterial population during the experiment, includ- ing increased fitness to specific growth conditions [8] and elevated mutation rates [9]..
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- In recent years, LTEE were combined with metagenome sequencing (i.e., sequencing the whole genomes in the population, also referred to as the Pool-Seq, or the sequencing of pooled individual genomes) to character- ize genetic variations introduced during the course of experiment, and the allele frequencies of these variations in a population [3, 10].
- Some of these novel variations were revealed to be associated with observed pheno- typic changes, e.g., the defective mutations in the DNA repair pathways causing elevated mutation rates [9], and the novel genetic traits selected for citrate use [10]..
- However, due to the nature of metagenome sequencing, it is not straightforward to determine the haplotypes of the clones arising in the experiment.
- Because a novel variation, e.g., a single nucleotide variation (SNV), may be shared by multiple clones in the population (i.e.
- subsequent mutations may occur in a clone already containing mutations instead of the founder clone), the tests on variation may be less sensitive than the tests directly on the frequency profiles of haplotypes, and thus may miss the selection on some clones, espe- cially when the population is dominated by a few clones containing many variations..
- To charac- terize the haplotypes in the populations, whole genome sequencing was carried out on randomly selected clones at the end of the experiments.
- In addition, the haplo- type frequencies of the major clones were derived from the metagneome sequencing data.
- The dynamic changes of these major clones during the course of experiment showed a clear picture of the subpopulation structure (e.g., using a Muller plot.
- Despite the demonstrated success here, the clonal sequencing has two disadvantages in prac- tice.
- First, because the sequenced clones are randomly selected, minor clones with low abundances in the pop- ulation may not be characterized (while major clones are sequenced repetitively), and thus their frequency profiles during the time course will not be considered in the sub- sequent analyses.
- More importantly, the clones are usually chosen at the end of the experiment.
- clones with high abundance in the middle but becoming less abundant towards the end of the experiment are less likely to be characterized, which will not only miss some clones under selection during the time course, but also miscalculate the allele frequencies of characterized clones in the middle of time course.
- Therefore, unless the clonal sequencing covers a large number of clones (that may con- tain many duplicated clones) compared to the complexity of the population, it is desirable to develop computational methods to reconstruct the haplotypes of clones from time series metagenome sequencing data..
- Interestingly, the clonal reconstruction has been exten- sively studied in the field of cancer genomics for track- ing the evolution of cancer cells by bulk tumor genome sequencing [13, 14], in an attempt to characterize the intra-tumor heterogeneity (i.e., clonal tree and composi- tion) and in the mean time to identify the clones carrying driver mutations that occur in the early stage of cancer and drive the cancer progression [15].
- Computationally, the clonal reconstruction (also referred to as the clonal- ity inference) takes as input the allele frequencies of a set of genetic variants in multiple samples (e.g., dissected from the same tumor tissue), and aims to reconstruct a set of clones, each carrying a subset of the variants, and simultaneously infer the fraction of these clones in each sample [16].
- Many algorithms addressed the clonal recon- struction problem [16–21] by inferring the evolutionary history of reconstructed clones and the generation of vari- ants (assuming that each variant is generated only once, i.e., the infinite sites assumption [22.
- It is worth noting that here, the clonal evolu- tion was not inferred from time series sequencing data (which are difficult to obtain in cancer genomics), but the inherent constraints among variant frequencies due to the infinite sites assumption, (e.g., no clone can carry two variants unless the frequencies of one variant is always greater than the other.
- Finally, similar to the clonal sequencing in LTEE, single cell sequencing data offers complementary information to clonal recon- struction in cancer genomics [25], and algorithms became available to infer tumor heterogeneity from low coverage single cell sequencing data [26, 27]..
- 1 A schematic illustration of the clonal structure in an evolving bacterial population and the time course clonal reconstruction problem.
- b The clonal tree represents the evolutionary history of these clones, in which each node represents a clone including the founder clone as the root, and each edge represents the mutations that occur at specific time points.
- e The VAF matrix can be viewed as the product of the clonal tree (T) and the clone frequencies (C), similar to the formulation in cancer genomics [16].
- The goal of this work is to reconstruct the clonal tree (T) and the clone frequencies (C) from the observed VAF matrix.
- We also discuss the effect of varying the number of clones in the population and the number of time points.
- Our algo- rithms successfully reconstruct clonal haplotypes that are not characterized by clonal sequencing, and reveal the evolutionary dynamics of the clones during the LTEE..
- We model an evolving bacterial population using the clonal theory [28, 29], similar to the one used in can- cer genomics [30].
- During the course of the evolution exper- iment, bacterial cells accumulate novel mutations form- ing new clones.
- but the other types of vari- ations (e.g., indels, structural variations and copy number variations) can be modelled in the same way.
- The ancestral relationships between the clones in the evolving population can be represented as a directed tree T, referred to as the clonal tree in which the root rep- resents the founder clone, every other node represents a clone introduced by one or more novel mutations, and each edge represents the direct ancestral relationships between the clones (Fig 1b).
- As a result, the haplotype of a clone (i.e., the variants contained in the clone) is represented by the path from the root to the node representing the clone..
- referred to as the clonal frequency matrix (CFM), in which c i,j indicates the frequency of clone j at the time point i.
- is proportional to the frequency of the clone in the popula- tion at the time.
- The clonal tree T and the CFM C together can be depicted in a Muller plot [31] (Fig 1c), which is commonly used to visualize the evolutionary dynamics in a population [32]..
- where f i,j indicates the allele frequency of the variant j at the time point i.
- Based on the clonal evolution model, we for- mally define the time course clonal reconstruction prob- lem using a maximum likelihood formulation: given the input of matrix F.
- where C i,j represents the (unknown) frequency of the clone j at the time point i, and ch ( i ) represents the set of all children of the node i.
- The likelihood function is computed by multiplying the likelihood of generating each clone in the clonal tree.
- We search for the optimal solution of a clonal tree T in the search space containing a total of ( N − 1.
- a brute force approach to search the ML solution in the entire tree space, referred to as the exhaustive tree search algorithm (ET.
- Once the clonal tree is constructed, the haplo- type of each clone (corresponding to a node in the tree) can be derived from the path from the root to the node..
- To reduce the computational complexity of the exhaustive tree search (ET), we propose an algorithm using a greedy approach as follows (see Algorithm 2 for details).
- Addressing sparse time course sequencing data.
- In practice, because of the often scattered genomic sequencing conducted in a time course, we may observe many mutations at the same time point.
- To reduce the computational complexity of the exhaustive permutation search (EP) algorithm, we propose a heuris- tic algorithm (for details see Algorithm 4) using a greedy approach as follows.
- In the greedy permutation search algorithm, we only search m.
- i=1 n i ! candidate matrices in the best case, but m.
- i=1 n i ! candidates in the worst case..
- Sometimes we have additional information from the experiment when some randomly selected clones are sequenced during or at the end of the experiment.
- The haplotypes of these clones can be used to improve the search algorithm by enforcing that the clonal tree is con- sistent with the sequenced clones.
- That is when we know the haplotype of at least one clone that contains variant v, its parent can only be one of the other variants in all the sequenced clones that has v.
- all vari- ants u ∈ A i ∩ A j must appear before v in the path from the root node to v in the clonal tree.
- This constraint will make sure that the variants in the symmetric difference.
- 2: n ← Number of columns (mutations) in F.
- of two clones branch off after the common ancestral path is formed by the variants in the intersection of the two clones.
- Metagenome sequencing data from an evolving E.
- and eight clones were isolated and sequenced separately at the end of the experiment.
- Then we called variant sites where all of the following con- ditions satisfied: the VAF (approximated by the ratio of the number of reads supporting the variant allele to the sum of number of reads supporting the reference and variant alleles) was above 0.05, the sum of the number of reads supporting the variant and reference allele was above 10 and the number of variant reads was above 6.
- The VAFs were input to our algorithms to predict the clonal tree and the clonal frequencies, which were then visualized using Muller plot created using the R library ggmuller [36]..
- PL represents the likelihood score of the predicted clonal tree, and TL stands for the likelihood score of the true tree.
- We compare the prediction performance of combinations of the two tree search algorithms (ET and GT) and the two permutation search algorithms (EP and GP), with two algorithms designed for tumor clonal reconstruction, AncesTree [16] and CITUP [21] on simulated data.
- The simulation procedure follows the clonal model described in methods.
- It starts with a founder clone and then at each new time point a new clone is intro- duced whose parent is chosen by random sampling from existing clones based on their frequencies in the popu- lation.
- The clonal frequencies are modified following a.
- Effect of number of clones.
- Figure 2 shows the distribution of recall (the proportion of clones correctly reconstructed by the algorithm), the running time in log scale and the log likelihood ratio of the predicted likelihood score over the true likelihood score.
- As the num- ber of clones increases the likelihood scores returned by EP-GT deviates much further from the likelihood score of the true tree compared to the deviation of GP-GT, imply- ing that the greedy heuristic not only helps in improving the speed but also reduces error in clonal reconstruction as the number of sequenced clones increases.
- As the number of time points increases, the number of unordered mutation groups reduces, which in turn reduces the size of the search space for the permu- tation search.
- To test the effectiveness of constrained search given a set of sequenced clones we used the simulations generated earlier with 20 clones and compared the performance of the constrained search version of GP-GT algorithm by.
- Analysis of metagenome sequencing data from the E.
- We used the metagenome sequencing data obtained from an LTEE study of an E.
- When we did not allow any negative values in the clonal frequency matrix C, our algorithm did not return any valid solu- tion.
- Figure 4d shows the clonal tree obtained when the known clones were not given as constraints.
- As shown in the sim- ulation experiments, the accuracy of clonal reconstruction can be improved by including more time points, or by sequencing more clones..
- a Clonal tree obtained by constrained search with GP-GT on the metagenome sequencing data.
- The variant locus and the gene where it is located are shown in the column header.
- The colors correspond to those in the clonal tree.
- In this paper, we presented a maximum likelihood frame- work and a series of greedy-based heuristic algorithms to reconstruct the clonal haplotypes in a bacterial popu- lation from metagenome sequencing data obtained in a time course.
- The results based on simulation experiment showed that, although the clones reconstructed by our algorithms are not identical to the real ones used in the simulation, they are highly similar, and more importantly, the likelihood computed on the reconstructed clones is comparable with (often higher than) the likelihood of the real ones, which implies that our algorithms achieved practically plausible optimal solutions under the max- imum likelihood framework.
- In particular, by sequencing more clones, not only the haplotypes of more clones can be directly derived, these derived haplotypes can impose additional constraints on the unknown (minor) haplo- types and thus improve the clonal reconstruction..
- The next step after the clonal recon- struction is to identify the clones under selection during the course of evolution based on their frequencies.
- coli strains, in which hundreds of mutations occurred [4], to characterize the evolutionary dynamics of the complex population..
- coli), it may have other applications such as in cancer genomics as described in the introduc- tion.
- In addition, the metagenome sequencing approach was commonly adopted to study microbial communities containing hundreds of bacterial species, e.g., the human microbiome [37, 38] and the microbiome from natural habitats [39].
- Recently, sequencing data acquired from the same microbial community at multiple time points become available [40, 41].
- The computational approaches presented here can also be applied to these data, which will enable haplotype reconstruction of bacte- rial genomes and may reveal concerted evolution among bacterial species in the community.
- Interestingly, in the applications to both the cancer genomics and microbiome studies, clonal sequencing can be obtained through single cell sequencing, where our algorithm incorporating the clonal sequencing data can be directly applied..
- The full contents of the supplement are available online at https://bmcgenomics..
- Onconem: inferring tumor evolution from single-cell sequencing data.
- The clonal evolution of tumor cell populations.
- Structure, function and diversity of the healthy human microbiome.
- Dynamics of the human gut microbiome in inflammatory bowel disease.

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt

Clonal reconstruction from time course genomic sequencing data

CHỦ ĐỀ LIÊN QUAN