« Home « Kết quả tìm kiếm

Impact of homologous recombination on core genome phylogenies


Tóm tắt Xem thử

- Impact of homologous recombination on core genome phylogenies.
- Background: Core genome phylogenies are widely used to build the evolutionary history of individual prokaryote species.
- Few attempts have been made to evaluate the robustness of core genome phylogenies to recombination, but some analyses suggest that reconstructed trees are not always accurate..
- Results: In this study, we tested the robustness of core genome phylogenies to various levels of recombination rates.
- By analyzing simulated and empirical data, we observed that core genome phylogenies are relatively robust to recombination rates.
- We found that some core genome phylogenies are highly robust to recombination whereas others are strongly impacted by it, and we identified that the robustness of core genome phylogenies to recombination is highly linked to the levels of selective pressures acting on a species.
- Conclusions: Overall, these results have important implications for the application of core genome phylogenies in prokaryotes..
- Keywords: Phylogeny, Recombination, Prokaryotes, Core genome.
- core genome (i.e.
- the set of genes shared by all the strains of the species) is usually used to reconstruct the tree of intraspecies relationships.
- At the other end of the spectrum, prokaryotic species ex- changing DNA at high rate are expected to yield poorly resolved trees, but the amount of recombination that a core genome can withstand while preserving true phylo- genetic signal has not been investigated in depth..
- Recently, a study has shown that almost all individual sites with phylogenetic signal were in disagreement with the core genome phylogeny of Escherichia coli [35].
- These conclusions and other studies have import- ant ramifications for the application of core genome phylogenies which are widely applied to prokaryotes..
- However, two questions remain unanswered: i) what amount of recombination a given core genome can with- stand without yielding artifactual trees? ii) what factors contribute to the robustness or sensitivity of phylogen- etic trees to recombination?.
- In this study, we demonstrate that even when the near totality of individual sites of the core genome are incon- gruent with the core genome phylogeny, the true top- ology can still be retrieved from the dataset.
- This surprising result can be explained by the fact that indi- vidual sites that are incongruent with the topology can still retain some of the true phylogenetic signal.
- We further use in silico simulations and empirical core gen- ome datasets to estimate to what extent homologous re- combination impacts core genome phylogenies.
- These results suggest that the combined effect of recombination and selection is affecting the re- construction of core genome phylogenies in prokaryotes..
- True phylogenies can be inferred from incongruent sites Several studies have demonstrated that gene trees are often incongruent with core genome phylogenies, open- ing the possibility that core genome phylogenies might correspond to artifacts .
- It has been observed that most sites in the core genome can be incongruent with the overall tree topology and this has been consid- ered as evidence that the reconstructed trees are not representative of the true evolutionary history of the strains [35].
- We first tested whether true phylogenetic trees can be recovered when nearly all sites of the core genome are inconsistent with the overall tree topology..
- We chose the tree and the parameters of the core genome of Acinetobacter pittii to conduct the simulations because the average bootstrap supports of this tree were closest to the average bootstrap support of our set of trees (average bootstrap support of 89%, see below).
- The real tree topology and branch lengths were used to simulate the evolution of the core genome evolv- ing clonally (i.e.
- As expected, build- ing the phylogenetic tree on the simulated dataset results in the same tree that was used to simulate the evolution of the core genome.
- We then used the core genome of the clonal simula- tion to generate a new core genome alignment while introducing exactly one random recombination event at each polymorphic site with phylogenetic signal (i.e.
- A single node was in- congruent in the two trees, but this node was one of the two unresolved nodes in the clonal and the true phyl- ogeny.
- These results indicate that even when nearly all informative alleles are incongruent with the true phyl- ogeny, it is still possible to retrieve the true phylogeny of the core genome.
- It is frequently observed that most gene trees are incongruent with the core genome phyl- ogeny [39], but it does not necessarily imply that the true phylogeny cannot be recovered.
- As a conse- quence, each site still retains most of the phylogenetic.
- the strains grouped together by each site of the core genome were indeed more related to each other except for the one strain that was re-assigned.
- Combin- ing the signal of all sites together allowed for the re- trieval of the true phylogenetic signal of the core genome.
- This result demonstrates that it is theoretically possible to recover correct phylogenetic signals even when all sites are incongruent with the true phylogeny of the core genome..
- a Phylogenetic tree built on the clonal simulation of the core genome of A.
- b Phylogenetic tree obtained from the simulated core genome of A.
- Each site of the clonal core genome with phylogenetic signal was exposed to exactly one recombination event.
- As a result, nearly all informative sites (96.4%) are incongruent with the true topology of the tree.
- The max- imum likelihood tree and the core genome of each species were used to infer the simulation parameters:.
- For each simula- tion, ρ was maintained as constant relative to the substitution rate across the branches of the tree.
- The donor and recipient genomes were randomly chosen be- tween branches of the tree that overlapped in time.
- Each recombinant fragment size was pulled from a geometric distribution of mean 100 bp and all the sites of the core genome had the same probability to recombine.
- Follow- ing this procedure, we simulated 100 core genome align- ments with recombination rates varying from ρ = 0 to ρ = 10 (with a step of 0.1) for each species.
- For each simulation, the phylogenetic tree was built from each simulated core genome, and the topologies of the simu- lated trees were compared to the true topology of the tree.
- the percent of internal nodes that are com- posed of the same sets of strains).
- not completely erode the phylogenetic signal of the core genomes.
- Impact of recombination rate on bootstrap supports We tested to what extent recombination rates impact the bootstrap supports of the simulated trees.
- We evolved the core genome of 100 species with various levels of recombination rates.
- we computed the bootstrap supports of the maximum likelihood trees for the simulated core genomes of A..
- Many of the simulated trees presented average bootstrap supports higher than the average bootstrap support observed in the real tree (89, Fig.
- In addition, we did not observe a significant correl- ation between the average bootstrap support of the sim- ulated trees and their topology score relative to the real tree (Fig.
- These re- sults indicate that recombination can result in incorrect trees presenting high bootstrap supports and that boot- strap supports therefore provide little information about the accuracy of the tree.
- Impact of phylogenetic methods on core genome phylogenies.
- We further tested which phylogenetic methods were the most adapted to reconstruct core genome phylogenies in the presence of recombination.
- We asked whether core genome phylogenies are best inferred by maximum like- lihood (ML) methods or distance approaches using RAxML and BIONJ, respectively.
- We simulated core genome evolution using parameters of A.
- Each tree topology was then com- pared to the topology of the real tree and the tree top- ology score was computed as the number of shared nodes between the trees (Fig.
- We first hypothesized that these disparities were due to dif- ferent levels of resolution across our dataset of real trees, since it is expected that poorly resolved trees are less likely to match the topology of the simulated trees.
- This indicates that many of the simulated trees were sensitive to recombination because the real tree was not robustly resolved..
- Many simulated core genomes yielded inaccurate trees due to the poor resolution of the original tree.
- a Average bootstrap supports of the trees inferred with the core genomes of A.
- The blue solid line represents the average bootstrap support of the real tree of A.
- b Relationship between the average bootstrap supports of the trees of A.
- Our results indicate that even for well-supported trees (average bootstrap ≥90), less than 50% of the nodes are correctly inferred for r/.
- We further investigated the potential factors affecting the topology of the trees.
- We observed that two parameters were significantly associ- ated with the overall quality of the trees: the average branch length of the trees (Fig.
- In theory, because the r/m metric of recombination rate is expressed relative to the substitution rates along the branches of the trees, the impact of recombination on tree topology should be in- dependent from the overall branch length of the trees..
- For this reason, we hypothesize that differential selective pressures are most likely to be responsible for the un- even robustness of core genome phylogenies to recom- bination.
- 4 Impact of phylogenetic method on tree topology inference from core genome simulated with different levels of recombination.
- 5 Correlation between average bootstrap values of the real tree and the average tree topology score.
- Estimating the overall accuracy of core genome phylogenies.
- of the nodes of the reconstructed trees match the real tree topology.
- Under these approximate numbers, this would indicate that core genome phylogenies are gener- ally correct for most prokaryote species, albeit not com- pletely accurate..
- Overall, this study revealed that core genome phyloge- nies can be very robust to high levels of recombination..
- a Correlation between average branch lengths of the tree and tree topology scores.
- Summary of the core genomes used in this study is given Table S1.
- The core genome concatenates are avail- able in Dataset S1 and can be freely downloaded from https://www.kaggle.com/louismariebobay/core-genomes..
- For each of the 100 datasets in this study, we reconstructed the phylogenetic tree with a maximum likelihood approach using RAxML v8 [45] under a Gamma + GTR (General Time Reversible) model [46].
- For each core genome alignment, 100 true bootstrap replicates were generated using the same pro- gram and the same parameters.
- The phylogenetic trees are available in Data- set S2 in Newick format and can be freely downloaded from https://www.kaggle.com/louismariebobay/core-genome.
- We conducted independent sets of simulations for each of the 100 core genome datasets using CoreSimul [40].
- GC-content) of the real core genome for each species.
- The generated alignment was then evolved in silico with a branching process following the core genome tree of each species.
- The number of substitu- tions m was inferred from each branch of the real tree, and, for each branch of the simulation, the number of sub- stitutions introduced in the simulated alignment was de- termined from a Poisson distribution of mean m .
- The spectrum of substitutions was set with a transition/trans- version ratio kappa estimated from the real core genome of each species.
- In addition, the different positions across codons were evolved with different relative substitution rates, while maintaining the overall substitution rate esti- mated from each branch length of the tree.
- For each branch of the tree, the number of recombination events is defined from a Poisson distribution of mean ρ.m .
- Recombination tracts are chosen randomly in the sequence and the length of the recombination tract δ is randomly pulled from a geometric distribution of mean δ = 100 bp.
- between contemporary branches of the tree).
- Mutations and recombination events are in- troduced in a randomized order for all branches of the tree overlapping in time.
- Concretely, all the branches of the tree are divided into segments of branches overlapping in time.
- The simulated genomes were then used to build phylogenetic trees (see above) and the topology of each of these trees was compared to the topology of the tree built on the actual core genome of the corresponding species.
- We evolved the core genome of 100 species to various levels of recombination rates.
- For each species, the average bootstrap value of the true phylogeny and the number of genomes ( n ) are indicated on top..
- A Average bootstrap supports of the trees inferred with the core genomes of A.
- Summary of the core genomes..
- DEB-1831730 and by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R01GM132137.
- The funding body had no role in the design of the study, collection, analysis and interpretation of data and in writing of the manuscript..
- From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria.
- Molecular evolution of the Escherichia coli chromosome.
- Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites..
- BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data.
- Mobile elements drive recombination hotspots in the core genome of staphylococcus aureus

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt