« Home « Kết quả tìm kiếm

Modification of the genome topology network and its application to the comparison of group B Streptococcus genomes


Tóm tắt Xem thử

- Modification of the genome topology network and its application to the.
- A bootstrap test is performed to verify the credibility of the clades when allowing users to adjust the relationships of the clades accordingly..
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- 3 Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai 201203, China Full list of author information is available at the end of the article.
- The majority of the methods for phylogenetic analysis are based on nucleotide sequence alignment and SNP analysis.
- studied the structure of the tryptophan operon in dif- ferent bacterial genomes and found that the order of homologous genes could be classified.
- We previously developed the first version of the genome topology network (GTN) [14], which is a new approach for studying closely related bacterial genomes by analys- ing gene order in complete genomes.
- The primary function of the first GTN version is to provide a phylo- genetic tree on the basis of an evolutionary distance matrix calculated using the formula provided in Fig.
- 1 Work flow of the new GTN version.
- Then, the GTN assigns genes to different gene families and calculates the relative DD value and the evolutionary distance on the basis of the gene family assignment.
- N in f 2 represents the number of genes in the gene family.
- Details of the introduction of the distance calculation in GTN.
- In formula f 1, which is cited from our first version of the GTN [14], D(G1, G2) represents the distance between Genome1 and Genome2, N represents the total number of ortho- logues (COG families in this paper, or ‘nodes’ in the first version of the GTN), Ci represents the number of ortho- logues adjacent to orthoi (gene connections in this paper, or ‘edges’ in the first version of the GTN) in both Genome1 and Genome2, and τ 1 i and τ 2 i represent the number of orthologues adjacent to orthoi in Genome1 and Genome2, respectively..
- Details of the introduction of different degree values The different degree (DD) values of a gene family pro- vided by the GTN represent the tendency of the gene family to change its connections.
- Because of the variation in its structural genomic framework, GBS was one of the first species to be studied in the fields of pan-genomics and comparative genomics [27]..
- In this study, 51 published GBS genomes from the NCBI database, including 28 complete genomes and 23 draft genomes, were collected for the modified GTN analysis and used to study phylogenetics at the gene and gene family levels as a demonstration of the new analytical approach..
- The difference between Roary and this version of the GTN is that Roary only aligns the protein sequences from the genomes to themselves, while the GTN additionally aligns the GBS proteins to the COG database (approximately 190,000 protein sequences).
- This is the main factor responsible for the lower performance of the GTN..
- The first version of the GTN only aligns protein sequences to COGs by using BLASTP to perform gene family assignment, so it requires less time to run (Tab.
- However, the resolution of the phylogenetic tree was inevitably lower than that obtained from the BLAST+MCL method.
- 2 Example of the topology networks of two genomes.
- a Location information for genes in the assumed genome A on the basis of the GFF file.
- b Location information for genes in the assumed genome B on the basis of the GFF file.
- and the relative DD values of gene families, which are used to evaluate the connection-changing ten- dency of the genes in gene families..
- The average length of the common synteny blocks of 51 genomes was 1,427.7 KB.
- After discarding each genome individually, the average length of the common synteny blocks of the remaining 50 genomes ranged from 1, 428.2 KB to 1,449.5 KB (Additional file 2: Figure S1)..
- There were sharp increases in the average length of the common synteny blocks by and 14.9 KB after removing genomes GB00411, SA20–06, ATCC_.
- 13813, and GBS10, respectively, whereas when each of the other 46 genomes was removed, the average length of the common synteny blocks only increased from 0.59–9.01 KB (Additional file 2: Figure S1).
- Therefore, we set the threshold at 1% of the common synteny block length and eliminated these four genomes..
- The average proportion of the COG-annotated genes in all genomes was 72.9%.
- Two genomes, GBS10 and MC632, showed considerably lower COG-annotated gene proportions of 62.4 and 63.4%, respectively, which were obviously lower than those of the others as shown in Additional file 2: Figure S2..
- The average length of the.
- (98.7% on average) of the genes were assigned to different gene orthologues by using orthoMCL software, and 56.1–.
- on average) of the genes were COG anno- tated by using the MCL algorithm in the GTN program (Additional file 2: Figure S3, Additional file 1: Table S1)..
- Among the 46 complete and draft GBS genomes with proper completeness on average) of the genes were located in the common synteny blocks, and on average) of the genes that were located were COG annotated by the GTN program (Add- itional file 2: Figure S4, Additional file 1: Table S2)..
- Phylogenetic analysis of GBS genomes on the basis of the GTN.
- Similar phylogenetic trees were obtained from the phylo- genetic analysis of the complete genome group on the basis of the COG (Fig.
- 3) and orthoMCL assignment re- sults (Additional file 2: Figure S5), which both suggested that most of the 27 complete genomes of GBS may be- long to six groups.
- The bootstrap values of the orthoMCL-based phylogenetic tree were higher than those of the COG-based tree.
- the GTN (Additional file 2: Figure S6).
- The resulting constitutions of the tree were chaotic, indicating that phylogenetics cannot be assumed on the basis of an in- correct gene order..
- The GTN phylogenetic tree of the group of 46 complete and draft genomes that was constructed with the COG gene family assignments confirmed the com- patibility of the complete genomes with the draft ge- nomes.
- Most of the constitutions of the clades in the phylogenetic tree (Additional file 2: Figure S8) obtained on the basis of SNPs by using the panX, mafft, and RAxML tools, were.
- 1, the theoretical basis of the GTN is that different gene orders in ge- nomes affect the differentiation within at phylogenetic tree.
- As a demonstration of the methodology, we extracted the genes at the unique connections in six.
- 3 COG-based phylogenetic tree of the complete genome group.
- The number following the six main clades is the number of genes at unique node connections that can be found in all genomes of the clade.
- in a cross is the length (KB) of the pieces that are connected based on the common node connections in the genomes of the clade.
- At each bifurcation in the COG-based tree, the black number represents the bootstrap evaluation value of the clade, while the first red number represents the average genomic fragment length (in KB) linked by the genes in- volved in the gene connection relationship.
- This rela- tionship was shared by all the genomes of the clade according to the GFF file.
- These fragments can also be considered to be the common ancestor of the genomes..
- The second red number represents the average number of these fragments in each genome of the clade..
- The total lengths of the fragments in GBS_ST-1 and SS1 were 2071.4 and 2076.4 KB, respectively.
- We marked the average of the total length in one genome and the num- ber of fragments (i.e., 2073|8 in the phylogenetic tree in Fig.
- We calculated COG func- tion statistics for these genes and found that, except for the “[S] function unknown” and “[R] general function prediction only” categories, the proportions of the genes were highest in the “[L] replication, recombination, and repair” and “[G] carbohydrate transport and metabolism”.
- Table 2 COG functional classification of genes at the unique node connections of the six main clades.
- The genes at the unique node connections of the six GBS clades were classified into COG functional categor- ies (Tab.
- We additionally used our own Perl scripts to extract all gene connections in each of the six clades to compare the connections with their parallel clades in determining.
- Table 3 Pathway enrichment of the genes at the unique node connections of the six main clades.
- One of the improvements of our new GTN method in comparison with SNP analysis-based tools is that the GTN can provide detailed information on the genes at unique node connections, which are specifically respon- sible for the differences between genomes in the phylo- genetic tree.
- different, the general structures of the three trees were similar.
- Most of the strains were at the same locations, with some exceptions..
- Another potential advantage of the GTN method is that a large number of genomes can be included in the phylo- genetic analysis, while the single-copy genes used under the SNP method may not be sufficient to provide a high resolution.
- There were 222 single-copy core genes among the 27 GBS strains used in the phylogenetic analysis, while approximately 60% (average of 1156 genes for each strain) of the COG-annotated genes and more than 90% (average of 1893 genes for each strain) of the orthologous genes were used in COG-based and orthologue-based GTN trees, respectively.
- distinguished in the GTN-based trees, and we can dir- ectly determine which of these orthologous genes differ in order between the strains from the resulting calcula- tions of the GTN.
- Some genes were more mobile in that their relative DDs were higher than those of the others.
- In this group, a total of 345 genes were assigned to 10 COG families related to mo- bile genetic elements (1.1% of the total COG gene num- ber).
- It was rea- sonable that most of the transposase genes were not in- cluded in the common synteny blocks, which were assumed to be relatively highly conserved regions of the genome, and the relative DDs of the COG families de- clined as expected (Tab.
- We compared the results of the present study with those of our previous work on Mycobacterium tuberculosis [14] and found that the GTNs of the two species were much different.
- Other than the transposase gene families, only one COG family, the COG1309 transcriptional regulator family (including the TetR/AcrR family tran- scriptional regulators and the dihydroxyacetone kinase transcriptional activator), occurred in the tables of the COG families with high relative DDs in both GBS and M.
- Draft genome data can be included in the calculations of the new GTN.
- All of the data came from the same species..
- Here, if the size of the common syn- teny blocks was increased by >.
- In the analysis of the complete and draft genome groups, 51 genomes were filtered primarily in terms of the average common synteny block length of the other genomes after remov- ing one genome.
- Unqualified genomes were filtered out when the sizes of the common synteny blocks of other genomes were increased by >.
- Since COG are the basic units for gene order, the greater the number of COG in a genome, the more accurate the calculations of the GTN will be.
- After gen- omic filtration, the protein sequences of the genomes were integrated with the COG protein database into two FASTA files.
- The resulting COG family was processed into clusters by using mcxdeblast from the MCL package (version 14–137, parameters: --m9 --line-mode = abc --score = r) and the MCL algorithm (version 14–137, pa- rameters: --abc) on the basis of the self-alignment re- sults.
- The cluster with only one COG family was selected and considered as the functional annotation of the COG family..
- OrthoMCL software (version 1.0, default parameters) was also used to obtain the orthologous families in the group of 27 complete genomes to evaluate the updated COG assignment function of the GTN tool..
- In this study, three phylogenetic trees were built on the basis of the four different assignment results, as follows:.
- The function of the COG families was annotated..
- The functions of the gene families were unclear..
- 80) [34] was used to derive a consen- sus of the bootstrap results (nwk file) and to then draw phylogenetic trees on the basis of the nwk file.
- To compare the phylogenetic trees calculated by the GTN, we used panX (version 1.5.1, default parameter) to identify the single-copy core genes of the complete gen- ome group.
- MEGA software was also used to derive a consensus of the bootstrap results and draw the phylogenetic tree..
- To optimize the running time of the GTN, we devel- oped an alternative method for performing gene family assignment to the BLAST+MCL method.
- For a clade with more than one genome, the node connections existing in all genomes of the clade are compared to the node connections existing in all genomes of the parallel clade..
- Since not all of the genes of GBS are recorded in the DAVID database (https://david.ncifcrf.gov/) [38].
- These functions were lacking in the GTN tool..
- COG functional classification of the genes at the unique node connections in GBS-M002 against clade F.
- The ‘ average length (KB) in a genome ’ represents the average length of the conserved genomic regions in each genome.
- Phylogenetic tree of the complete genome group based on the orthoMCL results.
- Phylogenetic tree of the complete and draft genome groups on the basis of the COG result.
- Phylogenetic tree of the complete genome group based on.
- ZY conceived of the study, participated in its design and coordination, and help to draft the manuscript.
- The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript..
- The mitochondrial genomes of the early land plants Treubia lacunosa and Anomodon rugelii: dynamic and conservative evolution.
- Relative efficiencies of the Fitch-Margoliash, maximum- parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree..
- DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt