« Home « Kết quả tìm kiếm

Revisiting avian ‘missing’ genes from de novo assembled transcripts


Tóm tắt Xem thử

- It is essential to explore new reference transcripts from large-scale de novo assembled transcriptomes to recover the potential hidden genes in avian genomes..
- Results: We explored 196 high quality transcriptomic datasets from five bird species to reconstruct transcripts for the purpose of discovering potential hidden genes in the avian genomes.
- We constructed a relatively complete and high-quality bird transcript database transcripts after quality control in five birds) from a large amount of avian transcriptomic data, and found most of the presumed missing genes (83.2%) could be recovered in at least one bird species.
- The missing genes also have lower Ka/Ks values than average (genome-wide: Ka/Ks = 0.99.
- Among all presumed missing genes, there were 135 for which we did not find any meaningful orthologues in any of the 5 species studied..
- Conclusion: Insufficient reference genome quality is the major reason for wrongly inferring missing genes in birds..
- Those presumably missing genes often have a very strong tissue-specific expression pattern.
- [4], using multiple genome comparisons, proposed there were 640 and 274 protein-coding genes (respectively) that were lost in the avian lineage.
- 1 National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
- Full list of author information is available at the end of the article.
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- At the time, the newly released chicken genome, Galgal5, included around 1900 protein-coding genes not present in Galgal4, annotating some of the genes previ- ously thought to be missing [11].
- Recent advances suggest that a considerable number of the presumed ‘missing genes’ are not really missing in the avian genome.
- To be able to directly address these conflicts, we need strong evidence to find these missing genes in multiple bird species.
- zebra finch, Taeniopygia guttata) to exhaustively search- ing for the missing genes in birds, and also elucidate the effects of GC content, expression pattern, and assembled genome quality on gene loss studies.
- We demonstrate that de novo assembly of multiple transcriptomes from various tissues can rescue most missing genes in the absence of complete reference genomes, and most pre- sumed missing genes have a strong tissue-specific ex- pression pattern..
- De novo transcriptome assembly and quality evaluation The analysis pipeline for discovering ‘missing genes’ is shown in Fig.
- In order to improve the accuracy of the alignment results and reduce problems caused by assembly error, the ORF of the transcript sequence was predicted and extracted by TransDecoder (https://trans- decoder.github.io) for each species.
- In order to ensure the accuracy of the downstream ana- lysis, we performed a quality assessment of the assembled transcripts.
- We used the orthlog hit ratio (OHR) [15] to evaluate the integrity and richness of the transcripts.
- By comparison of the constructed sequences with the known sequences in the related species database, we defined the ratio of the best comparison results to the reference se- quence of OHR.
- The OHR of the five species were calculated as the ratio of the length of the best CDS sequence to that of the known genes.
- (Additional file 1: Table S10), were used as the targets to test whether these presumably missing genes are really lost in birds.
- There are 274 missing genes in birds in the Lovell study and 640 genes in the Zhang results.
- combined each missing gene list to obtain 806 candi- date missing genes in birds.
- After obtaining the peptide sequences of these missing genes from hu- man, we used these human genes as targets with which to search for homologous bird genes from our assembled transcripts.
- After obtaining the best se- quence of the missing gene in birds, basic informa- tion such as length and GC content were calculated..
- The visual map of the common linear region was made using the R package.
- In order to compare Ka/Ks values of missing genes with all annotated protein-coding genes in the chicken genome, we used chicken-human orthologues as references.
- 1 Analysis pipeline showing how to get high quality transcriptome datasets and identify “ missing genes.
- The RPKM [21] of each transcription group sequence was then calculated, and used to calculate the specific ex- pression index of the downstream tissue.
- We calculated the tissue-specific-index of high confidence genes in four species, not include goose.
- We calculated the tissue-specific expression indices of genes in four species of birds - chickens, ducks, pigeons, and zebra finch as these spe- cies have data from more tissues..
- In order to confirm the de novo assembled cDNA for some very important ‘missing genes.
- The annealing temperature and extension times varied depending on the primer Tm and the length of the frag- ment being amplified.
- Specificity of the amplification products was verified by electrophoresis on a 0.8%.
- We exhaustively searched the missing genes described by Zhang et al.
- According to the comparison results, the recovered missing genes were classified into three bins:.
- high-confidence genes (recovered in all five species), medium-confidence genes (recovered in three to four birds), low confidence genes (existing in one or two spe- cies).
- The recovered missing genes from five birds were 589 (chicken), 583 (duck), 537 (goose), 558 (pigeon), and 543 (zebra finch) (Additional file 1: Table S3A) from the missing genes list.
- In total, most of these missing genes were found in at least one bird spe- cies (Additional file 1: Tables S3A)..
- The alignment quality of the de novo assembled tran- scripts is much better than using human protein se- quences (Additional file 1: Table S4, S5).
- All these results confirm the wide existence of presumed missing genes in the five birds studied..
- Average GC content of the dis- covered gene set is 56.72% which is significantly higher than the genome-wide chicken transcriptome (P = 2.2E-16, t-test).
- We found that the average GC content of these missing genes is higher than other annotated coding genes, although not reaching an extreme level.
- We also analyzed the GC content of high-confidence genes in different.
- Interestingly, GC content distribution of the ‘missing’ genes has a similar bimodal distribution pattern in birds (Fig.
- Further analysis revealed that GC-stretches for most of the high-confidence genes would be expected, and we did not observe long GC fragment repeats in birds (Fig.
- genes in our five studied birds, we can re-analyze the chromosomal location of these genes to investigate whether there are indeed lost syntenic blocks.
- 2 Venn diagram of recovered ‘ missing ’ genes in each species.
- Most of the ‘ missing ’ genes were recovered from all five species.
- We directly per- formed a co-linear analysis of the corresponding human, chicken, and lizard chromosomal segments of the four syntenic blocks (Additional file 2: Figure S3) which harbor the relatively closely-linked missing genes, and found that these regions were partially homozygous.
- Of all the reconstructed high-confidence genes had a tissue-specific expression index of more than 0.9 in chicken, which is significantly higher than the genome-wide average (average TSI genome-wide = 0.79, average TSI for missing genes = 0.89, t-test = 2.2E-16) (Fig.
- These missing genes not only have a very strong tissue-specific expression pattern in birds but are also lowly expressed in most tissues (Additional file 1: Table S8).
- There are several tissues, i.e., arcopallium, lung and gonads which are enriched for more missing genes compared with known gene models.
- There are several tissues, i.e., uterus, testis and adipose, which are enriched for highly expressed missing genes (Fig.
- The results showed that the missing genes have lower Ka/Ks values than average (genome-wide: Ka/Ks = 0.99.
- 5a), indicating that most pre- sumed missing genes have undergone stronger purifying.
- 3 GC-content and GC-repeats in high-confidence genes.
- a GC content of high-confidence genes in five bird species (chicken, duck, pigeon, goose, zebra finch).
- This figure shows GC content distribution of missing genes in five bird species and representatives of non-avian animals (human, mouse and anole lizard).
- Missing genes are generally more con- served compared to the genome average, which might sug- gest functional importance for some of these missing genes..
- We did literature searching for the recovered missing genes in humans, and used the number of hits as one indicator of importance (Additional file 1: Table S9).
- and our results suggest the importance of some missing genes in birds.
- There have been in-depth stud- ies on these genes in human, but there are no related studies in birds.
- Our study can recover most presumably- lost genes in birds which can be inferred from comparison of avians with other vertebrates.
- a Tissue-specific expression index (TSI) of high-confidence genes in chicken.
- b The percentage of genes that expressed most highly in each tissue and the percentage of expressed genes in each tissue both in high-confidence genes and annotated genes.
- In all chicken tissues, the percentage of expressed genes in high-confidence genes is significantly lower than in annotated genes.
- In this study, we found that a small portion of missing genes don’t have genomic/transcripts information based on current reference assembly and de novo assembled transcripts.
- After exhaustively searching de novo assem- bled transcripts and their current reference genomes for all five birds, we could not find any orthologues for 135 genes in any assembled transcriptome from the five birds, and didn’t find meaningful orthologues in any of the five bird reference genomes.
- All the missing genes de- scribed by Bornelov et al.
- Fur- thermore, precisely inferring these missing genes also depends on multiple finished bird genomes..
- Both genome assem- bly and annotation have major impact on inferring missing genes.
- As the quality of the genome assemblies improve, the numbers of genes in birds will increase..
- Based on current results, it was found that high GC content was only one cause of missing genes in general.
- It is observed that GC content of these missing genes is slightly increased from lizard through to human.
- The majority of the missing genes were recovered in the microchromo- somes and unplaced scaffolds.
- Our results also found very interesting results that current missing genes are highly enriched in the tissue-specific expressed group.
- Unique tissue specificity and low expression of genes are some of the reasons that hinder the construction of high quality transcripts using RNA-Seq data.
- In this study, more than 55% (high-confi- dence) or 88% (low-confidence) of the proposed missing genes were obtained through assembly of 196 transcrip- tomic data sets, indicating that multi-tissue transcrip- tome assembly can largely solve the missing gene problems caused by poor genome quality.
- We constructed a relatively complete and high-quality bird transcript database from a large amount of avian transcriptomic data, and recovered most of the genes previously presumed to be missing in birds.
- clude that most of the presumed missing genes are in fact present in the bird genomes, but not in the current reference assemblies.
- High GC-content is one reason for wrongly inferring missing genes in birds, and some of these genes (about 40%) have similar, or lower, GC-content com- pared with genome background.
- List of missing genes investigated in this study..
- Missing genes recovered by analysis of five bird transcriptomes.
- lists the evidence supporting the presence of missing genes as described both in Zhang et al.
- Mapping infor- mation of recovered missing genes in the bird genomes and homologs in SWISS-prot.
- GC-content of high-confidence genes in five birds.
- The number of related studies of 446 high confidence genes in PubMed.
- missing genes in the chicken genome (Galgal5).
- Distribution of recovered missing genes on chicken chromosomes.
- The comparison of gene expression pattern between high confidence genes and annotated genes in chicken.
- The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript..
- Most of the RNA-seq datasets were sequenced in this study and part of the data were downloaded from Sequence Read Archive.
- Jacqueline Smith is a member of the editorial board (Section/Associate Editors) of this journal..
- 1 National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University,.
- Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.
- Identification of the long-sought leptin in chicken and duck: expression pattern of the highly GC-rich avian leptin fits an autocrine/paracrine rather Than endocrine function.
- Hidden genes in birds Response

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt