- Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes. - Strong conservation across multiple species and non-random accumulation of substitutions in splice sites indicate a functional relevance of non-canonical splice sites. - Non-canonical splice sites were first identified before genome sequences became available on a massive scale (reviewed in [29. - GC-AG and AT-AC are classified as major non-canonical splice site combinations, while all deviations from these sequences are deemed to be minor non-canonical splice sites. - Dedicated split-read aligners like STAR [31, 32] are able to detect non-canonical splice sites during the alignment of RNA-Seq reads to genomic sequences. - Nonetheless, the combined number of currently inferred minor non-canonical splice site. - combinations is even higher than the number of the major non-canonical AT-AC splice site combinations [30, 34].. - We incor- porated RNA-Seq data to differentiate between artifacts and bona fide cases of active non-canonical splice sites.. - We then identified homologous non-canonical splice sites across species and subjected the genes containing these splice sites to phylogenetic analyses. - Classification of annotated splice sites. - A more detailed classification into major non-canonical splice site combinations (GC-AG, AT-AC) and all remaining minor non-canonical splice site combina- tions was applied. - As proof of concept, one previously validated non-canonical splice site containing gene [30], At1g79350 (rna15125), was investigated in more depth.. - Validation of annotated splice sites. - Comparison of non-canonical splice sites to overall sequence variation. - rates in a species were compared against the observed substitution in minor non-canonical splice sites via Chi 2 test.. - Genomic properties of plants and diversity of non- canonical splice sites. - Our investigation of these 121 plant genome sequences revealed a huge variety of different non-canonical splice site combinations (Additional files 6 and 7). - Camelina sativa dis- played the highest number of minor non-canonical splice. - There is a strong correlation be- tween the number of non-canonical splice site combina- tions and the total number of splice sites (Spearman correlation coefficient = 0.53, p-value . - Non-canonical splice sites are likely to be similar to canonical splice sites. - There is a negative correlation between the frequency of non-canonical splice site combinations and their diver- gence from canonical sequences (r. - Splice sites with one difference to a canonical splice site are more frequent than more diverged splice sites. - A similar trend can be observed around the major non-canonical splice sites AT-AC (Fig. - vinifera (Additional files 10, 11 and 12), there were slightly less genes with non-canonical splice sites close to the centro- meres. - RNA-Seq reads supported 224 of these CA-GG splice sites. - Non-canonical splice sites in single copy genes. - The average percentage of genes with non-canonical splice sites among single copy BUSCO genes was 11.4%. - splice sites among BUSCO genes (Additional file 14). - A couple of species displayed an inverted situation, having less genes with non-canonical splice sites among the BUSCO genes than the genome-wide average.. - Length distributions of introns with canonical and non-canonical splice site combinations are similar in most regions (Fig. - These distributions indicate that non-canonical splice sites are more frequent in introns that deviate from the average length. - Stress-related genes were checked for increased intron sizes, because non-canonical splice site combinations might be associated with stress-response. - The likelihood of having a non-canonical splice site in a gene is almost perfectly correlated with the num- ber of introns (Additional file 15). - Conservation of non-canonical splice sites. - Non-canonical splice site combinations detected in A.. - Of 1296 non-canonical splice site combinations, 109 over- lapped with listed variant positions. - To differentiate between randomly occurring non- canonical splice sites (e.g. - sequencing errors) and true bio- logical variation, the conservation of non-canonical splice sites across multiple species can be analyzed. - Manual inspection revealed that non-canonical splice sites were conserved in three posi- tions in many putative homologous genes across various species (Additional file 16).. - Medicago truncatula, Oryza sativa, Populus trichocarpa, Monoraphidium neglectum, and Morus notabilis displayed substantially lower valid- ation values for the major non-canonical splice sites.. - The same trend holds true for major non-canonical GC-AG splice site. - Most striking differences are (1) at the intron length peak around 200 bp where non-canonical splice site combinations are less likely and (2) at very long intron lengths where introns with non-canonical splice sites are more likely. - Major non-canonical AT-AC and minor non-canonical splice sites did not show a difference between 5′ and 3′. - minor non-canonical splice site combinations to 0.82 in major non-canonical AT-AC splice site combinations.. - In order to provide an example for the usage of minor non-canonical splice sites under stress conditions, four single RNA-Seq data sets of B. - The number of RNA-Seq supported minor non-canonical splice site combinations increased between control and stress conditions from 17. - Occurrences of the canonical GT-AG, the major non-canonical GC-AG and AT-AC as well as the combined occurrences of all minor non-canonical splice sites (others) are displayed. - 5 Usage of splice sites. - Canonical GT-AG splice site combinations are used more often than major or minor non-canonical splice site combinations. - Our results update and expand previous systematic analyses of non-canonical splice sites in smaller data sets . - Our analyses supported a variety of different non-canon- ical splice sites matching previous reports of bona fide non-canonical splice sites . - Frequencies of different minor non-canonical splice site combinations are not random and vary between different combina- tions. - Those combinations similar to the canonical com- bination or the major non-canonical splice site combinations are more frequent. - GT-AG canonical splice sites is in agreement with recent reports for A. - findings together, both major and minor non-canonical splice sites could be a more significant phenomenon of splicing in plants than in animals. - An in-depth investigation of non-canonical splice sites in animals and fungi would be needed to validate this hypothesis.. - Species-specific differences in minor non-canonical splice site combinations. - Nevertheless, con- served non-canonical splice site positions exist as presented on the gene level for At1g79350. - The group of minor non-canonical splice sites dis- played the largest variation between species, and a fre- quent non-canonical splice site combination (CA-GG) which appeared peculiar to O. - thaliana support this con- jecture and suggest that some non-canonical splice sites are conserved in homologous loci at the intra-specific level. - Putative mechanisms for processing of minor non- canonical splice sites. - We sought to understand possible correlations with minor non-canonical splice site combinations in order under- stand the mechanisms driving their occurrence. - Further investigation might connect neighbouring sequences to the processing of minor non-canonical splice sites.. - Usage of non-canonical splice sites. - As previously indicated by several re- ports, non-canonical splice sites might be more fre- quently used under stress conditions . - Splice sites of interest might be canonical splice site combinations in some accessions or subspecies, respectively, while they are non-canonical in others. - Therefore, we cannot exclude that certain non-canonical splice sites were missed in our splice site usage analysis due to a lack of gene expression under the investigated conditions.. - Investigation of homologous non-canonical splice sites poses several difficulties, as the exonic sequence is not necessarily conserved. - How- ever, a computationally feasible approach to investigate the phylogeny of all non-canonical splice sites would sig- nificantly enhance our knowledge e.g. - about the emer- gence and loss of non-canonical splice sites. - Splice sites could be experi- mentally validated e.g. - Non-canonical splice site combinations are present and appear to be functionally relevant in most plants, although at low abundance. - Additional file 6: Number of splice sites per species. - Canonical and non-canonical splice sites were counted per species as described in the method section. - Additional file 8: Similarity of the non-canonical splice site pattern across plants. - For each investigated species the number of canonical and non- canonical splice sites is displayed. - The Spearman correlation coefficient between splice site number and genome size is r = 0.14 for canonical splice sites and r = 0.02 for non-canonical splice sites. - (JPG 250 kb) Additional file 10: Genome-wide distribution of non-canonical splice sites in A. - The distribution of genes with non-canonical splice sites (red dots) across the five chromosome sequences (black lines) of A.. - Additional file 11: Genome-wide distribution of non-canonical splice sites in B. - The distribution of genes with non-canonical splice sites (red dots) across the nine chromosome sequences (black lines) of B.. - Additional file 12: Genome-wide distribution of non-canonical splice sites in V. - The distribution of genes with non-canonical splice sites (red dots) across the 19 chromosome sequences (black lines) of V. - Additional file 13: Conserved sequences around splice sites in Oryza sativa. - Additional file 14: Non-canonical splice sites in single copy genes. - The occurrence of non-canonical splice sites in single copy genes (BUSCO) and in all genes was assessed per species. - Additional file 15: Proportion of non-canonical splice sites. - The green line indicates the average (median) proportion of genes with a non- canonical splice site combination. - Genes with more introns are more likely to have a non-canonical splice site combination. - Additional file 16: Conservation of non-canonical splice sites. - Non- canonical splice sites at conserved positions in putative homologous of At1g79350 across various species. - Additional file 17: Supported splice sites. - Percentage of splice sites supported by RNA-Seq reads is given per species. - Lessons from non-canonical splicing. - A reappraisal of non-consensus mRNA splice sites. - Consideration of non-canonical splice sites improves gene prediction on the Arabidopsis thaliana Niederzenz-1 genome sequence. - Analysis of canonical and non- canonical splice sites in mammalian genomes. - RNA-Seq read coverage depth of splice sites in plants. - serine (SR) proteins in maize are differentially spliced and utilize non- canonical splice sites. - A comprehensive survey of non- canonical splice sites in the human transcriptome
Xem thử không khả dụng, vui lòng xem tại trang nguồn hoặc xem
Tóm tắt