« Home « Kết quả tìm kiếm

Comprehensive genome-wide identification of angiosperm upstream ORFs with peptide sequences conserved in various taxonomic ranges using a novel pipeline, ESUCA


Tóm tắt Xem thử

- Full list of author information is available at the end of the article.
- As expected, ESUCA analysis of each of the five angiosperm genomes identified many CPuORFs that were not identified from ESUCA analyses of the other four species.
- Conclusions: This study demonstrates that ESUCA is capable of efficiently identifying CPuORFs likely to be conserved because of the functional importance of their encoded peptides.
- Ribosome stalling on a uORF results in translational repression of the down- stream main ORF (mORF) because stalled ribosomes block the access of subsequently loaded ribosomes to the mORF start codon [11].
- We examined the sequence-dependent effects of the CPuORFs identified by BAIUCAS on mORF translation using a transient expression assay, and identified six regulatory CPuORFs that repress mORF translation in an amino acid sequence-dependent manner [27, 28].
- One major problem with identifying CPuORFs is that there are cases where a uORF found in the 5′-UTR of a transcript is fused to the mORF in an isoform of the transcript, and in some of these cases, such uORF sequences are conserved because they ac- tually encode parts of mORF-encoded protein sequences..
- Such an ORF can be extracted as a CPuORF if the amino acid sequence in the N-terminal re- gion of the protein is evolutionarily conserved.
- ESUCA includes an algorithm to se- lect one uORF sequence from each order for calculation of the K a /K s ratio of each CPuORF.
- automatic determination of the taxonomic range of CPuORF conservation provides useful information for the selection of CPuORFs likely to encode functional peptides..
- If transcripts with a uORF-mORF fusion are found as a major form in a majority of species with their ortho- logs, the uORF sequence is likely to code for a part of the mORF-encoded protein.
- In this step, tBLASTn searches are per- formed against a transcript sequence database, using the amino acid sequences of the uORFs as queries (uORF- tBLASTn analysis).
- To confirm whether the uORF-tBLASTn hits are derived from homo- logs of the original uORF-containing gene, the downstream sequences of putative uORFs in the uORF-tBLASTn hits are subjected to another tBLASTn analysis, which uses the mORF amino acid sequence of the original uORF- containing transcript as a query (mORF-tBLASTn analysis) (Fig.
- If a uORF-tBLASTn hit has a partial or intact ORF that contains a sequence similar to the mORF amino acid sequence downstream of the putative uORF, it is consid- ered to be derived from a homolog of the original uORF- containing gene.
- If uORF-tBLASTn and mORF-tBLASTn hits are found in at least two orders other than that of the original uORF, then the uORF is selected as a candidate CPuORF.
- For each candidate CPuORF, a representative uORF-tBLASTn and mORF-tBLASTn hit is selected from each order, and the putative uORF sequences in the representative uORF- tBLASTn and mORF-tBLASTn hits are used for the calcu- lation of the K a /K s ratio (Fig.
- On the basis of the presence of the uORF-tBLASTn and mORF-tBLASTn hits in each taxonomic category, the taxonomic range of sequence conservation is determined for each CPuORF..
- To extract sequences of uORFs and their downstream mORFs from all splice variants, we extracted uORF and mORF sequences from each of the transcripts with differ- ent transcript IDs.
- In the third step, using the amino acid sequences of the remaining uORFs as queries, we performed uORF-tBLASTn searches.
- 1 Outline of the ESUCA pipeline.
- We selected uORFs whose remaining uORF-tBLASTn and mORF-tBLASTn hits were found in homologs from at least two orders other than that of the original uORF..
- 2 Schematic representation of the algorithm to calculate uORF-mORF fusion ratios.
- For each original uORF-containing transcript sequence, RefSeq RNAs are selected that match an original uORF sequence, irrespective of the reading frame, and the original mORF sequence in the same reading frame as the largest ORF of the RefSeq RNA, using tBLASTx.
- For each of the original uORF-containing transcripts, the uORF-mORF fusion ratio is calculated as X / (X + Y).
- (i) The downstream in-frame stop codon closest to the 5 ′ -end of the matching region of each uORF-tBLASTn hit is selected.
- (ii) The 5 ′ -most in-frame ATG codon located upstream of the stop codon is selected.
- (iii) For each of the uORF-tBLASTn and mORF-tBLASTn hits, the upstream in-frame stop codon closest to the 5 ′ -end of the matching region is selected.
- (iv) The 5 ′ -most in-frame ATG codon located downstream of the selected stop codon is identified as the initiation codon of the putative partial or intact mORF.
- When multiple original uORFs derived from splice variants of the same gene partially or com- pletely shared amino acid sequences, the one with the lon- gest conserved region was manually selected on the basis of the uORF amino acid sequence alignments.
- The amino acid sequences of the remaining candidate CPuORFs are not similar to those of the known CPuORFs.
- If the amino acid sequence of a uORF is evolutionarily con- served because of functional constraints of the uORF- encoded peptide, it is expected that the amino acid se- quence in the functionally important region of the peptide is conserved among the uORF and its orthologous uORFs..
- The K a /K s ratios were recalculated after the manual removal of the sequences (Supplementary Table S1), and eight.
- 4 Schematic representation of the algorithms to select putative uORF sequences used for K a /K s analysis and to determine the taxonomic range of uORF sequence conservation.
- The putative uORF sequences in the selected transcript sequences are used for generating the multiple alignments of the uORF amino acid sequences.
- We found that the genomic position of the candidate CPuORF of the Arabidopsis ROA1 (AT1G60200) gene overlaps with that of an intron in the mORF region of a splice variant.
- Protein sequences with an N-terminal region similar to the amino acid se- quence encoded by the 5′-extended region of the mORF in this splice variant are found in most orders from which the uORF-tBLASTn and mORF-tBLASTn hits of this can- didate CPuORF were extracted, suggesting that the splice variant with the 5′-extended mORF is not a minor form among orthologous transcripts.
- In the second step of ESUCA, we excluded uORF se- quences likely to encode parts of the mORF-encoded pro- teins, by removing uORFs with high uORF-mORF fusion ratios.
- In this analysis, mORF-encoded proteins with N-terminal sequences similar to the amino acid sequences encoded by the candidate CPuORFs of the rice OsUAM2 gene and its poplar ortholog, POPTR_0019s07850, were identified in many orders.
- This suggests that the sequences encoded by these candidate CPuORFs are likely to function as parts of the mORF-encoded proteins.
- Of the newly identified CPuORF genes, six were classified into the same ortholog groups as previously identified CPuORF genes, but the amino acid sequences of these six CPuORFs are dissimilar to those of the known CPuORFs..
- Determination of the taxonomic range of CPuORF sequence conservation.
- As the final step of ESUCA, we determined the taxo- nomic range of the sequence conservation of each CPuORF identified, including previously identified CPuORFs.
- For 19 of the novel HGs, CPuORF sequences are conserved both in eudicots and monocots or in wider taxonomic ranges.
- In contrast, for 70 of the novel HGs, CPuORF sequences are con- served only among eudicots.
- Of the selected CPuORFs, those belonging to HG46, HG55, HG57, HG66 and HG103 are conserved in diverse angiosperms or in wider taxonomic ranges.
- 5 Taxonomic range of the sequence conservation of the CPuORF families.
- The presence of a line within a cell in each taxonomic category indicates the presence of uORF-tBLASTn and mORF-tBLASTn hits for any of the CPuORFs that belong to each HG.
- In the case where no uORF-tBLASTn and mORF-tBLASTn hit was found in the taxonomic category that contain a species from which the original uORF was derived, the line showing the species was still drawn in the cell of the taxonomic category because this category contained the species with the original uORF.
- In the 5′-UTR of the poplar gene with the HG107 CPuORF, there is another uORF immediately upstream of the CPuORF (Supplementary Figure S2K).
- a Schematic representation of the WT (35S::UTR (WT):Fluc) and frameshift (fs) mutant (35S::UTR (fs):Fluc) reporter constructs.
- The dotted boxes represent the first five nucleotides of the mORF.
- Fluc activity was normalized to Rluc activity, and the normalized activity relative to that of the corresponding WT reporter construct is shown.
- The Fluc translation efficiency relative to that of the corresponding WT reporter construct was calculated to determine the relative translation efficiency.
- These five novel sequence-dependent regulatory CPuORFs include the HG107 CPuORF, which is one of the CPuORFs conserved only among rosids.
- For example, while the CPuORF of the tomato LOC101264451 gene, which be- longs to HG43.1, exerts a sequence-dependent repressive.
- effect on mORF translation, the CPuORF of its Arabidopsis ortholog, ANAC096, lacks the C-terminal half of the amino acid sequence in the highly conserved region and does not have a sequence-dependent regulatory effect [26–28].
- One of the identified regulatory CPuORFs is con- served only among rosids (i.e.
- Likewise, all the HGs identified through ESUCA analysis of the other four plant genomes are con- served in multiple taxonomic categories.
- Of the 11 poplar CPuORFs analyzed by the transient expression as- says, five are conserved beyond eudicots, and three of them exhibited sequence-dependent repressive effects (Figs.
- To distinguish between ‘spurious’ CPuORFs conserved because they code for parts of mORF-encoded proteins and ‘true’ CPuORFs conserved because of functional constraints of their encoded small peptides, we employed the criterion of the uORF-mORF fusion ratio and discarded uORFs with uORF-mORF fusion ratios equal to or greater than 0.3.
- ‘spurious’ CPuORFs that code for parts of the mORF- encoded proteins..
- The uORF- mORF fusion ratios of the candidate CPuORFs of rice OsUAM2 and its poplar ortholog were 0.28.
- ‘spurious’ CPuORFs whose amino acid sequences are likely to be evolutionarily conserved because of their func- tion as N-terminal regions of the mORF-encoded pro- teins, we excluded the candidate CPuORFs of the rice OsUAM2 gene and its poplar ortholog.
- In conclusion, the criterion of the uORF-mORF fusion ratio used in this study appears appropriate because all known cis-acting sequence-dependent regulatory CPuORFs were extracted and most ‘spurious’ CPuORFs were removed..
- On the basis of the transcription start site and the translation initiation codon of each transcript in the genomic coordinate files, we extracted 5′-UTR sequences from the transcript sequence datasets.
- Calculation of the uORF-mORF fusion ratio.
- The uORF-mORF fusion ratio for each of the extracted uORFs was assessed as follows.
- Then, we examined whether the largest ORF of each of the se- lected RefSeq RNAs included the region that matched the original mORF in the same reading frame (Fig.
- We also examined whether the largest ORF included the region that matched the original uORF, irrespective of the read- ing frame.
- RefSeq RNA numbers of the former and latter types were defined as X and Y, respectively.
- We calculated a uORF-mORF fu- sion ratio as X / (X + Y) for each of the original uORF- containing transcript sequences..
- To search for uORFs with amino acid sequences con- served between homologous genes, we first performed tBLASTn searches against the assembled plant transcript sequence database, using the amino acid sequences of the uORFs as queries.
- In these uORF-tBLASTn searches, we extracted transcript sequences that matched a uORF with an E-value less than 2000 and derived from species other than that of the original uORF.
- The downstream in-frame stop codon closest to the 5′-end of the match- ing region of each uORF-tBLASTn hit was selected (Fig..
- Then, we looked for an in-frame ATG codon up- stream of the selected stop codon, without any other in- frame stop codon between them.
- The downstream sequences of putative uORFs were subjected to another tBLASTn analysis to examine whether the transcripts were derived from homologs of the original uORF-containing gene.
- In this analysis, the amino acid se- quence of the mORF associated with the original uORF was used as a query sequence, and transcript sequences matching the mORF with an E-value less than 10 − 1 were.
- For each of the uORF-tBLASTn and mORF- tBLASTn hits, the upstream in-frame stop codon closest to the 5′-end of the region matching the original mORF was selected, and the 5′-most in-frame ATG codon lo- cated downstream of the selected stop codon was identi- fied as the putative mORF initiation codon (Fig.
- If a uORF- tBLASTn hit EST/TSA sequence matched an EST, TSA, or RefSeq RNA sequence of a different order from the spe- cies of the uORF-tBLASTn and mORF-tBLASTn hit, with an E-value less than 10 − 100 and an identity equal to or greater than 95%, it was considered a candidate contamin- ant sequence.
- To distinguish these possibilities, we com- pared the ratio of the BLASTn hit number to the total EST/TSA and RefSeq RNA sequence number between the species of each uORF-tBLASTn and mORF-tBLASTn hit and the species of its BLASTn hits.
- If the ratio of the BLASTn hit number to the total EST/TSA and RefSeq RNA sequence number of a uORF- tBLASTn and mORF-tBLASTn hit species is less than that of any other BLASTn hit species, the uORF-tBLASTn and mORF-tBLASTn hit sequence was identified as a contam- inant sequence..
- Multiple alignments of the uORF amino acid sequences were gen- erated by using standalone Clustal Omega (ClustalO) ver.
- On the basis of the multiple uORF amino acid sequence alignments, codon-based multiple alignments (also referred to as codon-delimited multiple alignments) [47] of the uORF nucleotide sequences were generated (Supplementary Table S3).
- ratio for all pairwise combinations of the original uORF and its homologous putative uORFs was calculated using the codon-based multiple alignment and the kaks func- tion in the seqinR package (ver.
- Determination of the taxonomic range of uORF sequence conservation.
- To automatically determine the taxonomic range of the sequence conservation of each CPuORF, we first defined 13 plant taxonomic categories.
- Plasmid pNH006 harbors the cauliflower mosaic virus 35S RNA (35S) promoter, the Fluc coding sequence, and the polyadenylation signal of the A.
- Sequence analysis confirmed the integrity of the PCR-amplified regions of all constructs..
- Transient expression assays for measuring luciferase ac- tivities and the mRNA levels of the reporter genes were carried out with the following modifications.
- Taxonomic range of sequence conservation of the CPuORFs..
- Alignments of the newly identified CPuORF sequences..
- 5́-UTR nucleotide and deduced amino acid sequences of the poplar CPuORFs analyzed in the transient expression study..
- The Ensembl and Phytozome transcript IDs of the transcript sequences on which the identified CPuORF sequences were based are shown in Supplementary Table S1.
- weak ’ context of the start codon.
- Ribosome occupancy of the yeast CPA1 upstream open reading frame termination codon modulates nonsense- mediated mRNA decay.
- Polyamine-responsive ribosomal arrest at the stop codon of an upstream open reading frame of the AdoMetDC1 gene triggers nonsense-mediated mRNA decay in Arabidopsis thaliana.
- Trans-regulation of the expression of the transcription factor MtHAP2-1 by a uORF controls root nodule development.
- Identification of novel Arabidopsis thaliana upstream open reading frames that control expression of the main coding sequences in a peptide sequence-dependent manner.
- Unbiased estimation of the rates of synonymous and nonsynonymous substitution.
- The dwarf phenotype of the Arabidopsis acl5 mutant is suppressed by a mutation in an upstream ORF of a bHLH gene.
- Posttranscriptional regulation by the upstream open reading frame of the

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt