- BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq. - Results: A high-quality, non-redundant barley gene RTD and database (Barley Reference Transcripts – BaRTv1.0) has been generated. - BaRTv1.0, was constructed from a range of tissues, cultivars and abiotic treatments and transcripts assembled and aligned to the barley cv. - BaRTv1.0- Quantification of Alternatively Spliced Isoforms (QUASI) was also made to overcome inaccurate quantification due to variation in 5 ′ and 3 ′ UTR ends of transcripts. - BaRTv1.0-QUASI was used for accurate transcript quantification of RNA-seq data of five barley organs/tissues. - Precise transcript quantification using BaRTv1.0 allows routine analysis of gene expression and AS.. - Here, we describe the development of a first barley ref- erence transcript dataset and database (Barley Reference Transcripts – BaRTv1.0) consisting of 60,444 genes and 177,240 non-redundant transcripts. - To create BaRTv1.0, we used 11 different RNA-seq experimental datasets repre- senting 808 samples and 19.3 billion reads that were de- rived from a range of tissues, cultivars and treatments. - We further compared the BaRTv1.0 transcripts to 22,651 Haruna nijo full-length (fl) cDNAs [37] to assess the com- pleteness and representation of the reference transcript dataset. - As in Arabidopsis, we also generated a version of the RTD specifically for quantification of alternatively spliced isoforms (BaRTv1.0-QUASI) for accurate expres- sion and AS analysis, which overcomes inaccurate quantifi- cation due to variation in the 5′ and 3′ UTR [53, 61].. - Finally, we used BaRTv1.0-QUASI to explore RNA-seq data derived from five diverse barley organs/tissues identi- fying 20,972 differentially expressed genes and 2791 differ- entially alternatively spliced genes amongst the samples.. - The raw RNA-seq data of all samples were quality controlled,. - At each stage the spliced proportions from HR RT- PCR were compared to the spliced proportions of the same AS event(s) derived from the Transcripts Per Million. - 1 BaRTv1.0 assembly and validation pipeline. - Steps in construction and validation of BaRTv1.0 and programs used in each step (right hand side). - b the number of HR RT-PCR products that match transcripts. - c correlation of the proportions of transcripts in 86 AS events derived from HR RT-PCR and the RNA-seq data using the different assemblies as reference for transcript quantification by Salmon. - Previous studies in Arabidopsis and human RNA-seq analysis showed that variation in the 5′ and 3′ ends of assembled transcript isoforms of the same gene affected accuracy of transcript quantification. - and 3′ ends of the longest gene transcript [61, 63]. - We similarly modified BaRTv1.0 to produce transcripts of each gene with the same 5′ and 3′ ends to generate BaRTv1.0- QUASI specifically for transcript and AS quantification.. - BaRTv1.0 represents an improved barley transcript dataset The barley cv. - The quality con- trol filters used in the construction of BaRTv1.0 aimed to reduce the number of transcript fragments and redun- dancy as these negatively impact the accuracy of transcript quantification [61]. - The BaRTv1.0 and HORVU datasets were directly compared with the numbers of complete Haruna nijo fl cDNAs and correlating the proportions of AS transcript variants measured by HR RT-PCR with those derived from the RNA-seq analysis (Additional file 1: Table S4). - The BaRTv1.0 transcript dataset identified more of the experimentally determined HR RT-PCR prod- ucts (220 versus 191) and has higher Pearson and Spear- man correlation co-efficient (r) with quantification of the. - For the AS events detected in BaRTv1.0 and HORVU, we plotted the percentage spliced in (PSI) values (the frac- tion of mRNAs that represent the isoform that includes most exon sequence. - Pearson and Spearman ranked correlation (r) of the AS proportion values showed an improvement when comparing the HR RT-PCR with the three RNA-seq reference transcript datasets, HORVU (0.769 and 0.768), BaRTv1.0 (0.793 and 0.795) and BaRTv1.0-QUASI 0.828 and 0.83) (Table 1. - We conclude that BaRTv1.0 (and the derived BaRTv1.0-QUASI) RTD is a comprehensive, non- redundant dataset suitable for differential gene expression and AS analyses.. - BaRTv1.0 genes and transcripts. - We next explored the characteristics of BaRTv1.0 genes and transcripts. - A total of 57% of the BaRTv1.0 genes. - Analysis of the 177,240 predicted transcripts in BaRTv1.0 showed the expected distribution of canonical splice site dinucleotides. - Frequencies of the different AS events were consistent with studies in other plant Table 1 Transcriptome dataset comparisons with HR RT-PCR and Haruna nijo fl cDNAs. - Transcriptome Version BaRTv1.0 BaRTv1.0-QUASI HORVU. - HR RT-PCR products . - 3 Correlation of alternative splicing from HR RT-PCR and RNA-seq. - fluorescence units from HR RT-PCR and transcript abundances (TPM) from RNA-seq data quantified with Salmon using the (a) BaRTv1.0, b HORVU and (c) BaRTv1.0-QUASI transcript datasets as reference. - Of the alternative 3′. - We used RNA-seq data from three bio- logical repeats of five organs/tissues of Morex to quan- tify transcripts with Salmon and BaRTv1.0-QUASI.. - transcripts of the gene [10]. - Validation of differential AS from RNA-seq with HR RT-PCR and RNA-seq. - To validate differential AS observed for individual genes among the different organs/tissues, we compared the RNA-seq quantifications of the 86 AS genes and 220 transcripts used in HR-RT-PCR. - Each of these examples show the pattern of AS across the tissues are essentially equivalent between HR RT-PCR and RNA-seq (Fig. - Thus, there is good agreement between the differential alternative spli- cing analysis from the RNA-seq data and the experimental verification with HR RT-PCR. - These data provide strong support for the value of using BaRTv1.0 and BaRTv1.0-. - A principal aim of establishing BaRTv1.0 was to achieve higher accuracy of differential expression and AS ana- lysis in barley RNA-seq datasets by improved transcript quantification. - 344 k) was approxi- mately halved in BaRTv1.0 (ca. - BART1_0-u51812 contains 44 different transcript iso- forms in the BaRTv1.0 dataset due to unique combina- tions of different AS events (Fig. - 5 Comparison of alternative splicing in different barley tissues with HR RT-PCR and RNA-seq data. - splice sites and two alternative exons from the BaRTv1.0 transcripts (Fig. - These AS events were also quantified using transcript abundances from the RNA-seq data using BaRTv1.0_QUASI and showed good agreement with the HR RT-PCR results with Pearson correlations of 0.92 for the Hv78 regions and 0.73 for the Hv79 re- gion. - These examples support the accuracy of alternative splicing found in BaRTv1.0 and that the proportions of alternative splice sites selected in short-read RNA-seq can be determined.. - Here we describe the BaRTv1.0 transcript dataset or transcrip- tome for barley, produced by merging and filtering tran- scripts assembled from extensive RNA-seq data and its utility in differential expression and differential alternative splicing. - Finally, the BaRTv1.0 transcript dataset will enable accur- ate gene and transcript level expression and AS analysis increasing our understanding of the full impact of AS and how transcriptional and AS regulation of expression. - How- ever, the arrangement of BaRTv1.0 transcripts have identified mis-annotated chimeric genes in the barley reference genome, helping to improve gene resolution.. - BaRTv1.0 was established using RNA-seq data contain- ing approximately 19 billion reads from a range of differ- ent biological samples (organs, tissues, treatments and genotypes) and was assembled initially against the Morex genome. - A key function of the BaRTv1.0 transcript dataset is improved accuracy of transcript abundance. - We also found an improvement in the quantification of transcripts and splicing proportions by applying the same approach to produce the BaRTv1.0-QUASI version, spe- cifically for quantification of alternatively spliced isoforms (Table 1). - To demonstrate the value of the new RTD for gene expression studies and AS analysis, we used BaRTv1.0- QUASI to quantify transcripts in the five developmental organs and tissues RNA-seq datasets that we had used previously for HR RT-PCR optimisation and validation.. - BART1_0-u51812 transcript models represented in the BaRTv1.0 database. - AS events involving intron 2 validated by HR-RT-PCR. - AS events between exon 6 and 8 validated by HR-RT-PCR. - Electropherogram output from the ABI3730 shows the HR RT-PCR products (x-axis RT-PCR products (bp). - indicates minor alternative transcripts identified in HR RT-PCR and in RNA-seq. - indicates an uncharacterised alternative transcript identified in HR RT-PCR. - BaRTv1.0 en- ables rapid and robust analysis of gene expression and AS in a wide range of experimental scenarios. - BaRTv1.0 is based on cv. - Morex but used RNA-seq data from a wide-range of cultivars and lines. - A comprehensive, non-redundant barley reference tran- script dataset called BaRTv1.0 has been generated, which enables fast, precise transcript abundances. - BaRTv1.0 is part of a unique pipeline that facilitates the robust routine analysis of barley gene expression and AS. - Selected RNA-seq datasets and data processing. - to each RNA-seq tran- scriptome assembly generated. - High resolution RT-PCR. - Morex was used for HR RT-PCR validation [35]. - Comparing HR RT-PCR and RNA-seq alternative splicing proportions. - To assess the accuracy of BaRTv1.0 to detect changes in AS in the RNA-seq data, we compared the splicing pro- portions for AS events from HR RT-PCR with those cal- culated from the RNA-seq data using the HORVU transcript set, BaRTv1.0 and BaRTv1.0-QUASI as tran- script references. - For this reason, multiple RNA- seq transcripts may represent the same AS product that is detected by HR RT-PCR. - The proportions of the different AS products for both HR-RT-PCR and RNA-seq were then subse- quently calculated and correlated.. - PCR and RNA-seq were identified. - Finally, based on the calculated values of RNA-seq levels of expression and the calculated values of HR RT-PCR for each RT-PCR prod- uct, the proportions of the alternative transcripts were cal- culated. - Generation of the BaRTv1.0 database. - Statistical analysis HR RT-PCR ANOVA. - Mean proportions of alternatively spliced products by HR-RT-PCR analyisis. - Correlation of HR RT-PCR data with BaRTv1.0, BaRTv1.0- QUASI and HORVU transcripts. - Pipeline de- scribing the algorithm to compare HR-RT-PCR and RNA-seq alternatively spliced transcript proportions and correlations.. - HR RT-PCR: High resolution RT- PCR. - RNA-seq: RNA-sequencing. - PR-F and MB assembled the RNA- seq data. - JF, GS and CS identified the AS genes and performed the HR RT- PCR screening and analysis of the data. - PR-F, C-DM, JWSB, RZ, WG and CS performed the detailed analysis of the RNA-seq and HR RT-PCR data. - BaRTv1.0 and BaRTv1.0 – QUASI are available as .fasta and. - To develop BaRTv1.0 we used publicly available sequences from the Sequence Read Archive (SRA) or European Nucleotide Archive (ENA) (accession numbers: PRJEB13621. - Near-optimal probabilistic RNA- seq quantification. - Optimizing RNA-Seq mapping with STAR. - STAR: ultrafast universal RNA-seq aligner.. - Systematic evaluation of spliced alignment programs for RNA-seq data. - A physical, genetic and functional sequence assembly of the barley genome. - Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. - A chromosome conformation capture ordered sequence of the barley genome. - Expansion of the eukaryotic proteome by alternative splicing. - Complexity of the alternative splicing landscape in plants
Xem thử không khả dụng, vui lòng xem tại trang nguồn hoặc xem
Tóm tắt