Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies
- Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies. - Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. - In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. - We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR. - Results: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. - Conclusion: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. - Unlike cDNA microarray technology, RNA-seq has wide applications for the identification of novel genes or transcripts, muta- tions, gene editing and differential gene expression [1, 3–7]. - Recent clinical studies demonstrated the utility of RNA-seq in identifying complex disease signatures via transcriptome analysis [8, 9]. - Despite this utility and im- portance, optimal methods for analyzing RNA-seq data remain uncertain.. - For each sample in an RNA-seq experiment, millions of reads with a desired read length are mapped to a ref- erence genome by alignment tools such as Bowtie2/. - Thus, normalization and proper test statistics are critical steps in the analysis of RNA-seq data [15].. - Normalization of RNA-seq read counts is an essential procedure that corrects for non-biological variation of samples due to library preparation, sequencing read depth, gene length, mapping bias and other technical is- sues [16–20]. - One is called RNA-seq by Expectation- Maximization using a directed graph model (RSEM) [30]. - abundance estimation using k-mers to index and count RNA-seq reads [31]. - A recent study using RNA-seq time course data found DESeq2 and edgeR with a pairwise comparison outperformed TC tools for short time course (<. - References Normalization methods. - et al. - In this study, the ef- fects of the Wald test/DESeq2, exact test/QL F-test from. - The number of true positive (TP) and false positive (FP) genes calculated were based on the number of DEGs identified from MAQC RNA-seq data given a nominal FDR cutoff 0.05, and the total number of TPs and true negatives (TNs) were based on qRT-PCR data.. - Although a t-test for DEGs analysis in RNA-seq studies is not commonly used due to the distribution of the read counts in RNA-seq data follow- ing a negative binomial [26, 49], the voom-limma package has been recently proposed [29] and was reported to have good control of FDR, but low power for small sample size [36, 37]. - Overall, for this comparison study of the four test statistics (the exact test/QL F-test, Wald test and t-test), the results from MAQC2 data demonstrated that UQ-pgQ2 and TMM combined with an exact test/Wald test performed much bet- ter than using a QL F-test and t-test in terms of sensitivity/. - Briefly, UQ-pgQ2 with an exact test was the best choice and achieved the highest specifi- city among the four normalization methods for all four test statistics. - T-test (voom-limma) UQ-pgQ2 &. - First, using an exact test/QL F-test, we found that the FPR in Fig. - Second, we found that the exact test at a sample size of five can achieve a smaller value of FPRs than a Wald test for all the methods (RLE in pink, TMM in green, UQ in blue and UQ-pgQ2 in purple). - This suggests that when a sample size is small, an exact test is more con- servative than a Wald test. - 15), a Wald test for RLE, TMM and UQ is more conservative than choosing the exact test or QL F-test.. - Next, we examined whether the read depth in a RNA- seq study affects the number of FPs for the normalization and test statistical methods given a de- sired sample size. - desired sample size. - pgQ2 (purple) performed the best with the lowest FPR while TMM (green) performed the worst with the largest FPR using an exact test. - Illustrated are the fractions of FPs estimated from the RLE, TMM, UQ and UQ-pgQ2 normalization with the exact test, QL F-test or Wald test for sample sizes of and 40. - while the exact test has more power than others. - How- ever, given a sample size of five, the Wald test can iden- tify more DEGs than the exact test and QL F-test.. - However, given a sample size of five, the number of DEGs detected from UQ-pgQ2 combined with the exact. - Illustrated are the fractions of FPs estimated from the exact test/QL F-test and Wald test combined with the RLE, TMM and UQ-pgQ2 normalization methods, based on the read depths from 19 to 157 million (a-c), from 12.8 to 104 million (d-f) and from 9.6 to 78.6 million (g-i). - Illustrated are the fractions of FPs estimated from the RLE, TMM, UQ and UQ-pgQ2 normalization methods using the exact test, QL F-test or Wald test for sample sizes of and 40. - Currently, common sample sizes in RNA-seq studies can range from a minimum of 3 up to several hun- dreds of biological replicates. - Recently, edgeR provided a QL F-test which was recommended for studies with a small number of repli- cates in RNA-seq data. - To address these issues, we focused on four normalization methods and three test statistics using both real RNA-seq datasets and simulated data given sample sizes at and 40.. - Analysis was accomplished using the exact test (a, b), QL F-test (c, d) and Wald test (e, f) listed. - show that UQ-pgQ2 combined with an exact test can achieve the highest specificity with the sensitivity higher than 90%. - However, when the sample size is large, UQ-pgQ2 com- bined with the QL F-test performs the best and the Wald test performs much better than an exact test and QL F-test with FPR below 0.01. - Furthermore, comparing DEG analysis of BC and AdLC suggests that the RLE, TMM and UQ combined with an exact test or a Wald test have higher sensitivity or power than the UQ-pgQ2 method. - Furthermore, it is important to note that the evaluated methods may not be applicable to all type RNA-seq data.. - A few studies have compared DEG analysis tools for scRNA-seq data and found that existing methods for analysis of bulk RNA-seq data per- form as well as, or not worse than, those specifically devel- oped for scRNA-seq data in terms of the power and FDR [52, 61]. - However, this limitation can be offset by the two benchmark MAQC RNA-seq data. - Finally, since voom-limma with a t-test used for DEGs analysis in bulk RNA-seq data, we need to address here that per gene normalization in UQ-pgQ2 would not alter the DEG re- sults due to the invariant property of t-test for the linear transformations of gene counts across samples.. - Taken together, we found the UQ-pgQ2 method with an exact test is the best choice for DEG analysis in terms of controlling false positives when using the benchmark MAQC datasets. - We observed that the RLE, TMM and UQ normalization methods combined with the Wald or exact test/QL F-test performed similarly and read depths have minimal impact on detection of DEGs from the analysis of simulated data. - MAQC2 RNA-seq data contains two replicates in each condition (hbr1, hbr2, uhr1 and uhr2). - MAQC3 RNA-seq data contains five replicates in each condition.. - Human cancer RNA-seq datasets from TCGA. - Normalization methods. - Four methods (RLE, TMM, UQ and UQ-pgQ2) were used to normalize RNA-seq data. - In this study, we used the exact test and QL F-test implemented in edgeR (v . - reported to improve the sensitivity compared with an exact test implemented in DESeq [25].. - Exact test in a NB distribution. - Since RNA-seq data are read counts, an exact test has been implemented similarly in DESeq and edgeR [26, 68]. - Thus, the p-value from an exact test [26] is calculated by summation of the prob- ability of a pair of P(a, b) that is less than or equal to the observed P(y A , y B ) given that the overall summation of P(a, b). - Description of normalization Distribution Exact test Wald test. - RNA-seq: High-throughput RNA sequencing. - RSEM: RNA-seq by expectation- maximization. - RNA-Seq: a revolutionary tool for transcriptomics. - RNA-seq: from technology to biology. - De novo assembly and analysis of RNA-seq data. - New gene models and alternative splicing in the maize pathogen Colletotrichum graminicola revealed by RNA-Seq analysis. - SNP discovery in the bovine milk transcriptome using RNA-Seq technology.. - Reliable identification of genomic variants from RNA-seq data. - DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. - ultrafast universal RNA-seq aligner. - TopHat: discovering splice junctions with RNA-Seq. - A scaling normalization method for differential expression analysis of RNA-seq data. - A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data. - Mapping and quantifying mammalian transcriptomes by RNA-Seq. - RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. - Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. - Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and cufflinks. - Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. - Normalization of RNA-seq data using factor analysis of control genes or samples. - GC-content normalization for RNA-Seq data. - voom: precision weights unlock linear model analysis tools for RNA-seq read counts. - RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. - Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. - A combined approach with gene-wise normalization improves the analysis of RNA-seq data in human breast cancer subtypes. - A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. - Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. - Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. - A comparison of methods for differential expression analysis of RNA-seq data. - Comparison of software packages for detecting differential expression in RNA-seq studies. - A comparative study of techniques for differential expression analysis on RNA-Seq data. - Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. - Evaluation of methods for differential expression analysis on multi-group RNA-seq count data. - In Papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA- Seq experimental design. - RNA-Seq differential expression analysis: an extended review and a software tool. - It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR.. - CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq. - DEsingle for detecting three types of differential expression in single-cell RNA-seq data. - Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model. - Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size.. - How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA
Xem thử không khả dụng, vui lòng xem tại trang nguồn hoặc xem
Tóm tắt