« Home « Kết quả tìm kiếm

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

- Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.
- Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain.
- In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data.
- We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR.
- Results: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis.
- Conclusion: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small.
- Unlike cDNA microarray technology, RNA-seq has wide applications for the identification of novel genes or transcripts, muta- tions, gene editing and differential gene expression [1, 3–7].
- Recent clinical studies demonstrated the utility of RNA-seq in identifying complex disease signatures via transcriptome analysis [8, 9].
- Despite this utility and im- portance, optimal methods for analyzing RNA-seq data remain uncertain..
- For each sample in an RNA-seq experiment, millions of reads with a desired read length are mapped to a ref- erence genome by alignment tools such as Bowtie2/.
- Thus, normalization and proper test statistics are critical steps in the analysis of RNA-seq data [15]..
- Normalization of RNA-seq read counts is an essential procedure that corrects for non-biological variation of samples due to library preparation, sequencing read depth, gene length, mapping bias and other technical is- sues [16–20].
- One is called RNA-seq by Expectation- Maximization using a directed graph model (RSEM) [30].
- abundance estimation using k-mers to index and count RNA-seq reads [31].
- A recent study using RNA-seq time course data found DESeq2 and edgeR with a pairwise comparison outperformed TC tools for short time course (<.
- References Normalization methods.
- et al.
- In this study, the ef- fects of the Wald test/DESeq2, exact test/QL F-test from.
- The number of true positive (TP) and false positive (FP) genes calculated were based on the number of DEGs identified from MAQC RNA-seq data given a nominal FDR cutoff 0.05, and the total number of TPs and true negatives (TNs) were based on qRT-PCR data..
- Although a t-test for DEGs analysis in RNA-seq studies is not commonly used due to the distribution of the read counts in RNA-seq data follow- ing a negative binomial [26, 49], the voom-limma package has been recently proposed [29] and was reported to have good control of FDR, but low power for small sample size [36, 37].
- Overall, for this comparison study of the four test statistics (the exact test/QL F-test, Wald test and t-test), the results from MAQC2 data demonstrated that UQ-pgQ2 and TMM combined with an exact test/Wald test performed much bet- ter than using a QL F-test and t-test in terms of sensitivity/.
- Briefly, UQ-pgQ2 with an exact test was the best choice and achieved the highest specifi- city among the four normalization methods for all four test statistics.
- T-test (voom-limma) UQ-pgQ2 &.
- First, using an exact test/QL F-test, we found that the FPR in Fig.
- Second, we found that the exact test at a sample size of five can achieve a smaller value of FPRs than a Wald test for all the methods (RLE in pink, TMM in green, UQ in blue and UQ-pgQ2 in purple).
- This suggests that when a sample size is small, an exact test is more con- servative than a Wald test.
- 15), a Wald test for RLE, TMM and UQ is more conservative than choosing the exact test or QL F-test..
- Next, we examined whether the read depth in a RNA- seq study affects the number of FPs for the normalization and test statistical methods given a de- sired sample size.
- desired sample size.
- pgQ2 (purple) performed the best with the lowest FPR while TMM (green) performed the worst with the largest FPR using an exact test.
- Illustrated are the fractions of FPs estimated from the RLE, TMM, UQ and UQ-pgQ2 normalization with the exact test, QL F-test or Wald test for sample sizes of and 40.
- while the exact test has more power than others.
- How- ever, given a sample size of five, the Wald test can iden- tify more DEGs than the exact test and QL F-test..
- However, given a sample size of five, the number of DEGs detected from UQ-pgQ2 combined with the exact.
- Illustrated are the fractions of FPs estimated from the exact test/QL F-test and Wald test combined with the RLE, TMM and UQ-pgQ2 normalization methods, based on the read depths from 19 to 157 million (a-c), from 12.8 to 104 million (d-f) and from 9.6 to 78.6 million (g-i).
- Illustrated are the fractions of FPs estimated from the RLE, TMM, UQ and UQ-pgQ2 normalization methods using the exact test, QL F-test or Wald test for sample sizes of and 40.
- Currently, common sample sizes in RNA-seq studies can range from a minimum of 3 up to several hun- dreds of biological replicates.
- Recently, edgeR provided a QL F-test which was recommended for studies with a small number of repli- cates in RNA-seq data.
- To address these issues, we focused on four normalization methods and three test statistics using both real RNA-seq datasets and simulated data given sample sizes at and 40..
- Analysis was accomplished using the exact test (a, b), QL F-test (c, d) and Wald test (e, f) listed.
- show that UQ-pgQ2 combined with an exact test can achieve the highest specificity with the sensitivity higher than 90%.
- However, when the sample size is large, UQ-pgQ2 com- bined with the QL F-test performs the best and the Wald test performs much better than an exact test and QL F-test with FPR below 0.01.
- Furthermore, comparing DEG analysis of BC and AdLC suggests that the RLE, TMM and UQ combined with an exact test or a Wald test have higher sensitivity or power than the UQ-pgQ2 method.
- Furthermore, it is important to note that the evaluated methods may not be applicable to all type RNA-seq data..
- A few studies have compared DEG analysis tools for scRNA-seq data and found that existing methods for analysis of bulk RNA-seq data per- form as well as, or not worse than, those specifically devel- oped for scRNA-seq data in terms of the power and FDR [52, 61].
- However, this limitation can be offset by the two benchmark MAQC RNA-seq data.
- Finally, since voom-limma with a t-test used for DEGs analysis in bulk RNA-seq data, we need to address here that per gene normalization in UQ-pgQ2 would not alter the DEG re- sults due to the invariant property of t-test for the linear transformations of gene counts across samples..
- Taken together, we found the UQ-pgQ2 method with an exact test is the best choice for DEG analysis in terms of controlling false positives when using the benchmark MAQC datasets.
- We observed that the RLE, TMM and UQ normalization methods combined with the Wald or exact test/QL F-test performed similarly and read depths have minimal impact on detection of DEGs from the analysis of simulated data.
- MAQC2 RNA-seq data contains two replicates in each condition (hbr1, hbr2, uhr1 and uhr2).
- MAQC3 RNA-seq data contains five replicates in each condition..
- Human cancer RNA-seq datasets from TCGA.
- Normalization methods.
- Four methods (RLE, TMM, UQ and UQ-pgQ2) were used to normalize RNA-seq data.
- In this study, we used the exact test and QL F-test implemented in edgeR (v .
- reported to improve the sensitivity compared with an exact test implemented in DESeq [25]..
- Exact test in a NB distribution.
- Since RNA-seq data are read counts, an exact test has been implemented similarly in DESeq and edgeR [26, 68].
- Thus, the p-value from an exact test [26] is calculated by summation of the prob- ability of a pair of P(a, b) that is less than or equal to the observed P(y A , y B ) given that the overall summation of P(a, b).
- Description of normalization Distribution Exact test Wald test.
- RNA-seq: High-throughput RNA sequencing.
- RSEM: RNA-seq by expectation- maximization.
- RNA-Seq: a revolutionary tool for transcriptomics.
- RNA-seq: from technology to biology.
- De novo assembly and analysis of RNA-seq data.
- New gene models and alternative splicing in the maize pathogen Colletotrichum graminicola revealed by RNA-Seq analysis.
- SNP discovery in the bovine milk transcriptome using RNA-Seq technology..
- Reliable identification of genomic variants from RNA-seq data.
- DEGseq: an R package for identifying differentially expressed genes from RNA-seq data.
- ultrafast universal RNA-seq aligner.
- TopHat: discovering splice junctions with RNA-Seq.
- A scaling normalization method for differential expression analysis of RNA-seq data.
- A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.
- Mapping and quantifying mammalian transcriptomes by RNA-Seq.
- RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
- Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and cufflinks.
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
- Normalization of RNA-seq data using factor analysis of control genes or samples.
- GC-content normalization for RNA-Seq data.
- voom: precision weights unlock linear model analysis tools for RNA-seq read counts.
- RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
- Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.
- A combined approach with gene-wise normalization improves the analysis of RNA-seq data in human breast cancer subtypes.
- A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data.
- Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.
- Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.
- A comparison of methods for differential expression analysis of RNA-seq data.
- Comparison of software packages for detecting differential expression in RNA-seq studies.
- A comparative study of techniques for differential expression analysis on RNA-Seq data.
- Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.
- Evaluation of methods for differential expression analysis on multi-group RNA-seq count data.
- In Papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA- Seq experimental design.
- RNA-Seq differential expression analysis: an extended review and a software tool.
- It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR..
- CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq.
- DEsingle for detecting three types of differential expression in single-cell RNA-seq data.
- Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.
- Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size..
- How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies

CHỦ ĐỀ LIÊN QUAN