« Home « Kết quả tìm kiếm

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies


Tóm tắt Xem thử

- Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.
- Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain.
- In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data.
- We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR.
- Results: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis.
- Conclusion: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small.
- Unlike cDNA microarray technology, RNA-seq has wide applications for the identification of novel genes or transcripts, muta- tions, gene editing and differential gene expression [1, 3–7].
- Recent clinical studies demonstrated the utility of RNA-seq in identifying complex disease signatures via transcriptome analysis [8, 9].
- Despite this utility and im- portance, optimal methods for analyzing RNA-seq data remain uncertain..
- For each sample in an RNA-seq experiment, millions of reads with a desired read length are mapped to a ref- erence genome by alignment tools such as Bowtie2/.
- Thus, normalization and proper test statistics are critical steps in the analysis of RNA-seq data [15]..
- Normalization of RNA-seq read counts is an essential procedure that corrects for non-biological variation of samples due to library preparation, sequencing read depth, gene length, mapping bias and other technical is- sues [16–20].
- One is called RNA-seq by Expectation- Maximization using a directed graph model (RSEM) [30].
- abundance estimation using k-mers to index and count RNA-seq reads [31].
- A recent study using RNA-seq time course data found DESeq2 and edgeR with a pairwise comparison outperformed TC tools for short time course (<.
- References Normalization methods.
- et al.
- In this study, the ef- fects of the Wald test/DESeq2, exact test/QL F-test from.
- The number of true positive (TP) and false positive (FP) genes calculated were based on the number of DEGs identified from MAQC RNA-seq data given a nominal FDR cutoff 0.05, and the total number of TPs and true negatives (TNs) were based on qRT-PCR data..
- Although a t-test for DEGs analysis in RNA-seq studies is not commonly used due to the distribution of the read counts in RNA-seq data follow- ing a negative binomial [26, 49], the voom-limma package has been recently proposed [29] and was reported to have good control of FDR, but low power for small sample size [36, 37].
- Overall, for this comparison study of the four test statistics (the exact test/QL F-test, Wald test and t-test), the results from MAQC2 data demonstrated that UQ-pgQ2 and TMM combined with an exact test/Wald test performed much bet- ter than using a QL F-test and t-test in terms of sensitivity/.
- Briefly, UQ-pgQ2 with an exact test was the best choice and achieved the highest specifi- city among the four normalization methods for all four test statistics.
- T-test (voom-limma) UQ-pgQ2 &.
- First, using an exact test/QL F-test, we found that the FPR in Fig.
- Second, we found that the exact test at a sample size of five can achieve a smaller value of FPRs than a Wald test for all the methods (RLE in pink, TMM in green, UQ in blue and UQ-pgQ2 in purple).
- This suggests that when a sample size is small, an exact test is more con- servative than a Wald test.
- 15), a Wald test for RLE, TMM and UQ is more conservative than choosing the exact test or QL F-test..
- Next, we examined whether the read depth in a RNA- seq study affects the number of FPs for the normalization and test statistical methods given a de- sired sample size.
- desired sample size.
- pgQ2 (purple) performed the best with the lowest FPR while TMM (green) performed the worst with the largest FPR using an exact test.
- Illustrated are the fractions of FPs estimated from the RLE, TMM, UQ and UQ-pgQ2 normalization with the exact test, QL F-test or Wald test for sample sizes of and 40.
- while the exact test has more power than others.
- How- ever, given a sample size of five, the Wald test can iden- tify more DEGs than the exact test and QL F-test..
- However, given a sample size of five, the number of DEGs detected from UQ-pgQ2 combined with the exact.
- Illustrated are the fractions of FPs estimated from the exact test/QL F-test and Wald test combined with the RLE, TMM and UQ-pgQ2 normalization methods, based on the read depths from 19 to 157 million (a-c), from 12.8 to 104 million (d-f) and from 9.6 to 78.6 million (g-i).
- Illustrated are the fractions of FPs estimated from the RLE, TMM, UQ and UQ-pgQ2 normalization methods using the exact test, QL F-test or Wald test for sample sizes of and 40.
- Currently, common sample sizes in RNA-seq studies can range from a minimum of 3 up to several hun- dreds of biological replicates.
- Recently, edgeR provided a QL F-test which was recommended for studies with a small number of repli- cates in RNA-seq data.
- To address these issues, we focused on four normalization methods and three test statistics using both real RNA-seq datasets and simulated data given sample sizes at and 40..
- Analysis was accomplished using the exact test (a, b), QL F-test (c, d) and Wald test (e, f) listed.
- show that UQ-pgQ2 combined with an exact test can achieve the highest specificity with the sensitivity higher than 90%.
- However, when the sample size is large, UQ-pgQ2 com- bined with the QL F-test performs the best and the Wald test performs much better than an exact test and QL F-test with FPR below 0.01.
- Furthermore, comparing DEG analysis of BC and AdLC suggests that the RLE, TMM and UQ combined with an exact test or a Wald test have higher sensitivity or power than the UQ-pgQ2 method.
- Furthermore, it is important to note that the evaluated methods may not be applicable to all type RNA-seq data..
- A few studies have compared DEG analysis tools for scRNA-seq data and found that existing methods for analysis of bulk RNA-seq data per- form as well as, or not worse than, those specifically devel- oped for scRNA-seq data in terms of the power and FDR [52, 61].
- However, this limitation can be offset by the two benchmark MAQC RNA-seq data.
- Finally, since voom-limma with a t-test used for DEGs analysis in bulk RNA-seq data, we need to address here that per gene normalization in UQ-pgQ2 would not alter the DEG re- sults due to the invariant property of t-test for the linear transformations of gene counts across samples..
- Taken together, we found the UQ-pgQ2 method with an exact test is the best choice for DEG analysis in terms of controlling false positives when using the benchmark MAQC datasets.
- We observed that the RLE, TMM and UQ normalization methods combined with the Wald or exact test/QL F-test performed similarly and read depths have minimal impact on detection of DEGs from the analysis of simulated data.
- MAQC2 RNA-seq data contains two replicates in each condition (hbr1, hbr2, uhr1 and uhr2).
- MAQC3 RNA-seq data contains five replicates in each condition..
- Human cancer RNA-seq datasets from TCGA.
- Normalization methods.
- Four methods (RLE, TMM, UQ and UQ-pgQ2) were used to normalize RNA-seq data.
- In this study, we used the exact test and QL F-test implemented in edgeR (v .
- reported to improve the sensitivity compared with an exact test implemented in DESeq [25]..
- Exact test in a NB distribution.
- Since RNA-seq data are read counts, an exact test has been implemented similarly in DESeq and edgeR [26, 68].
- Thus, the p-value from an exact test [26] is calculated by summation of the prob- ability of a pair of P(a, b) that is less than or equal to the observed P(y A , y B ) given that the overall summation of P(a, b).
- Description of normalization Distribution Exact test Wald test.
- RNA-seq: High-throughput RNA sequencing.
- RSEM: RNA-seq by expectation- maximization.
- RNA-Seq: a revolutionary tool for transcriptomics.
- RNA-seq: from technology to biology.
- De novo assembly and analysis of RNA-seq data.
- New gene models and alternative splicing in the maize pathogen Colletotrichum graminicola revealed by RNA-Seq analysis.
- SNP discovery in the bovine milk transcriptome using RNA-Seq technology..
- Reliable identification of genomic variants from RNA-seq data.
- DEGseq: an R package for identifying differentially expressed genes from RNA-seq data.
- ultrafast universal RNA-seq aligner.
- TopHat: discovering splice junctions with RNA-Seq.
- A scaling normalization method for differential expression analysis of RNA-seq data.
- A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.
- Mapping and quantifying mammalian transcriptomes by RNA-Seq.
- RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
- Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and cufflinks.
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
- Normalization of RNA-seq data using factor analysis of control genes or samples.
- GC-content normalization for RNA-Seq data.
- voom: precision weights unlock linear model analysis tools for RNA-seq read counts.
- RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.
- Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.
- A combined approach with gene-wise normalization improves the analysis of RNA-seq data in human breast cancer subtypes.
- A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data.
- Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.
- Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.
- A comparison of methods for differential expression analysis of RNA-seq data.
- Comparison of software packages for detecting differential expression in RNA-seq studies.
- A comparative study of techniques for differential expression analysis on RNA-Seq data.
- Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.
- Evaluation of methods for differential expression analysis on multi-group RNA-seq count data.
- In Papyro comparison of TMM (edgeR), RLE (DESeq2), and MRN normalization methods for a simple two-conditions-without-replicates RNA- Seq experimental design.
- RNA-Seq differential expression analysis: an extended review and a software tool.
- It's DE-licious: a recipe for differential expression analyses of RNA-seq experiments using quasi-likelihood methods in edgeR..
- CEDER: accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq.
- DEsingle for detecting three types of differential expression in single-cell RNA-seq data.
- Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.
- Shrinkage estimation of dispersion in negative binomial models for RNA-seq experiments with small sample size..
- How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt