« Home « Kết quả tìm kiếm

Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis


Tóm tắt Xem thử

- Custom selected reference genes.
- outperform pre-defined reference genes in transcriptomic analysis.
- The use of internal control genes or spike-ins is advocated in the literature for scaling read counts, but the methods for choosing reference genes are mostly targeted at RT-qPCR studies and require a set of pre-selected candidate controls or pre-selected target genes..
- We used this method to pick custom reference genes for the differential expression analysis of three transcriptome sets from transgenic Arabidopsis plants expressing heterologous fungal effector proteins tagged with GFP (using GFP alone as the control).
- The custom reference genes showed lower covariance and fold change as well as a broader range of expression levels than commonly used reference genes.
- When analyzed with NormFinder, both typical and custom reference genes were considered suitable internal controls, but the custom selected genes were more stably expressed.
- geNorm produced a similar result in which most custom selected genes ranked higher (i.e.
- were more stably expressed) than commonly used reference genes..
- 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- The comparison of different softwares for RNAseq ana- lysis is a recurrent subject in the literature [12–14] and many authors argue over the benefits of using housekeep- ing genes or spike-in controls to scale the count data, yet the evaluation of the reference genes used for RNAseq data analysis is not as common.
- The most frequent approach is to take previously identified stably expressed genes, as done by B Zhuo, S Emerson, JH Chang and Y Di [11], this however does not ensure that the selected genes will show stable expression in the studied organism and conditions..
- Here we propose a simple and fast method to identify the most stably expressed genes for each experimental condition.
- Our method is aimed at differential expres- sion studies and represents a simple way to select cus- tom reference genes for any species or any type of experiments, so they can be used in the normalization step of differential expression analysis algorithms, and does not necessitate spike-ins.
- We tested the normalization of our RNAseq data using two sets of reference genes: commonly used ref- erence genes (Table 1) and the 104 stably expressed Arabi- dopsis genes proposed by B Zhuo, S Emerson, JH Chang and Y Di [11].
- The first set of reference genes was assessed for stability in three different permutations of the transcrip- tome sets as shown in Fig.
- For the three permutations of the transcriptome sets, im- portant fluctuations in the covariance were observed ran- ging from 2.9 to 49% (Fig.
- These results demonstrate that neither the commonly used reference genes, nor the 104 reference genes proposed by B Zhuo, S Emerson, JH Chang and Y Di [11] were stably expressed in our conditions..
- In order to search for more stably expressed genes, we developed a custom method to select reference genes using only one’s own RNAseq data.
- We then used the Table 1 Common reference genes used in this study for comparison against custom selected reference genes.
- remaining genes with lowest covariance were selected as reference genes (R-package “CustomSelection”.
- 2 the average expressing in log 2 TPM and covariance of the common reference genes (Common), the set of 30 genes from T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] (Czechowski et al.
- 2005), the set of 104 genes from B Zhuo, S Emerson, JH Chang and Y Di [11].
- In all pairings the custom selected reference genes show.
- We can see that the set of genes selected with the custom script shows lower fold change in all cases..
- As is shown in Table 2, in all the permutations the analysis without the use of refer- ences gives higher number of up-regulated genes than the analyses that use any of the reference sets while resulting in a lower number of down-regulated genes, possibly indicating a shift to downregulation that is not detected without reference genes..
- To further test the stability of the custom reference genes in our experiment, we used NormFinder [24] and.
- 1 Evaluation of covariance distribution in the three transcriptome data sets.
- a among a set of 14 commonly used reference genes and b a set of 104 reference genes proposed by B Zhuo, S Emerson, JH Chang and Y Di [11].
- geNorm [23] to compare the four sets of reference genes using log 2 transformed TPM values.
- The complete result is presented in the Tables S3-S5 of the Additional file 2..
- 4 the comparison of the set of com- mon reference genes against the custom selected ref- erence genes.
- The gene AT5G18800 (NDUFA8) which is in the set of common references was selected by the custom script in all three permutations and is shown with a purple border.
- Both sets of genes (cus- tom and common refences) were under the stability threshold of NormFinder (0.5), meaning that the soft- ware considers them suitable references genes, how- ever the custom selected genes (shown with a blue border) were more stable than the commonly used genes (shown in red, Fig.
- The use of reference genes in RNAseq studies is sug- gested in the literature [15–17], yet the methods for the selection of these genes are designed for qPCR data and require a set of pre-selected reference or target genes or the selection of conditions similar to that of one’s own experiment [22–25], which are not always available.
- For these reasons, we propose a new R-package which enables the selection of custom reference genes regardless of the organisms used or of the experimental conditions..
- 2 Comparison the four sets of reference genes in relation to covariance level and log 2 TPM for a Mlp37347 vs Control, b Mlp124499 vs Control and c) Mlp124499 vs Mlp37347.
- We first assessed whether the most commonly used reference genes (Table 1) or two sets of pub- lished reference genes for Arabidopsis [11, 26] were indeed stably expressed in our experimental condi- tions.
- 1 and Additional file 1, three sets of reference genes show a high level of co- variance in our experimental conditions, indicating that they were not suitable reference genes for our differential expression analysis..
- Having a high level of variability in the expression of the reference genes results in skewed quantitative ana- lysis and may cause the loss of some differentially.
- How- ever, there is extensive overlap in the deregulated genes (up- and down-regulated as shown in Additional file 2:.
- This fact demonstrates that all three sets perform well in detecting deregulated genes, however having a references gene set with lower co-variance results in the finding of more de-regulated genes (Additional file 2: Table S2 downregulated) since more subtle deregulation can be detected..
- 3 Comparison of the four sets of reference genes in relation the distribution of log 2 fold Change by -log 10 adjusted p -value for a Mlp37347 vs Control, b Mlp124499 vs Control and c Mlp124499 vs Mlp37347.
- 4 Comparison of custom selected reference genes (blue border) and commonly used reference genes (red border) with geNorm ranking, NormFinder stability index and covariance for a Mlp37347 vs Control, b Mlp124499 vs Control and c Mlp124499 vs Mlp37347.
- The bar with purple border indicates the gene (NDUFA8) selected with the custom script that is also present in the common references.
- Thus, to alleviate the bias inherent to the use of inappropriate reference genes, we devised a R- based pipeline to select custom reference genes for one’s own experimental data.
- 2 and 3, in all the pairings of the data used, the custom selected reference genes outperformed the other sets of reference genes in their expression stabil- ity, presenting lower fold changes and lower covariances..
- Our method allows the selection of genes more stably expressed and the selection of more genes as refer- ences (the final number is user defined, with the de- fault setting being 0.5% of the expressed genes), giving more reference points, hence more robustness, to the normalization of genes expressed at different levels.
- The advantage of having a user-defined thresh- old is that when there is extensive variation in the data, a stringent threshold may result in the selection of few or no genes as references.
- Our results show the need for a new R-based pipeline for the selection of custom reference genes in tran- scriptomic studies.
- This tool provides an alternative to spike-in controls and represents an improvement over pre-defined reference genes which may not be stably expressed in one’s own experimental conditions..
- (LEADING:4 TRAILING:4 SLIDINGWINDOW:4:20 MINLEN:20) and then the surviving paired reads were aligned to the TAIR10 assembly of the gen- ome of A.
- The general infor- mation of the sequencing results and mapping data is presented in Additional file 2: Table S6, the data- set was deposited in NCBI under BioProject PRJNA528094.
- We considered as reference the 0.5% of the active genes with the lowest covariance (R package “Cus- tomSelection” [29.
- to compare the custom selected reference genes against three sets of genes (a list of 14 commonly used housekeeping reference genes (Table 1), the ref- erence genes selected by T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] and the 104 reference genes selected by B Zhuo, S Emer- son, JH Chang and Y Di [11.
- Description of the R-package.
- This package has 4 functions, “Counts_to_tpm” (to convert read counts into TPM values using a named vector with gene lengths) and the read count data frame with the sam- ples as the column names and the genes as row names, “DAFS” (uses the data frame of TPM values, first object of the result from “Counts_to_tpm” to get the threshold for expressed genes), “gene_selection”.
- “DAFS” output a data frame with the selected refer- ence genes, their average TPM and the covariance of the TPM values) and “customReferences” (calculates internally “Counts_to_tpm”, “DAFS” and “gene_selec- tion” outputs the result from “gene_selection.
- The package also includes to datasets for testing: a data frame of counts created with the data used in this article and a named vector with the lengths of genes from Arabidopsis.
- Covariance level for each of the 30 genes selected by T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26] for each permutation (A: Mlp37347 vs Control.
- TAIR IDs of custom selected references for each transcriptome permutation.
- DESeq2 results summary of analysis without reference genes or with different reference sets (Custom selected, from T Czechowski, M Stitt, T Altmann, MK Udvardi and W-R Scheible [26], from B Zhuo, S Emerson, JH Chang and Y Di [11] or Commonly used references).
- Summary of the results of several analyses for all the genes evaluated in this article: Column A: TAIR ID.
- Column B: ranking calculated with geNorm with the function.
- Column D: covariance of the TPM values.
- Column F: the common standard deviation of the expression of a gene between two samples calculated with NormFinder;.
- Column H: log2- transformed fold change of each gene calculated with DESeq2 without using reference genes.
- Column I: adjusted p value of the gene deregulation calculated with DESeq2 without using reference genes;.
- replicate identification, number of sequenced reads, average length of the separation between two paired reads, number of reads after trimming and filtering and number of aligned reads for each of the 4 replicates of the three samples used in this study..
- Highly integrated single-base resolution maps of the epigenome in Arabidopsis .
- The transcriptional landscape of the yeast genome defined by RNA sequencing.
- Mapping and quantifying mammalian transcriptomes by RNA-Seq.
- RNA-Seq: a revolutionary tool for transcriptomics.
- Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells.
- Measurement of mRNA abundance using RNA- seq data: RPKM measure is inconsistent among samples.
- Models for transcript quantification from RNA-seq.
- A scaling normalization method for differential expression analysis of RNA-seq data.
- Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial.
- Identifying stably expressed genes from multiple RNA-Seq data sets.
- Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.
- Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data.
- A comparison of methods for differential expression analysis of RNA-seq data.
- oligonucleotides enable absolute normalization of small RNA-Seq data.
- External calibration with Drosophila whole-cell spike-ins delivers absolute mRNA fold changes from human RNA-Seq and qPCR data..
- Normalization of RNA-seq data using factor analysis of control genes or samples.
- Comparison of methods for differential gene expression using RNA-seq data.
- mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-Seq studies.
- of reference genes: a serious pitfall undervalued in reverse transcription- polymerase chain reaction (RT-PCR) analysis in plants.
- Genome-wide identification and testing of superior reference genes for transcript normalization in Arabidopsis .
- CustomSelection: Custom selected reference genes outperform pre-defined reference genes in transcriptomic analysis.
- In: This package calculates the Transcripts Per Million data frame from the counts matrix, calculates the minimum expresion level for a gene to be considered expressed in each sample and selects as reference genes those with lowest covariance.
- Differential analysis of count data – the DESeq2 package.
- Finding the active genes in deep RNA-seq gene expression studies.
- accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.
- Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt