« Home « Kết quả tìm kiếm

3’Pool-seq: An optimized cost-efficient and scalable method of whole-transcriptome gene expression profiling


Tóm tắt Xem thử

- 3 ’ Pool-seq: an optimized cost-efficient and scalable method of whole-transcriptome gene expression profiling.
- unprecedented accuracy, but the high costs of full-length mRNA sequencing have posed a limit on the accessibility and scalability of the technology.
- To address this, we developed 3 ’ Pool-seq: a simple, cost-effective, and scalable RNA-seq method that focuses sequencing to the 3 ′ -end of mRNA.
- Results: Thorough optimization resulted in a protocol that takes less than 12 h to perform, does not require custom sequencing primers or instrumentation, and cuts over 90% of the costs associated with TruSeq, while still achieving accurate gene expression quantification (Pearson ’ s correlation coefficient with ERCC theoretical concentration r = 0.96) and differential gene detection (ROC analysis of 3 ’ Pool-seq compared to TruSeq AUC = 0.921).
- The 3 ’ Pool-seq dual indexing scheme was further adapted for a 96-well plate format, and ERCC spike-ins were used to correct for potential row or column pooling effects.
- Transcriptional profiling of troglitazone and pioglitazone treatments at multiple doses and time points in HepG2 cells was then used to show how 3 ’ Pool-seq could distinguish the two molecules based on their molecular signatures..
- Conclusions: 3 ’ Pool-seq can accurately detect gene expression at a level that is on par with TruSeq, at one tenth of the total cost.
- Keywords: Next generation sequencing, RNA-seq, Transcriptomics, 3 ′ -RNA sequencing, 3 ’ Pool-seq, Differential gene expression.
- One of the most widely used kits for sequencing mRNA is TruSeq [6–8], which uses salt-catalyzed hydrolysis, random priming, and end repair/ligation to create sequence-ready libraries from bulk RNA [9].
- 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Herein, benchmark RNA from wild-type and GFAP- IL6 mice along with ERCC RNA standards were utilized to design and optimize a process called 3’Pool-seq,.
- 3’Pool-seq allows the user to create and sequence 3′-mRNA libraries in under a day for less than $15 per sample ($3 library preparation and $12 sequencing cost per sample), while still main- taining a standard of quality with regard to data gener- ation and gene expression quantification that is on par with TruSeq.
- The robustness of 3’Pool-seq was further demonstrated with as little as 10 ng input RNA.
- Design of 3 ’ Pool-seq.
- A schematic representation of the 3’Pool-seq method for gene expression quantification is depicted in Fig.
- The same Template Switching Oligo that is used in SMART-seq is added to the reaction to provide a handle at the 3′-end of the cDNA to allow full-length cDNA amplification..
- Further- more, since the 3’Pool-seq protocol uses oligo-dT primers linked to standard indexed TruSeq i7 adaptors (unlike the custom adapter primer sequences used in Drop-seq), the resulting 3′-end cDNA fragments can be easily PCR-amplified using standard TruSeq i7 and Nex- tera i5 primer reagents.
- Furthermore, since 3’Pool-seq uses the 3′-end fragments to quantify transcript abundance, fewer sequencing reads are needed per sample, further reducing the sequencing cost..
- Gene expression quantification using 3 ’ Pool-seq.
- The performance of 3’Pool-seq was first assessed in terms of its accuracy, sensitivity, and reproducibility in quantifying gene expression.
- Sequencing libraries were generated using 3’Pool-seq and TruSeq from total RNAs purified from brain cortical samples of three wild-type (WT) C57BL/6 mice and three GFAP-IL6 mice [20]..
- For 3’Pool- seq, on average, 6.4 million 75 base-pair single-end se- quencing reads were generated for each sample.
- A side-by-side comparison of the alignment and gene feature mapping metrics between 3’Pool-seq and TruSeq samples are.
- The majority of the 3’Pool-seq reads (87% of total reads) can be mapped to the reference gen- ome, comparable to mapping rates for TruSeq samples (94.
- The percentage of uniquely mapped reads for 3’Pool-seq (72%) is slightly lower than Truseq (87.
- As expected, a higher percentage of the 3’Pool-seq reads were mapped to 3′ Untranslated Regions (UTR).
- 3’Pool-seq gave a single peak at the last exon of the Apoe gene covering the 3’UTR and the 3′-end of the protein coding region while Truseq reads were mapped throughout the gene body.
- The distribution of reads for the top 1000 most abundant genes is also highly biased towards to 3′-end of the gene body as expected for 3 ’ Pool-seq (Fig.
- 1 A schematic representation of the 3 ’ Pool-seq protocol.
- 3’Pool- seq derived expression values were then compared to theoretical ERCC spike-in concentrations.
- An average Pearson correlation coefficient r of 0.968 was observed, indicating gene expression quantification from 3’Pool- seq is highly accurate (Table 1).
- It is worth noting that for both ERCC metrics, 3’Pool-seq outperformed TruSeq slightly (Table 1).
- To assess the sensitivity of 3’Pool-seq at different sequencing depths, we down-sampled reads gradually from 10 million uniquely mapped reads to half a million uniquely mapped reads and assessed how many genes can be detected at different abundance thresholds (Fig.
- This suggests that ~ 2 million uniquely mapped reads would be minimally recommended for 3 ’ Pool-seq..
- These performance metrics, taken together, indicate that 3’Pool-seq is highly accurate, reproducible, and sensitive in gene expression quantification..
- Performance of 3 ’ Pool-seq in detecting differential gene expression.
- To assess the ability of 3’Pool-seq to detect differentially expressed genes (DEGs) it was bench- marked against the TruSeq protocol.
- With these DEGs identified from TruSeq, we constructed a Receiver Operating Charac- teristics (ROC) analysis to assess the recall rate of TruSeq DEGs by 3’Pool-seq where genes were ranked by their dif- ferential expression p-value.
- We also conducted two separ- ate 3’Pool-seq library preparations on the same set of samples to assess the technical reproducibility of 3’Pool- seq.
- In addition, the effect size of the DEGs (i.e.
- expression fold changes between GFAP-IL6 and wild-type animals) quanti- fied by 3’Pool-seq and TruSeq are correlated with a Pear- son’s correlation coefficient r = 0.654 (Fig.
- Robustness of 3 ’ Pool-seq in low-input samples.
- Here, the performance of 3’Pool- seq was tested with different input amounts of total RNA, ranging from 0.5 ng to 50 ng.
- Similarly, stronger gene expression correlations were ob- served among replicates when higher amounts of RNA Table 1 Sequencing and mapping quality metrics comparison between 3 ’ Pool-seq and TrusSeq.
- Shown in the table are the mean and standard deviation of the different quality metrics.
- Quality Metrics 3 ’ Pool-seq mRNA TruSeq.
- Plate-based 3 ’ Pool-seq.
- The 3’Pool-seq library preparation protocol was further adapted to a 96-well plate format to enable high- throughput RNA-seq profiling experiments.
- The 96-well format is ideally suited for the 3’Pool-seq dual indexing.
- 2 3 ’ Pool-seq provides robust and reproducible gene expression quantification.
- a Read distribution from full-length mRNA-seq (Truseq) and 3 ’ Pool-seq in the ApoE gene region.
- Reads generated using 3 ’ Pool-seq are mapped preferentially towards the 3 ′ -end of the gene.
- b Correlation of the abundance levels of ERCC spike-ins between 3 ’ Pool-seq quantifications and actual pre-mixed concentrations.
- c Correlation of the abundance levels of ERCC spike-ins between 3 ’ Pool-seq replicates.
- d Correlation of gene expression values (log 2 TPM) between 3 ’ Pool-seq replicates.
- f Distribution of 3 ’ Pool-seq reads is skewed towards the 3 ′ -end of the gene body as expected.
- 3 Performance of 3 ’ Pool-seq in detecting differential expressed genes.
- b Correlation of the log 2 (Fold-Change) quantified by 3 ’ Pool-seq and TruSeq for DE genes identified by the TruSeq protocol.
- 4 Performance of 3 ’ Pool-seq with low RNA input samples.
- 1 in the 3 ’ Pool-seq run with 50 ng RNA input) between 10 ng input RNA 3 ’ Pool-seq run and 50 ng input RNA 3 ’ Pool-seq run.
- PCA analysis of the ERCC spike-ins quantified in our PPARγ test case.
- While whole transcriptome profiling is a powerful technique that enables genome-wide interrogation of gene expres- sion, current practices are often limited to taking snap- shots of the transcriptome at a single condition due to the cost and time required for traditional RNA-seq ex- periments.
- Thus, the 3’Pool-seq method presented here provides a cost- and time-effective solution for large- scale RNA-seq studies, enabling thorough interrogation of transcriptome changes at multiple time points and conditions..
- The 3’Pool-seq method integrates several technology advancements, leveraging the 3′-barcoding and early pooling strategies commonly used in single-cell RNA- seq studies and template switching and tagmentation techniques for efficient cDNA amplification and frag- mentation.
- 5 Plate-based format of 3 ’ Pool-seq applied to differentiate gene expression responses between troglitazone and pioglitazone treatments.
- a Layout of plate-based 3 ’ Pool-seq using row pooling scheme.
- By using standard TruSeq i7 and Nextera i5 indexed primers, the final 3’Pool-seq libraries are fully compatible with stand- ard Illumina sequencing protocols without the need for any custom sequencing reagents.
- Overall, the 3’Pool-seq library preparation method costs ~$3 per sample and re- quires only 2–3 h hands-on time (Table 2), significantly reducing the cost and time for library preparation.
- Fur- thermore, it was demonstrated that 3’Pool-seq generated high quality libraries with >.
- 66 ± 1.5%) of the uniquely mapped reads located in usable gene feature re- gions, as well as a very low percentage of reads (<.
- By using ERCC spike-in standards, it was shown that the 3’Pool-seq method was able to accurately and reproducibly quantify gene expression levels.
- More importantly, 3’Pool-seq was able to reproduce the differ- entially expressed genes from the standard TruSeq protocol at a small fraction (5%) of the library prepar- ation cost and one third of the hands-on time.
- 4 million raw reads) would capture the majority of the expressed genes, allowing ef- ficient multiplexing of a large number of samples in a single sequencing run..
- In accordance with research that compared commer- cial 3′-end sequencing with full-length RNA-seq [23], we found that full-length sequencing with TruSeq did in fact detect more differentially expressed genes than 3’Pool-seq (Additional file 1: Figure S1.A).
- Not surprisingly, the lengths of the differentially expressed genes detected by 3’Pool-seq do not show a size bias (Additional file 1: Figure S1.B)..
- Further studies would therefore be required to determine how 3’Pool-seq and/or TrueSeq compare to these methods.
- Regardless, a gene ontology (GO) analysis of the DEGs uncovered by 3’Pool-seq and TruSeq reveals almost identical pathways (Additional file 3:.
- Another innovation of the 3’Pool-seq method is the support for 96-well plate format for library preparation through row or column-based pooling, and the use of ERCC spike-ins and computational procedures to assess and correct for pooling confounding effects.
- As shown in the PPARγ test experiment, proper design of the pooling strategy and the correction of row or column- based pooling confounds are critical for differential gene expression analysis.
- Furthermore, the 96-well plate based 3’Pool-seq library preparation format can easily be adapted for automation..
- Table 2 Cost, Time, and Qualitative Metrics comparison of 3 ’ Pool-seq and TrusSeq, as well as two additional 3 ′ -end sequencing techniques: Plate-Seq and DRUG-seq.
- 3 ’ Pool-seq TruSeq Plate-Seq DRUG-seq.
- An ROC analysis com- paring DRUG-seq to TruSeq was performed and it gave an average AUC of as compared to the 0.921 value generated in the 3’Pool-seq experiments.
- In this paper we have de- scribed the strengths of 3’Pool-seq with regard to accurate ERCC measurements, quality metrics such as mapping rate, and DEG detection that is on par with TruSeq..
- With much reduced cost, streamlined experimental procedures, high data quality for gene expression quanti- fication and differential analysis, robust performance with low RNA inputs, and flexible support for plate- based library format, 3’Pool-seq not only provides sig- nificant cost and time saving for existing RNA-seq appli- cations but also opens up new opportunities for future large-scale transcriptomics studies..
- All procedures were performed in compliance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals under the approval of the Pfizer Cambridge Institutional Animal Care and Use Commit- tee.
- 3’Pool-seq data can be processed with standard RNA- seq pipelines with simple modifications.
- After standard sample de-multiplexing (bcl2fastq), an extra step was added to trim off polyA sequences (minimal length of 12 nucleotides) located towards the 3′-end of the reads (after 25th position), as sequencing reads from shorter fragments could extend into the polyA tails of mRNA transcripts.
- Since 3’Pool- seq sequences only the 3′-end of mRNA transcripts, no gene length normalization was applied to read counts when calculating Transcripts Per Million (TPM) values..
- For the plate-based 3’Pool-seq study of troglitazone and pioglitazone treated samples, row number (i.e.
- A Comparison of DEGs detected by TruSeq and 3 ’ Pool-Seq.
- A) Venn Diagram depicting the DEGs that are detected by TruSeq, and/or 3 ’ Pool-Seq at the indicated cutoffs.
- B) A histogram showing Mean TPM, transcript length, and absolute log 2 (Fold- Change) distributions of DEGs detected by TruSeq and/or 3 ’ Pool-seq..
- A per-sample overview of sequencing metric details that were used to construct Table 1 of the main manuscript..
- A Gene Ontology analysis of the pathways ranked by p -value represented by the DEGs detected by 3 ’ Pool-seq in the Wild-Type vs.
- A Gene Ontology analysis of the pathways ranked by p-value represented by the DEGs detected by TruSeq in the Wild-Type vs.
- GS conceptualized 3 ’ Pool-seq and carried out 3 ′ -end sequencing.
- The entire study, including the design of the study and collection, analysis, and interpretation of data was funded by Pfizer, Inc.
- The funding bodies played no role in the design of the study and collection, analysis, and interpretation of the data and in writing the manuscript..
- All animal procedures were performed in compliance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals under the approval of the Pfizer Cambridge Institutional Animal Care and Use Committee.

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt