« Home « Kết quả tìm kiếm

Massively parallel gene expression variation measurement of a synonymous codon library


Tóm tắt Xem thử

- Massively parallel gene expression variation measurement of a synonymous codon.
- Background: Cell-to-cell variation in gene expression strongly affects population behavior and is key to multiple biological processes.
- While codon usage is known to affect ensemble gene expression, how codon usage influences variation in gene expression between single cells is not well understood..
- Results: Here, we used a Sort-seq based massively parallel strategy to quantify gene expression variation from a green fluorescent protein (GFP) library containing synonymous codons in Escherichia coli .
- This trend is not observed for codons with high Normalized Translation Efficiency Index (nTE) scores nor from the free energy of folding of the mRNA secondary structure.
- Additionally, the drastic change in mean protein abundance with small changes in protein noise seen from our library implies that codon optimization can be performed without concerning gene expression noise for biotechnology applications..
- Keywords: Sort-seq, Protein abundance, Codon usage, Single-cell, Gene expression variation.
- The underlying causes of gene expression variation are of particular importance to the fundamental understanding of cellular processes, which may enable the development.
- Cell-to-cell variation in protein abundance can arise from transcriptional, translational, and other processes that govern gene expression.
- How transcriptional pro- cesses affect the variability of gene expression between single-cells has been extensively studied [11–13].
- Full list of author information is available at the end of the article Schmitz and Zhang BMC Genomics https://doi.org/10.1186/s z.
- Codon usage and bias also affect translational dynamics with low abundance tRNA isoacceptors pausing ribosomes [29] and controlling ribosomal traffic [30], particularly at the start of a gene sequence [31].
- Despite significant knowledge on the effects of codon usage on mean gene expression, how and to what extent codon usage affects cell-to-cell variability in protein abundance is poorly understood.
- Multiple methods were employed to validate the Sort-seq for high-throughput variability measure- ment.
- We found that codon usage has a large influence on the mean and variance of GFP abundance.
- These results illuminate the influence of codon usage to variations in protein abundance and can be potentially extended to study protein variations in other growth conditions and from other microorganisms..
- To systematically study the influence of codon usage in cell-to-cell protein variability, a GFP library was.
- All GFP coding sequences were placed to the 3′ of a red fluores- cent protein (RFP) with fixed codon usage in a polycis- tronic structure under the control of the same promoter (Fig.
- The synonymous co- dons were placed at the N-terminal of GFP coding se- quence because mean protein abundance is more sensitive to codon usage in this region due to its poten- tial to influence translation initiation, therefore allowing us to analyze protein variabilities across a wide range of protein abundance [22].
- Sort-Seq for high throughput protein variability analysis Protein variability was previously measured by quantify- ing single-cell fluorescence of a fluorescent protein using either microscopy or flow cytometry.
- To solve this problem, we aimed to use Sort-seq [34] to quantify the variations of the GFP library in a massively parallel fashion (Fig..
- An increasing number of virtual bins were applied to each sample based on single-cell fluorescence intensity using either linear or exponential fluorescence scales to simulate the bins used in Sort-seq..
- Compared to previous Sort-seq work for measuring mean protein abundance, a much higher number of bins are used here, reflecting the challenge in accurate quantifying of gene expression variations [34]..
- The Sort-seq experiment was performed three times to examine consistencies between experiments.
- members in the designed library) were sequenced, repre- senting 83% coverage of the library.
- Here we calculated variabil- ities from a fitted Gamma distribution, instead of dir- ectly from the binned distribution, to reduce the error caused by treating fluorescence as a discrete value at each of the individual bins.
- 1 Sort-seq for massively parallel measure of protein variability.
- b Experimental procedure for massively parallel measurement of gene expression variation using Sort-seq.
- fluorescence from independent Sort-seq measurements (Supplementary Figure S5).
- With a minimum CPS cut- off of 20, we obtain good correlation between two separ- ate Sort-seq measurements for both mean GFP fluores- cence (R 2 >.
- Additionally, for sequences with CPS greater than 20, we examined GFP mean and CV 2 values mea- sured from three independent Sort-seq experiments..
- The reconstructed Gamma distribution of the remaining sequences overlaps closely with Sort-seq mea- sured fluorescence distribution across replicates (Fig.
- Additionally, we com- pared the mean GFP fluorescence measured from Sort- seq with those measured from flow cytometry for 16 randomly-selected individual GFP sequences which showed strong correlation (R 2 = 0.94) for mean GFP fluorescence, further validating our method (Fig.
- Codon usage correlates with mean and variance but not CV 2.
- To understand how codon usage affects protein variabil- ity, GFP sequences were analyzed based on a few com- monly used quantitative metrics of the 8 variable codons, including the tRNA Adaptation Index (TAI), the Codon Adaptation Index (CAI), the Normalized Transla- tion Efficiency Index (nTE) scores and the folding free energy of the mRNA secondary structure (Fig.
- Thus, it is likely that the folding energy of GFP mRNA is affected by ribosome translation of the 5′ RFP sequence.
- Altering the codon usage has a significant effect on the mean expression level, which in turn affects variance and CV 2 .
- To isolate the influence of codon usage through mean expression level, GFP variance and CV 2 are plotted against mean GFP level.
- coli native gene expression [17].
- At high protein abundance, codon usage has little effect on protein CV 2 .
- Thus, codon usage affects CV 2 mostly via affecting mean GFP level.
- Meanwhile, codon usage affects protein variance at all gene expres- sion levels.
- Codon usage Bias in the E.
- 0.05) between protein CV 2 with any of the used codon metrics was observed (Fig.
- coli native genes are in agree- ment with results from Sort-seq analysis of our GFP library.
- Therefore, we conclude that codon usage only influences protein noise by affecting their mean.
- The analyses performed in this study show that codon usage has a strong influence on the mean protein abundance and variance, with little influence on cell-.
- 2 Validation of protein distribution reconstructed from Sort-seq.
- b Sort-seq-reconstructed single-cell fluorescence (pink columns) and the fitted curves (black) to a Gamma distribution from three independent Sort-seq experiment (from top the bottom) for the same six library isolates as shown in (a).
- c The correlation on mean fluorescence measured from Sort-seq and flow cytometry for another sixteen randomly isolated library members.
- The drastic change in protein abundance with small changes in variation indicates that for bio- technology applications, codon optimization can be per- formed to control gene expression levels without concerning gene expression noise [46]..
- Our Sort-seq based method represents a high- throughput strategy for measuring gene expression vari- ability.
- GFP protein mean abundance, variance, and CV 2 were calculated based on data measured from Sort-seq experiment.
- The number of bins used for the Sort-Seq protocol was determined using the flow-cytometer data from the ten individual library members.
- From the Sort-seq experiment, the variance (a) and CV 2 (b) are compared to mean GFP protein abundance and different codon metrics including the TAI score, CAI score, nTE score and ΔΔ G of the transcript for (N = 219) library members.
- Cells are only included that fluo- resced RFP above the maximum RFP fluorescence of the wild type E.
- The cells were sorted for a total of eight hours during the second Sort-seq experiment, until a total of 2.16 million cells had been sorted across all 20 bins (Supplementary Figure S3).
- Fewer cells were sorted during the first and third Sort-seq experiments due to sorting time constraints.
- 5 Genome analysis of codon usage for 735 genes.
- A total of 3.9 million reads were generated on the second Sort-seq experiment..
- To examine individual library members, 1 μL of the li- brary aliquots was plated onto an agar-LB plate contain- ing 30 mg/mL of chloramphenicol.
- The quality of the library was confirmed by high- throughput sequencing prior to sorting to ensure proper library construction and transformation.
- In detail, an ali- quot of the library culture was grown in 5 mL LB medium overnight.
- High-throughput sequen- cing produced 2 million reads with 85% of reads as cor- rect members of the library.
- Using all three Sort-seq experiments, error bars are calculated for each library member for both mean GFP fluores- cence and CV 2 of GFP fluorescence.
- Cut-offs in percent error were determined by natural cut-offs in the distribution of the percent error (Supplementary Figure S6).
- To calculate protein variability, each of the 20 bins was assigned a relative protein abundance value based on the fluorescence of each bin.
- where L represents the length of the sequence in the number of codons, and w k is the weight of the k th codon in the gene sequence.
- tGCN ij is the gene copy number of the jth tRNA that recognizes the ith codon and s ij is a selective constraint on the efficiency of codon-anticodon coupling as reported previously [37]..
- where L is the length of the sequence in number of co- dons and w ik is the weight of the kth codon..
- where c ij is the sum of the counts of codon i in gene j and a j is the transcript abundance of gene j considering all genes in genome g.
- L is the length of the sequence in number of codons and nTE ik is the weight of the kth codon.
- The free energy of folding for the sec- ondary structure of the transcript was calculated using NUPACK [40] for a region 42 base pairs before and after the codon library.
- Results for each of the bins from one of the sort-seq experiments.
- S4: Sort-seq RFP fluorescence.
- S6: Percent error between three sort-seq experiments.
- Using all three sort-seq experi- ments, percent error is calculated in the measurement of both mean GFP fluorescence and CV 2 .
- S7: Sort-seq reconstructed singe cell fluores- cence and the fitted curves to a Gamma distribution for six library iso- lates.
- S8: The GC percent content of the synonymously mutated sequence is compared to the mean and CV 2 GFP fluorescence of each sequence..
- Barak Cohen for helpful discussions and advice on Sort-seq..
- This work is supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number R35GM133797.
- The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
- The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data, and in writing the manuscript..
- Stochastic gene expression in a single cell.
- Noise in gene expression: origins, consequences, and control.
- Heterogeneity coordinates bacterial multi-gene expression in single cells.
- Nature, nurture, or chance: stochastic gene expression and its consequences.
- Cell-to-cell variability in the propensity to transcribe explains correlated fluctuations in gene expression..
- Promoter architecture dictates cell-to-cell variability in gene expression.
- Coding-sequence determinants of gene expression in Escherichia coli.
- Codon usage is an important determinant of gene expression levels largely through its effects on transcription.
- Codon Bias as a means to fine-tune gene expression.
- Codon usage of highly expressed genes affects proteome-wide translation efficiency.
- Predicting gene expression level from relative codon usage bias: An application to escherichia coli genome.
- Codon usage influences the local rate of translation elongation to regulate co- translational protein folding.
- Efficient translation initiation dictates codon usage at gene start.
- Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations.
- The codon adaptation index -a measure of directional synonymous codon usage bias, and its potential applications.
- Solving the riddle of codon usage preferences: a test for translational selection.
- BglBrick vectors and datasheets: a synthetic biology platform for gene expression

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt