« Home « Kết quả tìm kiếm

Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under overexpression conditions


Tóm tắt Xem thử

- Genome re-sequencing and reannotation of the Escherichia coli ER2566 strain and transcriptome sequencing under.
- annotations, the annotation of the ER2566 strain was incomplete, with missing gene names and miscellaneous RNAs, as well as uncorrected annotations of some pseudogenes.
- Here, we performed a systematic reannotation of the ER2566 genome by combining multiple annotation tools with manual revision to provide a comprehensive understanding of the E.
- Results: The reannotation included noteworthy corrections to all protein-coding genes, led to the exclusion of 190 hypothetical genes or pseudogenes, and resulted in the addition of 237 coding sequences and 230 miscellaneous noncoding RNAs and 2 tRNAs.
- In addition, we further manually examined all 194 pseudogenes in the Ref-seq annotation and directly identified 123 (63%) as coding genes.
- Whereas no mutations were detected in response to consecutive subculture, overexpression of the human papillomavirus 16 type capsid led to the identification of a mutation (position 1,094,824 within the 3 ′ non-coding region) positioned 19-bp away from the lac I gene in the transcribed RNA, which was not detected at the genomic level by Sanger sequencing..
- 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
- The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data..
- Conclusion: The ER2566 strain was used by both the general scientific community and the biotechnology industry..
- Reannotation of the E.
- coli ER2566 strain not only improved the RefSeq data but uncovered a key site that might be involved in the transcription and translation of genes encoding the lactose operon repressor.
- This study might facilitate a better understanding of gene function for the ER2566 strain under external burden and provided more clues to engineer bacteria for biotechnological applications..
- The Escherichia coli expression system is one of the most well-characterized classical expression systems for recombinant protein expression in biological science.
- coli ER2566 strain is a common laboratory tool that takes advantage of the ex- pression and growth properties of the B line strain [5]..
- rather, sequencing advancements have led to the deposition of an increased number of “draft” bacterial genomes into public databases [8], which tend to be incomplete and fragmented.
- High-quality annotations of bacterial genomes are crit- ical to understanding biological processes and enhancing these genomes has become a major task in the post- genomic era.
- Indeed, NGS technology was used by Luhachack and colleagues to successfully identify the function of the transcription factor YcjW as a regulator of the complex interaction between carbohydrate metab- olism and H2S production in bacteria [12], and whole- genome re-sequencing in E.
- led to the identification of mutations that conveyed a selective growth advantage during adaptation to a glycerol-based growth medium [13].
- In this study, we employed a series of automated an- notations and combined this with manual inspections with high-throughput analyses to reannotate the ER2566 genome.
- We detected a muta- tion located within the 3′ non-coding region position) 19-bp away from the lacI gene at the RNA level, which may be involved in the transcription and translation of genes encoding the lactose operon repres- sor.
- Our reannotation and sequencing results will pro- vide a better understanding of some of the biological processes of the ER2566 strain, and may offer insight into future biotechnological applications in bacterial engineering..
- Unlike the human genome, which is about 1.3% protein coding, 90% of the bacterial genome codes for proteins, with only short intergenic stretches [18].
- Precise genomic annotation is thus fundamental to the further interpretation of the biochemical and physio- logical characteristics of organisms, to provide detailed information on protein coding sequences, pseudogenes, non-coding RNAs, repeat sequences and various other genomic data [19].
- In this study, we reannotated the genome of the E.
- We employed a series of automated annotation tools combined with manual inspection to reannotate the ER2566 genome (Fig.
- How- ever, due to the limitation of its algorithm, RATT cannot effectively identify pseudogenes, indels, etc.
- For the systematic reannotation of the CDS, the predic- tion and identification of coding genes occurred in two stages (Fig.
- In the first stage, Prodigal software was used to predict a total of 4180 CDSs on the complete ER2566 genome deposited in GenBank (accession num- ber NC_CP014268.2).
- Using sequence alignment to the Swiss-Prot database [26] by Blastp [27], with a threshold e-value of <.
- 10 − 6 , all CDSs were annotated to provide accurate information regarding the sequences and func- tions of the enrolled proteins.
- A total of of the 4180 CDSs were annotated as protein-coding genes, with the remaining CDSs (136 CDSs.
- 3.3%) marked as hypothetical genes, with no registration in the Swiss- Prot database, and 21 CDSs (0.5%) marked as pseudo- genes by manual inspection.
- 1 Flowchart depicting the pipeline and methods used for bacterial genome reannotation of the E.
- This led to a total of 4066 protein-coding CDSs included in the reannotation of the ER2566 genome, along with 136 CDSs for hypothetical genes..
- In total, 123 of the 194 pseudogenes were directly identified as coding genes and are now found in the reannotated list.
- and insB, which are homologues of the insertion element protein, IS1, and related to DNA binding and transpo- sase activity (Fig.
- These three new genes are flanked by the lactose permease gene lacY upstream and the lactose operon repressor gene lacI downstream, both of which are essential to the lac operon system..
- This reannotation had uncovered genes related to the transcription and translation of genes encoding the lac- tose operon repressor in ER2566 strain..
- 2 Examples of the differences between the original RefSeq annotation and our reannotation.
- a In the reannotation, one pseudogene (RS16270) was identified as two genes, ins A and ins B, which show strong homology to the insertion element protein, IS1.
- b In the reannotation, two pseudogenes were re-identified as two genes ( lac Z1 and lac Z2), whereas the hypothetical protein was reannotated and shown to be highly homologous with the DNA-directed RNA polymerase gene ECBD_2906 from E.
- the next ring shows the positions of BLAST hits between the BL21(DE3) genome and the ER2566 genome detected by Blastn.
- The height of each line in the third ring showing BLAST results is proportional to the percent identity of the hit, and overlapping hits renders as darker lines.
- Table 1 Overview of the differences between the original annotation, the reannotation and BL21(DE3) annotation.
- The unidentical 7.7% CDSs annotated in BL21 gen- ome as compared to ER2566 is comprised of ~ 7% se- quence corresponding to the hybrid part in ER2566 and other CDSs that does not have an official gene name (Additional file 4).
- This rean- notation effectively eliminated the possibility of false in- terpretations introduced by the original annotation and provides a more integral view of the regulatory networks in ER2566 strain (Table 1)..
- To date, about 119 RNAs molecules in the E.
- In addition, most of the ncRNAs functions have been veri- fied experimentally.
- For instance, about 94 of the nucleoid-associated ncRNAs molecules play key func- tions in DNA-RNA interactions [31].
- Thus, we next performed variant calling to identify any nucleotide-level differences (i.e., single nucleotide poly- morphisms (SNPs), insertions and deletions (indels), and/or structural variations) in the ER2566 strain.
- Here, we used both methods to interrogate the ER2566 genome (Fig.
- First, we re-sequenced the whole genome of the ER2566 strain grown in our laboratory under consecutive subcul- turing.
- The raw reads were mapped to the C2566 reference genome (NC_CP014268.2) with a good coverage depth (>.
- Various de novo assembly softwares were used to construct a confident and long bp) scaffold, and assembly results for each step were assessed by alignment of final sequences back to the reference genome (Fig.
- The technical difference between short-read sequencing and single mol- ecule long-read sequencing may result in the generation of inverted region.
- coli ER2566 strain in our lab did not cause mutation in the genomic sequence..
- Thus, to identify any variations in the ER2566 genome due to overexpression pressure, we used RNA-seq to analyze the transcriptomes of the ER2566 strain growing at 37 °C without plasmids (B37, three replicates) or overexpressing human papillomavi- rus 16 type capsid protein L1 via plasmid-based indu- cible expression (Y37, three replicates) (Fig.
- From a total of 75.4 million 125-bp paired-end reads, 73.9 mil- lion reads (98%) were mapped to the reference genome (NC_CP014268.2) in B37 (control) samples.
- 0 million reads, only 29.6 million (39%) reads mapped to the refer- ence genome, which was significantly lower than that for the B37 samples.
- The cause of lower mapping rates in Y37 samples was due to the large number of mRNAs transcribed by the engineered plasmids which is not re- lated to genome sequence (Table 2)..
- The variant detection analysis using BactSNP revealed one mutation in the Y37 samples position;.
- 5b), located in the non-coding region 19-bp down- stream from the lacI gene.
- Interestingly, lacI is the high- est transcribed gene in the Y37 samples by comparing.
- The analysis of transcriptome data indicated a mutation of C to T substitution at position 109,824 in the three replicates of Y37 samples, as confirmed by nearly 100% mutation rate in all observed reads (910,914 Out 911,345 reads).
- Surprisingly, such mutation could not be detected in the bacterial genome by Sanger se- quencing (Fig.
- It is interesting to clarify whether the overexpression of lacI could lead to the increase of RNA methylation rate.
- Here, we employed a series of automated annotation tools along with manual inspection to reannotate the ER2566 genome.
- Moreover, there is an increase in the num- ber of miscellaneous RNAs from 15 to 245.
- These new additions will help to provide a more informative profile of the ER2566 genome and provide a better base for.
- exploring the molecular mechanisms of stress in re- sponse to changes in the bacterial cellular milieu.
- Never- theless, this reannotation still has further room for improvement, with the continuing advancement of the algorithm, the accumulation of next-generation se- quence data and proteomics data.
- We also carried out whole-genome sequencing and RNA-seq to detect se- quence variants under different conditions of external pressure, and detected one mutation within the non- coding region of the lacI gene.
- b) Visualization of BAM files of the B37 (left panel) and Y37 (right panel) in the Integrative Genomics Viewer.
- T, located within the 3 ′ non-coding region of the transcription factor gene lac I.
- c) Mutation detected by Sanger sequencing of the B37 and Y37 genomic samples.
- sequencing, which may indicate that this is an RNA modification related to the biological strain of overex- pression pressure in ER2566..
- The ER2566 strain is used widely within the scientific community, and our reannotation not only improved the characterization of the strain but uncovered a key site that might be involved in the transcription and transla- tion of genes encoding the lactose operon repressor..
- DNA libraries for Illumina sequencing were constructed according to the manufac- turer’s specifications (Thermo Fisher Scientific).
- Cells were harvested by centrifugation at 7000 rpm for 10 min at room temperature and total RNA was ex- tracted using the MasterPure RNA Purification Kit, ac- cording to the manufacturer’s protocol (Lucigen).
- Genome reannotation of the ER2566 strain.
- The quality of the raw reads were determined using FastQC [42], and appropriately truncated and filtered using fastp [43] to remove low-quality bases and Illu- mina adapter contamination with default parameters..
- The list of the ruled-out genes in the reannotation..
- The list of newly added protein-coding in the reannotation..
- The list of complete CDSs in the reannotation..
- Additional file 5.
- The list of newly added miscellaneous ncRNAs in the reannotation..
- The transcription level of the newly added ncRNAs..
- Additional file 9.
- The reannotation of ER2566 genome..
- Rebecca Jackson for polishing and revising of the final manuscript..
- The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript..
- The genomic sequence of the E.
- Whole-genome re-sequencing and RNA-seq data in this study were respectively deposited in the NIH Sequence Read Archive (www.ncbi.nlm.nih..
- The assembly and reannotation of the ER2566 genome had been introduced a supplementary file (Additional files 9 and 10)..
- Complete Genome Sequence of the Engineered Escherichia coli SHuffle Strains and Their Wild- Type Parents.
- Missing genes in the annotation of prokaryotic genomes.
- Deep genome annotation of the opportunistic human pathogen Streptococcus pneumoniae D39.
- A rare 920-kilobase chromosomal inversion mediated by IS1 transposition causes constitutive expression of the yiaK-S operon for carbohydrate utilization in Escherichia coli

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt