« Home « Kết quả tìm kiếm

CrustyBase: An interactive online database for crustacean transcriptomes


Tóm tắt Xem thử

- The primary focus of conventional genomics databases is the storage, navigation and interpretation of sequence data, which is typically classified down to the level of a species or individual.
- The addition of expression data adds a new dimension to this paradigm – the sampling context.
- While the latter spe- cies provide us with an increasing depth of knowledge in the fields of genetics and genomics, NGS technologies.
- Total RNA sequencing, commonly known as RNA-seq or transcriptome sequencing, has been an effective tool for curating and characterising genes across an expand- ing range of species in the past five years.
- 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
- The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.
- If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
- The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data..
- 1 Genecology Research Centre, University of the Sunshine Coast, Sippy Downs, Queensland 4556, Australia.
- Full list of author information is available at the end of the article.
- RNA-seq analysis results in two fundamental data types corresponding to each mRNA transcript in the sample:.
- A de novo tran- scriptome provides many of the insights associated with conventional gene sequencing, such as single-nucleotide polymorphism (SNP) detection, protein structural ana- lysis and comparative evolutionary analysis.
- Here, the researcher is concerned not only with the identity of the subject, but also the conditions from which the data were derived.
- Prag- matically speaking, access to online data is typically lim- ited by the searchability of the data and the format in which it is then presented.
- Such limits on accessibility impose an obvious barrier to the dissemination of infor- mation and are of utmost importance if public data accessibility is to be taken seriously..
- While much of the world’s RNA-seq data is publicly available (as required by many funding institutions and journals), we believe that accessibility to this data is far from optimal.
- Current platforms are well-equipped for sharing sequence data, and indeed many transcriptome sequence archives held by the NCBI can be queried directly with their freely-available Basic Local Alignment Search Tool (BLAST), a resource that has become entirely ubiquitous in the bioinformatics sphere.
- In the majority of cases these files are not available, and one might need to resort to full assembly and read-mapping from raw sequencing reads which could take several weeks.
- This forms a significant barrier to the dissemination of public data in two ways.
- Despite the widespread public availability of these datasets (71,818 NCBI bioproject records as of the accessi- bility of these data remains limited..
- Outside of the NCBI platform, it has become conven- tional for research groups to package the NCBI’s BLAST toolkit for the purpose of sharing genomics data in a dedicated online environment.
- Recently we have seen the re- lease of the Crustacean Annotated Transcriptome (CAT) database, a platform with similar structure to the afore- mentioned databases and populated with transcriptomes of seven crustacean species [2, 12].
- liberal investment, the Allen Brain Map [2, 14] provides a three-dimensional interface for viewing gene expres- sion in the human brain.
- Despite the abundance of these online databases, we have yet to see a platform which might permit access to the breadth of transcriptomic data available to the genomics community.
- Indeed, one of the most import- ant attributes of a database is that it should seek to bring data together into one easily-accessible location, since it is far easier to search a single location than to navigate many different sources [15].
- We leverage the ubiquity of the BLAST tool as a means for searching and accessing transcript sequences, whose corresponding expression data are instantly rendered in an interactive graphical output.
- We also provide an interface for navigating the datasets themselves, allowing the user to search not only the species, but also the biological context of RNA-seq experiments.
- These three core features result in a platform which can grow organically, while providing researchers with streamlined access to the variety of taxonomic and mo- lecular insights that they collectively produce.
- Therefore, in acknowledgement to the apparent naming convention of genomics databases, this platform has been released under the name “ Crusty- Base”, and is accessible at https://crustybase.org.
- HTTP responses are dynamically ren- dered from HTML templates, with CSS and JavaScript for interface styling and logic in the frontend (i.e..
- performed within the user’s web browser).
- Django stores these models as tables in the PostgreSQL database.
- The Meta model serves as a root for each dataset and stores various metadata relating to the experiment such as the organism name, taxonomic information, institution of origin and experiment description (Fig.
- In order to allow uploading and importing of new RNA-seq data by CB users, we have implemented a web interface which gathers the required files and informa- tion from the user.
- After processing, the completed data is returned to the web server by.
- SFTP (in the case of FASTA files and BLAST databases) and remote import to SQL database (in the case of expression and domain data), at which point they become available to users of the website.
- The import pipeline implemented on the data server is written in Python 3.6.5 and utilizes the TransDecoder program [23] for proteome prediction and a local build of the NCBI’s CD-Search tool [24], which uses RPS-BLAST to match protein sequences against the CDD database of conserved protein domains (obtained from ftp://ftp.ncbi..
- from the late phyllosoma, through the puerulus, to the juvenile lobster.
- We consider this a pilot dataset for CB and it will remain in the database with full public access (access levels are described in the Utility section)..
- The BLAST tool allows users to search an RNA-seq dataset in the conventional manner, with expression and pre- dicted domain data being instantly accessible in the form of graphs and figures.
- The Meta, Expression and Domain models define the core of the database schema, with each Meta entry serving as a master record for each dataset.
- Features ” describe the experimental variables used in the study, such as tissue type, treatment or phenotype.
- When the BLAST search is complete the user is pre- sented with a “stack” of result panes, each pertaining to one of the selected datasets.
- Each pane shows a sum- mary table of transcripts ordered by match score (this will be familiar to many BLAST users) accompanied by a generic image of the subject species and a brief de- scription of the experiment details.
- The user can scroll down this page to get a brief overview of the BLAST hits across the selected datasets.
- When a user is interested in a particular dataset’s results, they can choose to “ex- pand” the result view, thereby zooming in and filling the screen with the selected dataset.
- This provides users with im- mediate insights into the bioactivity and structure of the selected transcripts, with several consequences.
- Secondly, discrepancies in gene activ- ity are immediately brought to the user’s attention, mak- ing it possible to browse and compare the expression of genes between available datasets..
- The first view pre- sents the user with a list of all available datasets.
- Each dataset is represented by an image of the animal, species name, number of replicates and a brief description of the RNA-seq experiment.
- At the top of the page is a sin- gle text input field which can be used filter the datasets shown in real time by entering keywords relevant to the user.
- The user can then scroll down the page and select a dataset of interest, bringing them to the second interface of the data browser.
- This page pro- vides a detailed view of the selected dataset, including the dataset owner, assembly statistics, institution, refer- ence and descriptions of the species, experiment and assembly procedure.
- From either of these pages the user can jump directly to the BLAST search tool with the database selected..
- Groups are designed to manage ownership of datasets in a manner that reflects data creation and own- ership in the real world, and helps researchers share access of datasets with colleagues and collaborators.
- These worker threads then execute request handlers defined in the CrustyBase codebase.
- New datasets are remotely imported to the PostgreSQL database when processing is complete.
- When a user uploads a dataset, owner- ship is delegated to one of the user’s groups.
- This has several important considerations: 1) Every user in the group has full access to the data.
- 2) If a user account is deleted, all data uploaded by that user remains in the group.
- However, we are aware that groups become redundant for datasets which are already in the public domain.
- In this case, the user can choose to omit group delegation and simply import the dataset into the public domain.
- After delegating a group for data ownership, the user fills out a form which describes all meta data relevant to the data set, such as species name and experimental con- ditions.
- The user can then choose whether the dataset will have full or partial public acces- sibility.
- Full accessibility allows any CB user to download raw sequence and expression data, while a dataset with partial accessibility only provides public CB users with a graphical view of the data.
- These files will be parsed and tested for integrity, then returned to the user if any errors are encountered (i.e.
- The user then has the option to make revisions to the im- port before final submission.
- There are several further utilities that we hope to in- corporate into CB in the future in order to enhance the utility of this resource for the research community..
- These additions aim to improve user access to the data- base, introduce new data types to add value to datasets, and streamline the ingestion of new datasets into the database.
- Implement transcript annotation in the data import pipeline.
- Transcriptome assembly and quantitation in the data import pipeline.
- This would enable CB to ingest a large quantity of publicly available data held in the SRA, which holds 5629 sequencing runs from crustacean RNA-seq projects as of .
- A quick search in the NCBI protein database with the keywords.
- “Sonic SHH” yields a single protein sequence of 126AA belonging to the barnacle Amphibalanus amphitrite (ac- cession KAF0307803.1).
- We type into the keyword filter “larva” to find three related datasets (two spiny lobsters and one salmon louse), which we add to the “selected” pane (Fig.
- In the salmon louse, we see three-fold upregulation in the egg..
- In the spiny lobster we see 2-fold higher expression be- fore the phyllosoma metamorphosis, extending well into the puerulus phase..
- To ensure that we can remem- ber the origin of this file in the future, we enter the file prefix “caligus_sonic” before downloading (Fig.
- With these data for future reference, we could begin a phylogenetic study by curating sonic transcripts from other datasets in CB, or jump back to NCBI to search for novel transcripts in the TSA archive..
- Of course, this investigation could alternatively have been carried out with an NCBI BLAST search of the TSA archives, linking to related BioProject and GEO datasets, downloading the expression data as a spread- sheet and plotting it manually.
- But with the interface provided by CB, this entire process can be resolved within two web pages and around 10 min of the user’s time..
- Database selection in the BLAST interface allows for trivial searching of datasets by two text-based input filters (a and b).
- Input A allows for filtering based on the available taxonomy in the CB database on the basis of class, order, genus and species.
- Target datasets appear as a stack of “ panes ” showing a generic image of the animal with a brief description of the experiment and a hit table of BLAST summary statistics.
- Here, the user can scroll down the page to get an overview of all datasets with transcripts matching their query sequence.
- The user can then click the “ expand ” button (top-right) for a detailed view of that dataset, or select transcripts for data download (top-right).
- After expanding a dataset, the user receives a full-page, detailed view of that dataset ’ s BLAST hits including the BLAST alignment and statistics (bottom-left), an interactive expression graph (bottom-middle), and a predicted protein plot (bottom-right)..
- In the genomics era, sharing and accessibility of biological data are of utmost importance.
- Much of the progress in this field can be attributed to the model or- ganisms such as Drosophila and Mus musculus which have each attracted the shared attention of a large, well- funded research community.
- While these datasets may be valuable to the individual researcher who created them, they are an even greater asset when the community can unite their efforts to form a shared pool of information.
- We would also like to thank and acknowledge the Australian Research Data Commons (ARDC) for provision of the CB web server through the Nectar project.
- We would like to acknowledge funding from a USCRS scholarship, funded in part by the University of the Sunshine Coast.
- The views expressed herein are those of the authors and are not necessarily those of the Australian Government or Australian Research Council..
- This window appears when the user has selected one or more transcripts in the hit table (Fig.
- The user can select the data formats most appropriate to them and enter a prefix for the pending file before downloading.
- Access to raw sequencing data can be found through the corresponding NCBI BioProject where the creator of the dataset has provided this information.
- We strongly encourage that future contributors to CB make their data available in this way as this greatly enhances the credibility and utility of the data..
- Transcriptomic analysis of differentially expressed genes in the molting gland (Y-organ) of the blackback land crab, Gecarcinus lateralis, during molt-cycle stage transitions..
- The transcriptome of the marine calanoid copepod Temora longicornis under heat stress and recovery.
- Transcriptional profiling of spiny lobster metamorphosis reveals three new additions to the nuclear receptor superfamily

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt