« Home « Kết quả tìm kiếm

S·nr: A visual analytics framework for contextual analyses of private and public RNA-seq data


Tóm tắt Xem thử

- It allows researchers to explore their own data in the context of experimental data deposited in public repositories, as well as to extract specific data sets with similar gene expression signatures.
- Next-Generation Sequencing (NGS) has been established as a state-of-the-art tool in molecular biology.
- For interpreting the analysis output, which often comes in the form of spreadsheets con- taining a large number of differentially expressed genes, researchers often fall back to basic office spreadsheet applications, such as Microsoft Excel or Apple Numbers..
- Full list of author information is available at the end of the article.
- This leads to inconsistent analysis procedures, unobserved patterns in the data (e.g.
- a Visual Analytics tool empowering molecular biologists to explore their RNA-Seq experiments and shed light on patterns in the data.
- 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0.
- Providing means of identifying related NGS datasets in public repositories, despite being discordant in the interrogated model organism, tissue type or disease context..
- In this section we describe how we went from a thorough requirement analysis that defines the key functionality of s·nr to the selection of web-based technologies and con- tainerized solutions that allow for easy deployment and scalability of the tool..
- As result of the interviews, we decided to focus on the tasks that were almost uniformly mentioned: Performing GO analyses, filtering the genes and comparing datasets..
- Context: We concluded that many shortcomings in the analysis workflow arise from the context of the analysis (i.e.
- Due to the size of the imported data, the software tends to be slow and unresponsive..
- Multiple datasets of the size of standard RNA-Seq results concatenated to large sum- mary tables render such applications unresponsive.
- Functional similarity of the domain experts data with public data cannot be computed with the tools available.
- Detailed results of the questionnaire as well as com- monly used tools and plots can be found in Additional file 2..
- This workflow acts in iterative analysis workflow loops, allowing obser- vations that trigger new hypotheses which in turn require more abstract views to go focus on other sections of the data.
- On top of the VA mantra, we established the fol- lowing design principles for s·nr: (1) Since the researchers have to interpret the vast data presented to them, we have to treat cognitive workload as a resource.
- The power of the tool arises from (3) a high interactivity between few visual rep- resentations.
- Facebook’s React [9] library for JavaScript is the back-bone of the user interface and provides efficient means of build- ing interactive components by distributing data changes throughout all user-interface components.
- Another example is the data table, which behaves exactly like a classic HTML5 scrollable table, while it actually only renders the list elements of the current view pane with spacers above and below that are dynamically adjusted depending on the scroll posi- tion, reducing the number of rendered elements in the list from ∼50.000 to ∼20, depending on the resolution browser view frame size.
- This, however, increases the footprint of the snR package in the system memory several gigabytes.
- Heavy computational tasks are performed on the OpenCPU server.
- To solve this problem, we employ a second server described in the following paragraph..
- The Node.js server stores user names, the files the user has access to as well as the bcrypt hashes [15] of the user’s password.
- A typical request looks as follows: A request for calcu- lating a PCA is made from the client to the Node.js server through a POST together with the user’s token, which is then verified and if valid, the server will call the R command on the OpenCPU server.
- On completing the calculation, the Node.js server will retrieve the result as JSON file and pass it back to the client triggering the rendering of the PCA scatter plot..
- Using the Docker engine, users can convert the image into a container that is an exact replica of the exported system..
- The configuration is limited to the Docker image creation and customizing the Node.js server settings.
- Due to the demanding memory require- ments of the snR package mentioned above this machine requires at least 15 GB of RAM.
- The user interface is divided into two major compo- nents.
- (2) The selected experiments can be analyzed in the details view that provides simple yet efficient means for displaying and querying the data as well as extracting GO terms.
- The s · nr workflow consists of the overview visualization view that is used to select datasets based on similarity.
- The user can further investigate the selected data sets using the details view.
- Mean as an iterative analysis loop the user can always go back to the overview visualization to adjust the selection of potentially interesting data.
- We derived the data depicted in the overview visualization from ArrayExpress, processed it using QuickNGS, and provide the result with this paper.
- The analysis starts with the overview plot showing the first two PCAs of the p-values of all genes of public and private data sets.
- Public data sets are uniformly assigned the box icon and a higher transparency to facilitate identification of the user’s data.
- a The dot of the s · nr logo emits a fading circle when data is fetched from the server..
- On brushing data sets in the details view, the user can narrow down the genes of interest and trigger a new PCA calculation based on the selected group of genes.
- c Mouse-over shows meta data of the data set.
- Clicking on a data set icon fetches its data and passes it to the details view.
- represents the experiments in the individual views and manifests a clear yet simple visual link.
- The user can also customize the public experiment icons to create a cognitive link..
- A visual feedback of communi- cation with the server is indicated with a emitted circle in the s·nr logo (Fig.
- The experiments can then be investigated further in the details view..
- (1) Main scatter plot of the focus experiment.
- The focus experiment is displayed as large scatter plot in the center of the user interface (Fig.
- a The focus experiment is depicted in the large scatter/hex plot.
- b Further data sets are visualized in the small multiples of the large scatter plot.
- Opening a GO term displays additional information about the term as well as the expression of it in the context experiments.
- Each GO term is represented using a GO plot which can be customized in the panes options.
- We depict a typical interaction example at the bottom, where brushing (selecting) genes in the main scatter plot leads to highlighting the corresponding genes in the context experiments and also automatically triggers a GO-term analysis of the selection.
- fill color of the hex maps represents the number of genes it contains.
- The user has the option to always ren- der genes as dots using the scatter plot option menu..
- By spanning rectangles on click, the user can brush genes in the scat- ter plot.
- Through various options, the user can change the transforma- tion of the scale of the axes (linear, -linear, log2, -log2, log10, -log10, sqrt, -sqrt) as well as zoom- ing behavior (always show whole range or only selected range) and rendering-related options such as always drawing genes as dots..
- We incorporate this idea for the context experiments to display them according to the focus experiment represen- tation.
- The small multiples share the visual features of the main scatter plot rendering it a hybrid visualization of hex- and scatter plots.
- The icon in the heading of each small multiple allows to match it with the experiment it represents..
- The strength of the small multiples approach becomes evident when a subset of genes is selected, e.g.
- by brush- ing them in the main scatter plot.
- If, for example, only the up-regulated genes below a specific p-value in the default volcano-plot are selected, the small multiples show the regulation status in the context experiments (Fig 3a, b)..
- The data table view shows all available informa- tion for each entry of the focus experiment.
- Further- more, filter can be applied through the input fields in the table header.
- which are applied on the input field.
- Analogously to filtering in the main scatter plot, the filter is applied to all visible representations, limiting the list to the filtered entries and highlighting the genes in the scatter plots..
- Next to providing a tabular data representation and filtering on multiple dimensions at once, the data table view serves as input for the dimensions mapped to the x- and y-axis of the scatter plots.
- Clicking on a dimension name in the table puts it on the x-axis of the scatter plots and moves the current x-axis to the y-axis.
- The user can set the focus experiment and customize the experiment icons..
- The interface in the experiment selection pane allows the user to customize the information attached to the data set which can then be used to assess and filter the data further..
- GO term analysis was the most fre- quent request in the user study.
- 3d) includes the GO extraction in the analysis workflow of s · nr..
- Upon gene subgroup definitions by filtering in any of the s · nr views, the GO term pane will automat- ically retrieve the terms containing the selected genes..
- The resulting list of GO terms is sorted by percentage of selected genes in the term.
- The width of the plot is the sum of the width of the gene representation rectangles and is relative to the largest GO term fitting the gene selection query.
- When mapping fold change to the GO plot, the user gets an at-a- glance view of how many genes in the term are down- or up-regulated.
- Using the options pane, the user can adjust the dimension mapped to the GO plot as well as its transfer function.
- Additionally it allows for display- ing all genes in the GO terms, not restricting it to the selected ones..
- The GO plots are rendered in the overview of the pane for the focus experiments.
- Hovering the mouse over the GO plot highlights the gene under the cursor, showing its gene name as well as highlighting it in all other plots, triggering also the additional information context menu in the scatter plots..
- We conducted a Visual Data Analysis and Reasoning (VDAR) technique [23] to characterize the systems ability to generate and follow up hypotheses in the data.
- We carry out VDAR using a case study using the thinking-aloud technique to comprehend the reasoning and thought pro- cess of the user..
- Those data sets were then analyzed further in the small multiples view.
- By set- ting filters for fold change and p-value no pattern could be observed in the small multiples of the public con- text experiments (genes did not follow the same trend for up- or down regulation).
- By switching back to the overview visual- ization and re-triggering the PCA to be only calculated on the differentially expressed genes.
- 1 E-ERAD-209 (a): Ad libitum versus dietary restriction, which is the opposite condition of the focus experiment.
- These data were analyzed further in the details view, switching the standard volcano plot presenta- tion to a visualization showing log 10 ( baseMean ) against foldChange by clicking the corresponding headers in the table (Fig.
- The user selected up-regulated highly abundant genes in the main scatter plot.
- Using the GO term pane the user assessed terms of the selection yielding terms such as cholesterol metabolic process, collagen trimmer, and steroid metabolic process.
- In the ad libitum compared to dietary restriction condition of the E-ERAD-209 (b) dataset, the user observed a inverse relationship in these terms using the GO term plot (Fig.
- While the experimental setup of E-ERAD-209 (b) is similar to the focus experi- ment, it is compared in the opposite direction, explain- ing the inverse relationship, showing that the obser- vation makes biological sense and the similarity plot yields meaningful results.
- Analogously, the GO terms and differentially regulated genes of E-ERAD-209 (a) are regulated in the same direction as in the focus experiment..
- Compared to the other tools, s · nr has a significantly higher demand on the server’s RAM.
- This is because of the large size of data sets to be processed in a time-efficient way to facilitate real-time responses.
- aim at providing a web interface for the whole RNA-Seq pipeline including the alignment and differential analysis steps, but usually lack interactivity in the data explo- ration aspect.
- We focused solely on the exploration aspect to keep the tool lightweight in order to maintain a low cognitive workload on the user due to complexity of the user interface.
- JWK was supported by the Emmy-Noether Program of the Dt.
- KO4728/1.1), the University of Southern Denmark (SDU) and Danish Diabetes Academy (DDA), which is funded by the Novo Nordisk Fonden (NNF) and a European Research Council (ERC) Starting Grant No.
- The source code for s·nr is available in the repository located at https://github..
- Proceedings of the Annual Conference on USENIX Annual Technical Conference

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt