Visualisation Support for Biological Bayesian Network Inference

Research Output

Extracting valuable information from the visualisation of biological data and turning it into a network model is the main challenge addressed in this thesis. Biological networks are mathematical models that describe biological entities as nodes and their relationships as edges. Because they describe patterns of relationships, networks can show how a biological system works as a whole. However, network inference is a challenging optimisation problem impossible to resolve computationally in polynomial time. Therefore, the computational biologists (i.e. modellers) combine clustering and heuristic search algorithms with their tacit knowledge to infer networks. Visualisation can play an important role in supporting them in their network inference workflow. The main research question is: “How can visualisation support modellers in their workflow to infer networks from biological data?” To answer this question, it was required to collaborate with computational biologists to understand the challenges in their workflow and form research questions. Following the nested model methodology helped to characterise the domain problem, abstract data and tasks, design effective visualisation components and implement efficient algorithms. Those steps correspond to the four levels of the nested model for collaborating with domain experts to design effective visualisations. We found that visualisation can support modellers in three steps of their workflow. (a) To select variables, (b) to infer a consensus network and (c) to incorporate information about its dynamics.

To select variables (a), modellers first apply a hierarchical clustering algorithm which produces a dendrogram (i.e. a tree structure). Then they select a similarity threshold (height) to cut the tree so that branches correspond to clusters. However, applying a single-height similarity threshold is not effective for clustering heterogeneous multidimensional data because clusters may exist at different heights. The research question is: Q1 “How to provide visual support for the effective hierarchical clustering of many multidimensional variables?” To answer this question, MLCut, a novel visualisation tool was developed to enable the application of multiple similarity thresholds. Users can interact with a representation of the dendrogram, which is coordinated with a view of the original multidimensional data, select branches of the tree at different heights and explore different clustering scenarios. Using MLCut in two case studies has shown that this method provides transparency in the clustering process and enables the effective allocation of variables into clusters.

Selected variables and clusters constitute nodes in the inferred network. In the second step (b), modellers apply heuristic search algorithms which sample a solution space consisting of all possible networks. The result of each execution of the algorithm is a collection of high-scoring Bayesian networks. The task is to guide the heuristic search and help construct a consensus network. However, this is challenging because many network results contain different scores produced by different executions of the algorithm. The research question is: Q2 “How to support the visual analysis of heuristic search results, to infer representative models for biological systems?” BayesPiles, a novel interactive visual analytics tool, was developed and evaluated in three case studies to support modellers explore, combine and compare results, to understand the structure of the solution space and to construct a consensus network.

As part of the third step (c), when the biological data contain measurements over time, heuristics can also infer information about the dynamics of the interactions encoded as different types of edges in the inferred networks. However, representing such multivariate networks is a challenging visualisation problem. The research question is: Q3 “How to effectively represent information related to the dynamics of biological systems, encoded in the edges of inferred networks?” To help modellers explore their results and to answer Q3, a human-centred crowdsourcing experiment took place to evaluate the effectiveness of four visual encodings for multiple edge types in matrices. The design of the tested encodings combines three visual variables: position, orientation, and colour. The study showed that orientation outperforms position and that colour is helpful in most tasks. The results informed an extension to the design of BayePiles, which modellers evaluated exploring dynamic Bayesian networks. The feedback of most participants confirmed the results of the crowdsourcing experiment.

This thesis focuses on the investigation, design, and application of visualisation approaches for gaining insights from biological data to infer network models. It shows how visualisation can help modellers in their workflow to select variables, to construct representative network models and to explore their different types of interactions, contributing in gaining a better understanding of how biological processes within living organisms work.

Type:

Thesis
Date:

30 October 2019
Publication Status:

Unpublished
Library of Congress:

QA75 Electronic computers. Computer science
Dewey Decimal Classification:

004 Data processing & computer science
Funders:

Edinburgh Napier Funded

http://researchrepository.napier.ac.uk/output/2158180 <p>Vogogias, T. Visualisation Support for Biological Bayesian Network Inference. (Thesis). Edinburgh Napier University. Retrieved from http://researchrepository.napier.ac.uk/Output/2158180</p>

Citation

Vogogias, T. Visualisation Support for Biological Bayesian Network Inference. (Thesis). Edinburgh Napier University. Retrieved from http://researchrepository.napier.ac.uk/Output/2158180