Data-driven systems biology to study metabolism

Dnr:

SNIC 2017/1-344

Type:

SNAC Medium

Principal Investigator:

Aleksej Zelezniak

Affiliation:

Chalmers tekniska högskola

Start Date:

2017-08-30

End Date:

2018-02-01

Primary Classification:

10610: Bioinformatik och systembiologi (metodutveckling under 10203)

Secondary Classification:

10203: Bioinformatik (beräkningsbiologi) (tillämpningar under 10610)

Webpage:

Allocation

Abstract

The main focus of our lab is to move forward understanding of metabolic systems by applying mathematical modelling and data-driven artificial intelligence (AI) approaches. Manipulation of metabolism and other systems provides an opportunity to target cancer and other diseases, and also is important for biotechnological applications. To manipulate metabolism effectively, it is crucial to identify the best set of proteins/genes and DNA regions or other players for targeting. However to gain such information one needs sufficiently to characterize the system requiring multiple molecular readouts from thousands of biological samples. Currently, we are heavily relying on molecular readouts from next-generation DNA/RNA sequencing technologies, proteomics and metabolomics, however, the existing major limitation is the complex nature of data generated by these technologies and the lack of computational approaches that are capable provide informative, interpretable and actionable information. To circumvent the problem of biological complexity, we are developing self-­taught machine learning approaches enabling of interpreting of relating molecular data to complex biological phenotypes, such as cellular metabolism. One of the projects of the lab is to develop technology for interpretation of complex mass spectrometry spectra generated by simultaneously colliding thousands of small molecules (such as metabolites, peptides) resulting in heavily convoluted data. Currently, there are no computational methods that can use these data effectively, for instance, present state-of-art approaches discard over 90% of biologically informative data by using only easy accessible spectra. The way how we propose to tackle such complexity is to use self­-taught unsupervised AI approach based on deep convolutional networks (Deep Learning) for informative feature extraction from mass spectrometry data. Our lab is not only limited to technology development projects, we are also working ecology of microbial communities derived from environmental samples, including human gut. Microbial communities (MCs) has a great potential to be used for metabolic disease treatment, antibiotics resistance - the two enormous global problems causing deaths more than wars and all cancers together. MCs have also multiple industrial/biotechnology applications, such as wastewater, soil treatment and can be applied for cost-effective production of economically valuable chemicals. One of the biggest problems in the field is to understand why some bacterial communities are more stable than others over long periods of time. To answer this question and what role metabolic interaction plays in this, we are modelling large-scale microbial communities via integrating multiple environmental data sources, such as growth dynamics, material diffusion, DNA sequence data to understand interactions between hundreds of microbial species happening in space and time. On a daily basis, we deal with terabytes of DNA sequencing, proteomics data and rely heavily on computing. Recently, I started the group here at Chalmers, our work mainly will be 90% computational, for now, we are a small team (3 people) therefore the intention for the beginning is to apply to a MEDIUM size resources, but in the nearest future, we aim to expand.