Computing the largest gene interaction networks for plant species key to the Swedish Forestry Industry

SNIC 2018/3-61


SNAC Medium

Principal Investigator:

Nicolas Delhomme


Sveriges lantbruksuniversitet

Start Date:


End Date:


Primary Classification:

40402: Genetics and Breeding in Agricultural Sciences

Secondary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)



Being granted access to the resources would allow us to: 1) demonstrate the validity of our software by computing and validating the gene inference network of 3 model organisms: Drosophila melanogaster, Homo sapiens and Saccharomyces cerevisiae and 2) compute the largest gene inference networks for three key plant species: Arabidopsis thaliana, Populus tremula and Picea abies. The first one is the most comprehensively studied plant model organism, whereas the other two are key species for the Swedish Forestry industry. Sweden is a world leader in forest tree genomics, with the Umeå Plant Science Centre (UPSC) conducting world-leading research on the genomics of spruce and poplar species and having generated dataset comprising thousands of samples, representing an expanding and diverse data resource. Target traits (e.g. wood density, fibre length, etc.) for the improvement of forestry species are complex, being determined by the action and interaction of numerous genes that are each under complex regulatory control. Systems biology studies have highlighted that perturbing one gene affects potentially thousands others, of which only a few will share common functional roles. As a consequence, phenotypes can seldom be traced back to individual genes, but rather emerges from the interactions of networks of genes. Unfortunately, the available gene network inference tools available have limited ability to determine and explore these relationships comprehensively. In the network inference field, tens of diverse methods attempt to infer putative interactions between all pairs of genes. There is a wide range of regulatory mechanisms (direct, cascades, feedback loops, fan-in/out, positive/negative etc.) and each inference algorithm has a bias for a particular type or subgroup. The relative strengths of these algorithms can be a benefit if a consensus network is calculated using the interactions inferred by a range of these methods. In their highly influential paper, Marbach et al. (2012) demonstrated that more robust gene network inferences can be generated using the ‘wisdom of a crowds’ approach, whereby many network calculation methods are used and subsequently combined. At the UPSC bioinformatics facility (UPSCb), we have developed a set of computational methods (packaged under the name seidr) to take advantage of the mass of data available, generating aggregated gene interaction networks that allow researchers to perform a more informed mining of the gene expression data. We will use seidr to generate the largest, most comprehensive gene expression networks for model plant species (arabidopsis, poplar and spruce), in addition to more widely used model organisms (fruit fly, human and yeast). Generating these consensus networks is essential to demonstrate the applicability of seidr, will provide the plant community with a powerful resource that we will integrate within our web resource, and will lead to what we believe will be a landmark publication in the field. This proposal had previously been submitted to the LARGE Allocations Fall 2017, with the following answer: "Although your project is of good scientific quality,... we have viewed it as your demands are better suited for a small or medium allocation."