Computing the largest gene interaction networks for plant species key to the Swedish Forestry Industry - continuation

SNIC 2019/3-207


SNIC Medium Compute

Principal Investigator:

Nicolas Delhomme


Sveriges lantbruksuniversitet

Start Date:


End Date:


Primary Classification:

40402: Genetics and Breeding in Agricultural Sciences

Secondary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)



The current allocation SNIC 2018/3-61 has been extremely useful to develop and test on a real scale our tool. We have not been as effective using the resources as we had planned (~33% of the allocation, sadly) due to development issues and delays in obtaining the data, but we have now all of these solved and at hand. As such, and as you can check from the last month, we are now ready to use such an allocation to its full extend. Being granted access to the resources would allow us to: 1) further demonstrate the validity of our software by computing gene inference networks of the model organisms: fruit fly, human and yeast 2) compute the largest gene inference networks for three key plant species: Arabidopsis thaliana, Populus tremula and Picea abies. Sweden is a world leader in forest tree genomics, with the Umeå Plant Science Centre (UPSC) conducting world-leading research on the genomics of spruce and poplar species and having generated dataset comprising thousands of samples, representing an expanding and diverse data resource. Target traits for the improvement of forestry species are complex, being determined by the action and interaction of numerous genes that are each under complex regulatory control. Systems biology studies have highlighted that perturbing one gene affects potentially thousands others, of which only a few will share common functional roles. As a consequence, phenotypes can seldom be traced back to individual genes, but rather emerges from the interactions of networks of genes. Unfortunately, the available gene network inference tools available have limited ability to determine and explore these relationships comprehensively. In the network inference field, tens of diverse methods attempt to infer putative interactions between all pairs of genes. There is a wide range of regulatory mechanisms (direct, feedback loops, fan-in/out, etc.) and each inference algorithm has a bias for a particular type or subgroup. The relative strengths of these algorithms can be a benefit if a consensus network is calculated using the interactions inferred by a range of these methods. In their influential paper, Marbach et al. (2012) demonstrated that more robust gene network inferences can be generated using the ‘wisdom of crowd’ approach, whereby many network calculation methods are used and subsequently combined. At the UPSC bioinformatics facility (UPSCb), we developed a set of computational methods (seidr) to take advantage of the mass of data available, generating aggregated gene interaction networks that allow researchers to perform a more informed mining of the gene expression data. We will use seidr to generate the largest, most comprehensive gene expression networks for model plant species (arabidopsis, poplar and spruce), in addition to more widely used model organisms (fruit fly, human and yeast). Generating these consensus networks is essential to demonstrate the applicability of seidr, will provide the plant community with a powerful resource that we will integrate within our web resource, and will lead to what we believe will be a landmark publication in the field.