Information-theoretic clustering of large-scale networks


SNIC 2016/1-412


SNAC Medium

Principal Investigator:

Martin Rosvall


Umeå universitet

Start Date:


End Date:


Primary Classification:

10105: Beräkningsmatematik

Secondary Classification:

10299: Annan data- och informationsvetenskap

Tertiary Classification:

10399: Annan fysik




To comprehend the hierarchical organization of large integrated systems, we have developed an information-theoretic approach called the map equation. The map equation exploits the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network — the optimal number of levels and modular partition at each level — with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. Over the last year, we have used our access to Abisko in five different projects, including Memory in network flows and its effects on spreading dynamics and community detection published in Nature communications, Portfolio analysis of the Swedish stock market published in PLoS ONE, and Dynamics of interacting information waves published in PRE. The project on networks with memory uses multi-step real pathways of dynamics in social and biological networks and we use novel algorithms to comprehend the structure of such higher-order dynamics. The aim of the project for the next year is to further analyze more real systems (air traffic, Wikipedia, Web of Science, the arXiv, etc) with respect to memory effects and the consequences for structural organization and impacts on, for example, information and disease spreading.