Human Evolution storage

SNIC 2021/2-17


SNIC Large Storage

Principal Investigator:

Mattias Jakobsson


Uppsala universitet

Start Date:


End Date:


Primary Classification:

10615: Evolutionary Biology

Secondary Classification:

10609: Genetics (medical to be 30107 and agricultural to be 40402)

Tertiary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)



Here we propose a joint storage project for all research performed at the Human Evolution program at the department of Organismal Biology, Uppsala University. The program consists of five research groups that all aim to increase the knowledge of human evolution by analyzing whole genome sequences of modern and ancient humans as well as ancient domesticated plants and animals. By having a single program-wide storage project, we can utilize the SNIC resources in a more efficient way compared to having several smaller projects at the PI or research project level, and get a more transparent structure for how data is handled and analyzed. For performing downstream analyses of these program shared genomic resources, we have several ongoing small compute projects but the biggest one is p2018003 (Mattias Jakobsson's nodes on Rackham). The needs of having all data available on a fast storage system directly connected to the compute resources are huge. This is due to that population genetic analyses requires data from several hundred samples, and different samples are used to answer different questions. If each unique research project within the program would move data from an offload storage (such as Lutra) to the active storage in order to perform analysis, we would risk losing days or weeks in just transferring time, risk having multiple copies of the same data all over the project, and also increase the risk of losing/ overwriting important data that was time consuming and expensive to achieve.