SNIC 2018/3-217


SNAC Medium

Principal Investigator:

Sandra Baldauf


Uppsala universitet

Start Date:


End Date:


Primary Classification:

10612: Biological Systematics

Secondary Classification:

10615: Evolutionary Biology

Tertiary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)



The advance in phylogenetic methods in processing molecular data has revolutionized our knowledge about the eukaryotic tree of life. Understanding and resolving deep nodes in the eukaryote tree of life, primarily, the evolutionary relationships among major groups of living organisms is still an ongoing task. This means filling in some of the missing branches, identifying genes suitable for deep phylogeny, and trying to understand better the artefacts that disrupt accurate phylogenetic reconstruction. Our research projects involve the development and analysis of large molecular phylogenetic data sets, transcriptomics and genomics of orphan taxa, and the origin and evolution of mitochondrial proteomes. Accurately recovering the relationships requires broadly and carefully sampling the diversity of living organisms. Molecular methods such as whole transcriptome sequencing (RNAseq) from orphan taxa makes it possible to begin filling in the vast unpopulated regions of the tree of life. We are focusing on two critical taxa that have been traditionally under-sampled, the Amoebozoa and Excavata. This includes genomics and transcriptomics of the only multicellular excavate, the acrasid slime moulds. Two data sets will be analysed for deep eukaryote phylogeny - eukaryotic genes of bacterial ancestry (euBacs) and eukaryotic genes of archaeal ancestry (euArcs). Both datasets should give similar results, so they act as controls for each other. A significant part of the analyses involve identifying potential artefacts, especially horizontal gene transfer (HGT), deep paralogy, long branch attraction (LBA), and rogue taxa. These are particularly challenging for “deep phylogeny,” and require multiple rounds of progressive analysis, using different combinations of genes and taxa. Final analyses will involve very large data sets and multiple rounds of sub-sampling to confirm and expand the final results.