Gut and Oral Microbiome Modelling and Mobile Genetic Elements

SNIC 2019/3-226


SNIC Medium Compute

Principal Investigator:

Saeed Shoaie


Kungliga Tekniska högskolan

Start Date:


End Date:


Primary Classification:

10610: Bioinformatics and Systems Biology (methods development to be 10203)

Secondary Classification:

30109: Microbiology in the medical area




The oral cavity and gut contain reservoirs of antimicrobial resistance genes (ARGs) that reside on mobile genetic elements (MGEs). This project aims to characterise and compare the MGE profiles, i.e. “mobilome”, from 1581 shotgun metagenomes of paired oral cavity and the gut. To undertake this project, we need to use highly computational tools to assemble and annotate all 1581 samples. We will profile three common mobile genetic elements: plasmids, phages and transposable elements. Plasmids: To profile plasmids, a de novo approach of assembling into candidate circular contigs using plasmidSPAdes (Antipov et al., 2016), a tool that has shown to be most accurate out of all plasmid assembly tools. These candidates will BLASTed and annotated against the PlasmidFinder reference database (Carattoli et al., 2014) to identify putative plasmids and set a benchmark for de novo plasmid discovery. Phages: Similarly, a de novo approach, conducted by collaborators at University College Cork, will be used to create an oral/gut phage catalogue. Before this pipeline, metagenomes will be assembled into contigs using SPAdes using UPPMAX (Bankevich et al., 2012). After the catalogue creation, metagenomic reads will be mapped against the catalogue to quantify phage abundance and diversity in each sample on UPPMAX. Transposable elements: For transposable elements, transposable insertion sequences will be identified from metagenomic reads using the ISMapper tool (Hawkey et al., 2015). The goal is generating 2700 genome scale metabolic models (GEMs) from gut and oral reference GEMs including 10 thousunds reaction and 8000 metabolits and 990000 genes. To generate the specefic models from reference GEMs we need to create a reaction profile that scored based on the genome information coming from metagenomics (around 10000000 genes). The scoring of the reaction profile is calculated based on gene protein (990000)-reaction (10000) file and the abundance of the reaction in the different levels of taxonomy (i.e. genus, family, class, order, phylum) that calculated separately. References Antipov, D., Hartwick, N., Shen, M., Raiko, M., Lapidus, A., and Pevzner, P.A. (2016). plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 32, 3380–3387. Arredondo-Alonso, S., Willems, R.J., van Schaik, W., and Schürch, A.C. (2017). On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom 3. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., et al. (2012). SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J Comput Biol 19, 455–477. Carattoli, A., Zankari, E., García-Fernández, A., Voldby Larsen, M., Lund, O., Villa, L., Møller Aarestrup, F., and Hasman, H. (2014). In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing. Antimicrob Agents Chemother 58, 3895–3903.