Mitochondria is a cell organelle indispensable for cellular energetics, and alterations in mitochondrial DNA (mtDNA) can lead to several clinical pathologies. Although the extent to which such mtDNA damage contributes to tumorigenesis is debated, recent studies support that selective pressures are acting on mtDNA mutations in cancer to maintain the function of critical genes in the OXPHOS pathway. Large (100-10,000 bp) deletions or insertions in mtDNA can cause major neuromuscular disorders and have also been observed in cancers, but their role in tumorigenesis remains unclear. Today, high-throughput sequencing data from tumors is abundant in the public domain, and potentially this type of data could enable much more detailed and comprehensive studies of large mtDNA deletions in cancer, giving insights into their formation as well as role in cancer, but the computational methodology has been lacking.
We have established a computational pipeline, optimized to quickly identify mitochondrial deletions/duplications in high-throughput DNA and RNA sequencing data. We have carefully evaluated this pipeline using simulated data as well as tumor and matching normal samples sequenced by The Cancer Genome Atlas (TCGA). Presently, more than 2500 whole cancer genomes plus matching normals and 8,000 transcriptome libraries are available in TCGA, and our methodology gives us a unique opportunity to study large mtDNA deletions in a sizable cohort, and to relate them to important clinical parameters like age, treatment, cancer type and nuclear mutations, as well as other types of molecular changes in tumors.
We have extensively used computational resources provided by the SNIC large call, spring 2017 (SNIC 2017/11-3), to analyze ~6000 transcriptome datasets spanning 10 different cancer types which involved downloading, parsing and analyzing ~50 TB of compressed sequencing data. Principally our analysis with RNAseq data showed that a subset of patients show a propensity towards a high rate of mitochondrial structural alterations. We want to further validate our findings using gold standard high coverage whole genome sequencing data (WGS) from TCGA. We aim to apply the pipeline to ~5000 WGS samples across 32 cancer types, where over 1500 TB of data will be downloaded and analyzed. We believe this is the first ever comprehensive pan-cancer study of large structural alterations in the mitochondrial genome using high-throughput sequencing datasets.
Access to SNIC resources has given a tremendous impetus to the project, ensuring quick download and analysis or large number of samples in a limited time frame. Our pipeline has also been used to demonstrate mechanism of mitochondrial deletions in patients with neuromuscular disorders in two studies (Nicholls et al, 2018, Molecular Cell; Persson et al, 2018, manuscript in review, Nature Communications) where UPPMAX resources have been used extensively for the latter. Hence we believe our pipeline is well positioned to establish many years of exciting research, and access to UPPMAX computing resources will be crucial to realizing the potential of our methodology.