Expression bioinformatics - evaluation, development, and application of new tools

SNIC 2017/7-303


SNIC Small Compute

Principal Investigator:

Olof Emanuelsson


Kungliga Tekniska högskolan

Start Date:


End Date:


Primary Classification:

10203: Bioinformatics (Computational Biology) (applications to be 10610)




Recent technological advances have enabled cheap, accessible, and reliable transcriptome data ("RNA-seq"), enabling in-depth study of gene expression and its regulation. This proposal is about the evaluation, development, and application of tools for study certain aspects of RNA-seq. One aspect currently garnering interest is allele-specific expression (ASE), where the two alleles at a locus are expressed at different levels. ASE has been demonstrated in many tissues and organisms and seems to be prominent in e.g. cancer tissues and plant hybrids. In this application we propose to develop, evaluate and apply bioinformatics methods to analyze RNA-seq data to detect ASE. Specifically regarding: condition-dependent ASE, polyploid organisms, and how phase information can be used. The proposed work is largely based on our recently published tool for ASE detection, GeneiASE. Furthermore, we propose to investigate ASE regulation, both regarding regulatory SNPs and chromosomal organization of the ASE genes including promoter region analysis. For any non-model organism, where a reference genome is unavailable (or unsuitable, e.g. with extensive genomic rearrangements in cancer cells), reconstructing the transcripts directly from RNA-seq data is necessary. Existing methods have been shown to output many truncated transcripts, as well as an excessive number of short or unrealistic transcripts. We propose to address these issues by developing improved transcript reconstruction and a novel transcript classification tool. Currently, we have Illumina data from both animal and plants, and we plan to extend these short-read data with long-read data from either PacBio or Nanopore. This project is intended to cover our future computational needs for the projects b2010035, b2011075, and b2011098. (We have already been granted a Crex 2 storage project corresponding to b2011098, and we are currently working on estimating the need for storage for the projects b2010035 and b2011075).