Enzyme clustering and retrieval pipeline

Dnr:

SNIC 2017/2-5

Type:

SNAC Small

Principal Investigator:

Martin Engqvist

Affiliation:

Chalmers tekniska högskola

Start Date:

2017-03-14

End Date:

2018-03-01

Primary Classification:

10602: Biokemi och molekylärbiologi

Webpage:

Allocation

Abstract

I have recently started my own research group in high-throughput enzyme characterization. The research plan relies heavily on my ability to cluster and compare all genes from a large number of organisms. I require server access to build a clustering pipeline from which I can easily extract the results and feed my wet-lab pipeline. The clustering will be done using the BLAST algorithm and the resulting output will be processed by the MCL clustering algorithm. The results will be saved in an SQL database for easy retrieval. Python3 will be used to tie the different components together. R will be used to make visualizations of the data. Julia will be used to develop high-performance algorithms where needed.