This is a collaborative project within SERC. The following is a detailed technical description. Protein domain links evolutionarily related proteins and highlight their shared functionality. We are interested in obtaining a better understanding of this association, by for specific domains identifying their evolutionary pathways. More specifically, we will model the evolution of protein domains inside the gene and species tree to observe the sub gene level evolutionary events. This is a very novel methodology.
For this purpose, we are implementing an algorithm to model the evolution for protein domains, that will use the Grouped Independence Metropolis-Hastings (GIMH) technique, a variant of MCMC, to estimate the posterior distribution over domain trees along with other evolutionary parameters, namely gene and species tree, birth-death rates, edge rates and branch lengths. Our proposed probabilistic model of domain evolution consists of several sub-models including a duplication loss model at the gene level as well as the domain level, rate and sequence evolution models. Recently, we have completed the implementation part of our project and, now we are proceeding towards the analysis of real and synthetic data sets. This will require very substantial computational tests and investigations.