Representation Learning with Autoencoders on Gene Expression

Dnr:

SNIC 2017/3-61

Type:

SNAC Small

Principal Investigator:

Lukas Käll

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2017-06-30

End Date:

2018-04-01

Primary Classification:

10203: Bioinformatik (beräkningsbiologi) (tillämpningar under 10610)

Webpage:

Allocation

Abstract

Neural networks has successfully been applied to many different applications as an effective machine learning technique. One such domain is representation learning where features are extracted from high dimensional data with the help of autoencoders. Todays vast amount of biological data often needs manual feature engi- neering to extract variables that can be used in supervised learning. Development of a representation learning technique for e.g. gene expressions could therefore be very useful as a preprocessing stage. The question that will be examined is: ”Are autoencoders a good representation learning technique for gene expressions” The project entails the testing of different autoencoder architectures and learning methods and the evaluation of their performances. There are many challenges when it comes to the design of the network. Some of the questions that has to be answer are: Does a stacked autoencoder work better than a network that is trained all in once? Which regularization techniques should be used? How many nodes should the data be compressed in too? The project is scientifically relevant since it evaluates the capabilities of doing unsupervised learning with autoencoders which is not only important in the particular application of gene expression but in any problem where dimension reduction of the data is important. When it comes to gene expression feature learning is very important since it is often a necessary condition to reduce the dimension of the data before using it for inference for e.g. diseases. The hypothesis can be tested by evaluating if the feature learned can be used for supervised learning tasks.