A Comparative Study of Black-box Optimization Algorithms for Hyper-parameter Optimization in Deep Neural Networks

SNIC 2018/4-23


SNAC Small

Principal Investigator:

Mats Jirstrand


Fraunhofer-Chalmers Centre

Start Date:


End Date:


Primary Classification:

10105: Computational Mathematics




Several selected optimization algorithms are analyzed with respect to their ability to perform automatic hyperparameter optimization of deep neural networks (DNNs). This is performed by treating the DNN as an expensive-to-evaluate black-box function. Deep learning models, including DNNs, recently see a surge of practical use in various data-intensive applications ranging from computer vision, language modelling, bioinformatics and search engines. As the performance of a DNN typically is highly reliant on a situationally good choice of hyperparameters, the design-phase of constructing a DNN-model becomes critical, especially for very large models. Commonly employed naive techniques to find suitable hyperparameters is manual search, which relies heavily on the users expertise and understanding of the problem. Also grid and random search are common but which quickly becomes infeasible for high-dimensional inputs and expensive model evaluations. Instead, treating the DNN as an expensive-to-evaluate black-box function, mapping a set of hyperparameters to some quality metric, techniques from the field of optimization may be employed. In this work we compare four different optimization algorithms side-by-side on the basis of convergence speed, trial-to-trial variability, quality of best found solution and ability to generalize across different problem settings. One experiment consist of running approximately 200 function evaluations, i.e. construction and training of a DNN with a specific hyperparameter configuration, which is then repeated several times per algorithm in order to estimate performance variability. Tensorflow r1.4 with NVIDIA-GPU support are used for for creating and training the neural network model, providing significant speed-up as compared to CPU computation. As a single function evaluation typically consumes about 5-10 minutes of GPU-time, being granted the possibility to use external GPU computing resources would allow us to run more repetitions of each experiment providing an increased quality in the estimated distribution of algorithmic performance, but also to extend our study to include new problem settings.