Recently, image representations based on convolutional neural networks (CNNs) have demonstrated significant improvements over the state-of-the-art in many computer vision applications including image classification, object detection, scene recognition, action recognition, and visual tracking. CNNs consist of a series of convolution and pooling operations followed by one or more fully connected (FC) layers. Deep networks are trained using raw image pixels with a fixed input size. These networks require large amounts of labeled training data. The introduction of large datasets (e.g. ImageNet, 14 million images, http://image-net.org/challenges/LSVRC/2014/) and the parallelism enabled by modern GPUs have facilitated the rapid deployment of deep networks for visual recognition. This development has led to what many peers call the deep learning revolution in computer vision (https://www.technologyreview.com/s/513696/deep-learning/).