High-level parallel programming frameworks for heterogeneous clusters
In previous work (including EU FP7 projects PEPPHER, EXCESS and SeRC project OpCoReS) we have developed several frameworks for portable, high-level parallel programming of multi-core CPU and GPU-based heterogeneous parallel systems, such as the PEPPHER composition framework [Dastgeer et al. 2014] for multi-variant parallel software components and the SkePU skeleton programming library for GPU-based systems [Enmyren and Kessler 2010, Dastgeer 2014], with back-end support mainly for OpenMP, OpenCL and CUDA. Optimization techniques in these frameworks include automatically tuned, adaptive (context-dependent) selection of implementation variants of computations [Dastgeer et al. 2011, 2013], hybrid parallel execution involving different types of cores and accelerators together [Dastgeer et al. 2012], and data abstractions and automated memory management techniques for aggregate data structures, such as "smart containers" for the run-time minimization of PCIe bus communication between main memory and accelerator device memory [Dastgeer and Kessler 2015]. In principle, SkePU skeleton programs can run even in parallel across multiple nodes of a MPI-based cluster, without any modification in the program source code [Majeed et al. 2012, 2013]. However, up to now, not much work has been done towards memory and communication optimizations when executing skeleton programs across multiple, possibly heterogeneous, nodes in a HPC cluster. Likewise, hybrid multi-node execution with automatic load balancing is a challenge in the case of heterogeneous clusters that involve compute nodes of different kind and capability, e.g., with GPUs, Xeon PHI, or no accelerators. In this project, we will study extended auto-tuned back-end selection, automated load balancing for hybrid computing, and generalizations of the smart-container concept at the cluster level for SkePU skeleton programs. For experiments we will use a small heterogeneous subcluster of Triolith that includes both CPU-only nodes and nodes equipped with GPUs and Xeon PHIs. Acknowledgment: This work is part of our on-going research activities in the Swedish e-Science Research Centre (SeRC). References: See the publication list on the SkePU web page, the link is given below.