Development of the Chunks and Tasks model and runtime library implementations with applications in electronic structure calculations

Dnr:

SNIC 2017/1-4

Type:

SNAC Medium

Principal Investigator:

Elias Rudberg

Affiliation:

Uppsala universitet

Start Date:

2017-02-01

End Date:

2018-02-01

Primary Classification:

10205: Programvaruteknik

Secondary Classification:

10407: Teoretisk kemi

Tertiary Classification:

10105: Beräkningsmatematik

Webpage:

http://chunks-and-tasks.org/

Allocation

Abstract

The goal of this project is to extend our work on the Chunks and Tasks parallel programming model and apply it to massively parallel calculations. See our recent articles http://dx.doi.org/10.1016/j.parco.2013.09.006 and http://dx.doi.org/10.1016/j.parco.2016.06.005 that were published in Parallel Computing The project regards the development of the Chunks and Tasks programming model for parallel implementation of methods that require dynamic distribution of both work and data. Such methods are difficult to implement using standard languages or libraries such as MPI that leave it to the user to provide the distribution of both work and data. In an application program that uses the Chunks and Tasks programming model, the user defines the algorithm in terms of chunks and tasks without specifying where the work should be performed or how the data should be distributed. Our pilot C++ Chunks and Tasks runtime library implementation uses MPI and pthreads to distribute work and data of Chunks and Tasks application programs on clusters of multicore machines. This project will be used for development and evaluation of the Chunks and Tasks model and runtime library implementations as well as for our ongoing work on distributed-memory parallelization of the Ergo quantum chemistry code (http://ergoscf.org) using Chunks and Tasks. Since the Chunks and Tasks model is intended to work well for massively parallel calculations and our main target application (linear scaling electronic structure calculations with the Ergo program) motivates very large calculations, we need to carry out real calculations on as many nodes as possible. Our recent SNAC Large proposal "SNIC 2016/34-39" was denied, with the following comment: "Although your project is of good scientific quality, in the fierce competition of the limited amount of HPC resources we have viewed it as your demands are better suited for a medium allocation. We welcome you for medium sized applications, [...]" We hope that the fact that our project according to that evaluation has good scientific quality can help us to get this medium application approved. Regarding previous resource usage, although we did not use all hours during previous months, our project is at a stage where we need large resources; we have already made significant efforts to make our code run efficiently on Triolith and results are very promising. Please consider also that a development project such as ours may have less homogeneous usage of resources compared to other projects, as discussed in [SNIC support #125492], but still need large resources. In addition to this proposal for time on Triolith, we have also submitted another proposal applying for time on Beskow. However, it is important for us to be able to run on Triolith also since we want to make sure that our parallelization approach works well on different kinds of clusters. For our research it is very interesting to run similar calculations on both Beskow and Triolith to compare and analyze the results.