Development of quantum many-body theory codes

Dnr:

SNIC 2017/3-27

Type:

SNAC Small

Principal Investigator:

Håkan Johansson

Affiliation:

Chalmers tekniska högskola

Start Date:

2017-03-30

End Date:

2018-04-01

Primary Classification:

10301: Subatomär fysik

Webpage:

http://fy.chalmers.se/subatom/tsp/index.php?page=abinitio

Allocation

Abstract

We develop codes that perform state-of-the-art large-scale diagonalisation for solving quantum many-body problems using the no-core shell-model. The no-core shell model (NCSM) is a method to simulate strongly-interacting many-body systems. The main application is studies of atomic nuclei, but the method has also been used for the modeling of trapped ultracold atoms. The Chalmers group has recently made the NCSM code ANTOINE to operate in a parallel fashion. By an efficient use of storage - all the way from CPU registers, cache, and RAM to disk - the ANTOINE code can solve problems of sizes where competing codes would use very large amounts of memory. The main computational task is a large-dimension matrix diagonalisation translating into matvec and vecvec operations using the Lanczos algorithm. As a specific example, a problem where the matrix dimension is 8.5*10^9 with ~6*10^13 non-zero elements, ANTOINE uses ~100 GB of RAM (per node) and ~2 TB of scratch storage to store the matrix implicitly, while competing codes would use ~0.5 PB (of total RAM) to store the matrix explicitly. In both cases, the matrix is handled sparsely. The goal of this project is to adapt and investigate the suitability of our codes running on the new Xeon Phi / Knights Landing (KNL) platform. We believe that KNL (with its larger accessible memory than the first Xeon Phi / Knights Corner (KNC) platform) may be very suitable for our large-scale diagonalisation code. So far, nvidia-style GPUs (or KNC) have not been suitable, due to the need for large in-core memory areas. With new developments that we are currently doing, this may however change. We therefore also request access to GPU nodes. While keeping ANTOINE's overall approach, the developments underway is a complete rewrite of the code, enabling us to address current limitations: - By being able to cut the matrix and vector into smaller pieces, we can reduce the size of in-core memory needed to process echo sub-block. - Apply block-Lanczos methods when spectra of eigenvalues are sought. - By being in better control of task scheduling, we come into a position to use distributed local node scratch, instead of having to rely on distributed high-performance file systems.