We develop codes that perform state-of-the-art large-scale diagonalization for solving quantum many-body problems using the no-core shell-model.
The no-core shell model (NCSM) is a method to simulate strongly-interacting many-body systems. The main application is studies of atomic nuclei, but the method has also been used for the modelling of trapped ultracold atoms. The Chalmers group has recently made the NCSM code ANTOINE to operate in a parallel fashion.
By an efficient use of storage - all the way from CPU registers, cache, and RAM to disk - the ANTOINE code can solve problems of sizes where competing codes would use very large amounts of memory. The main computational task is a large-dimension matrix diagonalization translating into matvec and vecvec operations using the Lanczos algorithm. As a specific example, a problem where the matrix dimension is 8.5*10^9 with ~6*10^13 non-zero elements, ANTOINE uses ~100 GB of RAM (per node) and ~2 TB of scratch storage to store the matrix implicitly, while competing codes would use ~0.5 PB (of total RAM) to store the matrix explicitly. In both cases, the matrix is handled sparsely.
A paper describing some of the developments on the existing code have recently been accepted for publication in PRC: C. Forssén et al. (2017). "Large-scale exact diagonalizations reveal low-momentum scales of nuclei". In: arXiv: 1712.09951 [nucl-th].
The goal of this project is to adapt and investigate the suitability of our codes running on the new Xeon Phi / Knights Landing (KNL) platform. We believe that KNL (with its larger accessible memory than the first Xeon Phi / Knights Corner (KNC) platform) may be very suitable for our large-scale diagonalization code.
So far, nvidia-style GPUs (or KNC) have not been suitable, due to the need for large in-core memory areas. With new developments that we are currently doing, this may however change. We therefore also request access to GPU nodes.
While keeping ANTOINE's overall approach, the developments underway is a complete rewrite of the code, enabling us to address current limitations:
- By being able to cut the matrix and vector into smaller pieces, we can reduce the size of in-core memory needed to process each sub-block.
- Apply block-Lanczos methods when spectra of eigenvalues are sought.
- By being in better control of task scheduling, we come into a position to use distributed local node scratch, instead of having to rely on distributed high-performance file systems.