NLAFET: Parallel Numerical Linear Algebra for Future Extreme Scale Systems

Dnr:

SNIC 2016/1-536

Type:

SNAC Medium

Principal Investigator:

Bo Kågström

Affiliation:

Umeå universitet

Start Date:

2016-12-12

End Date:

2018-01-01

Primary Classification:

10105: Beräkningsmatematik

Secondary Classification:

10205: Programvaruteknik

Tertiary Classification:

10206: Datorteknik

Webpage:

http://www.nlafet.eu

Allocation

Abstract

NLAFET, funded EU Horizon 2020 project, is a direct response to the demands for new mathematical and algorithmic approaches for applications on extreme scale systems as identified in the H2020-FETHPC work programme. The aim is to enable a radical improvement in the performance and scalability of a wide range of real-world applications relying on linear algebra software, by developing novel architecture-aware algorithms and software libraries, and the supporting runtime capabilities to achieve scalable performance and resilience on heterogeneous architectures. The focus is on a critical set of fundamental linear algebra operations including direct and iterative solvers for dense and sparse linear systems of equations and eigenvalue problems. The main research objectives are: (i) development of novel algorithms that expose as much parallelism as possible, exploit heterogeneity, avoid communication bottlenecks, respond to escalating fault rates, and help meet emerging power constraints; (ii) exploration of advanced scheduling strategies and runtime systems focusing on the extreme scale and strong scalability in multi/many-core and hybrid environments; (iii) design and evaluation of novel strategies and software support for both offline and online auto-tuning. The validation and dissemination of results will be done by integrating new software solutions into challenging scientific applications in materials science, power systems, study of energy solutions, and data analysis in astrophysics. The deliverables also include a sustainable set of methods and tools for cross-cutting issues such as scheduling, auto-tuning, and algorithm-based fault tolerance packaged into open source library modules.