Public Abstract

DE-SC0022275: Bluestone: Program Translation and Synthesis for Extremely Heterogeneous Architectures

Award Status: Inactive

Institution: Carnegie Mellon University, Pittsburgh, PA
UEI: U3NKNFLNQ613
DUNS: 052184116

Most Recent Award Date: 08/10/2023
Number of Support Periods: 3
PM: Finkel, Hal

Current Budget Period: 09/01/2023 - 08/31/2024
Current Project Period: 09/01/2021 - 08/31/2024
PI: Franchetti, Franz

Supplement Budget Period: N/A

Public Abstract

Project Summary BLUESTONE: PROGRAM TRANSLATION AND SYNTHESIS FOR EXTREMELY HETEROGENEOUS ARCHITECTURES Jeffrey S. Vetter (Principal Investigator); Oak Ridge National Laboratory Franz Franchetti (Co-Investigator); Carnegie Mellon University Michael D. Franusich (Co-Investigator); SpiralGen Inc. Goal. The overarching goal of this Bluestone project is to enable new levels of performance portability of high-performance computing (HPC) and artificial intelligence (AI) applications on extremely heterogeneous (EH) architectures. Given the current trends toward architectural diversification, any software developers that strive for performance portability of their software will need improved techniques, including program synthesis, that hide the increasing complexity of emerging EH architectures. There are two key challenges for performance portability: (1) improved code generation and synthesis for EH devices on single node and across multiple EH architectures, and (2) effective run time scheduling of computation while accounting for device heterogeneity, locality, and data orchestration. Approach. Our project, named Bluestone, will take the approach of improving both the code genera- tion and run time systems by employing program synthesis techniques, including AI, constraint solving, SMT/SAT, and symbolic methods. To achieve these goals, our Bluestone project will integrate and build on several established components: (a) the LLVM/MLIR compiler ecosystem; (b) the SnowWhite high-level reasoning engine; and (c) the IRIS heterogeneous run time system. We will initially target HPC and AI ap- plications on three relevant platforms of interest to DOE: Summit (IBM and NVIDIA), Frontier (AMD), and Aurora (Intel). Concurrently, we will focus on emerging architectures including FPGAs, systems on a chip (SoCs) (e.g., Qualcomm Snapdragon), and quantum devices via QASM. We will initially target popular lan- guages and programming models in HPC, such as a C, C++, SYCL, OpenCL, and Python (NumPy/SciPy), then move to support other languages and frameworks—such as Flang/Fortran, PyTorch, TensorFlow, and Julia—as we gain experience with them. We will initially target proxy applications, moving on to respective, important DOE applications later in the project. Research objectives. Our objectives are to investigate these goals with the following tasks: (a) integrate the LLVM/MLIR compiler with SnowWhite with to facilitate advanced optimization of compute kernels offloaded to heterogeneous devices like GPUs; (b) integrate SnowWhite with the IRIS run time system to allow advanced dynamic scheduling and efficient data movement; (c) develop support for creating analyti- cal performance models that can, in turn, be ingested by SnowWhite to speed up its optimization process; (d) investigate approaches to Machine Learning (ML)-enabled transcoders for heterogeneous programming languages like OpenCL; (e) add support for quantum computing via QASM across Bluestone components to allow transparent access to QASM supported devices; and, (f) actively evaluate our ideas on proxy appli- cations, math kernels, and, later, DOE applications. Team. Our team comprises experts in the areas of programming systems and program synthesis from Oak Ridge National Laboratory (ORNL), Carnegie Mellon University, and SpiralGen Inc. We actively develop and have substantial experience with relevant software stacks, emerging test beds, HPC architectures, and a variety of HPC and AI applications. We work closely with US Department of Energy applications teams, among others, to use their applications as drivers and optimize their applications for these new architectures. We believe that these close collaborations will be necessary in the early stages of the emergence of these new architectures. For the emerging architectures, we have access to ORNL’s Experimental Computing Laboratory (https://excl.ornl.gov), which has several field-programmable gate arrays, systems on a chip, and other heterogeneous compute engines.

Project Summary
BLUESTONE: PROGRAM TRANSLATION AND SYNTHESIS FOR EXTREMELY
HETEROGENEOUS ARCHITECTURES
Jeffrey S. Vetter (Principal Investigator); Oak Ridge National Laboratory Franz Franchetti
(Co-Investigator); Carnegie Mellon University Michael D. Franusich (Co-Investigator); SpiralGen
Inc.

Goal. The overarching goal of this Bluestone project is to enable new levels of performance
portability of high-performance computing (HPC) and artificial intelligence (AI) applications on
extremely heterogeneous (EH) architectures. Given the current trends toward architectural
diversification, any software developers that strive for performance portability of their software
will need improved techniques, including program synthesis, that hide the increasing complexity of
emerging EH architectures. There are two key challenges for performance portability: (1) improved
code generation and synthesis for EH devices on single node and across multiple EH architectures,
and (2) effective run time scheduling of computation while accounting for device heterogeneity,
locality, and data orchestration.
Approach. Our project, named Bluestone, will take the approach of improving both the code
genera- tion and run time systems by employing program synthesis techniques, including AI,
constraint solving, SMT/SAT, and symbolic methods. To achieve these goals, our Bluestone project
will integrate and build on several established components: (a) the LLVM/MLIR compiler ecosystem;
(b) the SnowWhite high-level reasoning engine; and (c) the IRIS heterogeneous run time system. We
will initially target HPC and AI ap- plications on three relevant platforms of interest to DOE:
Summit (IBM and NVIDIA), Frontier (AMD), and Aurora (Intel). Concurrently, we will focus on
emerging architectures including FPGAs, systems on a chip (SoCs) (e.g., Qualcomm Snapdragon), and
quantum devices via QASM. We will initially target popular lan- guages and programming models in
HPC, such as a C, C++, SYCL, OpenCL, and Python (NumPy/SciPy), then move to support other languages
and frameworks—such as Flang/Fortran, PyTorch, TensorFlow, and Julia—as we gain experience with
them. We will initially target proxy applications, moving on to respective, important DOE
applications later in the project.
Research objectives. Our objectives are to investigate these goals with the following tasks: (a)
integrate the LLVM/MLIR compiler with SnowWhite with to facilitate advanced optimization of compute
kernels offloaded to heterogeneous devices like GPUs; (b) integrate SnowWhite with the IRIS run
time system to allow advanced dynamic scheduling and efficient data movement; (c) develop support
for creating analyti- cal performance models that can, in turn, be ingested by SnowWhite to speed
up its optimization process;
(d) investigate approaches to Machine Learning (ML)-enabled transcoders for heterogeneous
programming languages like OpenCL; (e) add support for quantum computing via QASM across Bluestone
components to allow transparent access to QASM supported devices; and, (f) actively evaluate our
ideas on proxy appli- cations, math kernels, and, later, DOE applications.
Team. Our team comprises experts in the areas of programming systems and program synthesis from
Oak Ridge National Laboratory (ORNL), Carnegie Mellon University, and SpiralGen Inc. We actively
develop and have substantial experience with relevant software stacks, emerging test beds, HPC
architectures, and a variety of HPC and AI applications. We work closely with US Department of
Energy applications teams, among others, to use their applications as drivers and optimize their
applications for these new architectures. We believe that these close collaborations will be
necessary in the early stages of the emergence of these new architectures. For the emerging
architectures, we have access to ORNL’s Experimental Computing Laboratory (https://excl.ornl.gov),
which has several field-programmable gate arrays, systems on a chip, and other heterogeneous
compute engines.