Public Abstract

DE-SC0024502: Cryo-Phoenix: Cryogenic and Photonic Zetta-Scale Supercomputing System Modeling

Award Status: Active

Institution: Regents of the University of California, Davis, Davis, CA
UEI: TX2DAGQPENZ5
DUNS: 047120084

Most Recent Award Date: 05/28/2025
Number of Support Periods: 2
PM: Finkel, Hal

Current Budget Period: 07/01/2024 - 03/31/2026
Current Project Period: 07/01/2023 - 03/31/2026
PI: Lowe-Power, Jason

Supplement Budget Period: N/A

Public Abstract

Objectives: We envision developing a modeling and evaluation framework for a superconducting electronics (SCE) based Supercomputing system to enable exploration of the systems for Zetta-Scale Era. We approach this by modeling a first Cryo-Phoenix System architecture with photonic interconnect to external HBM/DDR5 DRAM memories using 1) Specialized accelerator design, 2) wafer stacking, 3) device density improvement and 4) ultra-high bandwidth interconnect: 1) Specialized architecture design (several designs are already demonstrated by the authors [1],[2]) which reduces the application compute latency by 40x and reduction of external I/O by 2x-5x depending on the datapath width. 2) Superconducting die or wafer stacking, with density increase from 4M to 7M Josephson junctions per square centimeter (JJ/cm2), an approximate comparison to CMOS transistor density, would be able to achieve approximately 60 Exaflops (40x the Frontier system’s 1.5 Exaflops) within the same power envelope of 40MW, and 75 fridges (approximately half of the Frontier cabinets). 3) Increased density through lithographic and device improvements combined with stacking will enable 120 exaflops while reducing the cooling cost to approximately 30 fridges. (from 7M JJ/cm2 today to 70M JJ/cm2 in 5 years, at 10% of the cooling cost of the Frontier system). This project will also utilize specialization to reach closer to zettaflop performance. 4) Ultra-high bandwidth interconnect: The cryo-to-Room Temperature photonic interface designed by Stojanovic’s team is a novel coherent ultrafast reflective link (CURL), designed as the optimal approach to leverage the large energy cost asymmetry between 4K and room temperature (RT) environments. We project CURLs achieving 1pJ/bit (RT) wall-plug with 10-15 BER at 32 Gbps/wavelength. CURLs are replicated across wavelengths and fibers to permit 25 Tbps per SC node at 1Tbps/fiber and an aggregate 32Pb/s from the fridge cabinet Super-node.

Approach: A. Superconducting Circuit Design (RSFQ, AQFP): (Vasudevan) will design custom superconducting accelerators to accelerate compute and memory workloads using RSFQ and AQFP devices using EDA tools for Superconducting Logic. Logic models like multi-valued logic and temporal logic design [1,2] will be used to implement compact, high bandwidth and low latency accelerators for a set of chosen kernels from the NAS Parallel benchmarks and NERSC mini-apps. Surrogate model based Superconducting circuit simulation framework will be used to analyze the electromagnetic coupling for shielding issues and margin calculations. B. Superconducting Heterogeneous Chiplet Modeling (Super-gem5): (Lowe-Power, Vasudevan) will be developing the Super-gem5 framework to enable superconducting heterogenous chiplet model simulation supporting set of Graph benchmarks (LIGAR benchmarks especially). Accelerators, CPU, and memory metrics obtained from circuit design thrust will be used to configure gem5 to enable the Super-gem5 chiplet model evaluation. C. Cryo-Phoenix System Level Modeling (Super-SST):(Stojanovic, Vasudevan) will be modeling the Cryo-Phoenix system with 1K nodes with the CURL photonic interconnect models integrated to the SST simulation framework. The gem5 simulator and SST integrated interface will be used to communicate the chiplet level models from gem5 to SST. On successful modeling of the first system, we will embark on modeling a TESLA DOJO style disaggregated memory architecture to evaluate the feasibility and complexity of the system. Impacts: This project will address the 3 main goals of this DoE FOA 1) enabling serial computational rate of > 20 GHz clock speed (our design models 20 - 48 GHz blocks, up to 120 ExaFlops), 2) Addressing the < 20 MW energy consumption metric when building future supercomputing systems, 3) Exploring the several scientific workloads (NAS, LIGAR benchmarks) and 4) accounting for the data movement between the Nodes and Supernodes of proposed Cryo-Phoenix system.