The goal of this project is to develop new mathematical and algorithmic techniques capable of handling (in real-time) the growing amounts of data generated by modern fusion research. Existing numerical linear algebra (NLA) methods provide the backbone to classical data analysis and reduction algorithms and modern machine learning algorithms. However, these methods fundamentally do not port to distributed architectures. To take advantage of data acquisition capabilities of modern fusion reactors, this project will use new numerical methods with distributed computing architectures to characterize plasma dynamics, respond in real-time to discharge evolution, and to process massive-scale data accurately and rapidly more fully.
This project links expertise in multiple-sensor diagnostics of tokamak plasma dynamics from Columbia University’s Plasma Physics Laboratory and expertise in massive-scale data reduction and extreme data control algorithms at Columbia University’s Data Science Institute. This interdisciplinary project will adapt and develop randomized methods, specifically randomized-NLA (rNLA) routines, for data analysis, reduction, and real-time control of a tokamak fusion reactor. Randomized methods trade off accuracy for speed and are easily adaptable to distributed computing architectures and streaming data scenarios. All algorithms developed will incorporate rigorous statistical guarantees leveraging high dimensional statistics, and provide measures of sub-optimality as quantified by tunable parameters. The Columbia University High Beta Tokamak-Extended Pulse (HBT-EP) facility will provide the data and be used to run experiments. We will also seek to establish similar activities with the national tokamak facilities (DIII-D and NSTX-U) in order to evaluate the broader use of our algorithms and to maximize impact to the plasma science community.
The primary research tasks of this interdisciplinary project are to: i) Demonstrate high-throughput numerical methods with statistical guarantees that can characterize multiple data-streams from different diagnostics observing dynamic evolution of tokamak discharges; ii) Demonstrate how optimized massive-scale data reduction methods can be applied in realtime for tokamak discharge control; and iii) Examine how such algorithms can be used to advance our understanding of fundamental behaviors of magnetically-confined plasma.