Tensor-based Streaming Algorithms for Scientific Data Compression
Misha E. Kilmer, Tufts University, (Principal Investigator)
Srinivas Eswar, Argonne National Laboratory (Co-Investigator)
Vishwas Rao, Argonne National Laboratory (Co-Investigator)
Pinaki Pal, Argonne National Laboratory (Co-Investigator)
Arvind K. Saibaba, North Carolina State University (Co-Investigator)
Grey Ballard, Wake Forest University (Co-Investigator)
Aditya Devarakonda, Wake Forest University (Co-Investigator)
The purpose of this proposal is to develop a suite of randomized and tensor-based techniques to address compression of data and models in scientific applications. The objective is to realize greater computational efficiency and require less training data for better reduced-order model accuracy than obtained with traditional methods. The datasets we tackle involve observations, experiments, and simulations that are generated by DOE user facilities (e.g., the Advanced Photon Source). Both the volume and velocity of (streaming) data acquisition make it challenging to store and process these datasets for scientific discovery. The prevalent approaches typically disregard or do not fully exploit the inherent multidimensional structure of the data and models. Therefore, a fundamentally new approach is needed. To this end, we propose to unlock the multidimensional structure by computing a low-rank tensor decomposition, where “rank” depends on the specific tensor decomposition. We will consider two tensor decompositions: *M framework (developed by PI Kilmer and coauthors) and the Tucker format. The specific tensor format will be dictated by the application under consideration and the properties of the data we seek to exploit or preserve.
Randomized algorithms have become crucial for tensor decompositions because of their low computational footprint, excellent numerical stability, and suitability to high-performance computing. Building on our successes in this area, we propose to conduct research in three main directions: (1) tensor-based dynamic mode decomposition (DMD), in which we will develop a tensor-based approach for DMD based on the *M framework, with randomized algorithms to accelerate the computations and with a special emphasis on the streaming setting; (2) tensor-based model reduction, in which we will learn projection-based reduced-order models that rely in novel ways on randomized tensor decompositions; and (3) randomized algorithms for data compression in the streaming setting, with a focus on structured random matrices and which will be applicable to both the Tucker and the *M families. In addition, we will undertake the cross-cutting task of implementing the algorithms in high-performance computing settings, leveraging current and future systems such as Polaris and Aurora at the Argonne Leadership Computing Facility.
We will target data compression in three scientific applications of interest to the DOE mission: climate models (e.g., Energy Exascale Earth System Model) with regional refinement, turbulent combustion simulations in gas turbines, and x-ray ptychography.
The proposed research will deliver methods that are memory and computationally efficient, with quantifiable accuracy and rigorous error analysis, and with properties provably superior to the traditional matrix-based approaches. These new methods will advance the state of the art in randomized tensor decomposition and data compression. Additionally, the outcomes from the proposed research will greatly reduce the downstream processing tasks, thereby accelerating scientific discoveries.