Project Summary
FAIR Framework for Physics-Inspired AI in High Energy Physics
Eliu Huerta, Lead PI, University of Illinois at Urbana-Champaign (UIUC)
Daniel S. Katz, Volodymyr Kindratenko, Mark Neubauer, Zhizhen Zhao, UIUC
Priscilla Cushman, Andrew Furmanski, Vuk Mandic, Roger Rusack, Ju Sun, University of Minnesota
Javier Duarte, University of California San Diego
Philip Harris, Massachusetts Institute of Technology
Vision Showcase the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles
to benchmark data sets that are timely and relevant for the High Energy Physics (HEP) community, and
demonstrate how to apply them to train FAIR AI models. Use FAIR data and AI models, combined with
accelerated computing and scientific visualizations, to explore the interplay between data and models, and
the robustness of AI predictions to perturbations in AI architectures and testing data sets.
Objectives Select data sources produced by HEP experiments and AI models that will serve as exemplars
to: (i) develop and share benchmark datasets in a manner that adheres to FAIR principles. These FAIR
data sets will include metadata, provenance, and annotations to facilitate their use and define their scope
of applicability; (ii) demonstrate how to use FAIR benchmark datasets to train FAIR AI HEP models.
Develop and FAIRly share these AI models; (iii) lead community activities to define FAIR for AI models
and best practices for FAIRly sharing data and AI models, particularly within the HEP community; and
(iv) use accelerated computing, scientific visualization, and domain-inspired training methodologies to
gain new insights between the interplay of data and models, including model robustness to changes in
architecture, hyperparameter tuning and testing data sets.
Project Description We will produce and FAIRly share HEP benchmark data sets using platforms such
as CERN's Open Data Portal and Zenodo. We will work with the Research Data Alliance and the
Research Software Alliance to define FAIR principles for AI models. We will combine our FAIR
benchmark datasets and well known AI models in HEP as drivers for this work. Finally, we will release
distributed training algorithms and visualizations tools through Argonne’s DLHub to empower
researchers to gain new insights about how data are processed during training and inference by AI
models. These tools will reduce time-to-insight and will enable the exploration of domain-inspired
optimization schemes to guide AI towards an intuitive understanding of the physical world.
Potential Impacts Computational and data grand challenges in HEP provide exciting new avenues to set
an example for the production of FAIR benchmark data sets in science, which will unlock the potential
for FAIR AI data-driven discovery as HEP enters the High-Luminosity Large Hadron Collider (HL-LHC)
era. FAIR data and AI models will significantly reduce duplication of efforts, maximizing the use of DOE
computational resources and facilities for innovative AI research. These activities will further DOE's
objectives towards the construction of a theoretical framework that makes the best use of AI in science
and engineering.