Public Abstract

DE-SC0021225: FAIR Framework for Physics-Inspired Artificial Intelligence in High Energy Physics

Award Status: Expired

Institution: Massachusetts Institute of Technology, Cambridge, MA
UEI: E2NYLCDML6V1
DUNS: 001425594

Most Recent Award Date: 12/14/2023
Number of Support Periods: 3
PM: Lentz, Margaret

Current Budget Period: 09/23/2022 - 06/22/2024
Current Project Period: 09/23/2020 - 06/22/2024
PI: Harris, Philip

Supplement Budget Period: N/A

Public Abstract

Project Summary FAIR Framework for Physics-Inspired AI in High Energy Physics Eliu Huerta, Lead PI, University of Illinois at Urbana-Champaign (UIUC) Daniel S. Katz, Volodymyr Kindratenko, Mark Neubauer, Zhizhen Zhao, UIUC Priscilla Cushman, Andrew Furmanski, Vuk Mandic, Roger Rusack, Ju Sun, University of Minnesota Javier Duarte, University of California San Diego Philip Harris, Massachusetts Institute of Technology Vision Showcase the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles to benchmark data sets that are timely and relevant for the High Energy Physics (HEP) community, and demonstrate how to apply them to train FAIR AI models. Use FAIR data and AI models, combined with accelerated computing and scientific visualizations, to explore the interplay between data and models, and the robustness of AI predictions to perturbations in AI architectures and testing data sets. Objectives Select data sources produced by HEP experiments and AI models that will serve as exemplars to: (i) develop and share benchmark datasets in a manner that adheres to FAIR principles. These FAIR data sets will include metadata, provenance, and annotations to facilitate their use and define their scope of applicability; (ii) demonstrate how to use FAIR benchmark datasets to train FAIR AI HEP models. Develop and FAIRly share these AI models; (iii) lead community activities to define FAIR for AI models and best practices for FAIRly sharing data and AI models, particularly within the HEP community; and (iv) use accelerated computing, scientific visualization, and domain-inspired training methodologies to gain new insights between the interplay of data and models, including model robustness to changes in architecture, hyperparameter tuning and testing data sets. Project Description We will produce and FAIRly share HEP benchmark data sets using platforms such as CERN's Open Data Portal and Zenodo. We will work with the Research Data Alliance and the Research Software Alliance to define FAIR principles for AI models. We will combine our FAIR benchmark datasets and well known AI models in HEP as drivers for this work. Finally, we will release distributed training algorithms and visualizations tools through Argonne’s DLHub to empower researchers to gain new insights about how data are processed during training and inference by AI models. These tools will reduce time-to-insight and will enable the exploration of domain-inspired optimization schemes to guide AI towards an intuitive understanding of the physical world. Potential Impacts Computational and data grand challenges in HEP provide exciting new avenues to set an example for the production of FAIR benchmark data sets in science, which will unlock the potential for FAIR AI data-driven discovery as HEP enters the High-Luminosity Large Hadron Collider (HL-LHC) era. FAIR data and AI models will significantly reduce duplication of efforts, maximizing the use of DOE computational resources and facilities for innovative AI research. These activities will further DOE's objectives towards the construction of a theoretical framework that makes the best use of AI in science and engineering.

Project Summary

FAIR Framework for Physics-Inspired AI in High Energy Physics

Eliu Huerta, Lead PI, University of Illinois at Urbana-Champaign (UIUC)

Daniel S. Katz, Volodymyr Kindratenko, Mark Neubauer, Zhizhen Zhao, UIUC

Priscilla Cushman, Andrew Furmanski, Vuk Mandic, Roger Rusack, Ju Sun, University of Minnesota

Javier Duarte, University of California San Diego

Philip Harris, Massachusetts Institute of Technology

Vision Showcase the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles

to benchmark data sets that are timely and relevant for the High Energy Physics (HEP) community, and

demonstrate how to apply them to train FAIR AI models. Use FAIR data and AI models, combined with

accelerated computing and scientific visualizations, to explore the interplay between data and models, and

the robustness of AI predictions to perturbations in AI architectures and testing data sets.

Objectives Select data sources produced by HEP experiments and AI models that will serve as exemplars

to: (i) develop and share benchmark datasets in a manner that adheres to FAIR principles. These FAIR

data sets will include metadata, provenance, and annotations to facilitate their use and define their scope

of applicability; (ii) demonstrate how to use FAIR benchmark datasets to train FAIR AI HEP models.

Develop and FAIRly share these AI models; (iii) lead community activities to define FAIR for AI models

and best practices for FAIRly sharing data and AI models, particularly within the HEP community; and

(iv) use accelerated computing, scientific visualization, and domain-inspired training methodologies to

gain new insights between the interplay of data and models, including model robustness to changes in

architecture, hyperparameter tuning and testing data sets.

Project Description We will produce and FAIRly share HEP benchmark data sets using platforms such

as CERN's Open Data Portal and Zenodo. We will work with the Research Data Alliance and the

Research Software Alliance to define FAIR principles for AI models. We will combine our FAIR

benchmark datasets and well known AI models in HEP as drivers for this work. Finally, we will release

distributed training algorithms and visualizations tools through Argonne’s DLHub to empower

researchers to gain new insights about how data are processed during training and inference by AI

models. These tools will reduce time-to-insight and will enable the exploration of domain-inspired

optimization schemes to guide AI towards an intuitive understanding of the physical world.

Potential Impacts Computational and data grand challenges in HEP provide exciting new avenues to set

an example for the production of FAIR benchmark data sets in science, which will unlock the potential

for FAIR AI data-driven discovery as HEP enters the High-Luminosity Large Hadron Collider (HL-LHC)

era. FAIR data and AI models will significantly reduce duplication of efforts, maximizing the use of DOE

computational resources and facilities for innovative AI research. These activities will further DOE's

objectives towards the construction of a theoretical framework that makes the best use of AI in science

and engineering.