Scalable training on scalable infrastructures for programmable hardware (Remote presentation)

22 Mar 2023, 14:00
20m
Auditorium (BHSS, Academia Sinica)

Auditorium

BHSS, Academia Sinica

Oral Presentation Track 8: Infrastructure Clouds and Virtualizations Converging Infrastructure Clouds, Virtualisation & HPC

Speaker

Marco Lorusso (Alma Mater Studiorum - University of Bologna)

Description

The increasingly pervasive and dominant role of machine learning (ML) and deep learning (DL) techniques in High Energy Physics is posing challenging requirements to effective computing infrastructures on which AI workflows are executed, as well as demanding requests in terms of training and upskilling new users and/or future developers of such technologies.

In particular, a growth in the request for training opportunities to become proficient in exploiting programmable hardware capable of delivering low latencies and low energy consumption, like FPGAs, is observed. While training opportunities on generic ML/DL concepts is rich and quite wide in the coverage of sub-topics, a gap is observed in the delivery of hands-on tutorials on ML/DL on FPGAs that can scale to a relatively large number of attendants and that can give access to a relatively diverse set of ad-hoc hardware with different hardware specs.

A pilot course on ML/DL on FPGAs - born from the collaboration of INFN-Bologna, the University of Bologna and INFN-CNAF - has been successful in paving the way for the creation of a line of work dedicated to maintaining and expanding an ad-hoc scalable toolkit for similar courses in the future. The practical sessions are based on virtual machines (for code development, no FPGAs), in-house cloud platforms (INFN-cloud infrastructure equipped with AMD/Xilinx Alveo FPGA), Amazon AWS instances for project deployment on FPGAs - all complemented by docker containers with the full environments for the DL frameworks used, as well as Jupyter Notebooks for interactive exercises. The current results and plans of work along the consolidation of such a toolkit will be presented and discussed.

Finally, a software ecosystem called Bond Machine, capable of dynamically generate computer architectures that can be synthesised in FPGA, is being considered as a suitable alternative to teach FPGA programming without entering into the low-level details, thanks to the hardware abstraction it offers which can simplify the interaction with FPGAs.

Primary authors

Alessandro Costantini (INFN-CNAF) Prof. Daniele Bonacorsi (University of Bologna) Daniele Spiga (INFN-PG) Davide Salomoni (INFN) Diego Michelotto (INFN-CNAF) Doina Cristina Duma (INFN - CNAF) Marco Lorusso (Alma Mater Studiorum - University of Bologna) Mirko Mariotti (Department of Physics and Geology, University of Perugia) Paolo Veronesi (INFN) Dr Riccardo Travaglini (INFN)

Presentation materials