15-20 March 2026
BHSS, Academia Sinica
Asia/Taipei timezone

Offloading CMS data analysis on a distributed high-throughput platform with RDataFrame

19 Mar 2026, 16:00
30m
Auditorium (3F, BHSS)

Auditorium

3F, BHSS

Oral Presentation Track 8: Infrastructure Clouds and Virtualizations Infrastructure Clouds and Virtualisations - III

Speaker

Tommaso Diotalevi (INFN and University of Bologna)

Description

The ability to ingest, process, and analyze large datasets within minimal timeframes is a cornerstone of modern big data applications. In High Energy Physics (HEP), this need becomes increasingly critical as the upcoming High-Luminosity phase of the LHC at CERN is expected to produce data volumes approaching 100 PB per year. Recent advancements in resource management and open-source computing frameworks - such as Jupyter, Dask, and HTCondor - are driving a shift from traditional batch-oriented workflows toward interactive, high-throughput analysis environments.
Within this context, and leveraging the computing resources of the Italian “National Center for High-Performance Computing, Big Data, and Quantum Computing (ICSC)”, a scalable analysis platform has been developed. Such system allows users to dynamically distribute workloads across local Kubernetes resources or offload them to remote infrastructures through interLink, a technology that extends the Virtual Kubelet concept to federate heterogeneous resources, like High-Throughput Computing (HTC), High-Performance Computing (HPC), and Cloud systems, under a unified orchestration layer.
The platform’s performance has been then evaluated using a representative use case: the study of the CMS Drift Tubes (DT) muon detector performance, in phase-space regions driven by analysis needs. By exploiting the declarative model of ROOT RDataFrame (RDF) and its distributed execution via Dask, the study demonstrates significant improvements in scalability and speed-up compared to traditional serial workflows. These results confirm the effectiveness of the proposed distributed analysis approach, addressing the computational challenges posed by the High-Luminosity LHC era.

Primary authors

Tommaso Diotalevi (INFN and University of Bologna) Carlo Battilana (University of Bologna and INFN) Alessandra Fanfani (University of Bologna and INFN) Elvira Rossi (University of Naples and INFN) Daniele Spiga (INFN-PG) Tommaso Tedeschi (University and INFN, Perugia (Italy)) Diego Ciangottini (INFN Perugia) Alessandra Doria (INFN) Silvio Pardi (INFN-Napoli) Bernardino Spisso (INFN) Simona Maria Stellacci (INFN)

Presentation materials

There are no materials yet.