International Symposium on Grids & Clouds (ISGC) 2025

Name: International Symposium on Grids & Clouds (ISGC) 2025
Start: 2025-03-16T09:00:00+08:00
End: 2025-03-21T17:30:00+08:00
Location: BHSS, Academia Sinica

16-21 March 2025

BHSS, Academia Sinica

Asia/Taipei timezone

Contact

vhuang@gate.sinica.edu.tw

(REMOTE) Quasi interactive analysis of High Energy Physics big data with high throughput

18 Mar 2025, 14:36

18m

Room 1 (BHSS, Academia Sinica)

Room 1

BHSS, Academia Sinica

Oral Presentation Track 8: Infrastructure Clouds and Virtualizations Infrastructure Clouds & Virtualisation

Tommaso Diotalevi (INFN and University of Bologna)

The ability to ingest, process, and analyze large datasets within minimal timeframes is a milestone of big data applications. In the realm of High Energy Physics (HEP) at CERN, this capability is especially critical as the upcoming high-luminosity phase of the LHC will generate vast amounts of data, reaching scales of approximately 100 PB/year. Recent advancements in resource management and software development have enabled more flexible and dynamic data access, alongside the integration with open-source tools like Jupyter, Dask, and HTCondor. These advancements facilitate a shift from a traditional “batch-like” processing to an interactive, high-throughput platform that utilizes a distributed, parallel back-end architecture. This approach is further supported by the DataLake model developed by the Italian National Center for “High-Performance Computing, Big Data, and Quantum Computing Research Centre” (ICSC).
This contribution highlights the transition of various data analysis applications, from legacy batch processing to a more interactive, declarative paradigm using tools like ROOT RDataFrame. These applications are executed on the aforementioned cloud-based infrastructure, with workflows distributed across multiple worker nodes and results consolidated into a unified interface. Additionally, the performance of this approach will be evaluated through speed-up benchmarks and scalability tests using distributed resources. The analysis aims to identify potential bottlenecks or limitations of the high-throughput interactive model, providing insights that will guide its further development and implementation within the Italian National Center.

Dr Francesco Giuseppe Gravili (Università del Salento e INFN) Tommaso Diotalevi (INFN and University of Bologna)

ISGC2025_Diotalevi.pdf

International Symposium on Grids & Clouds (ISGC) 2025

Contact

(REMOTE) Quasi interactive analysis of High Energy Physics big data with high throughput

Room 1

BHSS, Academia Sinica

Speaker

Description

Primary authors

Presentation materials