5-10 March 2017
BHSS, Academia Sinica
Asia/Taipei timezone

Machine Learning analysis of CMS data transfers

7 Mar 2017, 15:00
20m
Media Conf. Room (BHSS, Academia Sinica)

Media Conf. Room

BHSS, Academia Sinica

No. 128, Sec. 2, Academia Rd., Taipei, Taiwan
Data Management & Big Data Data Management & Big Data

Speaker

Prof. Daniele Bonacorsi (University of Bologna)

Description

Tens of Petabytes of collision and simulated data have been collected and distributed across WLCG sites in Run-1 and Run-2 at LHC. A low latency in transfers among dozens of computing centres is crucial to make an efficient use of the computing resources. Despite on average the desired level of throughput has been successfully achieved to serve the LHC physics programs, it is not uncommon to observe transfer latencies caused by a large variety of causes, from file corruptions to site issues, most of which require operator intervention. To improve on this front, in particular, the CMS experiment equipped the PhEDEx dataset replication system with a system to collect the latency data, and a mechanism to categorise and analyse them promptly, matching them to quick and focussed operators intervention. The transfer latencies data has also been the target of Machine Learning techniques - already used in CMS to study and predict the dataset popularity - and preliminary results on the predictability potential of this approach will be presented and discussed.

Primary authors

Prof. Daniele Bonacorsi (University of Bologna) Dr Nicolò Magini (Fermilab (US)) Dr Valentin Kuznetsov (Cornell University)

Co-authors

Diotalevi Tommaso (University of Bologna) Kančys Kipras (Vilnius University) Matonis Zygimantas (University of Vilnius) Repečka Aurimas (Vilnius University)

Presentation materials