Conveners
Data Management & Big Data
- Patrick Fuhrmann (DESY/dCache.org)
Data Management & Big Data
- Kento Aida (National Institute of Informatics)
Data Management & Big Data
- David Groep (Nikhef)
The gradual approach of the High-Luminosity LHC (HL-LHC) era poses interest in the status and in the expected behaviour of the WLCG infrastructure. In particular, efficient networks and tape storage are expected pillars of success in the upcoming LHC phase. Considering the current computing models and data volume of the LHC experiments, the requirements are mainly driven by custodial storage...
The Belle II experiment, an asymmetric energy electron-positron collider experiment, has a targeted integrated luminosity of 50 ab$^{-1}$. Data taking has already started with more than 250 fb$^{-1}$ recorded thus far. Due to the very high data volume and computing requirements, a distributed ''Grid" computing model has been adopted. Belle II recently integrated Rucio, a distributed data...
Computing operations at the Large Hadron Collider (LHC) at CERN rely on the Worldwide LHC Computing Grid (WLCG) infrastructure, designed to efficiently allow storage, access, and processing of data at the pre-exascale level.
A close and detailed study of the exploited computing systems for the LHC physics mission represents an increasingly crucial aspect in the roadmap of High Energy Physics...
Experiments and scientists, whether in the process of designing and building up a data management system or managing multi-petabyte data historically, gather in the European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures (ESCAPE) project to address computing challenges by developing common solutions in the context of the EOSC. A modular ecosystem of services and...
A substantial data volume growth is expected with the start of the HL-LHC era. Even taking into account the hardware evolution it will require substantial changes to the ways data is managed and processed. The WLCG DOMA project was established to address the relevant research, and along with the national Data Lake R&Ds it studied the possible technology solutions for the organization of...
CASTOR is the primary tape storage system of CERN and has been used for over fifteen years at IHEP. By 2021, the data volume from experiments has reached 12PB. Two replicas are saved in tape for most raw data, as a result, the capacity of CASTOR has exceeded 20PB. However, numerous factors hinder the performance of CASTOR. For example, new experiments such as JUNO and HEPS, ask for long-term...
The LHC experiments at CERN, the world’s largest particle collider, have produced an unprecedented volume of data in the history of modern science since it started operations in 2009. Up to now, more than 1 Exabyte of simulated and real data have been produced, being stored on disk and magnetic support and processed in a worldwide distributed computing infrastructure, comprising 170 centers in...
The requirement for an effective handling and management of heterogeneous and possibly confidential data continuously increases within multiple scientific domains.
PLANET (Pollution Lake ANalysis for Effective Therapy) is a INFN-funded research initiative aiming to implement an observational study to assess a possible statistical association between environmental pollution and Covid-19...
“One platform, multi Centers” is a distributed computing platform in China managed by manpower of computing center IHEP. It consists of 10 distributed computing centers which belongs to HEP related institutes and departments. The computing center of IHEP at Beijing and the big data center of IHEP-CSNS-branch at Guangdong Province contribute to 90% of its computing and storage resources, while...
The ever increasing amount of data that is produced by modern scientific facilities like EuXFEL or LHC puts a high pressure on the data management infrastructure at the laboratories. This includes poorly shareable resources of archival storage, typically, tape libraries. To achieve maximal efficiency of the available tape resources a deep integration between hardware and software components...
The Science Data and Computing System is designed for High Energy Photon Source (HEPS), which is responsible for automating the organization, processing, persistent store, analysis and distribution of the data produced from experiments. It consist of several sub-systems, including storage system, network, computing system, data analysis integrated software system (Daisy) and data management...