ABSTRACT:
The 'next generation of the Internet' will have to serve machines much like the current Internet serves people. Preferably we should avoid some of the apparent downsides of the current situation, where a few major players dominate the Internet and the Web. This is why we started the Leiden Initiative for FAIR and Equitable Science, LIFES. In this not for profit, globally oriented...
German large scale research facilities like DESY with its PETRA III synchrotron beamlines are strong partners for universities to conduct cutting-edge experiments, which generate vast amounts of data that often exceed the universities’ storage and compute capacities for analysis. Transferring data over wide area networks is becoming ever more impractical and expensive with the growing amount...
The proliferation of distributed space systems, encompassing large-scale satellite constellations and federated ground segments, is generating unparalleled opportunities for global open science, particularly in the domains of earth observation, intelligence, security, and defense. However, this federated model poses significant challenges to operational integrity and data trustworthiness. The...
The European Open Science Cloud (EOSC) aims to create a federated, interoperable ecosystem for research data and services, enabling seamless access and reuse across disciplines. The components of this infrastructure are the EOSC nodes: distributed, compliant, and interoperable service providers from research infrastructures that make up EOSC as a whole. This work presents the design,...
Experimental data generated by scientific facilities such as synchrotron radiation and neutron sources provide a fundamental resource for scientific research. With the deepening integration of massive scientific data and artificial intelligence (AI) technologies, the paradigm of scientific inquiry is undergoing a transformative shift. This evolution places higher demands on the integrated...
Taiwan indigenous peoples (TIPs) are a branch of Austronesians or Polynesians. Persistent lack of data on TIPs made TIPs become isolated, marginalized, underdeveloped, and thus “invisible” hard-to-reach populations in the real world. I will address how scientific computing, data science, and open science are applied to build TIPD (https://osf.io/e4rvz) big data based on Taiwan’s household...
As supercomputing facilities increasingly run data analytics and artificial intelligence (AI) workloads, an efficient handling of external data sources, of storage and of data flows becomes paramount. However, often this is just addressed via optimization of parallel file systems, network connectivity and I/O libraries. In industry, it is common practice to support data-driven applications...
IFARM is a computing farm operated by GSDC/KISTI that supports computational experiments for several Korean scientific communities. IFARM has been in service since 2019 by integrating the previously separate computing‑farm services that had been provisioned for each scientific community. As of the end of 2025, the service targets are the CMS Tier‑3 and ALICE Tier‑3 communities. By mid‑2025,...
The Subset Sum Problem (SSP) asks whether there exists a subset of a given set $S$ of integers such that the sum of the elements in the subset corresponds to a predefined target t. The SSP is one of Karp's NP-complete problems and has several real-life applications. In previous works, we have shown that the SSP can be easily mapped onto a Quadratic Unconstrained Binary Optimization (aka QUBO)...
Abstract
Machine learning and physics have long been deeply intertwined, and there have been eras when their relationship came to the forefront. Even in today’s revolutionary AI development, physics has played a significant role—for example, in diffusion models. From a physics standpoint as well, an integrative perspective across various specialized domains is provided by innovative new...
Abstract:
Artificial intelligence is transforming the practice of science. From protein structure prediction to climate modelling and high-energy physics, AI techniques are accelerating discovery, enabling researchers to extract insights from datasets of unprecedented scale and complexity. At the same time, the infrastructure demand of AI-driven research (large-scale GPU computing, access to...
Commercial software development today happens almost exclusively in agile teams, using either the SAFe or Scrum framework. AI agents can communicate via MCP and take on different roles, including developer, scrum master, product owner and QA roles. The presentation discusses how to set up agile teams of agentic AIs, giving human developers the opportunity to let a group of AIs develop a...
Models of physical systems simulated on HPC clusters often produce large amounts of valuable data that need to be safely managed both during the research projects ongoing activities and afterwards. To help to derive the most benefit for scientific advancement, we use results of the Horizon Europe Project EXA4MIND applying the tools to managing Particle In Cell simulation data from research...
Reliable uncertainty quantification is essential in scientific applications, where predictive results must be supported by a transparent assessment of confidence. Among the many approaches proposed for this purpose, Conformal Prediction (CP) is especially compelling because it offers finite-sample, distribution-free coverage guarantees and can calibrate uncertainty on top of any trained model...
Future collider experiments will require fast, scalable, and highly accurate calorimeter simulation to cope with unprecedented event rates and detector granularity. While machine-learning-based simulation has become a central strategy, the next step may come from quantum-native generative models capable of learning expressive, bijective mappings between physics parameters and detector...
The increasing demands on simulation statistics for HL-LHC analyses challenge the scalability of traditional calorimeter simulation within all LHC collaborations. Fast simulation techniques based on machine learning have proven effective, yet further improvements may arise from quantum-inspired models.
In this study we investigate the feasibility of integrating Quantum Neural Network (QNN)...
Spatio-temporal data mining is effective for extracting useful information from the occurrence frequencies and patterns of real-world physical phenomena. The author has previously proposed a spatio-temporal and categorical data mining method that not only extracts occurrence frequencies and patterns from spatio-temporal features, but also performs semantic interpretation of relationships...
Scientific Computing and Data Facilities (SCDF) at Brookhaven Lab began in
1997 when the Relativistic Heavy Ion Collider (RHIC) and ATLAS Computing Facility was established. The full-service scientific computing facility has since supported some of the most notable physics experiments, including Broad RAnge Hadron Magnetic Spectrometers (BRAHMS), Pioneering High Energy Nuclear Interaction...
Forty months after its inception under the Italian National Recovery and Resilience Plan (NRRP), the National Centre for High-Performance Computing, Big Data, and Quantum Computing - established and managed by the ICSC Foundation - has reached a mature and productive phase, consolidating its role as a strategic infrastructure for research, innovation, and industrial competitiveness. The...
Modern Earth Observation (EO) platforms integrate diverse distributed components and scientific workflows across heterogeneous cloud environments. Ensuring software security, maintainability and rapid delivery within such complex systems represents a major operational challenge. To address this, we developed an AI-assisted DevSecOps framework that augments continuous integration and deployment...
The KM3NeT Collaboration is building two Cherenkov neutrino detectors in the depths of the Mediterranean Sea to study both the intrinsic properties of neutrinos and cosmic high-energy neutrino sources. Neutrinos are elementary particles with no electric charge and almost no mass, interacting only through the weak force. These characteristics allow the neutrino particles to travel straight...
Modern scientific workflows increasingly rely on Machine Learning (ML) models whose development, deployment, and validation must meet high standards of reliability, transparency, and reproducibility. Yet many scientific ML pipelines still lack robust engineering practices, making experiments difficult to track, compare, and replicate. In this contribution, we present a structured MLOps...
The rapid progress in quantum technologies and the advent of Noisy Intermediate-Scale Quantum (NISQ) devices have inaugurated a new computational paradigm. Concurrently, Artificial Intelligence continues to drive transformative advances across research and industry. Recent studies have demonstrated that the integration of these two paradigms can yield mutual benefits, fostering the exploration...
2025 is widely recognized as the Year of the AI Agent. Large language models have moved beyond conversational interfaces to become callable tools that boost productivity—evident in the rapid adoption of systems like Manus, Claude-Code, and Cursor. AI Agent technologies are also increasingly being applied in scientific research to assist in data analysis and literature exploration, as...
The massive data throughput at the High Energy Photon Source (HEPS) exposes critical I/O bottlenecks inherent in the conventional "write-then-read" data handling paradigm, severely limiting analytical throughput. To address this challenge, we have designed and developed LightVortex, an end-to-end real-time data feeding platform for HEPS. LightVortex implements a coherent processing pipeline...
Abstract
Distributed Acoustic Sensing (DAS) is revolutionizing the Earth and environmental sciences by repurposing fiber-optic infrastructure into ultra-dense seismic networks that provide unprecedented spatiotemporal resolution of the Earth's subsurface and environment. This technology is driving transformative scientific breakthroughs, ranging from precision earthquake localization...
As part of the strategic refactoring and modernization of the INFN Cloud orchestration system, the Federation Manager has been developed to enhance the flexibility, scalability, and interoperability of the distributed DataCloud infrastructure. This initiative represents a key step in the long-term evolution of INFN Cloud toward a more modular, service-oriented architecture capable of...
NACO is a comprehensive Python-based validation tool designed to ensure
compliance with attribute requirements, such as NFDI or AARG-G056. The
system provides dual interfaces - a command-line tool (naco) and a web
service (naco-web) - for validating OIDC tokens and SAML assertions
against configurable attribute specifications.
The tool supports flexible attribute checking through...
Domain shift occurs when the distributions of features, underlying behaviours or operational conditions differ between source (training) and target (test) domains, causing models to struggle when applied to data from a different context than the one used for training. To mitigate this, several transfer learning approaches have been proposed to reuse and adapt knowledge acquired in the source...
The rapid growth of data-intensive life science research requests infrastructures and services that guarantee security, compliance, and interoperability across federated environments. EPIC Cloud (Enhanced PrIvacy and Compliance Cloud) represents the highly secure region of INFN DataCloud, representing the backbone of the Italian EOSC national node. Designed to meet stringent privacy and data...
The increasing complexity and scale of modern data centers generate operational environments where the ability to detect anomalies, anticipate failures, and optimize resource usage is becoming critically important. Recent advances in machine learning and artificial intelligence offer powerful techniques for extracting actionable insights from heterogeneous monitoring data, ranging from logs...
The integration of artificial intelligence (AI) into biomedical research is transforming the analysis of complex datasets such as high-resolution images of tumor tissues. As part of a collaboration between the Italian EOSC and BBMRI-ERIC nodes, INFN and BBMRI-ERIC have launched a joint initiative to define and deploy a secure and scalable infrastructure capable of supporting AI-driven...
The Worldwide Large Hadron Collider Computing Grid (WLCG) community’s deployment of IPv6 on its worldwide storage infrastructure is very successful and has been presented by us at earlier ISGC conferences. The campaign to deploy IPv6 on CPU services and all worker nodes is progressing well. Dual-stack IPv6/IPv4 is not, however, a viable long-term solution; the HEPiX IPv6 Working Group has...
An astronomical observatory requires not only state-of-the-art telescopes but also robust computing infrastructure to archive and analyze the vast amounts of astronomical observation data. Consequently, optimizing the operation of these computing systems is a crucial issue. Adopting public cloud services is expected to reduce the Total Cost of Ownership (TCO) and allow the use of cutting-edge...
Supported by the work of the Security Operations Centre Working Group that has been reported on at ISGC previously we have built experience across the Research & Education community deploying security operations centre capabilities. This work has used the reference design of the working group, developed to provide guidance on the elements required to deploy a Security Operations Centre, in...
The FAIR principles provide a foundational framework for ensuring that scientific data is accessible and reusable, and their implementation is a central objective of the European Open Science Cloud (EOSC). However, enabling access to sensitive or confidential data while simultaneously preserving privacy, confidentiality, and usability for researchers remains an open challenge. Existing...
Large Scientific Facilities such as synchrotron radiation facility (e.g., BSRF, HEPS) and spallation neutron sources (e.g., CSNS), are generating massive, complex, and heterogeneous datasets continuously during routine operations and scientific experiments. Managing and utilizing the diverse experimental data, along with simulation results and literature-derived information, is presenting a...
The JUNO Experiment officially started data acquisition at the end of 2024, producing approximately 8TB of raw data daily, all of which must be transmitted to the offline computing platform in Beijing in real time. Nevertheless, in production, during data transmission, unstable transmission performance may occur, and existing monitoring methods cannot quickly identify the root causes....
The O-CEI project is a forward-thinking initiative aimed at addressing key challenges in the European supply chain by creating an open and cross-sector Cloud Edge DOT (CEI) continuum platform. O-CEI uses and upgrades technological innovations from previous successful projects (i.e. existing cloud technology spectrum, including meta-operating systems, cognitive cloud technologies, and...
The Jiangmen Underground Neutrino Observatory (JUNO) is a 20 kton liquid scintillator detector located in South China. Its construction as completed on November 2024. JUNO aims at an unprecedented 3% energy resolution at 1 MeV to determine neutrino mass ordering and study oscillation phenomena with high precision.
To support this ambitious physics program, JUNO relies on a Distributed...
The European Centre for Medium-Range Weather Forecasts (ECMWF) operates as both a world-leading provider of global numerical weather predictions and a major research organization advancing Earth system science.
Within its mission to deliver reliable data, forecasts, and insights for the benefit of society, ECMWF is actively involved in several European initiatives such as...
The ability to ingest, process, and analyze large datasets within minimal timeframes is a cornerstone of modern big data applications. In High Energy Physics (HEP), this need becomes increasingly critical as the upcoming High-Luminosity phase of the LHC at CERN is expected to produce data volumes approaching 100 PB per year. Recent advancements in resource management and open-source computing...
The National Institute for Nuclear Physics (INFN) manages the INFN Cloud, a federated cloud platform offering a customizable portfolio of IaaS, PaaS, and SaaS services tailored to the needs of the scientific communities it supports. The PaaS services are defined through an Infrastructure as Code approach, combining TOSCA templates to model application stacks, Ansible roles for automated...
An heterogeneous data management and analytics platform has been deployed over the years at INFN-CNAF in Bologna, with the goal to support both system administrators and user-support teams in monitoring, diagnostics, and operational decision-making across the data center, that serves multiple particle and astroparticle experiments (and beyond) in Italy and internationally. The platform is...
Ensuring the reproducibility of physics results is a critical challenge in high-energy physics (HEP). In this study, we aim to develop a system that automatically extracts analysis procedures from HEP publications and generates executable analysis code capable of reproducing published results, leveraging recent advances in large language models (LLMs).
Our approach employs open-source LLMs...
Modern instruments generate enormous datasets, but scientific insight depends on how quickly and collaboratively those data can be processed. DECTRIS CLOUD is shaping a future where researchers no longer move data, but move science into the cloud. The platform provides ready-to-use environments where experimental data, processing pipelines, and computational resources are instantly...
The Helmholtz Model Zoo (HMZ) is a cloud-based platform that provides remote access to deep learning models within the Helmholtz Association. It enables seamless inference execution via both a web interface and a REST API, lowering the barrier for scientists to integrate state-of-the-art AI models into their research.
Scientists from all 18 Helmholtz centers can contribute their models to...
This study investigates how LLMs can act as generative reasoning engines to interpret and restructure complex archival document collections. Building upon earlier work with the President’s Personal File (PPF 9: Gifts) from the Franklin D. Roosevelt Presidential Library, the research explores how an LLM can infer relationships, sequences, and contextual features from textual and descriptive...
The rapid spread of large language models (LLM) in higher education has intensified discussions about their promise as instructional support tools and their risks as enablers of academic misconduct. Depending on how they are used, LLMs can assist instructors in developing more efficient learning and evaluation materials, as well as students to prepare for a test, or they can undermine...
Sustained investment in research software requires moving beyond downloads/citations to demonstrate impact. This presentation uses the WeNMR platform — a VRE for structural biology with over 70,000 users — as a case for the "Research-Software-as-a-Service" (RSaaS) model. We will detail how the RSaaS model provides visible and quantifiable impact. By moving software into a centralized, managed...
Multivariate time series forecasting often suffers from noise interference, inconsistent dynamics across variables, and limited capacity to capture both short-term fluctuations and long-term trends. This paper proposes a novel framework that addresses these challenges through three coordinated modules. First, a channel-wise modulation mechanism selectively filters anomalous patterns by...
Advanced computing infrastructures offer unparalleled computing capabilities and effectively support a multitude of computing requirements across diverse fields such as scientific research, big data analysis, artificial intelligence training and inference, and many more. Secure Shell (SSH) is a widely used method for accessing remote computing resources. To aim for efficient and secure access...
Universities and research institutes have been early adopters of IPv4, which
have served scientific research infrastructure well in the past. But now the
time has come to let go of the legacy protocol with awkward limits, and phase
it out in favour of IPv6.
The World-wide LHC Computing Grid (WLCG) is half-way through the transition
from IPv4 to IPv6, with almost all services now being...
Large-scale scientific experiments increasingly depend on trustworthy data infrastructures and AI-driven workflows to extract reliable insights from complex sensor systems. In the ATLAS experiment at CERN’s LHC, each collision generates heterogeneous information across multiple subdetectors, making the accurate integration of these data streams essential for ensuring reconstruction quality and...
The European Open Science Cloud (EOSC) creates a federated environment where researchers across Europe can publish, find, and reuse data, tools, and services. Implemented as a 'system of systems', EOSC connects data repositories, research infrastructures, and e-infrastructures through a network of national and thematic nodes. The Dutch EOSC Node, operated by SURF, is one of the initial...
The Authentication and Authorisation for Research and Collaboration (AARC) community, funded by the AAR-TREE project is releasing the AARC Compendium, a comprehensive introductory guide to implementing federated identity management for research infrastructures and their communities. Building on the AARC Blueprint Architecture (AARC BPA), the Compendium bridges the gap between technology,...
In this contribution, we present the development of a Virtual Research Environment (VRE) for the Einstein Telescope (ET) project, specifically in the Bologna research unit, designed to support collaborative, high-performance, and reproducible research within the ET community. The Einstein Telescope is a next-generation underground gravitational-wave observatory that will explore the Universe...
Modern data centres are cyber-physical infrastructures whose reliable operation depends on the continuous interaction of electrical and mechanical subsystems. Detecting anomalous behaviour in these environments is essential for ensuring operational continuity, improving energy efficiency, and enabling early fault prediction.
We present an anomaly-detection case study using Spatio-Temporal...
In this presentation we will share our experience in providing training for
security personnel providing operational security for different types of large
distributed infrastructures.
Depending on the target audience and the topics to be addressed, these were
developed in two categories, technical hands on training and table top
exercises.
In the first category either a technical...
The IHEP computing platform is facing new demands in data analysis, including restricted access to login nodes, increasing needs for code debugging tools, and more efficient data access for collaborative workflows. To address these challenges, we have developed INK, a web-based "Interactive aNalysis worKbench" that enables users to access IHEP login nodes, cluster computing resources, and data...
Network security operations at the Institute of High Energy Physics (IHEP) face severe challenges, including massive data volumes, high alarm complexity, and low manual processing efficiency. While the current Security Operations Center (SOC) system at IHEP has improved cybersecurity operational efficiency to a certain extent through big data platforms and automated workflows, its intelligence...
This paper presents a multilayer architectural model designed to support secure, AI-driven processing of confidential legal documents. The platform integrates strong authentication and authorization mechanisms, ensuring controlled access to sensitive information in accordance with privacy and security requirements. A key component is an automated contract-generation module based on predefined...
With the deepening application of large artificial intelligence (AI) models in high energy physics (HEP) data analysis, both data access patterns and storage architectures are undergoing profound transformation. Traditional file systems face performance and scalability bottlenecks when dealing with unstructured data, high-concurrency access, and cross-site data sharing. Object storage, with...
Abstract
With the continuous expansion of large scientific facilities and critical research infrastructures, the network security operations of their network systems face growing challenges such as massive data volumes, high alarm complexity, and low efficiency of manual processing. Traditional security operations methods rely heavily on human analysis and experience, which are inadequate...
The Institute of High Energy Physics (IHEP) currently has over 2,000 network devices. Traditional network management methods are time-consuming, labor-intensive, and unable to quickly locate problems in network operations and maintenance. This report introduces an intelligent and visualized network management platform for IHEP based on the open-source network management tool Netdisco. Netdisco...
Virtual Research Environments (VREs) constitute an essential aspect in attaining globally impactful research. With rapid environmental changes worldwide, including global warming, significant challenges that require collaboration between the environmental sub-domains need to be addressed. While workflows from individual Research Infrastructures (RIs) exist, environmental change is a global...
Devising a multi-workflow scheduling algorithm is paramount to explore high performance from High Performance Computing Environment. In this article, we propose a new list scheduling algorithm taking fairness into consideration for assigning multi-workflow to heterogeneous processors. The proposed algorithm, a Fairness-aware Improved HEFT (Fair-IHEFT) algorithm is devised to schedule multiple...
In network intrusion detection systems, alert logs generated by intrusion detection devices contain a large number of false positive alert logs, which seriously impair the accuracy of security incident analysis. Thus, filtering false positive alert logs is of great significance. The essence of false positive alert filtering is a classification task: each alert log is labeled to indicate...
(The proposed topics of FAIR mainly focus on technology issues, such as making data FAIR in a machine-actionable way or generating automated metadata, not only by humans but also by algorithms.
Nevertheless, how to impose FAIR principles on the researchers and research institutions to make their data easily findable and accessible is indeed a policy issue, and a governance or a guideline is...
The experimental data generated by the High Energy Photon Source (HEPS) is featured with massive scale and high diversity, which imposes severe challenges on the real-time efficiency and long-term storage of data processing pipelines. As efficient data serialization and compression are critical to data transmission and storage, traditional fixed strategies fail to adapt to the diverse...
Proxmox VE (PVE) is a light-weighted decentralized virtualization platform based on KVM and LXC. PVE is suitable for small and medium-sized data centers. We deployed a small PVE cluster combined with NVME-based Ceph cluster to evaluate PVE's reliability and robustness, and it has been operating stably for over a year since its launch. PVE And We also developed a simple management system to...
On 3 April 2024, a moment magnitude (Mw) 7.2 earthquake struck Hualien, Taiwan, with a focal depth of 22.5 km. The event generated a tsunami that reached 89 cm at the Wushi tide gauge and produced a run-up height of approximately 2.5 m along the coastal area of Yanchao Village in Shoufeng Township. Fortunately, the tsunami occurred during low tide, resulting in no casualties. Nevertheless,...