AI and Hybrid Quantum - Shaping Tomorrow’s Scientific Breakthroughs
The rapid evolution of AI is reshaping scientific discovery across a wide range of research fields. At ISGC 2025, the focus will be on the practical deployment of AI technologies, particularly how they have been successfully integrated to accelerate research and address complex challenges. From machine learning models to large language models (LLMs), AI’s transformative impact is undeniable, and the conference will showcase real-world use cases and share valuable lessons learned in deploying these technologies. By showcasing the practical uses, ISGC 2025 aims to provide participants with actionable insights into how AI can be applied to enhance their own research and drive innovation in their fields.
In parallel, hybrid quantum computing is emerging as a promising game changer, similar to the rise of AI in recent years. As quantum computing continues to complement traditional supercomputers, it holds the potential to solve problems that were previously thought intractable. ISGC 2025 will explore the current state of hybrid quantum systems, discussing their deployment and the challenges in integrating these cutting-edge technologies. Together, AI and quantum computing represent the future of computational science, offering unprecedented opportunities for innovation.
ISGC 2025 will foster an environment for sharing ideas and solutions, helping participants refine their approaches and collaborate to overcome the challenges of deploying these transformative technologies.
Organizers:
Tobias Dussa, Sven Gabriel, David Groep, Daniel Kouril, Maarten Kremers, Sascha Kriebitzsch, Davide Vaghetti, Martin Waleczek, Marcus Hardt
Format: Workshop, presentations and table top exercise
A key element of international research projects using distributed (compute) infrastructuresis an Authentication and Authorization Infrastructure. A typical problem here isthat Service Providers (SPs) need to take an authorization decision based on the identity information provided by the user, which in the general case is neither personally known by the service provider, nor does the user has to be in the same country as the Service Provider. eduGAIN addresses this through providing an interfederation service that connects identity federations around the globe, allowing users to use their home organisation authentication credentials managed by the organisations Identity Provider (IdP) to access services provided by another institution.
In a typical setup setup in research communities is the usage of IdP-SP proxies
service. This service often makes use of token technologies which add another
dimension of challenges for the IT Security incident response in the field of of
Federated Identity Management. During the workshop we will give an introduction to the token technology and how to extract and make use of the relevant information from IdP and SP log files.
Federated Identity Management is subject of a variety of threats which need to be addressed by the eduGAIN Computer Security Incident Response Team (CSIRT).
In this workshop we will give an introduction to the eduGAIN service, how
it is organized, the IT security responsibilities of the major roles supporting the
service, and frameworks enabling the coordination of incident response across
the federated organisations (SIRTFI).
The enabled learning objectives (what the participants should learn) include: * Know how eduGAIN is organised, role of Federations, and eduGAIN CSIRT. * Know SIRTFI v2, and understand to apply it. * Tokens, technologies used here, what information is available in the log files. * IdP/SP logfile analysis (check for/find a reported Id). * Name the risks of Federated Identity Management.
After that the participants will take the described roles and apply the IT
Security Incident Response concepts presented before in a Table Top Exercise
(TTX) set-up. Although it's an "made up" scenario, it consists of real world
incidents the authors had to deal with. Since the goal here is to find possible
issues in the eduGAIN Incident Response Procedure, we invite the
participants to help us to find possible dead ends on the way to IT security
incident resolution.
In the second half of the workshop we will look at the wide topic "Risk
Management" in eduGAIN and collaboratively navigate through the terminology used there to find a way to get to an outcome which provides us with a better view on the risks associated with use of federated identity management along with possible means (Security Measures) to increase the resilience of the relevant services.
After an introduction to the to the whole seven step process of the IT Security
Risk Management Methodology (ITSRM), which is based on ISO-27k standards, we will focus on the process steps where IdP, SP, Idp-SP-Proxy managers can provide input to a risk study. These are in particular Risk Identification, Risk Analysis/Evaluation which together would fall under "Risk Assessment" in ISO 27k.
The enabled learning objectives (what the participants should learn) include: * General concepts of Risk Management with ISO/IEC 27001, 27005 and 31010, terminology. * What are the process steps of ITSRM, how to fit in the Risk Assessment in the overall process. * How to get to a basic view of the risk landscape resulting from the Risk Assessment.
Organizers:
Tobias Dussa, Sven Gabriel, David Groep, Daniel Kouril, Maarten Kremers, Sascha Kriebitzsch, Davide Vaghetti, Martin Waleczek, Marcus Hardt
Format: Workshop, presentations and table top exercise
A key element of international research projects using distributed (compute) infrastructuresis an Authentication and Authorization Infrastructure. A typical problem here isthat Service Providers (SPs) need to take an authorization decision based on the identity information provided by the user, which in the general case is neither personally known by the service provider, nor does the user has to be in the same country as the Service Provider. eduGAIN addresses this through providing an interfederation service that connects identity federations around the globe, allowing users to use their home organisation authentication credentials managed by the organisations Identity Provider (IdP) to access services provided by another institution.
In a typical setup setup in research communities is the usage of IdP-SP proxies
service. This service often makes use of token technologies which add another
dimension of challenges for the IT Security incident response in the field of of
Federated Identity Management. During the workshop we will give an introduction to the token technology and how to extract and make use of the relevant information from IdP and SP log files.
Federated Identity Management is subject of a variety of threats which need to be addressed by the eduGAIN Computer Security Incident Response Team (CSIRT).
In this workshop we will give an introduction to the eduGAIN service, how
it is organized, the IT security responsibilities of the major roles supporting the
service, and frameworks enabling the coordination of incident response across
the federated organisations (SIRTFI).
The enabled learning objectives (what the participants should learn) include: * Know how eduGAIN is organised, role of Federations, and eduGAIN CSIRT. * Know SIRTFI v2, and understand to apply it. * Tokens, technologies used here, what information is available in the log files. * IdP/SP logfile analysis (check for/find a reported Id). * Name the risks of Federated Identity Management.
After that the participants will take the described roles and apply the IT
Security Incident Response concepts presented before in a Table Top Exercise
(TTX) set-up. Although it's an "made up" scenario, it consists of real world
incidents the authors had to deal with. Since the goal here is to find possible
issues in the eduGAIN Incident Response Procedure, we invite the
participants to help us to find possible dead ends on the way to IT security
incident resolution.
In the second half of the workshop we will look at the wide topic "Risk
Management" in eduGAIN and collaboratively navigate through the terminology used there to find a way to get to an outcome which provides us with a better view on the risks associated with use of federated identity management along with possible means (Security Measures) to increase the resilience of the relevant services.
After an introduction to the to the whole seven step process of the IT Security
Risk Management Methodology (ITSRM), which is based on ISO-27k standards, we will focus on the process steps where IdP, SP, Idp-SP-Proxy managers can provide input to a risk study. These are in particular Risk Identification, Risk Analysis/Evaluation which together would fall under "Risk Assessment" in ISO 27k.
The enabled learning objectives (what the participants should learn) include: * General concepts of Risk Management with ISO/IEC 27001, 27005 and 31010, terminology. * What are the process steps of ITSRM, how to fit in the Risk Assessment in the overall process. * How to get to a basic view of the risk landscape resulting from the Risk Assessment.
For Online Participants (Zoom):
https://ds-musashino-u.zoom.us/j/94552838552?pwd=wBxKegE9k8ZXXFw09BKTLt3SOMbwmb.1
ID: 945 5283 8552
PASSCODE: 1MBQia
Organizers: Tosh Yamamoto (Kansai University of International Studies), Yasuhiro Hayashi (Musashino University), Zhi Zhang (Kansai University of International Studies), Thapanee Thammetar (Silpakorn University, Thailand), Juling Shih (Network Learning Technology National Central University, Taiwan), Jintavee Khlaisang (Chulalongkorn University)
Abstract
This workshop explores the ascension of authentic learning in post-pandemic education, enhanced by illuminative AI. It
addresses essential factors and fundamental components of
authentic learning from educators with diverse global teaching experiences. Today's university students, having experienced lockdowns during crucial developmental years, often lack fully
developed social and communication skills due to limited in-person interactions. Traditional educational models focusing solely on explicit knowledge through rote memorization are now obsolete. The workshop aims to redesign AI-illuminated liberal education, incorporating active and authentic learning within global and constructive paradigms. It features innovative
showcases from experts in Collaborative Online International Learning (COIL), Englishmediated instruction (EMI), STEAM education, and game-based global issue discussion activities. Renowned educators from various educational tiers share AI experiences of teaching and learning to help attendees reshape their mindsets for the ascension of authentic learning in the post-pandemic era. By addressing these critical issues, the workshop seeks to equip educators with strategies to nurture essential skills in students and prepare them for the
challenges of a rapidly evolving, post-pandemic world. Furthermore, as a concrete approach to promoting our research and educational collaboration, we will discuss international curriculum design and credit transfer using online university education and research frameworks.
Participants:
Dr. Tosh Yamamoto, Kansai University, Japan
Dr. Tashi, Royal University of Bhutan, Bhutan
Dr. Ru-Shan Chen, Chihlee University of Technology, Taiwan
Dr. Juling Shih, National Central University, Taiwan
Dr. Tasi Chen Hung (Jerry) , National Central University, Taiwan
Dr. Thapanee Thammetar, Silpakorn University, Thailand
Dr. Jintavee Khlaisang, Chulalongkorn University, Thailand
Dr. Anuchai Theeraroungchaisri, Chulalongkorn University, Thailand
Dr. Anirut Satiman, Silpakorn University, Thailand
Dr. Rusada Natthaphatwirata, Prince of Songkla University, Thailand
Dr. Prakob Koraneekij, Chulalongkorn University, Thailand
Mr. Koji Yamada, Japan International Cooperation Agency
Dr. Yasuhiro Hayashi, Musashino University, Japan
Dr. Virach Sornlertlamvanich, Musashino University, Japan
Mr. Takashi Kumagai, Musashino University, Japan
Tosh explores the need for a paradigm shift in education to address the evolving demands of the post-pandemic era. By critically examining current challenges and issues in education, he introduces innovative curriculum designs aimed at fostering future-ready learners. Key topics discussed include:
(i) What skills are needed for success in the future?
(ii) What to learn? The importance of tacit knowledge: Future skills go beyond acquiring
explicit knowledge; tacit knowledge plays a crucial role in human resource development.
(iii) Creating a learning ambiance for active and authentic learning: How can we implement these skills effectively in real-world, meaningful learning environments?
These insights provide a foundation for reimagining education better to prepare learners for the complexities of the future.
This workshop aims to identify the essential issues for authentic education, especially in the realm of authentic learning in the Post-Pandemic era, and then showcase innovative educational practices and their progress. We intend to share some successful educational experiences with participants so that they will be inspired to get involved in conducting authentic learning for the future generation.
Renowned and experienced educators will share their experiences and report progress in their educational plans.
2.1. Showcase Presentations and Discussions
(i) Future Skills
Session 1: Authentic learning enhanced with AI and Meta-Learning Learning Analytics
Dr. Tosh Yamamoto, Kansai University
Dr. Yasuhiro Hayashi, Musashino University
A Proposal for Implementing Authentic Assessment Enhanced with Academic Integrity in New Education Normal
From the perspective of qualitative evaluation methods of learning analytics using machine learning text mining techniques, we propose that it may be possible to realize a new authentic evaluation approach for classroom active learning. Learning evaluation methods in the active learning educational paradigm cannot rely heavily on traditional comprehensive evaluation methods in quantitative methods, so formative and qualitative evaluations that reflect the
learning process evidenced in the learning outcomes or footprints, become the main part of assessment. Therefore, here we propose an approach to analyze the written reflections of the students themselves at the end of the course. We propose to develop a learning evaluation method that involves the main educational stakeholders, i.e., students, as the main players in the evaluation process. This paper deals with an evaluation strategy for authentic assessment that incorporates academic integrity, with reference to an evaluation strategy piloted in the 2021 academic year.
Keywords:
Data Scientific Approach to Assessment for Learning
Making Students Ready for the VUCA-full Future
Balancing Theory and Practice in Modern Education
Building Global Standards for Academic Collaboration
Organizers:
Tobias Dussa, Sven Gabriel, David Groep, Daniel Kouril, Maarten Kremers, Sascha Kriebitzsch, Davide Vaghetti, Martin Waleczek, Marcus Hardt
Format: Workshop, presentations and table top exercise
A key element of international research projects using distributed (compute) infrastructuresis an Authentication and Authorization Infrastructure. A typical problem here isthat Service Providers (SPs) need to take an authorization decision based on the identity information provided by the user, which in the general case is neither personally known by the service provider, nor does the user has to be in the same country as the Service Provider. eduGAIN addresses this through providing an interfederation service that connects identity federations around the globe, allowing users to use their home organisation authentication credentials managed by the organisations Identity Provider (IdP) to access services provided by another institution.
In a typical setup setup in research communities is the usage of IdP-SP proxies
service. This service often makes use of token technologies which add another
dimension of challenges for the IT Security incident response in the field of of
Federated Identity Management. During the workshop we will give an introduction to the token technology and how to extract and make use of the relevant information from IdP and SP log files.
Federated Identity Management is subject of a variety of threats which need to be addressed by the eduGAIN Computer Security Incident Response Team (CSIRT).
In this workshop we will give an introduction to the eduGAIN service, how
it is organized, the IT security responsibilities of the major roles supporting the
service, and frameworks enabling the coordination of incident response across
the federated organisations (SIRTFI).
The enabled learning objectives (what the participants should learn) include: * Know how eduGAIN is organised, role of Federations, and eduGAIN CSIRT. * Know SIRTFI v2, and understand to apply it. * Tokens, technologies used here, what information is available in the log files. * IdP/SP logfile analysis (check for/find a reported Id). * Name the risks of Federated Identity Management.
After that the participants will take the described roles and apply the IT
Security Incident Response concepts presented before in a Table Top Exercise
(TTX) set-up. Although it's an "made up" scenario, it consists of real world
incidents the authors had to deal with. Since the goal here is to find possible
issues in the eduGAIN Incident Response Procedure, we invite the
participants to help us to find possible dead ends on the way to IT security
incident resolution.
In the second half of the workshop we will look at the wide topic "Risk
Management" in eduGAIN and collaboratively navigate through the terminology used there to find a way to get to an outcome which provides us with a better view on the risks associated with use of federated identity management along with possible means (Security Measures) to increase the resilience of the relevant services.
After an introduction to the to the whole seven step process of the IT Security
Risk Management Methodology (ITSRM), which is based on ISO-27k standards, we will focus on the process steps where IdP, SP, Idp-SP-Proxy managers can provide input to a risk study. These are in particular Risk Identification, Risk Analysis/Evaluation which together would fall under "Risk Assessment" in ISO 27k.
The enabled learning objectives (what the participants should learn) include: * General concepts of Risk Management with ISO/IEC 27001, 27005 and 31010, terminology. * What are the process steps of ITSRM, how to fit in the Risk Assessment in the overall process. * How to get to a basic view of the risk landscape resulting from the Risk Assessment.
For Online Participants (Zoom):
https://ds-musashino-u.zoom.us/j/94552838552?pwd=wBxKegE9k8ZXXFw09BKTLt3SOMbwmb.1
ID: 945 5283 8552
PASSCODE: 1MBQia
Organizers: Tosh Yamamoto (Kansai University of International Studies), Yasuhiro Hayashi (Musashino University), Zhi Zhang (Kansai University of International Studies), Thapanee Thammetar (Silpakorn University, Thailand), Juling Shih (Network Learning Technology National Central University, Taiwan), Jintavee Khlaisang (Chulalongkorn University)
Abstract
This workshop explores the ascension of authentic learning in post-pandemic education, enhanced by illuminative AI. It
addresses essential factors and fundamental components of
authentic learning from educators with diverse global teaching experiences. Today's university students, having experienced lockdowns during crucial developmental years, often lack fully
developed social and communication skills due to limited in-person interactions. Traditional educational models focusing solely on explicit knowledge through rote memorization are now obsolete. The workshop aims to redesign AI-illuminated liberal education, incorporating active and authentic learning within global and constructive paradigms. It features innovative
showcases from experts in Collaborative Online International Learning (COIL), Englishmediated instruction (EMI), STEAM education, and game-based global issue discussion activities. Renowned educators from various educational tiers share AI experiences of teaching and learning to help attendees reshape their mindsets for the ascension of authentic learning in the post-pandemic era. By addressing these critical issues, the workshop seeks to equip educators with strategies to nurture essential skills in students and prepare them for the
challenges of a rapidly evolving, post-pandemic world. Furthermore, as a concrete approach to promoting our research and educational collaboration, we will discuss international curriculum design and credit transfer using online university education and research frameworks.
Participants:
Dr. Tosh Yamamoto, Kansai University, Japan
Dr. Tashi, Royal University of Bhutan, Bhutan
Dr. Ru-Shan Chen, Chihlee University of Technology, Taiwan
Dr. Juling Shih, National Central University, Taiwan
Dr. Tasi Chen Hung (Jerry) , National Central University, Taiwan
Dr. Thapanee Thammetar, Silpakorn University, Thailand
Dr. Jintavee Khlaisang, Chulalongkorn University, Thailand
Dr. Anuchai Theeraroungchaisri, Chulalongkorn University, Thailand
Dr. Anirut Satiman, Silpakorn University, Thailand
Dr. Rusada Natthaphatwirata, Prince of Songkla University, Thailand
Dr. Prakob Koraneekij, Chulalongkorn University, Thailand
Mr. Koji Yamada, Japan International Cooperation Agency
Dr. Yasuhiro Hayashi, Musashino University, Japan
Dr. Virach Sornlertlamvanich, Musashino University, Japan
Mr. Takashi Kumagai, Musashino University, Japan
Organizers:
Tobias Dussa, Sven Gabriel, David Groep, Daniel Kouril, Maarten Kremers, Sascha Kriebitzsch, Davide Vaghetti, Martin Waleczek, Marcus Hardt
Format: Workshop, presentations and table top exercise
A key element of international research projects using distributed (compute) infrastructuresis an Authentication and Authorization Infrastructure. A typical problem here isthat Service Providers (SPs) need to take an authorization decision based on the identity information provided by the user, which in the general case is neither personally known by the service provider, nor does the user has to be in the same country as the Service Provider. eduGAIN addresses this through providing an interfederation service that connects identity federations around the globe, allowing users to use their home organisation authentication credentials managed by the organisations Identity Provider (IdP) to access services provided by another institution.
In a typical setup setup in research communities is the usage of IdP-SP proxies
service. This service often makes use of token technologies which add another
dimension of challenges for the IT Security incident response in the field of of
Federated Identity Management. During the workshop we will give an introduction to the token technology and how to extract and make use of the relevant information from IdP and SP log files.
Federated Identity Management is subject of a variety of threats which need to be addressed by the eduGAIN Computer Security Incident Response Team (CSIRT).
In this workshop we will give an introduction to the eduGAIN service, how
it is organized, the IT security responsibilities of the major roles supporting the
service, and frameworks enabling the coordination of incident response across
the federated organisations (SIRTFI).
The enabled learning objectives (what the participants should learn) include: * Know how eduGAIN is organised, role of Federations, and eduGAIN CSIRT. * Know SIRTFI v2, and understand to apply it. * Tokens, technologies used here, what information is available in the log files. * IdP/SP logfile analysis (check for/find a reported Id). * Name the risks of Federated Identity Management.
After that the participants will take the described roles and apply the IT
Security Incident Response concepts presented before in a Table Top Exercise
(TTX) set-up. Although it's an "made up" scenario, it consists of real world
incidents the authors had to deal with. Since the goal here is to find possible
issues in the eduGAIN Incident Response Procedure, we invite the
participants to help us to find possible dead ends on the way to IT security
incident resolution.
In the second half of the workshop we will look at the wide topic "Risk
Management" in eduGAIN and collaboratively navigate through the terminology used there to find a way to get to an outcome which provides us with a better view on the risks associated with use of federated identity management along with possible means (Security Measures) to increase the resilience of the relevant services.
After an introduction to the to the whole seven step process of the IT Security
Risk Management Methodology (ITSRM), which is based on ISO-27k standards, we will focus on the process steps where IdP, SP, Idp-SP-Proxy managers can provide input to a risk study. These are in particular Risk Identification, Risk Analysis/Evaluation which together would fall under "Risk Assessment" in ISO 27k.
The enabled learning objectives (what the participants should learn) include: * General concepts of Risk Management with ISO/IEC 27001, 27005 and 31010, terminology. * What are the process steps of ITSRM, how to fit in the Risk Assessment in the overall process. * How to get to a basic view of the risk landscape resulting from the Risk Assessment.
Welcome message from Chair of Program Committee, Prof. Ludek Matyska and Chair of Organizing Committee, Prof. Song-Ming Wang
After years of anticipation, quantum computing is finally here as evidenced by many ongoing projects around the world. In June 2024, the Leibniz Supercomputing Centre (LRZ) publicly demonstrated how a job run on a supercomputer is assisted by a superconducting quantum accelerator. Although this was only the first public demonstration, it gives a clear indication of the potential of hybrid quantum computing. Supercomputers, which are the most powerful machines on the planet, will use integrated quantum computers whenever quantum computation is advantageous. This talk will provide an update on quantum activities at the LRZ, where 3 quantum computers are already working in conjunction with HPC systems, while 2 more are approaching. Obviously, the current first version prototypes need to be transformed into production systems, while quantum computing itself is on its way to maturity. This will require not only improved quantum hardware, but also advances in quantum software and quantum-HPC integration.
Dieter Kranzlmueller is full professor of computer science at the Ludwig-Maximilians-Universitaet Muenchen (LMU), chairman of the board of the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, member of the board of the German national Gauss Centre for Supercomputing (GCS) and member of the board of directors of the Center for Digital Technology & Management (CDTM). He serves as a founding member of the IT:U Linz, board member of the Heidelberg Institute for Theoretical Studies (HITS), member of the Senate of the national research data infrastructure (NFDI), and member of the strategic advisory board of DFN, the german Research and Education Network. He chairs the MNM-Team (Munich Network Management Team), which is engaged in networks and distributed systems in general, and networks, grids, clouds and HPC in particular.
Abstract:
As artificial Intelligence (AI) technology rapidly develops, together with high performance computing, new ways to scientific research are enabled. This talk will highlight NVIDIA’s solutions for agentic AI, physical AI, and biomolecular AI. Come discover how these tools are being applied in real-world scenarios to drive forward biomedical research. Learn from practical examples that showcase the transformative power of AI and high-performance computing, and uncover how these advancements are shaping the future of research across disciplines.
Bio:
Ying-Ja Chen, Ph.D. 陳映嘉
Solutions Architect, NVIDIA
Ying-Ja Chen holds a B.Sc. in electrical engineering from National Taiwan University, a Ph.D. in bioengineering from UC San Diego, and completed postdoctoral training at MIT. She has 10+ years R&D experience in the biotech industry, as Associate Director, Bioinformatics and AI Division at ACT Genomics, RD Manager at Acer, and Associate Director, Technology Partnerships at Insilico Medicine. She is currently a Solutions Architect at NVIDIA. Her research spans from sequencing technology development and synthetic biology to bioinformatics and AI applications in medical diagnostics and drug discovery. She has published in journals such as Lab-on-a-Chip and Nature Methods, and is the inventor of several patents.
Dr. Yeh will introduce the recently announced Willow superconducting quantum processor developed in Google Quantum AI and its performance on error correction.
Machine learning, particularly deep neural networks, has been widely used in high-energy physics, demonstrating remarkable results in various applications. Furthermore, the extension of machine learning to quantum computers has given rise to the emerging field of quantum machine learning. In this paper, we propose the Quantum Complete Graph Neural Network (QCGNN), which is a variational quantum algorithm based model designed for learning on complete graphs. QCGNN with deep parametrized operators offers a polynomial speedup over its classical and quantum counterparts, leveraging the property of quantum parallelism. We investigate the application of QCGNN with the challenging task of jet discrimination, where the jets are represented as complete graphs. Additionally, we conduct a comparative analysis with classical models to establish a performance benchmark.
Quantum Machine Learning (QML) faces significant challenges, particularly in encoding classical data and the reliance on quantum hardware for inference, limiting its practical applications. Meanwhile, classical large language models (LLMs) demand immense computational resources and exhibit low training efficiency, leading to substantial cost and scalability concerns. This talk will introduce Quantum Parameter Adaptation (QPA), a research work recently accepted at the top-tier AI conference ICLR 2025. QPA leverages quantum neural networks (QNNs) to generate parameters during training, while inference remains entirely classical. Applied to LLM fine-tuning, QPA significantly reduces the number of trainable parameters while maintaining or even improving performance, making fine-tuning large-scale models more efficient. By bridging quantum computing with large language models, this approach highlights how quantum technology can enhance modern AI, positioning it as a key enabler for the future of intelligent computing.
This track will focus on the development of cloud infrastructures and on the use of cloud computing and virtualization technologies in large-scale (distributed) computing environments in science and technology. We solicit papers describing underlying virtualization and "cloud" technology including integration of accelerators and support for specific needs of AI/ML and DNN, scientific applications and case studies related to using such technology in large scale infrastructure as well as solutions overcoming challenges and leveraging opportunities in this setting. Of particular interest are results exploring the usability of virtualization and infrastructure clouds from the perspective of machine learning and other scientific applications, the performance, reliability and fault-tolerance of solutions used, and data management issues. Papers dealing with the cost, price, and cloud markets, with security and privacy, as well as portability and standards, are also most welcome.
The National Institute for Nuclear Physics (INFN) has been managing and supporting Italy’s largest distributed research and academic infrastructure for decades. In March 2021, INFN introduced "INFN Cloud," a federated cloud infrastructure offering a customizable service portfolio designed to meet the needs of the scientific communities it serves. This portfolio includes standard IaaS solutions as well as more advanced PaaS and SaaS offerings, all tailored to the specific requirements of individual communities. The PaaS services are defined using an Infrastructure as Code approach, combining TOSCA templates to model application stacks, Ansible roles for automated configuration of virtual environments, and Docker containers to package high-level application software and runtimes. The INFN Cloud platform’s federation middleware is based on the INDIGO PaaS Orchestration system, which integrates multiple open-source microservices. Among these, the INDIGO PaaS Orchestrator handles high-level deployment requests from users and orchestrates the deployment process across various IaaS platforms.
In this contribution, we will present the recently introduced functionalities and newly developed microservices in the INFN Cloud platform. Due to the obsolescence of certain PaaS components, the development and integration of new microservices became necessary, leveraging modern technologies to replace outdated solutions. For example, the method for collecting information about the resources made available by the federated cloud providers has been significantly refactored by adopting a Neo4j graph database. This enables efficient horizontal scaling to handle high-throughput and large datasets, while offering a REST API interface secured by OpenID Connect/OAuth2 for authentication and authorization. Regarding the PaaS Orchestrator dashboard, an updated version has been released, featuring an improved graphical interface and enhanced functionalities. In particular, the interaction with deployments has been refined, improving the user experience and extending the offered capabilities.
Additionally, new PaaS services have been designed, implemented, and made available to end users, such as the Kubernetes Cluster one that enables the transparent offloading of Kubernetes workloads to remote computation systems. As for SaaS services in the portfolio, we offer an object storage solution based on the Ceph Rados Gateway backend, complemented by a custom web Graphical User Interface developed in-house.
The evolution of the INDIGO PaaS Orchestration system also includes the adoption of modern DevOps practices, like the introduction of automated deployment pipelines and streamlined development workflows to ensure the rapid delivery of new features and improvements.
In 2021, the National Institute for Nuclear Physics (INFN) launched the INFN Cloud orchestrator system to support Italy’s largest research and academic distributed infrastructure. The INFN Cloud orchestration system is an open-source middleware designed to seamlessly federate heterogeneous computing environments, including public and private resource providers, container platforms, and more. It provides a customizable service portfolio, crafted to suit the distinct needs of specific communities. It supports standard Infrastructure as a Service (IaaS) options, advanced Platform as a Service (PaaS) configurations and useful Software as a Service (SaaS) solutions, such as Jupyter Hub, Kubernetes, Spark, and HTCondor clusters. Its primary function resides in orchestrating the deployment of virtual infrastructures, ranging from simple to intricate setup, providing users with convenient access and operational control.
At the heart of the federation middleware of the INFN Cloud platform lies the INDIGO PaaS Orchestrator system. This orchestration suite consists of a set of interconnected open-source micro-services. Among them, there is the orchestrator component that is a Java REST API enable to manage high-level deployment requests to federated cloud providers. Those micro-services play a crucial role in assisting the orchestrator by facilitating the selection of the optimal provider among all the providers available in the federated environments and managing the communication within the federated environment.
The most recent software upgrades can be intended as the first steps toward the definition of a new architecture based on message exchange between micro-services and exploiting Machine Learning for the optimal resource provider selection. In this context, a plan to replace the existing micro-services, exploiting newer and modern technologies, is under definition and will be adopted in the next period. In particular, the AI-ranker, devoted to the smart choice of the best provider, and the Federation-Registry, devoted to collect different information from the federated providers, will replace and evolve the in-use services providing undersized features.
Adopting a similar approach, new components will be implemented to introduce advanced features, like the adoption of open-source infrastructure as code tools aimed at extending the interaction to containerized platforms like Kubernetes. As an added value, among the renovation plans there is the adoption of the Kafka queue mechanism to manage the PaaS deployments and deliver the deployment details to the INDIGO PaaS Orchestrator micro-services.
Finally, with the continuous growth of the number and quality of micro-services, a reliable and automated procedure aimed at securing and simplifying the deployment procedure of the core services is under definition.
The ability to ingest, process, and analyze large datasets within minimal timeframes is a milestone of big data applications. In the realm of High Energy Physics (HEP) at CERN, this capability is especially critical as the upcoming high-luminosity phase of the LHC will generate vast amounts of data, reaching scales of approximately 100 PB/year. Recent advancements in resource management and software development have enabled more flexible and dynamic data access, alongside the integration with open-source tools like Jupyter, Dask, and HTCondor. These advancements facilitate a shift from a traditional “batch-like” processing to an interactive, high-throughput platform that utilizes a distributed, parallel back-end architecture. This approach is further supported by the DataLake model developed by the Italian National Center for “High-Performance Computing, Big Data, and Quantum Computing Research Centre” (ICSC).
This contribution highlights the transition of various data analysis applications, from legacy batch processing to a more interactive, declarative paradigm using tools like ROOT RDataFrame. These applications are executed on the aforementioned cloud-based infrastructure, with workflows distributed across multiple worker nodes and results consolidated into a unified interface. Additionally, the performance of this approach will be evaluated through speed-up benchmarks and scalability tests using distributed resources. The analysis aims to identify potential bottlenecks or limitations of the high-throughput interactive model, providing insights that will guide its further development and implementation within the Italian National Center.
High-Energy Physics (HEP) experiments involve a unique detector signature - in terms of detector efficiency, geometric acceptance, and software reconstruction - that distort the original observable distributions with smearing and biasing stochastic terms. Unfolding is a statistical technique used to reconstruct these original distributions, bridging the gap between experimental data and theoretical predictions. The emerging technology of Quantum Computing offers potential improvements for unfolding, by addressing its computational complexity. To accomplish this task, a simple Python module named QUnfold has been developed, addressing such a challenge by means of the quantum annealing optimization process. In such a context, the regularized log-likelihood minimization formulation - required by the unfolding problem - is translated into a Quantum Unconstrained Binary Optimization (QUBO) model, which can be solved via quantum annealing systems.
Despite being a promising approach to tackle the unfolding problem, the scalability of Simulated Annealing poses hard challenges, especially with the increasing data volume expected during the high-luminosity phase of the LHC. To address this, the QUnfold library is being adapted to a distributed, high-throughput platform using tools like Jupyter, Dask, and HTCondor, offering users a more flexible and dynamic data access, as well as speeding up the overall execution time by distributing the workload. The approach is validated on Monte Carlo samples from the CMS Collaboration simulated at generator level (thus containing the parton level observables) and reconstructed with the full pipeline used in data taking conditions. A comparison between the current implementation of QUnfold - running serially on a local machine - and the distributed implementation using Dask is provided, highlighting the speedup in terms of the number of worker nodes used for the computation.
In computer science, monitoring and accounting involve tracking and managing the usage of system resources in IT environments by users, applications, or processes. These activities typically encompass monitoring CPU usage, memory allocation, disk space, network bandwidth, and other critical resources. The insights obtained through activity tracking and analysis serve several purposes. Resource allocation enables administrators to distribute resources effectively, ensuring fair usage and preventing monopolization. Cost management is particularly critical in cloud computing and shared systems, where users or organizations are billed based on resource consumption. Performance optimization identifies resource bottlenecks or inefficient processes, enhancing overall system performance. Security focuses on detecting unauthorized or anomalous activities to prevent misuse or cyberattacks. Finally, auditing and compliance ensure the maintenance of detailed logs to meet regulatory or organizational requirements. In summary, monitoring and accounting are pivotal for managing and optimizing system performance, cost-efficiency, and security in IT environments.
In this context, monitoring and accounting mechanisms have been designed and implemented within the INFN Cloud infrastructure, a private cloud offering INFN users a comprehensive and integrated set of cloud services, and within DARE, a European project aimed at managing sensitive data and developing solutions for population surveillance, prevention, health promotion, and security. Due to its distributed nature, INFN Cloud leverages a network of geographically dispersed data centers across Italy, introducing additional challenges in monitoring and accounting. When a user from a specific community requests a cloud service, computational resources are allocated to the data center best suited to meet the user's requirements, based on predefined criteria such as resource availability. In the context of DARE, and in projects involving sensitive data more generally, these methods are both beneficial and essential for ensuring comprehensive control over infrastructure activities.
This presentation provides an overview of the monitoring and accounting architecture developed and implemented within the INFN Cloud and DARE infrastructure, with a focus on the methods employed for generating, collecting, and analyzing relevant data. The proposed approach not only enhances operational efficiency but also offers a scalable model suitable for distributed cloud infrastructures.
"Quantum computers promise transformative advancements in computation, yet their performance remains critically hindered by environmental noise. Qubits, the fundamental units of quantum information, are inherently fragile and highly sensitive to even minimal disturbances from their surroundings. Factors such as electromagnetic interference: We introduce the Telemetry Project, an initiative designed to measure and analyze environmental factors that disturb quantum systems. By integrating Internet of Things (IoT) technologies with quantum computing and leveraging high-performance computing techniques for concurrent data analysis from a centralized database, our approach provides deeper insights into the interplay between environmental noise and quantum system performance. This integration not only advances our understanding of quantum behavior in real-world conditions but also paves the way for developing more resilient and reliable quantum computations in an HPC environment."
As 2025 marks the International Year of Quantum Science and Technology, Taiwan's quantum computing education resources are expanding rapidly. EntangleTech, as a leading organization dedicated to fostering quantum education among high school and university students, has been actively developing a structured learning ecosystem. Our mission is to consolidate Taiwan’s diverse quantum computing education resources and establish a systematic framework that enhances students’ accessibility and comprehension of quantum technologies.
In this talk, we will review the progress and achievements of the Quantum Computing Student Conference (Qracon) over the years and highlight EntangleTech’s contributions to quantum education in Taiwan. Furthermore, we will outline our future initiatives, including efforts to integrate educational resources, strengthen industry-academia collaboration, and expand international engagement. These strategies aim to position Taiwan as a key player in the global landscape of quantum science education and talent cultivation.
Abstract: Amid rising geopolitical tensions, how can we ensure that the advancement in quantum technologies is inclusive, secure and beneficial to humanity? Based on the GESDA-approach, the Open Quantum Institute (OQI), a multilateral governance initiative, hosted at CERN, born at GESDA and supported by UBS, promotes global and inclusive access to quantum computing and the development of applications for the benefit of humanity. OQI brings together a global community of academic, industry, diplomacy and education leaders. One of its four key objectives is to advance capacity building, by engaging diplomatic, academic and industry-led communities in quantum diplomacy dialogues, and providing a platform to collectively identify, co-shape and implement best practices. OQI and its partners support quantum ecosystems to become more responsive to the unique needs of diverse communities. As such, the OQI is anticipatory science diplomacy in practice.
Virtual Research Environments (VRE) provide an intuitive, easy-to-use and secure access to (federated) computing resources for solving scientific problems, trying to hide the complexity of the underlying infrastructure, the heterogeneity of the resources, and the interconnecting middleware. Behind the scenes, VREs comprise tools, middleware and portal technologies, workflow automation as well a security solutions for layered and multifaceted applications. Topics of interest include but are not limited to: (1) Real-world experiences building and/or using VREs to gain new scientific knowledge; (2) Middleware technologies, tools, services beyond the state-of-the-art for VREs; (3) Science gateways as specific VRE environments, (4) Innovative technologies to enable VREs on arbitrary devices, including Internet-of-Things; and (5) One-step-ahead workflow integration and automation in VREs.
DESY, a leading European synchrotron facility, has taken a significant step towards making research data publicly available by establishing a metadata catalogue and data analysis portal. This development is in line with the Open and FAIR data principles, which aim to make data easily discoverable, accessible, and reusable for the wider scientific community.
The metadata catalogue, Scicat, provides a comprehensive overview of public research data, making it easier for scientists to find and access relevant data sets. The catalogue is accessible through federated user accounts, allowing community members to log in using their institutional accounts via eduGAIN, HelmholtzID, NFDI, and soon EOSC-AAI.
Furthermore, the data analysis portal, VISA, enables researchers to explore and analyze the Open Data sets, which are provided in commonly accepted data formats such as HDF5, NeXuS, openPMD, and ORSO. The provision of technical and scientific metadata ensures that the data sets are reusable for further analysis and research.
By establishing this infrastructure, DESY is contributing to the growing movement towards Open Science, as requested by funding agencies and scientific journals. The blueprint for DESY's Open Data solution will be shared with the wider community through HIFIS, enabling other research institutions to benefit from this development.
The talk will give a short overview of the established services and their architecture. Demonstrating the workflow of using a previously minted DOI from the Open dataset to find the dataset description in the metadata catalogue and subsequently the corresponding dataset itself in the analysis portal will be the focus of the talk. An idea of how to leverage this portal package by federating its capabilities in the future will conclude the talk.
The Torch computing platform aims to provide a one-stop scientific data analysis platform for light source discipline users, covering multiple computing service modes, supporting multiple access methods, integrating multiple data analysis methods, and facing multiple application scenarios.
The platform covers multiple computing service modes, including two types of desktop computing and analysis services: virtual machine remote desktop and physical machine remote desktop, web interactive computing and analysis services based on jupyterlab, command line analysis services based on ssh, remote interactive computing and analysis services based on vscode ssh-remote mode, self-organizing cluster computing and analysis services based on Ray/Spark framework, and HTC/HPC batch job computing services commonly used in the field of high-energy physics. Supports multiple service access methods such as windows remote desktop, browser, command line terminal, etc. At the same time, the Torch platform continues to expand and integrate various data analysis applications such as AI modeling, data reconstruction, image segmentation, etc., combined with the token-base user identity management module to achieve consistency of experimental data access rights for users in various application scenarios.
The Torch platform also relies on the big data operation and maintenance analysis platform to aggregate platform resources and application service monitoring indicators, and provides a rich application indicator display board based on the correlation analysis results, so that platform users can understand their current service status and historical resource usage information.
The Torch platform has been applied to the HEPS large scientific device to provide computing power support for HEPS's scientific experimental exploration.
The EU project OSCARS (Open Science Clusters’ Action for Research and Society) brings your research data to new audiences and targets new use-cases in a broad range of scientific clusters including Photon and Neutron Sciences (PaN). As recommended by a new White Paper (submitted to IUCrJ) from the user organisations, ESUO and ENSA, adherence to the FAIR principles (Findable, Accessible, Interoperable, Reusable) facilitates the use of research data in novel ways, with increased citations acknowledging original researchers and facilities that provided that data. Further, increased (meta)data and software findability and accessibility promotes a better use of resources by reducing the duplication of experiments.
We are currently engaged in the Consolidation task by cataloguing existing services and data sources, aiming to highlight common approaches between the clusters and to identify “composable” services. For the PaN Open Science Cluster (PaNOSC) this will create such a portfolio from scratch starting with link collections from the most relevant Research Infrastructures (RIs) of PaNOSC, e.g. LEAPS, LENS, and European Research Infrastructure Consortia (ERICs). The representatives of the different RIs within the PaNOSC Competence Center (also established within OSCARS) contributed significantly by adding new resources and also by completing information on already listed resources (e.g. TRL, licences). Currently, the portfolio contains more than 500 resources.
The portfolio provides the basis to identify services required for a specific task within a specific research scenario. We are currently collecting PaNOSC-typical scenarios that can be simplified by composing and slightly adapting the involved services. One or two scenarios will be realised as demonstrators within the project. The services and data sources could be onboarded to the thematic PaN EOSC node, which is being proposed as candidate node of the EOSC Federation.
Welcome Reception 歡迎餐會
18:20, 18 March 2025
GRAND HILAI TAIPEI (Platinum Grand Ballroom, 3F)
台北漢來大飯店 (3樓鉑金A聽)
No. 168, Jingmao 1st Road, NanGang Dist.,Taipei
台北市南港區經貿一路168號
+886 2 2788-6866
Since 2023, Japan has significantly intensified its research and development activities concerning foundation models, particularly large language models (LLMs). These efforts have encompassed not only independent initiatives within the private sector but also proactive governmental support aimed at fostering collaboration among industry and academia. LLM-jp is an academia-focused research consortium led primarily by the National Institute of Informatics (NII), undertaking comprehensive research and development on large language models and multimodal models with government backing. Notably, the consortium facilitates projects that are challenging for individual laboratories to pursue independently, including the training of models at the 100-billion-parameter scale, which require extensive computational resources. It also serves as a forum for disseminating the outcomes of these initiatives. In this keynote session, we will present an overview of these recent advancements.
https://bonacor.web.cern.ch/ISGC25-AImasterclass/
This masterclass is intended as an opportunity to get an overview of "applied machine/deep learning" techniques, targetted to an audience with a begginer/intermediate level of expertise, and plenty of curiosity! You will increse your knowledge on learning models, from basic concepts to neural networks and most advanced techniques (including large language models), and will put in practice your understanding with few applications. In addition, you will be exposed to examples and discussions that will increase your awareness of major opportunities and risks connected to AI.
The masterclass will proceed through an intuition-based understanding of the theoretical aspects at the basis of machine learning concepts, moving quickly to explained, hands-on activities on selected datasets using a set of tools that every data scientist should have in her/his "virtual backpack".
During the last decade, research in Biomedicine and Life Sciences has dramatically changed thanks to the continuous developments in High Performance Computing, highly Distributed Computing and the raise of AI which was rewarded this year with Nobel prizes in both chemistry and physics..
This track aims at discussing problems, solutions and application examples in the fields of health and life sciences, with a particular focus on non-technical end users.
We invite submissions in the areas of e.g. Drug discovery, Vaccine design, Structural biology, Bioinformatics, Medical imaging, Epidemiological studies and other Public health applications.
Mass spectrometry (MS) is a compound identification technique used frequently in
life- and environmenta sciences. The specific setup of mass spectrometry on electron ionization, coupled with gass chromatrography (GC-EI-MS) is appealing due to its relative simplicity and stability of the setup, while it is challenging computationally -- the acquired data are strictly "flat", with no hierarchical structure which would reflect the chemical structure of the analyzed compounds provided by other experimental techniques (MS$^n$).
The community recognizes the importance of computational prediction of mass spectra of given compounds as well as elucidation of molecular formulae out of measured spectra. Traditional library-search approaches are limited by a quantitative gap. The number of "small" organic molecules is estimated up to $10^{60}$, out of which approx. 1 billion is confirmed to exist (ZINC database). On the other hand, the state-of-the art spectral libraries (NIST, Willey) contain only about 500,000 records.
We introduce SpecTUS, our 354M-parameters transoformer-based ML model, which takes the mass spectrum on input, and produces molecular structural formulae (SMILES strings) on its output. SpecTUS was pre-trained on $2\times 4.7$ million synthetic mass spectra (generated by state-of-the-art models NEIMS and RASSP), and fine tuned on 232,000 spectra of our training subset of the NIST spectral database. We evaluate the model on a~held-out testing subset of NIST (28,000 spectra), as well as spectra from other independent databases (1,640 SWDRUG, 5,015 MONA). When a single candidate is retrieved, the exact solution is returned in 40% cases, and the average precision (Morgan-Tanimoto similarity) is 0.66. With retrieving 10 candidates, the exact solution among them is in 62% cases, and the precision increases to 0.79. We also carried experiments of similarity comparison to legacy spectral database search methods to demonstrate that the model is able to generalize (not memorizing the trainig set only). There is no competing solution for the GC-EI-MS setup but our results superseed even the state-of-the-art MS$^n$ approaches, which work with more structured information on input.
The other extremum on the scale of MS computations is the accurate prediction of mass spectra. These methods simulate the actual process of molecule fragmentation on the electron impact, using ab-initio energies calculated by solving time independend Schroedinger equation. Semi-empirical quantum-chemical methods are not sufficiently accurate in this case, and DFT-like methods easily become computationally unfeasible.
On the other hand, solving the Schroedinger equation is one of expected killer applications of quantum computers. We demonstrate integration of QCxMS software package (ab-initio spectra simulator) with Qiskit-based implementation of the energy calculation by Variational Quantum Eigensolver, a~method to approximate the ground state energy suitable for the current noisy quantum hardware. So far we carried simulated validation experiments only but we are heading to the use of the actual quantum hardware.
Disciplines across the Social Sciences, Arts and Humanities (SSAH) have critically engaged with technological innovations such as grid- and cloud computing, and, most recently, various data analytic technologies. The increasing availability of data, ranging from social media text data to consumer big data has led to an increasing interest in analysis methods such as natural language processing, multilingualism and (semi-)automatic AI-powered translations, social network analysis, usage data analysis, machine learning and text mining, and data sharing. These developments pose challenges as well as opening up a world of opportunities. Members of the SSAH community have been at the forefront of discussions about the impact that novel forms of data, novel computational infrastructures and novel analytical methods have for the pursuit of science endeavours and our understanding of what science is and can be.
The ISGC 2025 SSAH track invites papers and presentations covering applications demonstrating the opportunities of new technologies or critically engaging with their methodological implications in the Social Sciences, Arts and Humanities. Innovative application of analytical tools or international data space for survey and usage data, social media data, and government (open) data are welcomed. We also invite contributions that critically reflect on the following subjects: (1) the impact that ubiquitous and mobile access to information and communication technologies have for society more generally, especially around topics such as smart cities, civic engagement, and digital journalism; (2) philosophical and methodological reflections on the development of the techniques and the approaches by which data scientists use to pursue knowledge.
This talk will discuss the technical challenge of using virtual unwrapping as a technique to restore damaged film negatives from more than a century ago. The technique is applied to film going all the way back to the early photographic explorations of Etienne Marney and Muybridge, whose photographic work determined that a galloping horse has all its hooves off the ground at one time at a certain point in its stride.
In the field of Social Sciences, Arts and Humanities (SSAH), researchers have started to explore the possibilities of machine-learning techniques in several directions. With the current and imminent generations of open-source Large Language Models (LLMs) it seems already attainable for individual researchers to speed up onerous but necessary tasks on personal computers, while keeping control of their datasets at all times; does that mean that every SSAH researcher will have a range of useful AI-driven aides on their desktop in the near future? Automation of certain parts of data collection and data processing would certainly enable researchers to skip some of the more painstaking tasks such as qualitative data coding, leaving more time for the actual analysis and opening up the possibility to work with larger data sets. However, the application of LLMs comes with challenges of its own, especially when working with data - such as historical datasets - that are divergent from the data that the model was trained on.
Applying the Phi-3-mini model, this paper explores the ability of open source, low-threshold LLMs to perform specific tasks such as qualitative data coding. It takes the data of the Common Rules Project of research group Social Enterprises & Institutions for Collective Action (SEICA, Erasmus University Rotterdam) as a case study. The interdisciplinary research group studies all kinds of bottom-up organisations in which the participants manage their resources collectively: institutions for collective action (ICAs). Since 2007, members of SEICA have coded rule sets of historical common lands across Europe manually and consolidated these in a database. Presently, there are plans to extend the database with rule sets from other ICAs, e.g. early modern fishery cooperatives, 19th-century consumer cooperatives, and modern-day citizen collectives. The extended database can be used for several ongoing research programmes within the SEICA research group. One is a fairly new programme that assesses the environmental literacy embedded in historical regulations and evaluating their effectiveness in translating that literacy into actionable governance. Another programme that may benefit from the application of LLM's encompasses the comparative analysis of ICAs' regulations across centuries, providing present-day citizen collectives with evidence-based knowledge on more or less successful institutional setups through the knowledge exchange platform CollectieveKracht ("CollectivePower").
By trying to replicate the qualitative coding efforts done manually for the Common Rules Project in the past decades, and applying the resulting modus operandi to a newly acquired historical rule set, this paper will assess the opportunities, pitfalls, and limitations to be reckoned with when applying LLMs to historical data, discussing whether an easily available desktop application based on open-source LLMs is already within grasp of SSAH researchers.
This study examines the challenges of maintaining emotional support for families in the changing social structure anticipated in 2040. New challenges are posed to traditional family roles in providing emotional connections. Changes in family structure, including an increase in single-person households and diverse family types, may increase the distance between members and complicate the maintenance of family bonds. Research has shown that a lack of family emotional support can harm an individual's emotional regulation, social relationships, and overall mental health, especially for children, adolescents, and older adults.
In this regard, this study addresses the question: “How to maintain the emotional support function of the family in the future social structural changes?”. Through literature analysis, questionnaires, and Kano modeling, this study identifies pain points and proposes solutions.
Utilizing a speculative design, this study proposes a “Future Family Memory Management System” that integrates artificial intelligence technologies to address the challenges of emotional connection and memory sharing. The system focuses on memory storage and family image cohesion, emotional accompaniment and interactive response, memory transmission and ritual visualization, digital roles, and privacy protection. The literature and questionnaires also show the relationship between strengthening family cohesion and family emotional support functions.
This study presents a new perspective on the application of artificial intelligence in the social domain, and it has been demonstrated that some of the functions of this solution can be accepted by the target group and provide meaningful emotional support for families under the changing social structure.
https://bonacor.web.cern.ch/ISGC25-AImasterclass/
This masterclass is intended as an opportunity to get an overview of "applied machine/deep learning" techniques, targetted to an audience with a begginer/intermediate level of expertise, and plenty of curiosity! You will increse your knowledge on learning models, from basic concepts to neural networks and most advanced techniques (including large language models), and will put in practice your understanding with few applications. In addition, you will be exposed to examples and discussions that will increase your awareness of major opportunities and risks connected to AI.
The masterclass will proceed through an intuition-based understanding of the theoretical aspects at the basis of machine learning concepts, moving quickly to explained, hands-on activities on selected datasets using a set of tools that every data scientist should have in her/his "virtual backpack".
Networking and the connected e-Infrastructures are becoming ubiquitous. Ensuring the smooth operation and integrity of the services for research communities in a rapidly changing environment are key challenges. This track focuses on the current state of the art and recent advances in these areas: networking, infrastructure, operations, security and identity management. The scope of this track includes advances in high-performance networking (software defined networks, community private networks, the IPv4 to IPv6 transition, cross-domain provisioning), the connected data and compute infrastructures (storage and compute systems architectures, improving service and site reliability, interoperability between infrastructures, data centre models), monitoring tools and metrics, service management (ITIL and SLAs), and infrastructure/systems operations and management. Also included here are issues related to the integrity, reliability, and security of services and data: developments in security middleware, operational security, security policy, federated identity management, and community management. Submissions related to the general theme of the conference are particularly welcome.
Implementing a Risk Management Process to a distributed infrastructure can be a tedious task. Usually one need to agree on a certain Risk management methodology, get a clear picture on the scope and the governance, and from that assign the relevant roles and responsibilities. Clearly this is only possible with with sufficient support from the governing body.
But even if the above mentioned parameters are defined, a meaningful risk study of a distributed infrastructure can run into various issues.
In this presentation we take a look at the European Commissions (EC) IT Security
Risk Management Methodology (ITSRM$^2$) applied to fictitious distributed infrastructure.
From real world experience we examine possible pitfalls and derive a strategy for a useful Risk Management, that leverages the inherent enforcement capabilities of the methodology.
The WLCG Security Operations Centre Working Group has been working on establishing a common methodology for the use of threat intelligence within the academic research community. A central threat intelligence platform allows working group members to easily exchange indictors of compromise (IoCs) of ongoing security incidents and to use this information to secure their own infrastructures. This capability also enhances the ability of participating organisations to respond effectively to multi-site security incidents across the community.
We discuss the current extent of this trust group, including examples of sites that have deployed MISP instances themselves as well as those that are using the central instance directly. We also consider the type of events that are being shared and methods used to help sites gain confidence in sharing information of their own.
A recent focus of the Working Group has been on people, processes, tools and data needed for operational security, trying to answer some important questions such as how to engage people, how to communicate in a clear manner, how to deploy tools and technologies in tandem for effective defences, as well as how to have processes in place to ensure a consistent approach to handling and preventing security incidents.
Another area of work is around the use of pDNSSOC, a lightweight "80% SOC" solution focused on correlating DNS logs with Threat Intelligence. Having been specifically designed to be low impact to the deploying sites, pDNSSOC at the minimum requires a sensor to be installed in the DNS infrastructure of the site with an external centre performing the correlation and alerting activities.
In addition, we report on the outcomes of a recent WLCG Security Operations Centres Working Group workshop and hackathon which took place in December 2024. We will also present recent developments in the community, including both updates on deployment of security tools as well as progress in the sharing of threat intelligence in different contexts.
Network security operations depends on many kings of network security tools to deal with the monitoring, detecting, and responding for security incidents, threats, and vulnerabilities across the organization's infrastructure. However, despite the evolving power of these tools, they are relatively cumbersome to use and often require interaction through specific interfaces, which increases the difficulty and professional requirements for the security operation personnel to understand and combine their inputs and outputs. Therefore, the integration of a complex set of network security tools to enhance interoperability is a critical concern for network security operations. Recent advancements in large language models (LLMs) have showcased their exceptional capabilities in natural language processing and comprehension, offering a novel approach to interfacing with network security tools. This paper introduces ChatSOC, an autonomous agent for network security operations empowered by a large language model, which is effectively capable of managing five types of operations: identify, policy, protection, detection, response. ChatSOC streamlines different operations by effectively task planning, and task execution when instructed by the security operation personnel. Our work is an innovative approach to achieve the easy to use and understanding for the network security tools. Through comprehensive experimental evaluations, ChatSOC has demonstrated the high accuracy in network security operations task planning and execution in five types of operational scenarios.
Abstract— As software systems become increasingly complex, defects in source code pose significant security risks, such as user data leakage and malicious intrusions, making their detection crucial. Current approaches based on Graph Neural Networks (GNNs) can partially reveal defect information; however, they suffer from heavy graph construction costs and underutilization of heterogeneous edge information. In contrast, sequence model-based methods primarily capture token-level representations, failing to effectively learn statement-level features and their interrelations, which results in poor detection performance. Moreover, most existing methods only support coarse-grained vulnerability detection, lacking the capability for precise fine-grained analysis. To address these issues, this paper develops a novel sequence model that simultaneously learns both token and statement representations, thereby enhancing the detection of vulnerabilities at the statement level. The proposed approach is expected to achieve significant improvements in both accuracy and F1 score, offering a more refined and efficient solution for source code defect detection.
Thank you for your time and consideration.
Keywords—Code Vulnerabilities, Graph Neural Networks, Hierarchical Attention Mechanism
https://bonacor.web.cern.ch/ISGC25-AImasterclass/
This masterclass is intended as an opportunity to get an overview of "applied machine/deep learning" techniques, targetted to an audience with a begginer/intermediate level of expertise, and plenty of curiosity! You will increse your knowledge on learning models, from basic concepts to neural networks and most advanced techniques (including large language models), and will put in practice your understanding with few applications. In addition, you will be exposed to examples and discussions that will increase your awareness of major opportunities and risks connected to AI.
The masterclass will proceed through an intuition-based understanding of the theoretical aspects at the basis of machine learning concepts, moving quickly to explained, hands-on activities on selected datasets using a set of tools that every data scientist should have in her/his "virtual backpack".
Networking and the connected e-Infrastructures are becoming ubiquitous. Ensuring the smooth operation and integrity of the services for research communities in a rapidly changing environment are key challenges. This track focuses on the current state of the art and recent advances in these areas: networking, infrastructure, operations, security and identity management. The scope of this track includes advances in high-performance networking (software defined networks, community private networks, the IPv4 to IPv6 transition, cross-domain provisioning), the connected data and compute infrastructures (storage and compute systems architectures, improving service and site reliability, interoperability between infrastructures, data centre models), monitoring tools and metrics, service management (ITIL and SLAs), and infrastructure/systems operations and management. Also included here are issues related to the integrity, reliability, and security of services and data: developments in security middleware, operational security, security policy, federated identity management, and community management. Submissions related to the general theme of the conference are particularly welcome.
The Worldwide Large Hadron Collider Computing Grid (WLCG) community’s deployment of dual-stack IPv4/IPv6 on its worldwide storage infrastructure is very successful and has been presented by us at earlier ISGC conferences. Dual-stack is not, however, a viable long-term solution; the HEPiX IPv6 Working Group has focused on studying where and why IPv4 is still being used, and how to change such traffic to IPv6. The agreed end goal is to turn IPv4 off and run IPv6-only over the wide area network, to simplify both operations and security management.
This paper will report on our work since the ISGC2024 conference. Firstly, we will report on our campaign to encourage the deployment of IPv6 on CPU services and Worker Nodes. Then, we will present the ongoing work to further identify and correct the use of IPv4 between two dual-stack endpoints. The monitoring and tracking of all data transfers is essential, together with the ability to understand the relative use of IPv6 and IPv4. This paper presents the status of monitoring IPv6 data flows within WLCG. Furthermore, the Research Networking Technical Working Group has identified marking the IPv6 packet header as one approach for understanding complex large data flows. This provides another driver for full transition to the use of IPv6 in WLCG data transfers.
The paper then ends with the working group’s proposed plans and timescale for moving WLCG to “IPv6-only”, while also defining what we mean by the term “IPv6-only”. One component of this plan could be the use of IPv6-only clients configured with a CLAT, customer-side translator, together with a deployment of 464XLAT (IETF RFC6877) using what is often known as “IPv6-mostly” as in IETF RFC8925 to connect to remote IPv4-only services.
Anomaly detection is a critical component of predictive maintenance in data centers, where early identification of abnormal patterns in system behavior can prevent failures and reduce operational costs. This work explores the application of Variational Autoencoders (VAEs) for unsupervised anomaly detection in data collected from data center infrastructure. VAEs are probabilistic generative models that learn latent representations of data, enabling the identification of deviations from normal operational patterns.
Our approach leverages information from data centers to train a VAE on normal operating conditions. Once trained, the VAE identifies anomalies as inputs with high reconstruction errors or low likelihood in the learned latent space. The results suggest that VAE-based anomaly detection has potential to provide a robust solution for monitoring complex data center environments, with advantages over traditional threshold-based and supervised methods. Additionally, the integration of this technique with predictive maintenance workflows is discussed, highlighting its potential to enhance failure prediction accuracy and operational reliability. This study underscores the value of VAEs in advancing the automation and intelligence of predictive maintenance systems.
Abstract: The Worldwide Computing Grid (WLCG) is a global collaboration that provides the computing infrastructure essential for the CERN Large Hadron Collider experiments. Spanning over 40 countries, it delivers approximately 3 exabytes of storage and 1.3 million CPU cores to support scientific research. Recently, WLCG launched a multi-year strategy to prepare for the next phase of the LHC’s scientific program, the HL-LHC at the end of the decade. HL-LHC will bring unprecedented challenges in terms of data volume and complexity. This contribution will present how WLCG intends to prepare for that challenge and how it intends to ensure the long-term sustainability of its infrastructure and services.
Simone Campana is senior staff member at CERN and currently the head of the Worldwide LHC Computing Grid (WLCG). He is a member of the CERN Information and Technology (IT) department head office and he is currently in charge of the engagement process between the department and the user communities. Simone obtained his PhD in Particle Physics at the University of California in 2003 and worked in software and distributed computing projects since then. He was project leader of the Data Management system of the ATLAS experiment at CERN, the responsible for ATLAS distributed computing, the WLCG Service and Operations coordinator and the ATLAS Software and Computing Coordinator.
Abstract: (update forthcoming)
Bio:
Dr. İlkay Altıntaş, a research scientist at the University of California San Diego, is the Chief Data Science Officer of the San Diego Supercomputer Center as well as a Founding Faculty Fellow of the Halıcıoğlu Data Science Institute at the School of Computing, Information and Data Science. With a specialty in scientific workflows and systems, she leads collaborative teams to deliver impactful results and sustainable solutions through making computational data science and AI work more reusable, programmable, scalable, equitable, and reproducible. She is the Founding Director of the Workflows for Data Science Center for the development of methods and workflows for computational data science, and the WIFIRE Lab on AI methods for an all-hazards knowledge cyberinfrastructure. She is the PI of the NSF National Data Platform and other diverse NSF grants to develop scalable computing, AI and data systems at the digital continuum from edge to HPC. Among the awards she has received are the 2015 IEEE TCSC Award for Excellence in Scalable Computing for Early Career Researchers and the 2017 ACM SIGHPC Emerging Woman Leader in Technical Computing Award. Ilkay serves on the elected Board of Governors for the IEEE Computer Society, and was appointed by California Governor Newsom to the Wildfire Technology Research and Development Review Advisory Board. Ilkay received a Ph.D. degree from the University of Amsterdam.
Over a quarter century of distributed computing research and development has brought together a strong community that is willing to trust each other. Scaling well beyond the ‘human circle of trust’ that lies in the order of a few hundred, we have built computing infrastructures for scientific research
across continents and disciplines spanning hundreds of thousands of people – leveraging federated identity from your home organisation, policy-bridges spanning domains and regions, and a unique multi-lateral approach to trust. This has enabled global research as diverse as high-energy physics in the LHC Computing Grid or structural biology, and giving the social sciences a ‘trusted research environment’ for sensitive data.
Today, the trust and identity system witnesses a time of unprecedented change, with new technologies and protocols. But recent authentication protocols are oft designed around a single-source world view, matching the small set of large private providers that dominates the public user identity space today, their
branded ‘login’ buttons ubiquitous on the web. Meanwhile, our multi-lateral federated trust is as relevant and vibrant today as ever: while technology may change, the foundations of collaboration and the structure of our research communities remains.
So how will we shape the next decade of research infrastructure collaboration?
As we evolve our blueprint for federated access to services and the guidelines that make it work together, we now start to connect millions rather than the first hundred thousand users. Today, we see the scale-out from research computing collaboration to enabling access to diverse services and data sets,
and we start applying our research federation models to education. Are we up to new architectures, global policy harmonisation, and sharing of good practice and operational expertise, making our very ICT itself become the research instrument we need?
Bio:
David Groep is programme leader of the Physics Data Processing (PDP)programme on advanced computing at Nikhef and extraordinary professor of e-Infrastructure at Maastricht University. His research interests include trust, identity, and security for multi-domain ICT infrastructures, authentication and authorization collaboration models, and scaling behaviour of resource-balanced operational e-Infrastructures, including networks, storage, and computing systems - and how data intensive research can exploit them in an effective way. David Groep is Chair of EUGridPMA and was the founding chair of the Interoperable Global Trust Federation IGTF in October 2005. He is a member of the Dutch National Permanent Committee on Large-Scale Research Infrastructures (PC-GWI) the Dutch Research Council's Committee on Digitalisation of Research (CDI), and Dutch National e-Infrastructure executive. He is leading the policy and best practice harmonization activities in the AARC Community and AEGIS, is a member of the steering committee for the WISE Information Security for E-infrastructures collaboration, and a member of the European Open Science Cloud AAI working group.
Europe hosts a vibrant cluster of Environmental Research Infrastructures (ENVRIs), serving a diverse community committed to tackling today’s most pressing environmental challenges. These infrastructures play a crucial role in advancing Europe’s strategic actions to achieve climate neutrality. The project ENVRI-Hub NEXT builds on years of collaboration to provide interoperable datasets, services, knowledge base supported by AI driven search engine and training across key Earth system domains—atmosphere, marine, ecosystems, and solid Earth—while leveraging advanced distributed e-infrastructures such as EGI.
This keynote will explore the technical foundations of the ENVRI-Hub gateway and its role in enabling environmental research infrastructures (RIs) to jointly investigate the full chain—from essential climate variables to global climate indicators and Earth system feedback mechanisms. The ENVRI-Hub is evolving towards establishing an Environmental Node, aligning with Europe’s vision of building the European Open Science Cloud (EOSC) Federation as a network of Nodes to support cross-disciplinary scientific research and innovation.
Co-Authors:
Ulrich Bundke 1, Angeliki Adamaki 8, Daniele Bailo 2, Magdalena Brus 3, Claudio Dema 4, Dario De Nart 5, Frederico Drago 3, *Marta Gutierrez 3, Anca Hienola 6, Andreas Petzold 1, Alex Vermeulen 9 and Zhiming Zhao 7
Bio:
Marta Gutierrez works for the EGI Foundation as a Community Support Specialist. She collaborates with scientific communities, facilitating their integration into the EGI Federation, and ensuring access to compute, cloud, and data analytic services offered by providers of the federation. Marta is actively engaged in EU-funded projects focused on promoting FAIR data principles and advancing analytics via cloud and HPC infrastructures. With prior experience at international organizations like ECMWF and research facilities such as STFC, Marta has a strong background in working with climate and meteorological communities. Her expertise lies in building federated data infrastructures and providing data and services to a global audience. Marta is currently focused on supporting environmental Research Infrastructures and pilot activities for the European Open Science Cloud Federation. She holds a master’s degree in Theoretical Physics from Universidad Autónoma de Madrid.
During the last decade, Artificial Intelligence (AI) and statistical learning techniques have started to become pervasive in scientific applications, exploring the adoption of novel algorithms, modifying the design principles of application workflows, and impacting the way in which grid and cloud computing services are used by a diverse set of scientific communities. This track aims at discussing problems, solutions and application examples related to this area of research, ranging from R&D activities to production-ready solutions. Topics of interests in this track include: AI-enabled scientific workflows; novel approaches in scientific applications adopting machine learning (ML) and deep learning (DL) techniques; cloud-integrated statistical learning as-a-service solutions; anomaly detection techniques; predictive and prescriptive maintenance; experience with MLOps practices; AI-enabled adaptive simulations; experience on ML/DL models training and inference on different hardware resources for scientific applications.
Qiskit-symb is a Python package designed to enable the symbolic evaluation of quantum states and quantum operators in Parameterized Quantum Circuits (PQCs) defined using Qiskit. This open-source project has been integrated into the official Qiskit Ecosystem platform, making it more accessible to the rapidly growing community of Qiskit users.
Given a PQC with free parameters, qiskit-symb can generate the symbolic representation of the corresponding state vector or unitary operator directly from the Qiskit circuit instance. It leverages the SymPy library for efficient symbolic computations in Python, enabling seamless manipulation of complex mathematical expressions.
Additionally, qiskit-symb provides tools to assist in the development and debugging of Quantum Computing algorithms and Quantum Machine Learning (QML) models. A notable feature is the conversion of PQC objects into bare Python functions, where function arguments correspond to the unbound parameters of the original circuit. This functionality is particularly beneficial for QML applications, allowing users to intuitively simulate quantum circuits with various parameter sets, enhancing the efficiency in tasks requiring multiple executions of the simulation.
This study aims to improve the performance of event classification in collider physics by introducing a foundation model in deep learning. Event classification is a typical problem in collider physics, where the goal is to distinguish the signal events from the background events as much as possible to search for new phenomena in nature. Although deep learning can provide significant discrimination power in this event classification by exploiting its large parameter space, a large amount of data is necessary to maximize its performance. Because there are many data analyses that target various signal events, such as Higgs boson measurements and new particle searches, generating a large amount of training data using Monte Carlo simulations is computationally expensive. To address this problem, this presentation proposes a foundation model that can efficiently train the target even classification using a small amount of training data.
A foundation model is a pre-trained model, which is actively discussed in other fields such as natural language processing. A foundation model is usually trained on a large amount of unlabelled data, and then transferred and fine-tuned to downstream tasks, which involve common features, to efficiently train the model even with a small amount of data. In applying this foundation model concept to our field, the following novelties are introduced in this study. First, the real particle collision data collected by the CMS experiments are used to train the foundation model. Second, data augmentation techniques based on physics knowledge are applied. The advantage of using real data is that there is no need to generate a large amount of training data using simulations, which saves computing resources. However, due to the limited availability of true information in the real data, we have developed self-supervised learning techniques. Data augmentation also plays an important role because the amount of data needs to be increased as much as possible to build a robust foundation model. Based on our physics knowledge, the Lorentz transformations are applied to the events to increase the data patterns. Details of the self-supervised learning methods and data augmentations will be discussed in the presentation. The performance improvements of event classification by introducing the foundation model will also be shown.
The AI4EOSC (Artificial Intelligence for the European Open Science Cloud) project aims at contributing to the landscape of Artificial Intelligence (AI) research with a comprehensive and user-friendly suite of tools and services within the framework of the European Open Science Cloud (EOSC). This innovative platform is specifically designed to empower researchers by enabling the development, deployment, and management of advanced AI solutions. Key features of the platform include support for federated learning, which facilitates collaborative model training across distributed datasets while ensuring data privacy; zero-touch model deployment, which streamlines the transition from development to production environments; MLOps tools, which optimize the lifecycle management of AI models; model serving on serverless computing platforms, and the visual design of composite AI pipelines, which integrate multiple AI techniques for enhanced analytical capabilities.
Our presentation will provide a comprehensive exploration of the AI4EOSC platform's high-level architecture, highlighting its capacity to address the diverse and evolving needs of researchers across a wide range of scientific disciplines. By offering a robust and flexible infrastructure, the platform not only supports domain-specific customization but also fosters interdisciplinary collaboration, reflecting the ethos of the European Open Science Cloud.
We will also discuss the foundational frameworks and technologies that constitute the backbone of the platform, emphasizing their scalability, interoperability, and adherence to open science principles. These technologies enable seamless integration with existing research workflows and ensure that the platform remains accessible and sustainable for the scientific community.
To illustrate the practical utility and transformative potential of the AI4EOSC platform, we will present real-world case studies from ongoing projects, including the noteworthy contributions of iMagine and AI4Life projects. These practical use cases exemplify the platform's effectiveness in addressing complex challenges, such as enabling cross-domain data analysis, fostering reproducible research and accelerating the pace of scientific discovery.
By showcasing these examples, we aim to highlight the capabilities and practical utility of the AI4EOSC project, as well as its significant impact on the scientific community.
The educational needs in the future classroom need to focus on a combination of student engagement in learning, inquiry-based approaches, curiosity, imagination, and design thinking. Smart classrooms leverage the advancements in Internet of Things to create intelligent, interconnected learning environments that enhance the quality of life and educational outcomes of students. With advancements in image recognition Face detection and recognition are pivotal in security and tracking, ensuring swift and effective identification and management of personal data. Currently, researchers are collecting data on student attention retention in learning environments to help facilitate teachers and students in a holistic learning environment. Furthermore, harnessing accurate visual tracking data to accurately monitor multiple students in real-time, with continued advancements in image recognition and deep learning frameworks. There is a challenge in choosing the most effective face-detection algorithms for specific use cases. We propose a system that first detects students in a classroom using object detection of a person with a custom-trained model. Once a student is identified, the video feed is cropped to the bounding box of the detected individual, and face detection is applied to this region. With the face detection step gaze estimation is performed to infer attention retention during lectures.
Face detection is a critical technology in understanding and measuring student engagement through facial recognition and gaze analysis. This study focuses on evaluating the performance of three versions of the YOLO (You Only Look Once) face detection algorithm—YOLOv5, YOLOv8, and YOLOv11—in a real-world classroom setting. The research investigates their efficacy in face recognition among Japanese middle school students during lectures. The setup involved two cameras: a 3840 × 2160-pixel resolution at 30 frames per second camera positioned (from the student's point of view) at the front-top-left of the classroom and another mounted at the top-right-rear, facing the lecturer. These cameras provided complementary perspectives but introduced challenges, particularly in detecting and analyzing the faces of the farthest students due to reduced image sharpness and resolution constraints.
Given the constraints of the classroom environment, including limitations on adding additional cameras to avoid distractions for both students and the lecturer, the study emphasizes the need for robust face detection algorithms capable of delivering reliable results under suboptimal imaging conditions. The algorithms were evaluated across several dimensions: detection accuracy, speed, robustness to environmental variations (e.g., lighting and occlusion), and their ability to maintain performance on distant and partially visible faces.
The model to recognize the student face is trained on a 64 bit Windows operating system with 32 GB RAM and Intel(R) Core(TM) i9-9900K CPU@3.60GHz with a Nvidia GeForce RTX 3080 10 GB graphics processing unit (GPU). All YOLO variant will have the same dataset and key points used in the training.
The implication of this study extends to the broader application of face detection in education settings and to analyze gaze and student attention to aid facilitators and teachers in adjusting content delivery methods in real-time. By identifying the strengths and limitations of these algorithms, educators and scientists can make informed decisions about implementing AI tools to monitor student engagement. Future work will further add body position detection to provide a more comprehensive understanding of classroom attention dynamics, as well as refine the system for improved performance under similar constraints.
Environmental Computing Workshop -
Making the most of today’s systems, targeting tomorrow’s AI and hybrid quantum systems
Our traditional " Environmental Computing Workshop" this year encourages speakers to imagine future directions in IT usage, in particular considering AI and Hybrid Quantum Computing as focus topics of ISGC 2025.
We welcome presenters to show their current developments in Environmental Computing (including Green Computing) and Disaster Mitigation. Aspects of constant and increasing importance are, with societal motivation, support for efficient disaster management and for the sustainable development goals. On the technical side, ML-based surrogate models promise to save considerable computing time and enable digital-twin type applications. Also, porting models to GPU- or quantum-accelerated architectures shall increase the efficiency of simulations. All this shall go along with energy efficiency efforts.
The workshop aims at a vivid exchange of knowledge on Disaster Management and Environmental Computing, giving participants a common understanding of the state of the art. The talk sessions may be complemented by a discussion, continuing in the line of the brainstorming sessions at last ISGC with the target of sparking project collaborations.
Talks should be about 20-40min in length (depending on the speaker's preference and slot availability). All attendees giving a talk can submit a full paper to the Proceedings of the International Symposium on Grids & Clouds 2025 (ISGC 2025). The ISGC 2025 proceedings will be published in the open-access Proceedings of Science (PoS) by SISSA, the International School for Advanced Studies of Trieste.
Coordinators:
Mr. Eric Yen, Academia Sinica, TW
Dr. Stephan Hachinger, LRZ, DE
Mrs. Viktoria Pauw, LRZ, DE
Maximilian Höb, LRZ & LMU, DE
Prof. Dr. Dieter Kranzlmüller, LRZ & LUM, DE
To enhance the discovery potential of the Large Hadron Collider (LHC) at CERN in Geneva and improve the precision of Standard Model measurements, the High Luminosity LHC (HL-LHC) Project was initiated in 2010 to extend its operation by another decade and increase its luminosity by approximately tenfold beyond the design value.
In this context, the scope of applications for Machine Learning, particularly Artificial Neural Network algorithms, has experienced an exponential expansion due to their considerable potential for elevating the efficiency and efficacy of data processing, especially for innovative trigger-level event selection in Beyond Standard Model (BSM) research. This study explores Autoencoders (AEs), unbiased algorithms that select events based on abnormality without theoretical assumptions. However, the stringent latency and energy constraints of a HEP Trigger system require tailored software development and deployment strategies. These strategies aim to optimize the utilization of on-site hardware, with a specific focus on Field-Programmable Gate Arrays (FPGAs).
This is why a technique called Knowledge Distillation (KD) is studied in this work. It consists in using a large and well trained “teacher”, like the aforementioned AE, to train a much smaller student model which can be easily implemented on an FPGA. The optimization of this distillation process involves exploring different aspects, such as the architecture of the student and the quantization of weights and biases, with a strategic approach that includes hyperparameter searches to find the best compromise between accuracy, latency and hardware footprint.
The strategy followed to distill the teacher model will be presented, together with consideration on the difference in performance when applying the quantization before or after the best student model has been found.
Models of physical systems simulated on HPC clusters often produce large amounts of valuable data that need to be efficiently managed both in the scope of research projects and continuously afterwards to derive the most benefit for scientific advancement. Database management systems (DBMS) and metadata management can facilitate transparency and reproducibility, but are rarely used in scientific supercomputing. Reasons include organizational overhead and low performance when migrating data to DBMS which is originally written to files on the parallel file system attached to cluster nodes.
Using first results of the Horizon Europe Project EXA4MIND, the work presented here explores different approaches and system set-ups for managing Plasma Physics data, considering interoperation of HPC systems/filesystems, databases and object stores. We evaluate post-processing workflows for physics simulations run on supercomputing systems at LRZ (Garching b.M./DE) in collaboration with LMU Munichs Chair of plasma and computational physics. The use cases we focus upon in this contribution are simulated many-body systems of elementary particles in plasma physics, produced from outputs from the Plasma Simulation Code (Ruhl et al., Ludwig Maximilian University of Munich/DE). When conducting parameter studies with HPC applications, much work goes into post-processing, visualizing and discussing the simulated data, often several times in an iterative process. TBs of resulting data then have to be processed (e.g. aggregation of domain patches, extraction of statistical information), and evaluated, on various levels – from ensembles of simulations down to single trajectories or timesteps.
Our focus includes testing the performance of typical data queries and processing steps with different execution methods. We strive to facilitate faster and more flexible access to both the raw and processed data by exploring the properties of different storage and database systems. These range from data access schemes facilitated by common python environments over row-based DBMS such as PostgreSQL to column DBMS like MariaDB columnstore, where live queries on large array-based datasets can be executed in memory, or functionalities like Data Vaults can provide access to external repositories. We conclude our contribution stating that modern data storage concepts involving DBMS are also an optimal basis for data sharing and systematic metadata management. Thus, we aim at facilitating a research data management according to the FAIR (findable, accessible, interoperable, reusable) principles from the start.
This research received the support of the EXA4MIND project, funded by a European Union´s Horizon Europe Research and Innovation Programme, under Grant Agreement N° 101092944. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.
Quantum computing has gained significant attention in recent years, with numerous algorithms and applications under active development. Limited by the current quantum technology, quantum noise and readout error have become critical issues. Various methods have been proposed to address readout error through error mitigation techniques, typically involving post-processing of measurement data. However, most of these methods increase the quantum hardware overhead, leading to higher computational costs. In this work, we present a machine-learning-based approach that minimizes hardware overhead while improving computational efficiency. We employed a convolutional neural network (CNN) autoencoder, commonly used for image denoising, as our baseline model. The datasets were derived from 4-qubit random circuits with depths ranging from 1 to 18, generated using Qiskit Aer and Lima fake backends for target and noisy measurement data, respectively. The model was trained using mean squared error (MSE) as the loss function and Adam optimizer over 500 epochs, achieving an average noise reduction by 95% across the test set, with no signs of overfitting. To validate the model’s effectiveness across diverse quantum states, we conducted extensive tests on both typical quantum circuits and algorithms, including Grover’s search algorithm, Quantum Fourier Transform, Haar random circuits and symmetry protected topological states. The results demonstrated consistent and robust denoising in noisy measurement data, indicating that the autoencoder model is well-suited for efficient quantum error mitigation for current noisy quantum computers. For future studies, attention mechanisms offer a promising solution by accommodating variable-length data and enhancing spatial feature extraction. This work contributes to the advancement of quantum error mitigation techniques using machine learning.
Quantum computing is emerging as a groundbreaking approach for solving complex optimization problems, offering new opportunities in fields requiring both computational efficiency and innovative solution discovery. Quantum annealing, a specialized quantum computing paradigm, leverages quantum adiabatic theorem to efficiently find the global minimum of a problem's cost function, making it a promising tool for tackling combinatorial optimization tasks. In this presentation, we demonstrate the application of the D-Wave quantum annealer to a biomechanical optimization problem: estimating muscular activation patterns.
The problem involves a biomechanical system with one degree of freedom (DOFs) and three muscles actuators(i.e. a simplified elbow joint).Because of muscle redundancy (more
muscles than DOFs),there are virtually infinite sets of activations that enable any given motion. To select the optimal activation set, a cost function—minimizing the sum of square activations—is typically employed, reflecting the body's natural tendency to reduce metabolic cost. During the annealing process, the system occasionally settles into energy levels different from the ground state, representing suboptimal solutions. These alternative solutions are particularly valuable as they often correspond to muscular activation patterns observed in individuals with musculoskeletal disorders or pathologies. Such patterns are challenging to compute using classical methods but are naturally revealed through quantum annealing, providing critical insights into pathological conditions.
Using quantum annealing, we leverage its capacity to explore vast solution spaces simultaneously, enabling the identification of both optimal and suboptimal solutions in a single computation. Preliminary calculations performed on a quantum annealing simulator show promising results, demonstrating the feasibility of this approach and its potential for uncovering insights that are difficult to achieve with classical techniques. Moving forward, we plan to perform the annealing computations directly on the D-Wave quantum computer. This work highlights the advantages of quantum computing in biomechanical research, paving the way for more advanced applications in understanding muscular behaviour and related pathologies.
This work is conducted as part of the ICSC project in collaboration with the University of Bologna which provided the necessary biomechanical data.
Tracking imaging systems have progressed from manual examination to utilizing contemporary photodetectors, like SiPM arrays and CMOS cameras, to convert scintillation light into digital data and obtain physical information. This study presents RIPTIDE, a novel recoil-proton track imaging system designed for fast neutron detection, with an emphasis on the use of deep-learning methods. RIPTIDE utilizes neutron-proton elastic scattering within a plastic scintillator to produce scintillation light, creating images that document scattering occurrences. A deep neural network is employed to rectify optical distortions in proton track images, enhancing their form and alignment. This adjustment improves the precision of track length measurements, which directly affects proton energy estimation and neutron kinematics reconstruction.
During the last decade, Artificial Intelligence (AI) and statistical learning techniques have started to become pervasive in scientific applications, exploring the adoption of novel algorithms, modifying the design principles of application workflows, and impacting the way in which grid and cloud computing services are used by a diverse set of scientific communities. This track aims at discussing problems, solutions and application examples related to this area of research, ranging from R&D activities to production-ready solutions. Topics of interests in this track include: AI-enabled scientific workflows; novel approaches in scientific applications adopting machine learning (ML) and deep learning (DL) techniques; cloud-integrated statistical learning as-a-service solutions; anomaly detection techniques; predictive and prescriptive maintenance; experience with MLOps practices; AI-enabled adaptive simulations; experience on ML/DL models training and inference on different hardware resources for scientific applications.
Users may have difficulties finding answers in the documentation for products, when many pages of documentation are available on multiple web pages or in email forums. We have developed and tested an AI based tool, which can help users to find answers to their questions. Our product called Docu-bot uses Retrieval Augmentation Generation solution to generate answers to various questions. It uses github or open gitlab repositories with documentation as a source of information. Zip files with documentation in a plain text or markdown format can also be used for input. Updated version of Docu-bot can also process pdf files.
Embedder model and Large Language Model generate answers. All models conforming to OpenAI python API can be used. For this reason we have developed and deployed a custom LLM and embedder server with Llama-3.2-3B model and all-mpnet-base-v1 model on single Nvidia T4 GPU. This allows the user to actively choose between our model and OpenAI model and allows us to support multiple Docu-bot instances without the need for more compute power. We have experimented and deployed setups with document reranking and chat interface to gauge the correctness and relevancy of responses with the provided filtered context and chat history.
The convergence of Natural Language Processing (NLP) and cheminformatics represents a groundbreaking approach to drug development, particularly in the critical domain of toxicity prediction. Early identification of toxic compounds is paramount in pharmaceutical research, as late-stage toxicity discoveries lead to substantial financial setbacks and delays market approval. While traditional methods often require extensive laboratory testing, the representation of chemical compounds as SMILES (Simplified Molecular Input Line Entry System) strings offers an innovative pathway for applying NLP techniques to chemical structure analysis, potentially revolutionizing the efficiency and accuracy of toxicity assessments.
Leveraging the computational resources provided by Academia Sinica Grid Computing (ASGC), this study comprehensively evaluates the efficacy of pre-trained language models for molecular toxicity prediction. We assess both transformer-based models (RoBERTa and ChemBERTa) and large language models (GPT-3.5 Turbo, GPT-4 Turbo, and GPT-4o) across three established benchmark datasets: ClinTox, Tox21, and ToxCast. Through systematic optimization, we demonstrate that while RoBERTa achieved exceptional performance on ClinTox (0.9898 F1-score), it failed to generalize effectively to other datasets. Conversely, ChemBERTa exhibited robust cross-dataset performance, maintaining remarkable F1-scores results across all three benchmarks (ClinTox: 0.9034, Tox21: 0.8232, ToxCast: 0.8162). Notably, LLMs demonstrated remarkable adaptability in few-shot learning scenarios, with GPT-3.5 Turbo achieving near-perfect performance on ToxCast (0.9923 F1-score).
Our findings reveal the transformative potential of integrating NLP techniques into toxicity prediction workflows. The superior performance of LLMs, particularly in scenarios with limited training data, suggests a paradigm shift in computational toxicology. This research establishes a foundation for more efficient and accurate early-stage drug discovery processes, potentially accelerating the development of safer therapeutic compounds while reducing development costs.
A living systematic review is an approach that provides up-to-date evidence for a given research topic. It is extensively used in health domains due to its potential to enhance the efficiency of conventional systematic reviews. Furthermore, this approach is particularly suitable when the literature requires frequent updates, and the research needs continuous monitoring. Artificial Intelligence technologies have been developed to graphically represent literature knowledge to gain a more comprehensive understanding of the major research trends.
In this study, we propose different graphical representations of literature knowledge for a living systematic review that we have implemented in the context of the ionization cross-sections by electrons case study. Starting with a well-known literature that has been regularly updated over time, we have applied
methods such as e.g. knowledge graphs to represent learning and detect structural temporal relations between entities. The findings have been assessed by experts in the field to identify the most suitable solution for our research objectives.
The rapid growth of the data available to scientists and scholars – in terms of Velocity and Variety as well as sheer Volume – is transforming research across disciplines. Increasingly these data sets are generated not just through experiments, but as a byproduct of our day-to-day digital lives. This track explores the consequences of this growth, and encourages submissions relating to two aspects in particular - firstly, the conceptual models and analytical techniques required to process data at scale; secondly, approaches and tools for managing and creating these digital assets throughout their lifecycle.
Additionally, a significant additional dimension is the automated generation and provisioning of metadata, either from simulated data such as Digital Twins or from experiments that produce vast amounts of data beyond manual annotation capacity. The automation of metadata creation and their availability in searchable catalogues is crucial for aligning with the FAIR Data Principles, ensuring data is findable and reusable. This process is also pivotal in making data usable for machine-driven applications, notably in AI training scenarios."
Data Management Planning within the EOSC CZ - Czech National Data Infrastructure for Research Data
Author: Jiří Marek, Open Science manager at Masaryk University, Head of EOSC CZ Secretariat, Czech Republic
The rapid expansion of data availability is reshaping research methodologies across various disciplines. This surge, characterized by its Velocity, Variety, and Volume, is driven not only by experimental data but also by the digital footprints of our daily activities. This presentation will focus on the implications of this data explosion, with a particular emphasis on the European Open Science Cloud (EOSC) and its Czech National implementation (EOSC CZ) via the Czech National Data Infrastructure.
EOSC CZ is a pivotal initiative aimed at integrating Czech research data management practices with the broader European Open Science Cloud framework. This initiative supports the principles of Open Science by providing a robust infrastructure for managing research data, ensuring that data is accessible, interoperable, and reusable. The Czech National Data Infrastructure complements this by offering a comprehensive platform for data storage, management, and sharing, tailored to meet the needs of the Czech research community[1].
Effective data management strategies are crucial for ensuring that data remains accessible, usable, and secure throughout its lifecycle. The Czech National Data Infrastructure plays a critical role in this context by providing a unified platform for data management that aligns with the FAIR Data Principles—ensuring that data is Findable, Accessible, Interoperable, and Reusable[2].
Discussions will cover best practices using machine actionable Data Management plan tool Data Stewardship Wizard. This presentation will focus on the automated generation and provisioning of metadata for Data Management Planning and Repositories of National Repository platform as part of Czech National Data Infrastructure. Automation in metadata creation is not only a solution to this challenge but also a necessity for adhering to the FAIR Data Principles. We will explore automated metadata generation techniques, the integration of metadata into searchable catalogues and Data Management Plan tool, and the role of metadata in enhancing data discoverability and usability by enhancing semantic interoperability of all the infrastructure. These advancements are particularly vital for machine-driven applications, such as AI training, where the quality and accessibility of data directly impact the outcomes[3].
In summary, this presentation at ISGC 2025 aims to address the multifaceted challenges posed by the rapid growth of data. By fostering discussions on advanced data processing models, effective data management practices, and automated metadata generation, we seek to equip researchers with the tools and knowledge necessary to navigate the complexities of big research data. Participants will gain insights into cutting-edge techniques and strategies that can transform data into valuable assets, driving innovation and discovery in the digital age. The integration of EOSC CZ and the Czech National Data Infrastructure into this framework underscores the importance of collaborative efforts in advancing open science and data stewardship in Europe[4].
References
[1] www.eosc.cz/en
[2] www.eosc.eu
[3] www.eosc.cz/en
[4] /msmt.gov.cz/vyzkum-a-vyvoj-2/eosc-obecne
With the Run2025 for sPhenix, it comes the higher data throughput and data volume
requirements.
The sustained data throughput required for sPhenix2025 is 20GB/sec. Once started in
mid-April, this sustained data steam will be steadily constant with no breaks through December.
The projected data volume is 200PB.
In order to meet these data throughput and volume requirement, we must rebuild our
data storage archive systems…
1. Data movers.
a. Replace data movers with new PCIe-4 architecture and FC HBA’s. Increase the
number of data movers to nine servers from four.
2. Ethernet connecTons.
a. Replace NIC adapters with dual 100GbE PCIe-4.
3. Disk arrays.
a. With the NAND opTons too expensive, we decide to stay with HDD spindles
b. Increase disk arrays to nine units from three.
c. Increase each disk array mulTpath connecTons to four channels from two.
4. OS I/O related parameter tunings
5. Upgrade HPSS soYware to resolve occasional hanging processes on movers
6. Purchase addiTonal two 9-frame IBM tape libraries
a. To lower the costs, we decided to go for LTO9 technology, instead of enterprise
tape technologies.
b. Two 9-frame tape libraries are needed to meet the data volume requirements
c. To make the most usage of all tape drives, we deice to evenly strip injected data
across four tape libraries with 100 LTO9 drives (25 drive in each library).
7. Benchmark tesTng
a. We ran benchmark tesTng on exisTng 3 disk arrays with 4 movers and 56 tape
drives
b. With ongoing injecTon of 8.5GB/sec, the concurrent migraTon to tape is
11GB/sec.
c. With ongoing injecTon of 12GB/sec, the concurrent migraTon to tape is
9.9GB/sec.
d. The CPU usage on each mover is at 80%.
e. This benchmarking numbers give us confidence that our new configuraTons will
comfortably meet the 20GB/sec sustained throughput.
Conclusion:
With the new designs and fine-tunings, we have idenTfied a soluTon to the sPhenix2025
data requiremnets.
The 14 beamlines for the phase I of High Energy Photon Source(HEPS) will produces more than 300PB/year raw data. Efficiently storing, analyzing, and sharing this huge amount of data presents a significant challenge for HEPS.
HEPS Computing and Communication System(HEPSCC), also called HEPS Computing Center, is an essential work group responsible for the IT R&D and services for the facility, including IT infrastructure, network, computing, analysis software, data preservation and management, public services etc. Aimed at addressing the significant challenge of large data volume, HEPSCC has designed and established a network and computing system, making great progress over the past two years.
For the IT infrastructure, A deliciated and high-standard machine room, with about 900㎡ floor space for more than 120 high-density racks in total has been ready for production since this August. The design of the network utilizes RoCE technology and a spine-leaf architecture. The data center network’s bandwidth can support speeds of up to 100Gb/s, fully meeting the demands of high- speed data exchange. To meet the requirements of data analysis scenarios for HEPS, a computing architecture is designed and deployed in three types, including Openstack, Kubernetes, and Slurm. Openstack integrates the virtual cloud desktop protocol to provide users with remote desktop access services, and supports users to use browsers to access windows/Linux desktop, running commercial visualization data analysis software. Kubernetes manages container clusters, and starts multiple methodological container images according to user analysis requirements. Slurm is used to support HPC computing services and meet users' offline data analysis needs.
Additionally, HEPSCC designed and developed two softwares for the data management and analysis, DOMAS and Daisy. DOMAS (Data Organization, Management and Accessing Software stack), which is aimed for automating the organization, transfer, storage, distribution and sharing of the scientific data for HEPS experiments, provides the features and functions for metadata catalogue, metadata ingestor, data transfer, data web portal. Daisy (Data Analysis Integrated Software System) is a data analysis software framework with a highly modular C++/Python architecture. Some online data analysis algorithms developed by HEPS beamlines have been integrated into Daisy successfully most of which were validated at the beamlines of BSRF (Beijing Synchrotron Radiation Facility) for the real-time data processing. Other data analysis algorithms/software will be continuously integrated to the framework in the future.
This year, the data and computing system has been deployed at HEPS Campus (Huairou District, Beijing). The integration and the verification of the whole system at HEPS were finished and achieved great success.
The High Energy Photon Source (HEPS) is a new fourth-generation high-energy synchrotron radiation facility, scheduled to become fully operational by the end of 2025. Compared to previous generations, it features significant advancements in brightness and detector performance. In its phase I, HEPS plans to construct 14 beamlines, with an estimated annual experimental data volume exceeding 300 PB. The total data scale is expected to surpass the EB level in a short period. HEPS supports a wide range of experimental techniques, including imaging, diffraction, scattering, and spectroscopy, each with significant differences in data throughput and scale. Meanwhile, the emergence of increasingly complex experimental methods poses unprecedented challenges for data processing.
To address the future EB-scale experimental data processing demands of HEPS, we have developed DAISY (Data Analysis Integrated Software System), a general scientific data processing software framework. DAISY is designed to advance integration, standardization, and high-performance in HEPS experimental data processing. It provides key capabilities, including high-throughput data I/O, multimodal data parsing, and multi-source data access. It supports elastic and distributed heterogeneous computing to accommodate different scales, throughput levels, and low-latency data processing requirements. It also offers a general workflow orchestration system to flexibly adapt to various experimental data processing modes. Additionally, it provides user software integration interfaces and a development environment to facilitate the standardization and integration of methodological algorithms and software across multiple disciplines.
Based on the DAISY framework, we have developed multiple domain-specific scientific applications, covering imaging, diffraction, scattering and spectroscopy, while continuously expanding to more scientific domains. Furthermore, we have optimized key software components and algorithms to significantly improve data processing efficiency. At present, several DAISY-based scientific applications have been successfully deployed on HEPS beamlines, supporting online data processing for users. The remaining applications are scheduled for fully deployment within the year, further strengthening HEPS’s data analysis capabilities
CLUEstering is a versatile clustering library based on CLUE, a density-based, weighted clustering algorithm optimized for high-performance computing. The library offers a user-friendly Python interface and a C++ backend to maximize performance. CLUE’s parallel design is tailored to exploit modern hardware accelerators, enabling it to process large-scale datasets with exceptional scalability and speed.
To ensure performance portability across diverse architectures, the backend is implemented using the Alpaka library, a C++ performance portability library that enables near-native performance on a wide range of accelerators with minimal code duplication.
CLUEstering's unique combination of density-based and weighted clustering makes it a standout among popular clustering algorithms, many of which lack built-in support for such combination. This hybrid approach unlocks new possibilities for applications in fields such as high-energy physics, image processing, and complex system analysis.
Environmental Computing Workshop -
Making the most of today’s systems, targeting tomorrow’s AI and hybrid quantum systems
Our traditional " Environmental Computing Workshop" this year encourages speakers to imagine future directions in IT usage, in particular considering AI and Hybrid Quantum Computing as focus topics of ISGC 2025.
We welcome presenters to show their current developments in Environmental Computing (including Green Computing) and Disaster Mitigation. Aspects of constant and increasing importance are, with societal motivation, support for efficient disaster management and for the sustainable development goals. On the technical side, ML-based surrogate models promise to save considerable computing time and enable digital-twin type applications. Also, porting models to GPU- or quantum-accelerated architectures shall increase the efficiency of simulations. All this shall go along with energy efficiency efforts.
The workshop aims at a vivid exchange of knowledge on Disaster Management and Environmental Computing, giving participants a common understanding of the state of the art. The talk sessions may be complemented by a discussion, continuing in the line of the brainstorming sessions at last ISGC with the target of sparking project collaborations.
Talks should be about 20-40min in length (depending on the speaker's preference and slot availability). All attendees giving a talk can submit a full paper to the Proceedings of the International Symposium on Grids & Clouds 2025 (ISGC 2025). The ISGC 2025 proceedings will be published in the open-access Proceedings of Science (PoS) by SISSA, the International School for Advanced Studies of Trieste.
Coordinators:
Mr. Eric Yen, Academia Sinica, TW
Dr. Stephan Hachinger, LRZ, DE
Mrs. Viktoria Pauw, LRZ, DE
Maximilian Höb, LRZ & LMU, DE
Prof. Dr. Dieter Kranzlmüller, LRZ & LUM, DE
Banquet 大會晚宴
18:30, 20 March 2025
HUMBLE HOUSE TAIPEI, Curio Collection by Hilton (The Orchid Room, 5F)
No.18, Songgao Rd., Xinyi Dist., Taipei
T: +886-2-6631-8000
台北艾麗 希爾頓格芮精選酒店 (5樓 蘭廳)
台北市信義區松高路18號
Recently, deepfake pornos in South Korea gained attention after unconfirmed lists of schools that had victims spread online in August this year. Many girls and women have hastily removed photos and videos from their Instagram, Facebook and other social media accounts. Thousands of young women have staged protests demanding stronger steps against deepfake porn. Politicians, academics and activists have held forums.
Hence, South Korean lawmakers recently passed a bill that criminalizes possessing or watching sexually explicit deepfake images and videos, with penalties set to include prison terms and fines.
However, the techniques of photoshopping or deepfakes are available in cyberspace, and many politicians have already been the subject of deepfakes. Thus, the techniques themselves cannot be the target of regulation. On the other hand, the misuse, disuse, or mal-use should be strictly criminalized, as the IT people argue.
In these discussions between the policy makers and IT people, AI literacy matters. A good knowledge of AI can be a double-edged sword. It can be used in cybercrimes such as deepfakes and, at the same time, it can help to detect and prevent them.
In this talk, the status quo of South Korean progress will be reported.
Integrating Artificial Intelligence in digital humanities has created unprecedented opportunities for analyzing historical archives. Building upon established work with Learning-as-a-Service solutions for Maryland State Archives' Legacy of Slavery collections, specifically the Domestic Traffic Advertisements dataset, this research proposes an innovative approach combining Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) with Large Language Models to analyze three strategically chosen collections: Certificates of Freedom, Domestic Traffic Advertisements, and Manumissions. These collections were selected for their intricate historical relationships - Certificates of Freedom documents provide legal proof of an enslaved individual's free status, often referencing prior Manumission records issued, while Domestic Traffic Advertisements offer crucial contextual information about slave trading patterns that frequently preceded manumission or freedom certification. The project introduces a novel architecture that enhances traditional RAG systems by incorporating knowledge graphs to capture complex relationships and temporal-spatial connections between these historically linked dataset collections. Unlike traditional RAG systems, this knowledge graph-enhanced approach enables natural language interactions for archive patrons and researchers, allowing them to explore complex historical narratives through intuitive conversations rather than traditional database queries. The system employs a three-layer approach: a knowledge graph layer mapping relationships between entities across collections using Neo4j, an RAG layer augmented with knowledge graph embeddings for contextual retrieval, and an LLM layer for natural language interaction and insight generation. This is an approach to transform how users can discover connections across collections, trace individual histories, and uncover previously hidden relationships in the archives. This natural language interface improves accessibility by eliminating the need for specialized database knowledge or understanding of archival organization systems. The research advances AI-enabled scientific workflows through specialized prompt engineering patterns for cross-collection analysis and custom embedding techniques for historical document representation. This approach improves the trustworthiness of AI responses by grounding them in verified historical relationships while enhancing the accuracy of cross-collection insights by leveraging the knowledge graph's ability to capture complex historical narratives and relationships. The system demonstrates how AI can democratize access to complex historical archives while maintaining the integrity and context of sensitive cultural materials. This research contributes to ISGC 2025 Track 10's focus on AI-enabled scientific workflows and novel approaches in scientific applications adopting machine learning techniques while advancing the state-of-the-art in knowledge graph-enhanced RAG systems for digital humanities research.
The emergence and integration of AI tools like ChatGPT into educational contexts has ignited heated debates, especially concerning its dual role as both a powerful teaching assistant and a potential tool for dishonesty. On one side, it may hold potential as a pedagogical aid for instructors and as a source of efficiency for students (one may nickname it positively as “TeachGPT”). On the other side, it may trigger lazy adoption by teachers to prepare thoughtless assessments, and unfair use by students to complete assignments or circumvent academic integrity in examinations (one may nickname it as “CheatGPT”). This study seeks to navigate this dichotomy by examining the impact of ChatGPT on assessment design and student outcomes in 3 university courses (Software and computing for subnuclear physics, Applied machine learning, Quantum machine learning) where the author has been serving as instructor since many years and has collected a vast dataset of questions and scores. Based on a relatively large database of questions created by the instructor, from which multiple-choice tests can be generated and administered to students, the assessment experience collected over the years with real students has been recently complemented with fake exam results generated by collecting ChatGPT answers to the very same questions, thus simulating dozens of “fake” academic years and related assessments. The performance comparison among real students and ChatGPT-based fake students reveals intriguing quantitative patterns, shedding light on 1) how AI matches or diverges from human responses across various scenarios; 2) the implications on future exam design, mainly focusing on strategies to mitigate AI misuse while leveraging its strengths for improved learning outcomes; 3) the feasibility of designing "ChatGPT-proof” tests, where the AI itself contributes to the creation of assessments resistant to its own capabilities. The study highlights the transformative potential of generative AI in reshaping educational practices and redefining the boundaries of teaching and evaluation.
The rapid advancement of Large Language Models (LLMs) has opened new avenues for accelerating data processing and analysis in high-energy physics (HEP). In this presentation, we introduce the Dr.Sai AI Agents System, a cutting-edge intelligent agent designed specifically for the BESIII experiment. This system leverages the power of LLMs to streamline complex tasks in particle physics analysis, from data interpretation to hypothesis generation, significantly reducing the time required to achieve meaningful physics results.
We will discuss the architecture of the AI agent, including its "brain" for logical reasoning, "sensors" for data intake, "memory" for knowledge storage, "actuators" for task execution, and "learning systems" for continuous improvement. The system's application in hadron spectroscopy studies will be highlighted as a prime example of its potential to boost experimental efficiency and drive new discoveries.
Additionally, we will explore the broader vision of developing a "scientific large model" capable of directly processing particle physics data and the roadmap toward creating an AI scientist. This talk will provide an overview of recent progress, technical considerations, and future directions in harnessing AI to redefine particle physics analysis.
Environmental Computing Workshop -
Making the most of today’s systems, targeting tomorrow’s AI and hybrid quantum systems
Our traditional " Environmental Computing Workshop" this year encourages speakers to imagine future directions in IT usage, in particular considering AI and Hybrid Quantum Computing as focus topics of ISGC 2025.
We welcome presenters to show their current developments in Environmental Computing (including Green Computing) and Disaster Mitigation. Aspects of constant and increasing importance are, with societal motivation, support for efficient disaster management and for the sustainable development goals. On the technical side, ML-based surrogate models promise to save considerable computing time and enable digital-twin type applications. Also, porting models to GPU- or quantum-accelerated architectures shall increase the efficiency of simulations. All this shall go along with energy efficiency efforts.
The workshop aims at a vivid exchange of knowledge on Disaster Management and Environmental Computing, giving participants a common understanding of the state of the art. The talk sessions may be complemented by a discussion, continuing in the line of the brainstorming sessions at last ISGC with the target of sparking project collaborations.
Talks should be about 20-40min in length (depending on the speaker's preference and slot availability). All attendees giving a talk can submit a full paper to the Proceedings of the International Symposium on Grids & Clouds 2025 (ISGC 2025). The ISGC 2025 proceedings will be published in the open-access Proceedings of Science (PoS) by SISSA, the International School for Advanced Studies of Trieste.
Coordinators:
Mr. Eric Yen, Academia Sinica, TW
Dr. Stephan Hachinger, LRZ, DE
Mrs. Viktoria Pauw, LRZ, DE
Maximilian Höb, LRZ & LMU, DE
Prof. Dr. Dieter Kranzlmüller, LRZ & LUM, DE
Networking and the connected e-Infrastructures are becoming ubiquitous. Ensuring the smooth operation and integrity of the services for research communities in a rapidly changing environment are key challenges. This track focuses on the current state of the art and recent advances in these areas: networking, infrastructure, operations, security and identity management. The scope of this track includes advances in high-performance networking (software defined networks, community private networks, the IPv4 to IPv6 transition, cross-domain provisioning), the connected data and compute infrastructures (storage and compute systems architectures, improving service and site reliability, interoperability between infrastructures, data centre models), monitoring tools and metrics, service management (ITIL and SLAs), and infrastructure/systems operations and management. Also included here are issues related to the integrity, reliability, and security of services and data: developments in security middleware, operational security, security policy, federated identity management, and community management. Submissions related to the general theme of the conference are particularly welcome.
The Authentication and Authorisation for Research Collaboration (AARC) Blueprint Architecture has been a foundational framework for authentication and authorisation infrastructures (AAIs) in global research. It supports the European Open Science Cloud (EOSC), national research AAIs, and cross-regional e-infrastructures, offering a unified approach to federated identity management. As the scope and complexity of federated identity models grow, the AARC framework is undergoing critical updates through the AARC-TREE (Technical Revision to Enhance Effectiveness) project and with continuous support by the GEANT projects (Currently GN5-2) by means of the Trust & Identity Enabling communities task.
This presentation focuses on the outcomes current outcomes, particularly on the policy advancements that address the evolving needs of research communities and infrastructures in federated environments for alignment and policy harmonisation. The policy workgroup worked on:
Composite-Proxy Scenarios: Development of distinct guidelines, policies, good practices, and procedures to enhance trust, security, and operational interaction in scenarios involving multiple proxies, advancing beyond the traditional AARC Blueprint Architecture.
Enhanced User Experience: Introducing reference models for acceptable use policies and privacy notice collection to improve cross-infrastructure user interaction, minimizing redundant actions (e.g., requiring users to "click once").
Updating the AARC Policy Development Kit (PDK) to take account of experiences of e-infrastructures using the original version; to simplify and to meet the requirements of a broader range of e-infrastructures.
Target Audience:
This presentation is directed at research communities and infrastructure providers offering services in federated environments.
Aims of the presentation:
CC-IN2P3, the french Tier-1 for W-LCG, has recently equipped itself with a dedicated in-house documentation tool, DIPLO, based on an open-source web-based solution. The centralization of information and a single point of entry have been key in this cross-organizational approach. Starting at the initial situation report, we expose the vision and the specifications that have led to the implementation of the internal documentation project a.k.a. the DIPLO, based on the Bookstack open-source platform.
All pre-existing documentation has been centralized there by Mars 2024. Guidelines and best practices have been issued way before DIPLO’s production status. At the same time, the ecosystem of tools used previously for documentation purposes has been simplified. This, in turn, has enabled some of these tools to naturally re-focus on their core business.
The current phase deals, on one hand, with content consolidation since Mars 2024, to ensure the sustainability of the technical documentation within the new ecosystem, and, on the other hand, with on-going adoption of DIPLO for operational information sharing throughout the organization, i.e. IT and non-IT departments, such as technical & logistics facilities and administration.
Side effects of this project set-up on the organization, challenges and lessons learned up to now will be addressed.
In IoT (Internet of Things) systems consisting of IoT devices, edges, and cloud servers, it is expected that various sensor data obtained from IoT devices will be collected, accumulated, and utilized to solve various social issues using Artificial Intelligence. However, due to the sophistication and intensification of cyber attacks, security measures for IoT systems consisting of a large number of IoT devices deployed remotely have become an issue. The idea of "zero trust" has attracted attention as a security measure method, and we are considering applying the idea of zero trust to IoT systems. In zero trust, it is necessary to identify resources such as computers and data that need to be protected, monitor and analyze access to them, and take measures to prevent the spread of damage when a security infringement is discovered. In applying zero trust to IoT systems, we monitor the software execution status and communication status of many IoT devices, and aim to prevent the spread of damage to other IoT devices and servers by revoking the host certificate of an IoT device where a problem has occurred. Issuing host certificates to IoT devices needs to be done without human intervention, and it is essential to implement automatic issuance and automatic revocation of certificates in cooperation with a certificate authority.
In this paper, we consider the application of the ACME (Automatic Certificate Management Environment) protocol to certificate lifecycle management in IoT systems. A model in which the certificate subscriber requests the revocation cannot be applied directly to IoT scenarios. Therefore, it is necessary to identify and authenticate entities that have the authority to revoke the host certificate of an IoT device and to revoke it on behalf of the device. We consider several designs that meet this requirement, particularly the use of the ACME protocol, and discuss how the proposed system solves the issues and its feasibility in the zero trust IoT system.
The Account LInking SErvice ALISE implements the concept of site-local
account linking. For this a user can log in with one local account and
with any number of supported external accounts (e.g. Helmholtz-ID and
Google). The local account is on at an HPC centre, which also comprises
the Unix-User name.
Federated services can use this informatin whenever they need to map a
federated identity to a local Unix account at a computer centre.
Examples for this are http/webDAV file access. WeDAV supports Basic
Authentication, which is transported via an OIDC Access Token to convey
the federated users' identity. The server needs to store the uploaded data
with a specific account name, such that the same user could later access
the uploaded date from e.g. computing jobs on that same server.
Alise may be used to ask users for linking their federatd identity to a
local one, so that the webDAV server could find the users' corresponding
local unix ID.
Quantum technologies represent a transformative leap in science but also in business. In recent years, quantum computing has evolved from a niche research area to a highly competitive technological frontier, with exponential growth described by Dowling’s and Neven's Law (Moore´s Law of Quantum) illustrating rapid advancements in processing power.
Significant global investments and efforts are driving this evolution. Major companies like IBM, which has a vast quantum network with over 410,000 users and numerous industrial collaborations, and start-ups are leading the development of scalable quantum solutions. Countries such as China and the USA are vying for leadership, with China building robust quantum infrastructure and making strategic advancements in an attempt to dominate the field. But European Centers of Excellence are also developing.
The quantum market is now seen as a strategic sector, with billions of dollars in global investments. This burgeoning ecosystem is supported by established corporations, innovative start-ups, and academic partnerships. The applications of quantum technologies are vast, spanning areas like quantum-enhanced machine learning, cybersecurity, optimization, communication and sensing. However, it also introduces risks, particularly to current encryption systems, including blockchain security, necessitating a shift toward quantum-safe encryption methods.
The rise of quantum technologies highlights the urgent need for a specialized workforce. Interdisciplinary education and training programs are essential to prepare talent capable of combining expertise in physics, computer science, and industry-specific applications. Academia and industry must collaborate to design new training programs and curricula to support this rapidly expanding field.
Efforts to standardize quantum technologies are also crucial for market development. The intellectual property landscape in quantum technologies is expanding, with growing interest in patenting aimed at commercialization. This presentation underscores the transformative potential of quantum technologies across industries while stressing the need for global collaboration to address its opportunities and challenges effectively.
Biography:
Prof. Dr. Michael Dowling was named to the Professorship for Innovation and Technology Management at the University of Regensburg effective July 1, 1996 and retired from this position in April 2024. Previously he had been an Assistant Professor and Associate Professor with tenure at the University of Georgia, USA.
Prof. Dowling was born in 1958 in New York, USA. He studied at the University of Texas in Austin (Bachelor of Arts in Chemistry with High Honors), Harvard University (Master of Science in Management and Public Policy) and University of Texas at Austin (Doctor of Philosophy in Business Administration). He has worked at the International Institute for Applied Systems Analysis in Laxenburg, Austria and with McKinsey & Company in Düsseldorf Germany.
Since 2014 he has been the Chairman of the Board of the MÜNCHNER KREIS, a non-profit supra-national association dedicated to the impact of digitalization on business and society. http://www.muenchner-kreis.de/
Prof. Dowling was elected a member of acatech – the German National Academy of Science and Engineering in 2015.
His research interests include the strategic management of technology, high technology entrepreneurship, and the relationships between technology, public policy and economic development.
Abstract:
Launched in 2020, the SYNAPSE consortium unites synchrotron facilities across Asia-Pacific, Europe, and the Middle East to tackle one of most daunting challenges in modern science: mapping neural connectomes at sub-micrometer resolution. This borderless collaboration leverages calibrated synchrotron microtomography beamlines to generate 3D brain datasets, each exceeding 1 exabyte at (0.3 μm)³ resolution. Scaling this effort to encompass hundreds of datasets—and eventually whole-body neural circuits—demands innovative solutions at the intersection of big data, distributed computing, and AI.
Central to our technical roadmap is the integration of grid and cloud infrastructures to manage petabyte-scale data streams, enabling real-time processing and global resource sharing. Advanced machine learning pipelines automate synapse detection and network tracing, reducing computational bottlenecks inherent to exabyte-level analyses. Meanwhile, federated storage architectures ensure secure, efficient data access across 10 or more synchrotron facilities, fostering collaborative breakthroughs without borders.
This talk will highlight milestones achieved by SYNAPSE consortium since its inauguration in merging cutting-edge imaging with scalable computation, including AI-driven annotation tools and adaptive workflows optimized for high-performance computing environments. As we push toward a century-long goal of completing the human connectome, the consortium exemplifies how global scientific infrastructure—paired with advancements in grids, clouds, and AI—can transform impossibly vast datasets into actionable insights. For the ISGC community, SYNAPSE offers a compelling blueprint for solving grand challenges through distributed innovation.
Professional Biography:
Yeukuang Hwu is the CEO of the SYNAPSE consortium (www.synapse-axon.org), leading a global initiative to map large animal and human brain connectomes using cutting-edge X-ray imaging technologies. He is also a Distinguished Research Fellow at Academia Sinica and holds adjunct professorships at several prestigious universities and research institutions. With over three decades of expertise in synchrotron-based research, Professor Hwu has made pioneering contributions to phase contrast microradiology and nanotomography, significantly advancing the x-ray imaging applications in life sciences, materials science, and many other areas.
Specifically, using the zone plate x-ray optics developed by his laboratory, his team went on setting a world record in 2007, achieving 15 nm resolution with multi-keV x-rays. This positioned him at the forefront of x-ray microscopy, opening new frontiers in nanoscience and subcellular imaging.
Professor Hwu is an elected Fellow of the Chinese Physics Society and the recipient of numerous prestigious awards, including the National Research Council Distinguished Research Award and the Taiwan-France Scientific Award. He has delivered more than 100 invited talks at international conferences, 12 international patents, and has published over 315 refereed journal articles, garnering more than 10,000 citations.