International Symposium on Grids & Clouds (ISGC) 2024

Asia/Taipei
BHSS, Academia Sinica

BHSS, Academia Sinica

Eric YEN (ASGC) , Ludek Matyska (CESNET) , Yuan-Hann Chang (Institute of Physics, Academia Sinica)
Contact
    • Security Workshop Room 2

      Room 2

      BHSS, Academia Sinica

    • 10:30
      Coffee Break
    • Security Workshop Room 2

      Room 2

      BHSS, Academia Sinica

      • 5
        eduGAIn Table Top Exercise
    • 12:30
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • Education Infomratics Workshop Room 1

      Room 1

      BHSS, Academia Sinica

      • 6
        General overview of the education paradigm
        Speakers: Tosh Yamamoto (Kansai University) , Yasuhiro HAYASHI (Musashino University)
      • 7
        Innovative approach of K-12 STEAM cases
        Speaker: Jerry Tsai (National Central University)
      • 8
        AI-enhanced Writing for a Freshman English Course at SCU
        Speaker: Peggy Tsai (Soochow University)
    • Security Workshop Room 2

      Room 2

      BHSS, Academia Sinica

    • 15:30
      Coffee Break
    • Education Infomratics Workshop Room 1

      Room 1

      BHSS, Academia Sinica

      • 11
        PROJECT-BASED LEARNING: AUTHENTIC TEACHING & LEARNING APPLY TO TOURISM COURSE
        Speaker: RuShan Chen (Chihlee University of Technology)
      • 12
        Educational use of AI in the future of education
        Speaker: Chris Peng
      • 13
        A Context-Based Learning Environment Using Cyber-Physical System For Contribution Degree Calculation
        Speaker: Yasuhiro HAYASHI (Musashino University)
    • Security Workshop Room 2

      Room 2

      BHSS, Academia Sinica

      • 14
        CTF (Continued)
    • Opening Ceremony & Keynote Speech I Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Ludek Matyska (CESNET)
      • 15
        Opening Remarks
        Speaker: Ludek Matyska (CESNET)
      • 16
        Building a superconducting quantum computer

        Quantum computers offer the potential to address complex problems that exceed the capabilities of current high-performance computers. Although fault-tolerant quantum computers have yet to be realized, numerous countries are actively engaged in their development. Among the various platforms under consideration, superconducting quantum computers utilizing Josephson junction qubits emerge as particularly promising candidates. In this presentation, I will delve into the fundamental principles of quantum gates and elucidate the construction process of a 5-qubit superconducting quantum computer. Additionally, I would like to discuss an open topic regarding the future integration of QPU with diverse data processing technologies, involving CPU, GPU and TPU, to expand the landscape of our current problem-solving capabilities.

        Speaker: ChiiDong CHEN (Academia Sinica)
      • 17
        Alps: cloud-native HPC at the Swiss National Supercomputing Centre

        The Swiss National Supercomputing Centre (CSCS) is commissioning the future multi-region flagship infrastructure codenamed Alps, an HPE Cray EX system based on NVIDIA GH200 Grace-Hopper superchip. The Centre has been heavily investing in the concept of Infrastructure as Code, and it is embracing the multi-tenancy paradigm for its infrastructure. Alps will serve multiple partners, characterised by different development and operational requirements, through the use of versatile software-defined clusters (vClusters). Exemplified by the collaborative partnership with the Paul Scherrer Institut, CSCS is also able to deliver Infrastructure-as-a-Service solution on Alps, enabling organisations to construct and manage vClusters tailored to their specific demands. Furthermore, in close collaboration with the Swiss Institute for Particle Physics, CSCS has met the computing requirements of the WLCG project with cutting-edge resources via a Tier-2 Grid site vCluster. This baseline is also exploited by other projects such as the Cherenkov Telescope Array and the Square Kilometre Array.
        Leveraging modern approaches and technologies borrowed from the cloud, CSCS is now empowered with enhanced flexibility to deliver first-class High Performance Computing to a variety of users and partners.

        Speaker: Riccardo Di Maria (Swiss National Supercomputing Centre (CSCS))
    • 10:30
      Coffee Break
    • Converging High Performance infrastructures: Supercomputers, clouds, accelerators Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Tomoaki Nakamura (KEK)
      • 18
        UniNuvola: the computing portal of the Perugia University

        An innovative, distributed, and elastic computing ecosystem, called UniNuvola,
        is being deployed at the University of Perugia, involving the Department of
        Chemistry, Biology and Biotechnologies, the Department of Physics and Geol-
        ogy, the Department of Mathematics and Informatics and the Department of
        Engineering. The aim of the project is the creation of a federated and scalable
        computing infrastructure, providing scientific services to end users belonging
        to both academic structures and, in perspective, SMEs. It represents a proof
        of concept of a distributed computing infrastructure empowered with software-
        defined networking capabilities, with every node laying transparently its virtual
        resources, services, and microservices on a virtual backbone. Our main objective
        is to design a virtual distributed networking infrastructure to federate an inno-
        vative architecture, with dynamical resource allocation, intelligent management
        of large volumes of data and compliant with present European federated com-
        putation paradigm and data protection policies. The federation also manages
        a heterogeneous collection of virtual application environments by organizing
        them into well tested and auto-consistent packages, ready to be used by orga-
        nizations guaranteeing, at the same time, the requested performance and result accuracy features. To this end, a first prototype of four Dell Power Edge R940
        nVME servers, each equipped with 2 Intel Xeon Gold CPU, 512 GB of RAM,
        and 16 TB of disk space, has been configured with the most recent software
        solutions for pursuing the above mentioned objectives. More in detail, a Kuber-
        netes cluster has been installed, using the Rook operator for deploying a CEPH
        scalable distributed storage. We have also investigated the adoption of both
        Metallb and OVN load balancers. User authentication has been managed with
        the Vault server interfaced to the University LDAP, while Jupyter Hub has been
        containerized into Kubernetes to serve Notebooks for multiple users. On the
        other hand, for those workloads already running in virtual environments that
        are difficult to containerize into Kubernetes pods, KubeVirt technology provides
        us with the possibility of enabling KVM-based virtual machine workloads to be
        managed as pods.
        Upon this (virtual) infrastructure a ready-to-use collection of scientific pack-
        ages, for both research and education purposes, is being developed. To this
        end, the capabilities of UniNuvola will be firstly benchmarked with various use
        cases, built upon computational chemistry and machine learning applications.
        Computational chemistry applications are widely recognized for their high de-
        mands on CPUs and storage, making them the ideal candidates for testing the
        scalability of the architecture. The significance of machine learning lies not
        only in its wide range of applications but also in its high memory requirements.
        Applications ranging from image recognition to intrusion detection algorithms,
        tested on well-known datasets or from live interaction with external sources, will
        allow benchmarking the capability of the platform with respect to the commercial,
        widespread alternatives. In addition, both cases can be utilized to test future
        improvements to the infrastructure, such as the inclusion of GPUs and quantum
        computing. In fact, during the second phase of the project, both high-end nodes
        with GPUs (Nvidia A100) and a solid-state SpinQ Triangulum NMR quantum
        computer will be integrated into the UniNuvola cluster to realize an academic
        across-the-board data center able to serve a variety of instances.

        Speaker: Dr Marco Pezzella (Department of Physics and Geology, University of Perugia)
      • 19
        Characteristic Analysis and Running Time Prediction of Slurm Jobs on CSNS Scientific Computing Platform (Remote Presentation)

        China Spallation Neutron Source (CSNS), the fourth pulse spallation neutron source in the world built during China's 11th Five-Year Plan period (2006-2010), has begun its operation after passing the national acceptance test in August 2018. There are 11 spectrometers built and put into operation and 3TB of raw data generated every day by November 2023. With the accelerator power upgraded from 100kW to 500kW and 20 spectrometers built eventually in the second phase of CSNS (CSNS II) before 2027, there will be 12TB of raw data generated every day then.
        In order to meet the urgent needs of computing, simulating and data processing during the construction project and experimental operation of CSNS, a high-performance computing cluster has been planned and built by stage in 2018 and 2021, and provides powerful computing capability of a total of 40,000 CPU cores, 80 NVIDIA v100 GPU cards and sufficient storage capacity of 4PB. The Cluster of CSNS partition running in 2018, mainly serves the computation and simulation of CSNS construction project, including radiation shield,accelerator design, beam analysis, and spectrometer design. The Cluster of Dongguan Big Science partition running in 2021, mainly provides inhouse and external scientists services of theoretical computing, experimental analyzing and machine learning.
        A web-base platform integrating HPC and AI services, namely CSNS Scientific Computing Platform of Institute of High Energy Physics of CAS and GBA Sub-center of National HEP Science Data Center (hereinafter referred to as CSNS SC Platfom) was design and developed in 2022 and officially launched in January 2023 in order to provide one-stop services with computing, simulating, data analyzing and AI training. The CSNS SC Platform integrates and unifies the hardware resources, software deployment, users management, job scheduling and API interfaces, providing a unified web-based development environment, job submission, data management,visual environment,AI development frameworks, dataset tools and AI development processes.
        With the development of computing technologies and the expansion of computing cluster scale, the components and system of HPC clusters becomes so complex increasingly that is a great challenge to the management of clusters and the allocation of computing resources. Related research has shown that the distribution of computation, communication, and I/O operations on HPC clusters is uneven among the HPC nodes, and the actual application performance of HPC clusters is often less than 10% of the peak performance which results in a huge waste of computing resources. Optimized scheduling strategies are expected to increase the resource utilization of HPC clusters and improve the efficiency of scientific computation and data analysis. This paper will present our resent work on collection and organization of the slurm jobs data on CSNS SC Platform, analysis on the job characteristics, and optimization of prediction of job run time with machine learning

        Speaker: Jianshu Hong (Institute of High Energy Physics, Chinese Academy of Sciences)
    • Network, Security, Infrastructure & Operations Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: David Groep (Nikhef and Maastricht University)
      • 20
        New Security Trust and Policies - for WLCG and other Research Infrastructures

        Many years ago, the Joint WLCG/OSG/EGEE security policy group successfully developed a suite of Security Policies for use by WLCG, EGEE, EGI and others. These in turn formed the basis of the AARC Policy Development Kit, published in 2019. Many infrastructures have since used the template policies in the AARC PDK but found they had to modify them to meet their needs. The Policy Templates are gradually being modified, taking feedback from others into account. The work on new template versions in the WISE Community Security for Collaborating Infrastructures working group was presented at ISGC2023. In future, work on this will continue in WISE but also be influenced by AARC TREE, the new EU Horizon Europe funded project presented in another abstract to this conference.

        Standard best practice on the development and maintenance of a Cybersecurity Program includes the management of risks, the mitigations for which include the use of appropriate security controls. The use of Security Policies is one essential component of such controls. This is well described in the Trusted CI Framework and guidance published at https://www.trustedci.org/framework .

        In WLCG, the IT infrastructure for the CERN Large Hadron Collider experiments, many of the security policies are now in need of update and revision. The work to use existing policy templates, and modify where necessary, for the update of WLCG security policies will be presented in this talk. This is essential for building trust within WLCG and also externally with other Infrastructures. All of this work will be fed back into discussions within WISE and AARC TREE to help produce new AARC policy templates.

        Speaker: David Kelsey (STFC-RAL)
      • 21
        A comprehensive initiative to enhance the security posture of open-source software (Remote Presentation)

        Protecting information assets has become a top priority for organizations in the ever- changing landscape of digital security.
        INFN is deeply committed to security, being a major player in the research world with distributed computing infrastructures across the entire territory and being involved in numerous research projects that deal with health and sensitive data.
        The Datacloud project aims to develop a portfolio of IaaS and PaaS cloud services, that will allow research partners to deploy innovative services by easily and transparently accessing geographically distributed computing and storage resources.
        The middleware developed by Datacloud reflects the uniqueness of our technology compared to common public clouds in the market: our core strength lies in the middleware and cloud-native applications tailored to the needs of research communities and our ability to co-design solutions with our research partners.
        Considering this, we are currently undertaking a comprehensive and strategic initiative to assess and enhance the security posture of our open-source components within the Datacloud production middleware. This middleware encompasses Platform-as-a-Service (PaaS) Orchestration system, Identity and Access Management (IAM), and TOSCA-based services.
        Over the years, the accelerating pace of software development has, at times, led to security aspects being overshadowed in favour of expeditious feature releases.
        Recognizing this oversight from the past, we intend to correct the situation, reinforcing security as a fundamental pillar in the development lifecycle.
        The initiative is motivated by a compelling need to align with best practices and industry standards, namely the OWASP SAMM (Software Assurance Maturity Model) and ISO/IEC 27002 frameworks. The collaboration of these frameworks serves as the foundation for a robust and harmonized security posture. OWASP SAMM provides a maturity model specifically tailored for software security assurance programs, offering guidance on creating and evolving an organization-wide software security initiative. ISO/IEC 27002, on the other hand, focuses on information security management, providing a comprehensive set of controls and guidelines.
        We acknowledge the need of a collaborative effort within our organization, which will actively involve development leads, “security champions”, and other key stakeholders.
        We have identified a set of tasks to be implemented to establish virtuous processes aimed at enhancing our security posture.
        In the initial phase of the plan, we will focus on the definition of security standards and policies, roles and responsibilities, and security training and awareness. The overarching goal is to create a robust governance framework that permeates the entire software development lifecycle.
        Following that, the plan delves into the security self-assessment, which involves creating a comprehensive projects inventory and the initiating an evaluation and improvement plan. This stage integrates code reviews and initial implementation strategies, acknowledging the importance of automated security checks, continuous assessment, and code quality reviews.
        A subsequent focus will be on setting up continuous monitoring processes to establish and maintain frameworks and processes for ongoing control and enhancement of software quality. This includes proactive measures for timely updates of dependencies and swift responses to emerging vulnerabilities.
        This contribution will provide an overview of this pragmatic roadmap for realizing a more secure and resilient software ecosystem, ultimately leading to the implementation of inherently secure end user services tailored to scientific data analysis.

        Speaker: Marica Antonacci (INFN)
      • 22
        Security exercises, which questions to ask and what to do with the answers

        ecurity exercises can be seen as an experiment, one wants to investigate how
        good, for example, the expected computer security incident response activities of an
        organisation described in the procedures and policies match with real (measured)
        activities in an -as realistic as possible, but contained- created security
        incident situation.

        The complexity of the created security situation depends on what to
        investigate. It ranges from measuring various aspects of incident response, like
        the security communication infrastructure used by the involved security teams,
        to "shaking the whole tree" situations where also the borders of the primarily
        addressed infrastructure and the interfaces to the security teams
        of depending or supporting infrastructures are challenged. An example would be Identity
        Providers that manage identities which can be used at compute services (like EGI
        FedCloud).

        Since security exercises are costly and in addition even bare the risk to be
        harmful to collaborations in the area of operational security, we will focus in this
        talk rather on our experiences gained through organizing security exercises, or
        being part of campaigns, on what went wrong, what can be improved, what should
        be avoided in future runs.

        For that purpose, and taking into account the similarity to scientific
        experiments we will also talk about how to design a security exercise, i.e. what
        are the questions you want to answer, how to identify and measure the relevant
        parameters, and finally what to do with the results.

        Speakers: David Crooks (UKRI STFC) , Sven Gabriel (Nikhef/EGI)
      • 23
        Status update on the deployment of threat intelligence and operational security monitoring capabilities (Remote Presentation)

        We have presented previously on the strategic direction of the Security Operations Centre working group, focused on building reference designs for sites to deploy the capability to actively use threat intelligence with fine-grained network monitoring and other tools. This work continues in an environment where the cybersecurity risk faced by research and education, notably from ransomware attacks, continues to be very high.

        In this report we discuss recent developments in the community, including both updates on deployment of security tools as well as progress in the sharing of threat intelligence in different contexts

        Speakers: David Crooks (UKRI STFC) , Liviu Valsan (CERN)
    • eScience Activity in Asia Pacific Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Alberto Masoni (INFN National Institute of Nuclear Physics)
    • 12:30
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • APGridPMA Meeting Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

    • Artificial Intelligence (AI) Auditorium (BHSS Academia Sinica)

      Auditorium

      BHSS Academia Sinica

      Convener: Ludek Matyska (CESNET)
      • 29
        Leveraging Cloud-based OpenAI's LLMs to Create Learning-as-a-Service (LaaS) Solutions for Culturally Rich Conversational AI: A Study Using the Legacy of Slavery Dataset (Remote Presentation)

        Abstract:
        In scientific applications, integrating artificial intelligence (AI) and machine learning (ML) has revolutionized research methodologies and workflows. This study delves into an innovative application of cloud-based OpenAI's Large Language Models (LLMs) in developing a conversational AI chatbot, drawing exclusively from the culturally significant Legacy of Slavery (LoS) datasets maintained by the Maryland State Archives. This initiative deviates from conventional chatbots that rely on a vast, generalized corpus for training. Instead, it focuses on harnessing the LoS datasets as the sole source for responses, thereby ensuring the authenticity and contextual relevance of the historical content. At the heart of this research are cloud-hosted digital notebooks designed as Learning-as-a-Service (LaaS) solutions. These notebooks are designed to elucidate the methodology behind employing OpenAI's LLMs to engineer a chatbot that not only engages in meaningful dialogues but is also constrained to using verified data from the LoS collection. The intention is to create a chatbot that supports educational and research-focused interactions, offering users insights rooted directly in the archival material. Additionally, the project integrates LangChain agents, such as CSV agents, to empower the chatbot with capabilities for data aggregation and analytical tasks, thereby extending its functionality beyond standard conversational interfaces. A pivotal aspect of this study is the comparative analysis between the outcomes produced by the LLM-based chatbot and those obtained using traditional data analysis and visualization tools like Tableau. This comparative study is essential to assess the effectiveness and accuracy of AI-driven analysis compared to conventional data analysis methods. It aims to illuminate the potential benefits and drawbacks of employing LLMs in scientific and research settings, particularly in the context of historical and cultural data analysis. The convergence of cloud computing and AI in this project exemplifies an innovative approach to digital humanities and archival research. It stands as an exemplar of the possibilities of using AI in the curation, exploration, and dissemination of cultural and historical data. The cloud-based digital notebooks serve as a model for LaaS solutions, showcasing how AI can transform the access, analysis, and dissemination of cultural and historical data. This research contributes significantly to the ongoing discourse on AI-enabled scientific workflows, offering new perspectives on applying ML and Deep Learning techniques in data-rich domains of humanities research. This project, through its unique use of AI, opens up new pathways for interacting with, analyzing, and learning from historical datasets. It demonstrates the transformative potential of AI in reshaping educational and scholarly approaches to digital humanities. The insights gleaned from this study are poised to influence a range of disciplines, promoting a deeper understanding of how AI can be tailored to respect and amplify the nuances of cultural and historical datasets in the digital era.

        Speaker: Rajesh Kumar Gnanasekaran (the University of Maryland)
      • 30
        An investigation about pretrainings for the multi-modal sensor data

        This paper investigates the effect of pretraining and fine-tuning for a multi-modal dataset. The detaset used in this study is accumulated in a garbage disposal facility for facility control and consists of 25000 sequential images and corresponding sensor values. The main task for this dataset is to classify the state of garbage incineration from an input image for the combustion state control. In this kind of tasks, pretraining with an unsupervised dataset and fine-tuning with a small supervised dataset is a typical and effective approach to reduce the costs of making supervised data. We investigated and compared lots of pretraining with sensors and autoencoders to find effective pretraining. Moreover, we compared some sensor selection methods for pretraining with sensors. The results show the performance and discussion about fine-tuned models with frozen and unfrozen pretraining parameters and the sensor selection.

        Speaker: Dr Kenshiro Tamata (Osaka University)
      • 31
        ¬NONLINEAR FUSION OF MULTIPLE EFFICIENT MANIFOLD RANKINGS IN CONTENT-BASED MEDICAL IMAGE RETRIEVAL

        ABSTRACT:
        The efficient manifold ranking (EMR) algorithm has been widely applied in content-based image retrieval (CBIR). For this algorithm, each image is represented by low-level features, describing color, texture, and shape. The characteristics of low-level features include the ability to quickly detect differences in color, texture, and shape, and the invariance to rotations and translations without the need for learning. However, low-level features are limited in describing the meaning of the image. To enhance the performance of EMR in CBMIR, in this research, we propose utilizing the fusion methods that combines the multi-rankings on low-level features with embedded vectors from a Deep Metric Learning (DML) model to enhance the discriminative power of a query image compared to images in the dataset. Experiments were conducted to demonstrate the effectiveness of the proposed methods in improving the quality of EMR.
        Key words: Content-based medical image retrieval, Efficient manifold ranking, Deep Metric Learning, Contrastive loss, Triplet loss.

        1. List item
        Speaker: Dr VAN TUYET DAO (Saigon International University)
    • Infrastructure Clouds & Virtualisation Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Josep Flix (PIC / CIEMAT)
      • 32
        Design and implementation of HEPS scientific computing system for various interactive data analysis scenarios (Remote Presentation)

        China’s High-Energy Photon Source (HEPS), the first national high-energy synchrotron radiation light source, is under design and construction. HEPS computing center is the principal provider of high-performance computing and data resources and services for science experiments of HEPS. The mission of HEPS scientific computing platform is to accelerate the scientific discovery for the characteristics of light source experiments through high-performance computing and data analysis. In order to meet the diverse analysis needs of data analysis in light source disciplines, we have built a scientific computing platform that can provide desktop analysis, interactive analysis, batch analysis and other types of computing services, and support scientists to access the computing environment through the web anytime and anywhere, quickly analyze experimental data. In this article, a scientific computing platform for HEPS's diverse analysis requirements is designed. First, the diverse analysis requirements of HEPS is introduced. Second, the challenges faced by the HEPS scientific computing system. Third, the architecture and service process of the scientific computing platform are described from the perspective of user, and some key technical implementations will be introduced in detail. Finally, the application effect of computing platforms will be demonstrated.

        Speaker: Qingbao Hu (IHEP)
      • 33
        Design and Implementation of a Container-based Public Service Cloud Platform for HEPS (Remote Presentation)

        High Energy Photon Source (HEPS) is a crucial scientific research facility that necessitates efficient, reliable, and secure services to support a wide range of experiments and applications. However, traditional physical server-based deployment methods suffer from issues such as low resource utilization, limited scalability, and high maintenance costs.Therefore, the objective of this study is to design and develop a container-based public service cloud platform that caters to the experimental and application needs of synchrotron radiation sources. By leveraging Kubernetes as the container orchestration technology, the platform achieves elastic scalability, multi-tenancy support, and dynamic resource allocation, thereby enhancing resource utilization and system scalability. Furthermore, incorporating robust security measures such as access control, authentication, and data encryption ensures the safety and integrity of users' applications and data.This research also focuses on the design, application, and deployment of Continuous Integration and Continuous Delivery (CICD). By implementing CICD workflows, the platform automates the build, testing, and deployment processes of applications, resulting in improved efficiency and quality throughout the development and deployment lifecycle.HEPS Container Public Service Cloud offers a comprehensive range of services including ganglia and nagios monitoring, puppet, cluster login nodes, nginx proxy, user service system, LADP and AD domain authentication nodes, KRB5 slave nodes, and more.The research findings demonstrate that the container-based public service cloud design and application deliver high-performance, stable, and secure services, effectively meeting the demands of synchrotron radiation source experiments and applications. Additionally, the utilization of CICD further enhances the efficiency and quality of development and deployment processes. Future work should focus on optimizing and expanding the capabilities of the container-based public service cloud to accommodate diverse user requirements and scenarios.

        Speaker: 吉平 徐 (中国科学院高能物理研究所)
      • 34
        High availability Kubernetes cluster using Octavia Ingress Controller (Remote Presentation)

        With the widespread adoption of containers by various organizations and companies, Kubernetes (K8s), an open-source software dedicated to container management, has in recent years become de-facto the standard for the deployment and operation of applications focused on this technological solution. K8s offers several advantages: workload balancing, dynamic resource allocation, automated rollout and rollback, storage orchestration, management of sensitive information, self-healing, etc. Such a resilient infrastructure allows us, for example, to deploy authentication systems used in the WLCG world. Furthermore, the same infrastructure will be used in the future to provide services to the developer group. Obviously K8s has some limitations that can be supplanted by the easy integration with many external software.

        Thanks to its flexibility and scalability features, K8s can be integrated with cloud native solutions such as Openstack, a modular cloud operating system capable to offer computing and storage management services according to the Infrastructure as a Service (IaaS) paradigm, deployed at INFN CNAF.

        The inner complementary relationship between K8s and Openstack has pushed us to widely use this solution in our Cloud infrastructure. An interesting aspect of the integration between the two software, which we are investigating, is the possibility of exposing K8s services externally via a Load Balancer (LB), by making use of the Octavia service. Octavia is an open source, operator-scale load balancing solution designed to work with OpenStack. Octavia achieves load balancing services by overseeing a fleet of virtual machines, containers, or bare metal servers, referred to collectively as amphorae. These amphorae are dynamically deployed as needed, distinguishing Octavia from alternative load balancing solutions and making it particularly well-suited for cloud environments.

        Using Octavia to publish a service externally will enable a sort of service resilience given by the adoption of a Master-Backup HAProxy. In the event of a master downtime, the backup will assume the role of master, keeping external communication alive. The advantage of using the Octavia service is to limit the resources (e.g. FloatingIPs) used. Furthermore, delegating the creation of the LB, and therefore the VMs, to Octavia may allow us to speed up the process and avoid any human errors resulting from manual installation.

        In the present work the deployment of the functionality is shown and performance of the whole system will be provided to prove the solidity of the adopted technological solutions.

        Speaker: Francesco Sinisi (INFN-CNAF)
    • VRE Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: Alberto Masoni (INFN National Institute of Nuclear Physics)
      • 35
        Open data at DESY

        The DESY Research Infrastructure, historically supports a large variety of sciences, like High Energy and Astro particle Physics, Dark matter research, Physics with Photons and Structural Biology. Most of those domains generate large amounts of precious data, handled according to domain specific policies and taking into account embargo periods and license restrictions. However, a significant section of this data is supposed to become “Open Data”, often enforced by funding agencies. To support its scientific communities in producing and using open data, DESY-IT is developing and installing central services, making open data sets easily findable, browsable and viewable. In addition, mechanisms will be provided to analyse data for the long tail of science, not covered by large e-Infrastructures.

        Following the principles of Open and FAIR data, we will provide a metadata catalogue to make the data findable. The accessibility aspect is approached by making use of federated user accounts via eduGAIN and will enable community members to use their institutional accounts for data access. The interoperability of the data sets is ensured by using community approved data formats such as HDF5, specifically NeXuS and openPMD wherever possible. Providing the technical and scientific metadata will finally make the open data sets reusable for subsequent analyses and research.

        Our proposed setup will initially consist of three components: the metadata catalogue SciCat, the storage system dCache and the VISA (Virtual Infrastructure for Scientific Analysis) portal. Scientific data can then be placed in a specific directory on dCache together with its metadata and will be ingested into SciCat to be availabe for access and download. Simultaneously, a subset of the technical and scientific metadata will be integrated into the VISA portal such that scientists can access the dataset within it. VISA has been developed for creating virtual machines that come with analysis tools pre-installed, the selected data already mounted and accessible from a web browser such that anyone can reliably access and explore data without having to install anything themselves.

        During the talk at ISGC, we will present the architecture of the system, its individual components as well as their interplay. A public entrypoint for the auditorium to test accessing data and the analysis tools by themselves will be given, too.

        Speaker: Patrick Fuhrmann (DESY/dCache.org)
      • 36
        Redesign and Modernization of the INFN Cloud Dashboard: improving efficiency, user-experience, and security

        For over two years the Italian National Institute for Nuclear Physics (INFN) has been actively operating a national cloud platform designed to streamline access to geographically distributed computing and storage resources. This initiative includes a catalog of high-level services that can be instantiated on demand through a web portal. Service descriptions are specified using TOSCA templates, which are processed by an orchestration system that implements a lightweight federation of cloud sites and automatic scheduling features to select the optimal provider for service deployment.

        By masking all implementation and infrastructure specifics, the INFN Cloud dashboard acts as a user-friendly gateway, seeking to simplify user interaction with the orchestration and service deployment platform.

        In the modern digital landscape, a website reflects an organization’s commitment to excellence and user satisfaction. With that in mind, we set out to transform our existing portal, the INFN Cloud Dashboard, into something new, by making it more secure, efficient, and user-friendly while providing a visually appealing interface.

        To enhance security, we conducted a comprehensive audit to detect vulnerabilities and integrated state-of-the-art security features. Through various tests, we have improved the platform security to prevent any type of data leak and to maximize the final user privacy.

        Another crucial factor is the effectiveness of the website in terms of performance and user retention. To enhance efficiency, we optimized the website’s backend infrastructure for speed and scalability. This involved refining the codebase, minimizing the requests, and employing techniques to expedite content delivery.

        To improve the user experience, we used an optimal UX strategy that focuses on understanding and addressing the needs and preferences of the target audience. This includes simplifying navigation, streamlining content layout, and enhancing interactivity to make the website more intuitive and enjoyable to use. We achieved that by making the website experience customizable by the end user and through the adoption of a responsive design that ensures seamless access across various devices, providing swift and reliable access to the website’s content.

        We have also upgraded the design of the website to align it with the INFN visual identity adopting contemporary aesthetics that resonate with users while maintaining brand consistency. This includes the thoughtful use of color schemes, typography, and effects that align with the latest design trends.

        Upgrading a website is a continuous process that requires diligence and foresight. By implementing comprehensive security measures, enhancing efficiency, optimizing UX, and refining the design, we aimed to transform the INFN Cloud Dashboard into an immersive and secure digital experience.

        Speakers: Ettore Serra (INFN) , Marica Antonacci (INFN)
      • 37
        Towards the Future: DiracX - A Modern Incarnation of the DIRAC Framework

        DIRAC has been a cornerstone in providing comprehensive solutions for user communities requiring access to distributed resources. Originally initiated by the LHCb collaboration at CERN in 2000, DIRAC underwent significant changes over the years. In 2008, a major refactoring resulted in the creation of the experiment-agnostic "core" DIRAC, allowing custom extensions such as LHCbDIRAC.

        Despite its success in meeting experiment requirements, DIRAC has accumulated technical debt over 15 years. Managing installations is complex, with a high entry barrier and reliance on custom machinery. The software development lacks adherence to modern standards, hindering onboarding for new developers. Key components like the network protocol and authentication are custom and challenging to integrate with other applications.

        To address these challenges, the DIRAC consortium has initiated the development of DiracX. Building on two decades of experience and battle-tested technological choices, DiracX represents a new era. While still in its early stages, the roadmap and timelines are well-established.

        This paper outlines the architecture of DiracX and discusses the technological decisions made. Considering the critical importance of a continuously running DIRAC system for many communities, we delve into the migration procedure from DIRAC to DiracX.

        Speaker: Dr Christophe HAEN (CERN)
      • 38
        Recent developments in the data analysis integrated software system of HEPS (Remote Presentation)

        Recent advances in X-ray beamline technologies, including the advent of very high brilliance beamlines at next generation synchrotron sources and advanced detector instrumentation, have led to an exponential increase in the speed of data collection. As a result, there is a growing need for a data analysis platform that can refine and optimise data collection strategies on-line and effectively analyse large volumes of data after data collection.
        The Data Analysis Integrated Software System (Daisy) has been designed to meet the requirements of the next generation of advanced synchrotron radiation sources, such as the High Energy Photon Source (HEPS). Daisy aims to support on-site data analysis services with rapid feedback and interaction, and offline analysis of large data sets.
        In this talk, we will present the latest developments in the Daisy framework, as well as the custom applications for specific scientific domains that have been developed based on Daisy. Future developments will also be discussed.

        Speaker: Yu Hu (IHEP)
    • 15:30
      Coffee Break
    • APGridPMA Meeting Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

    • Artificial Intelligence (AI) Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Simon C. Lin (ASGC)
      • 39
        Deep learning approaches for prevention of Japanese local monkey trespassing in a sweet potato field

        As human-wildlife conflicts escalate in our area and around Japan, safeguarding crops and farmers from animal intrusions becomes paramount. This research introduces a deep learning approach to prototype a prevention system against monkey trespassing in sweet potato fields. The proposed system was motivated by the idea of developing wildlife identification and assisting local farmers in protecting their sweet potato fields in the autumn of 2022 and 2023.
        This research project explores the complexities of utilizing deep learning algorithms to monitor monkey activity in real-time within agricultural settings. We evaluated deep-learning models based on the you-only-look-once (YOLO) algorithm version 4, which can simultaneously classify and localize images to identify local monkeys. Our prototype was designed to detect live monkeys in images from a real-time streaming Protocol (RTSP) Camera and automatically notify farmer group members via the Line application. We successfully installed and tested the system at a sweet potato field in Hakusan, Ishikawa, using four cameras and a solar power system to ensure 24-hour operation from September through November of 2022 and 2023. We collected data from trail cameras installed around the sweet potato fields to train the model and used the deep learning software called CiRA CORE (https://www.cira-ai.com/en).
        Furthermore, this research emphasizes the significance of performance analysis and system tuning to optimize the efficacy of the proposed deep learning-based prevention system. A comprehensive evaluation of the system's performance metrics, including detection accuracy, response time, and false positive rates, is conducted. Using the k-fold cross-validation method, we find the skill set of the local model/monkey picture dataset on new pictures and videos. The findings inform the iterative system tuning process, where adjustments are made to improve overall performance and minimize false alarms.
        The research emphasizes the need to optimize the deep learning-based prevention system by conducting a comprehensive evaluation of its performance metrics, such as detection accuracy, response time, and false positive rates. The study used the k-fold cross-validation method to determine the system's skill set on new pictures and videos and used the findings to inform the iterative system tuning process. This resulted in adjustments made to improve overall performance and minimize false alarms.
        In conclusion, the prototype was successfully trained and operated on a Windows PC with 32 GB RAM, 64-bit Operating System, and Intel(R) Core(TM) i9-9900K CPU@3.60GHz with Nvidia GeForce RTX 3080 10 GB graphics processing unit (GPU). The model demonstrated high performance in terms of classification and accuracy with a real-time streaming camera. The system can provide advance notification to farmers to prevent damage caused by monkeys.
        Moreover, This research provides a comprehensive exploration of the application of deep learning in preventing monkey trespassing in sweet potato fields. In this study, we look into the importance of performance analysis and system tuning, and the study offers a holistic perspective on developing and optimizing an effective and practical solution for mitigating human-wildlife conflicts in the local agricultural environment.

        Speaker: Mr Apirak Sang-ngenchai (Kanazawa Institute of Technology / International College of Technology, Kanazawa)
      • 40
        An application-agnostic AI platform to accelerate Machine Learning adoption for basic to hard ML/DL scientific use cases. (Remote Presentation)

        Researchers at INFN (National Institute for Nuclear Physics) face challenges from basic to hard science use cases (e.g., big-data latest generation experiments) in many areas: HEP (High Energy Physics), Astrophysics, Quantum Computing, Genomics, etc.

        Machine Learning (ML) adoption is ubiquitous in these areas, requiring researchers to solve problems related to the specificity of applications (e.g., tailored models and intricate domain knowledge), but also requiring solving general infrastructure-level and ML-workflow related problems.

        As the demand for ML solutions continues to rise across the diverse research domains, there exists a critical need for an innovative approach to accelerate ML adoption.

        In this regard we propose an AI platform designed as an application-agnostic MLaaS (Machine Learning as a Service) solution, which provides a paradigm shift by offering a flexible and generalized infrastructure that decouples the ML development process from specific use cases.

        The AI platform is implemented as a software layer on top of our cloud serviceplatform, the INFN Cloud, which offers composable, scalable, and open-source solutions on a dedicated geographically distributed infrastructure. The INFN Cloud core mission is to facilitate resource sharing and enhance accessibility for INFN users, encompassing a wide range of resources, including GPUs and storage.

        The AI platform leverages INFN Cloud resources and principles, gathering and orchestrating technologies to support end-to-end scalable ML solutions: Kubernetes, Kubeflow, KServe, KNative, Kueue, Horovod, etc., ensuring support for many ML frameworks: TensorFlow, PyTorch, Apache MXNet, XGBoost, etc.

        This contribution will describeTogether with the platform’s design and principles, we presentas well as some selected use cases from NLP and HEP domains that benefitted from the “aaS” approach.

        The platform's agnostic nature extends beyond model compatibility to address the practical challenges associated with deploying ML solutions in real hard science scenarios: streaming services, exabyte-scale storage solutions, high-bandwidth networking, support for native HEP data (e.g., CERN ROOT data format), etc.

        Furthermore, the AI platform promotes transfer learning and model reuse, to accelerate the ML development lifecycle. Developers can leverage pre-trained models and share knowledge across different applications, reducing the time and resources required for training new models from scratch. This collaborative aspect not only enhances efficiency but also promotes a collective learning environment within the research community.

        In conclusion, the application-agnostic AI platform serves as a unified ecosystem where developers, data scientists, and domain experts can collaborate seamlessly. By providing a standardized framework for ML model development, training, and deployment, the platform eliminates the need for extensive domain expertise in every application area. This democratization of ML empowers a broader audience to leverage the benefits of machine learning, breaking down barriers and fostering innovation across diverse research domains.

        Speakers: Mauro Gattari (INFN (National Institute for Nuclear Physics)) , Luca Giommi (INFN and University of Bologna) , Marica Antonacci (INFN) , Mr Gioacchino Vino (INFN)
      • 41
        Wireless Broadcasting for Efficiency and Accuracy in Federated Learning

        Machine learning is today’s fastest-developing and most powerful computer technology, finding applications in
        nearly every domain of science and industry, such as natural language processing, visual object detection, autonomous driving, stock market prediction, medical applications, and many more. For machine learning to be effective, a large amount of high-quality training data is essential. At the same time, the number of Internet of Things (IoT) devices such as smart watches, voice control systems, air quality monitors, surveillance cameras, and much more is increasing rapidly. These devices, as well as smartphones, are equipped with high-quality sensors generating an enormous amount of real-world data which has the potential to feed machine learning models. Due to the high privacy sensitivity and velocity of the data, classical offline training in data centers is not possible. To tackle these challenges, Google proposed a new approach to distributed machine learning called federated learning in 2016. In federated learning, clients train a global model on local data and upload only their model parameter deltas to a server where they get merged into a global model while keeping their local data private. While this approach holds great opportunity for privacy-preserving AI solutions e.g. for medical applications, several challenges limiting its efficiency and effectiveness remain, namely communication cost, the dependency on a trusted central server, and the parameter divergence in the presence of non-IID training data.

        In this paper, we first analyze the state of the art with respect to the aforementioned challenges.
        Considering the characteristics of modern wireless networks, we notice that there is a huge opportunity for leveraging wireless broadcasting in a hybrid system architecture comprising peer-to-peer subgroups and hierarchical servers. We design a new protocol for federated learning where clients asynchronously share gradient updates with other nearby clients via wireless broadcasting and globally exchange gradient updates via a hierarchical server network. This way, we benefit from the efficiency of wireless broadcasting to increase communication efficiency and decrease server involvement. Furthermore, the frequent exchange of gradient updates between clients allows us to better cope with non-IID training data and allows for higher privacy guarantees.
        Our protocol serves as a framework that can easily be enhanced using many recently published contributions in federated learning. The impact of our work is expected to become even stronger in future 5G+ networks because it benefits from high device density and mobility.

        Speaker: Mr Jonas Wessner
    • Infrastructure Clouds & Virtualisation Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Josep Flix (PIC / CIEMAT)
      • 42
        Efficient management of INDIGO-IAM clients and S3 buckets via INDIGO PaaS Orchestrator in INFN Cloud (Remote Presentation)

        The National Institute for Nuclear Physics (INFN) has been operating and supporting Italy’s largest research and academic distributed infrastructure for several decades. In March 2021, INFN launched “INFN Cloud” which provides a federated cloud infrastructure and a customizable service portfolio tailored for the scientific communities supported by the institute. The portfolio assortment comprises standard IaaS options and more advanced PaaS and SaaS solutions, all crafted to suit the distinct needs of specific communities. All PaaS services in the portfolio are described through an Infrastructure as Code paradigm based on a declarative approach, via a combination of TOSCA templates (to model an application stack), Ansible roles (to manage the automated configuration of virtual environments), and Docker containers (to encapsulate high-level application software and runtime).
        The federation middleware of the INFN Cloud platform is built upon the INDIGO PaaS Orchestration system, which consists of interconnected open-source microservices. Among them, there is the INDIGO PaaS Orchestrator which receives high-level deployment requests from users and coordinates the deployment process over IaaS platforms.

        In this work, we address an issue within INFN Cloud concerning the proliferation of INDIGO Identity Access Management (INDIGO-IAM) clients and S3 buckets. Specifically, these resources are created during the on-demand deployment of high-level services, such as PaaS and SaaS offerings like Jupyter Hub or File Sync&Share solutions like ownCloud or NextCloud. These services require the creation of an INDIGO-IAM client to authenticate users and authorize them to access the service. Currently, this process is performed through Ansible recipes, with no transmission of client information to the INDIGO PaaS Orchestrator. This results in a scenario where, upon deployment deletion or failure, the related INDIGO-IAM client is not removed. This leads to an increasing number of unused clients, consequently causing a decrease in the performance of the INDIGO-IAM Service (an issue that has already been confirmed to exist).
        Our proposed resolution involves delegating to the INDIGO PaaS Orchestrator the creation (and the deletion) of any INDIGO-IAM clients. The implemented feature offers users enhanced flexibility, enabling them to create multiple clients, select the identity provider, define scopes, and assign the client owner so that its configuration can be managed even later.

        A similar problem was identified concerning the proliferation of S3 buckets. Some services that integrate S3 storage as a backend (such as Sync&Share as a Service), usually write a lot of data into these buckets. Also in this case, when the deletion of the deployment is triggered, the associated buckets (and their contents) persist, resulting in a continuous increase in disk space consumption. To address this, we found it convenient to delegate to the INDIGO PaaS Orchestrator both the creation and the deletion of the buckets.

        This work presents the solution we developed for both challenges, along with a comprehensive overview of the new functionalities introduced in the code.

        Speaker: Dr Luca Giommi (INFN CNAF)
      • 43
        INFN Cloud User Support: the link between research and IT

        With a rich heritage in cutting-edge distributed IT technologies, ranging from initial small clusters to Grid and Cloud-based computing, INFN has introduced "INFN Cloud" about three years ago. This user-friendly, distributed cloud infrastructure and services portfolio is designed for scientific communities, providing easy access and utilization.

        Given the decentralized nature of the infrastructure and the extensive array of services and technical solutions in its service catalog, it is crucial to establish a dependable system, supported by both staff and services. This system should facilitate the satisfaction and monitoring of interactions between users and INFN Cloud administrators. In addressing this need, INFN Cloud employs a multi-level structure. The first level (L1) is responsible for managing user registration requests, enrolling new use-cases, and guiding users in utilizing the services available in the INFN Cloud portfolio. The second level (L2) addresses issues requiring higher privileges than those granted to L1, involving experienced and proficient technicians.

        Additionally, comprehensive training initiatives have been established and offered to various user categories, aiming to address intricate scientific and technological challenges. This includes training for cloud site administrators who wish to federate their resources with INFN Cloud. This is precisely what has been happening in recent months: INFN Cloud was initially hosted by a couple of data centers (CNAF and ReCaS), but over time new sites, always belonging to INFN, were added to the initial federation, thus increasing the total computational resources available. Moreover, training programs are accompanied by a robust collection of user guides and technical documentation, designed to facilitate the utilization and integration of the services offered through the INFN Cloud PaaS.

        This presentation will offer an overview of INFN Cloud and its evolution, DATACLOUD, along with insights into support and training initiatives. Specifically, the discussion will focus on activities dedicated to aiding users in the selection of the most fitting cloud services for their requirements. Additionally, attention will be given to enhancing the INFN Cloud portfolio by gathering new requirements, which can be transformed into dependable solutions for community use. Anticipated modifications in the support and training processes, aligning with the progression of INFN Cloud activities within the newly established INFN DATACLOUD working group, will also be outlined.

        Speaker: Dr Alessandro Pascolini (INFN CNAF)
      • 44
        Federation-registry: the renovated Configuration Management Database for dynamic cloud federation (Remote Presentation)

        The INDIGO PaaS orchestration system is an open-source middleware designed to seamlessly federate heterogeneous computing environments, including public and private clouds, container platforms, and more. Its primary function lies in orchestrating the deployment of virtual infrastructures, ranging from simple to intricate setups. These virtual infrastructures can implement high-level services, such as Jupyter Hub, Kubernetes, Spark, and HTCondor clusters, providing users with convenient access and operational control.
        At the heart of the orchestration system lies its core component, the Orchestrator, supported by a suite of micro-services. These micro-services play a crucial role in assisting the Orchestrator by facilitating the selection of the optimal provider from the federated environments, based on the specific deployment request.
        Within this architecture, a pivotal micro-service is dedicated to implement the information system of the federation. This crucial component records comprehensive details about all the providers, encompassing their characteristics and capabilities. The information stored plays a central role in the matchmaking process between user deployment requests and available providers.
        For instance, if a deployment request specifies the allocation of one or more GPUs, the Orchestrator relies on the information system to identify which providers within the federation, for which the user is entitled to allocate resources, offer GPU capabilities.
        Currently, this functionality is implemented by the Configuration Management Database (CMDB) service, which stores and organizes information about resource providers, and the Service Level Agreement Manager (SLAM) which retains SLAs signed by users and resource provider administrators.
        We have opted to replace the existing services due to the discontinuation of maintenance for the CMDB developed during the INDIGO-DataCloud project, which relies on outdated components. The forthcoming solution, the Federation-Registry, is a state-of-the-art web application built on the FastAPI framework. It features a REST API secured by OpenID-Connect/OAuth2 authentication and authorization technologies and policies. This upgrade ensures a more robust and secure foundation for managing federation-related information.
        The Federation-Registry leverages neo4j, a highly flexible graph database, as opposed to the legacy CouchDB - a non-relational database - for storing and organizing data related to resource providers. Additionally, it adopts S3 object storage to securely store the signed SLA agreements. The implementation of a new population script becomes essential to retrieve information from the target resource providers and feed the database with relevant data.
        This upgrade promises several advantages, including improved data organization, independence from outdated and unmaintained software, adherence to test-driven code practices, enhanced flexibility for accommodating various types of providers, and simplified database structure updates for the incorporation of new provider types. This contribution will outline the architectural decisions and delve into the specifics of the implementation.
        The newly implemented Federation-registry service will be integrated into the INFN Cloud platform, which is already exploiting the INDIGO PaaS middleware to provide INFN scientific communities with a portfolio of high-level services supplied on-demand across geographically distributed cloud sites.

        Speaker: Giovanni Savarese (INFN)
      • 45
        Supporting Native Cloud Tools in the EGI Federated Cloud via the FedCloud Client

        The EGI Federated Cloud (FedCloud) is a multinational cloud system that seamlessly integrates community, private, and public clouds into a scalable computing platform dedicated to research. Each cloud within this federated infrastructure configures its Cloud Management Framework (CMF) based on its preferences and constraints. The inherent heterogeneity among cloud sites can pose challenges when attempting to use native cloud tools such as Terraform at the federation level.

        The FedCloud client, the official client of the EGI Federated Cloud, plays a crucial role in simplifying the use of these tools. It offers the following capabilities:

        • Creating a working environment for the tools:
        • It generates site-specific configuration files (cloud.yaml).
        • Sets up essential environment variables to ensure seamless tool integration.
        • Making the tools federation-aware:
        • Utilizes site IDs and Virtual Organization (VO) names for streamlined access to individual sites within the federation.
        • Facilitates global, all-site operations to enhance tool functionality.
        • Selecting suitable resources and configurations:
        • Efficiently searches for local cloud images.
        • Helps in selecting the most appropriate cloud flavors to optimize resource utilization.

        In essence, the FedCloud client serves as a valuable bridge, simplifying the use of native cloud tools within the EGI Federated Cloud environment. Its features contribute to a more user-friendly and efficient cloud computing experience, particularly when dealing with the diverse cloud infrastructure found within the federation.

        Speaker: Viet Tran (Institute of Informatics, Slovak Academy of Sciences)
    • 18:30
      Welcome Reception Evergreen Laurel Hotel (Taipei)

      Evergreen Laurel Hotel (Taipei)

    • Keynote Speech Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Simon C. Lin (ASGC)
      • 46
        Accelerating Science and Learning, Part 2: The Breakthrough and the Next Stages in the Digital Restoration of Damaged Historical Material

        This talk continues the story of virtual unwrapping with a victorious return to the stage at ISGC 2024. Last year we predicted an unprecedented breakthrough with the help of an international competition called the Vesuvius Challenge. This talk will fulfil that promise and describe how we have captured the imagination of a diverse, global audience, through the virtual unwrapping of one of the iconic scrolls from Herculaneum. Aligned with the theme of ISGC, I will describe the crucial role that cloud computing has played in the application of computationally intensive AI techniques applied to this problem at scale within the competitive scheme of the Vesuvius Challenge. I will conclude by revealing the upcoming activities we are planning in order to deliver a corpus of material from the ancient world that stands to be the largest revelation of classical material since Italian Scholar and Renaissance Humanist Poggio Bracciolini was rescuing manuscripts in 1416.

        Speaker: Brent Seales (University of Kentucky)
      • 47
        Forging physics-corrected deep-learning approaches and AI robotics for high performance drug discovery
        Speaker: Jung-Hsin LIN (Research Center for Applied Science, Academia Sinica)
    • 10:30
      Coffee Break
    • Artificial Intelligence (AI) Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: Ludek Matyska (CESNET)
      • 48
        Data Center IT Anomaly Prediction and Classificaiton: an INFN CNAF experience (Remote Presentation)

        INFN CNAF data center provides a huge amount of heterogeneous data through the adoption of dedicated monitoring systems. Having to provide a 24/7 availability, it has started to assess artificial intelligence solutions to detect anomalies aimed to predict possible failures.

        In this study, the main goal is to define an artificial intelligence framework able to classify and predict anomalies in time series data obtained from different sensors and systems within the data center (i.e. electrical plant, cooling system, UPS system, and others). The framework takes into consideration the following data characteristics: the majority of the collected data has a time window that begins on January 6, 2022, and ends on July 7, 2023; the number of entries per file varies from 5000 up to 50000; most of the sensors values are sampled every 15 minutes, but some sensors (like the UPS system) are sampled every 10 minutes; during the merging phase of sensors data, using the timestamp as key, a tolerance of 15 minutes to get entries where every timestamp have the values of each sensor.

        Having to deal with unlabeled data, the proposed framework performs as a first step a regression task to learn the behavior of the sensors and, given the previous 5 timestamps, provides the values of the sensors in the next timestamp. As a second step, it performs a classification task. Comparing the predicted and the actual behaviors of the sensors, in fact, evaluates the status of the system and possible anomalies.

        The regression task can detect the relationship with other sensors with the usage of GATv2, Long-short term memory, and linear layers, and provide the trend of each sensor by using LSTM layers. To make the training phase faster and less influenced by the initial random initialization of the parameters, batch normalization has been performed after each GATv2 layer and LSTM layer. Once the regression network has provided the expected behavior of the sensors, the outcome is compared with the observed one by using two linear layers: after the first one, there is a batch normalization and a ReLU, providing two numbers between 0 and 1, that represents the probability that the timestamp is anomalous and the probability that it is not anomalous. The two layers have been trained with the mean squared error and the cross entropy as loss functions respectively. The network can properly learn from this unbalanced dataset.

        To train the regression model we have only used the non-anomalous timestamps, instead to train the classifier we have considered both types of entries. To avoid missing data between timestamps, the samples for training, validation, and testing have not been created randomly, furthermore, the ratio between non-anomalous and anomalous timestamps has been preserved in all the sets. So to achieve all these points, the dataset has been divided into six different parts of equal length, and then the training set, the validation set, and the test set of each part have been created.

        Speaker: Elisabetta Ronchieri (INFN CNAF)
      • 49
        Evaluation on Momentum Contrastive Learning with 3D Local Parts

        Self-supervised learning speeds up the representation learning process in lots of computer vision tasks. It also saves time and labor of labelling the dataset. Momentum Contrast (MoCo) is one of efficient contrastive learning methods, which has achieved positive results on different downstream vision tasks with self-supervised learning. However, its performance on extracting 3D local parts representations remains unknown. In our study, we make modifications on the MoCo model to learn the local features of ShapeNet, and design data augmentation methods and local clustering method to randomly generate local clusters. To evaluate proposed method, the evaluation experiments on different scales of local clusters and data augmentation methods with our method are performed, then we perform the 3D object classification downstream task on the local parts with pretrained model. From the results, the modified MoCo model shows great performance on extracting local representations and make the classification downstream task faster with pretrained model.

        Speaker: XUANMENG SHA (Osaka University)
      • 50
        An Artificial Intelligence-based service to automatize the INFN CNAF User Support (remote Presentation)

        The INFN CNAF User Support unit plays the role of the first interface to the user of the data center, which provides computing resources to over 60 scientific communities in the fields of Particle, Nuclear and Astro-particle physics, Cosmology and Medicine.

        While its duties span from repetitive tasks to supporting complex scientific-computing workflows, many of them can be automatized or simplified and made efficient through Artifical Intelligence (AI) techniques, which have shown promising accuracy in the text multi-labeling classification problem. Indeed, part of the users’ requests cannot be directly addressed without the intervention of one of the other INFN-CNAF units, which act as a second level of support. In these cases, automatic labeling is exploited to address the relevant units of the request.

        Over the many years of activity of the User Support group, several thousands of users’ e-mail bilingual messages, both in Italian and English, have been collected, and have been used to provide training samples to Machine Learning algorithms, and validate them with new coming users’ requests. These messages can be organized in threads which comprise user requests and the corresponding solutions, as well as the messages to and from the involved second-level support unit, which are implicitly labelled by the recipients list of the e-mail.

        In this study, we have applied a set of Machine Learning classification models, such as Support Vector Machine, Naive Bayes, and Convolutional Neural Network, to the extracted features of the pre-processed text messages that we have obtained with Natural language Processing solutions. The defined models have been compared by considering various feature extraction techniques, such as Bag of Words, Term Frequency - Inverse Document Frequency, Bag of n-Grams, and WordEmbedding, to improve their performance. The best models have been exploited by our AI-based service that is involved in the Virtual User Support Assistant. It receives text in a chat-like user interface and provides a reply on the base of the training knowledge base.

        The prototype has been implemented in Python through the usage of several AI libraries, among them nltk and scikit-learn. A set of the INFN-CNAF users have been selected to test the prototype and give their precious feedback to the User Support unit.

        In conclusion, our study not only showcases the technical prowess of AI in enhancing the INFN-CNAF user support activities, but also emphasizes the broader considerations of user satisfaction, scalability, and future readiness.

        Future developments also foresee the study of Large Language Models such as GPT-3.5 and LLAMA 2 for providing a more natural user experience.

        Speakers: Dr Elisabetta Ronchieri (INFN CNAF) , Dr Carmelo Pellegrino (INFN-CNAF)
    • Joint DMCC, UMD & Environmental Computing Workshop Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Eric YEN (ASGC)
      • 51
        An Approach to Realistic Parameterization (REP) Model of Typhoon Pressure and Wind Fields Around Taiwan

        Typhoon-induced storm surge modeling involves forcings of 10-m wind and sea level pressure fields, typically determined by an adequate parametric typhoon model based on typhoon tracks, sizes, and intensities or a fully dynamical simulation by numerical weather prediction (NWP). Parametric models have been widely developed to simulate tropical cyclones. In conventional Holland-type parametric models, a typhoon or hurricane is modeled as a symmetric or quasi-symmetric vortex. These symmetric assumptions fit the observation well in deep-ocean areas or areas with flat topography. A tall Central Mountain Range (CMR), however, will cause these models to produce significant errors. The presence of CMR can impact TCs moving over it or passing by, resulting in storm surge modulation in Taiwan's vicinity. The diversity of the environment surrounding Taiwan generates various physical issues that complicate storm surges. In this paper, we aim to create a new weather model for storm surge calculation while only the track and intensity are known for a tropical cyclone, and this model shall be able to present the terrain blockage effect. Because the wind velocity is much faster than the moving speed of the typhoon, the flow field soon transfers from a transition stage into a quasi-steady state. As a result, the weather field is primarily controlled by storm intensity and topography and has less effect from the storm trajectory. This paper presents a new statistical method for generating a realistic weather field based on the location and intensity of the typhoon. Instead of using conventional parametric models to represent the structure of wind and pressure fields of a typhoon, we aim to develop a Realistic Parameterization (REP) Model by employing a 10-m (above sea level) wind field and sea-level pressure from historical typhoons to provide more realistic typhoon model and to generate better storm surges when the influence of topography is non-negligible. We adopted the ERA-5 reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF) with a total of 3200 data from 1981 to 2021. ERA-5 reanalysis data were validated against ground observation from the Central Weather Bureau (CWB) in Taiwan, including pressure and wind speed gauge data. The storm surge was simulated using the COMCOT-SS model. The results were compared with the tidal gauge data from CWB as well. Excellent comparison results can be seen. After the validation, the ERA-5 data were used as the database to generate the weather field by providing the location and intensity of the typhoon.

        Speaker: Tso-Ren Wu
      • 52
        Numerical Simulations of Landslide-Induced Tsunami Event of Guishan Island

        In the past, estimating tsunami characteristics induced by volcanic-related landslides often relied on approximations of total tsunami volume or using empirical formulas to estimate initial wave heights. In this study, we take the example of Guishan Island off the eastern coast of Taiwan and employ a numerical model, specifically the Discontinuous Bi-viscous Model (DBM), combined with a three-dimensional incompressible flow model (Splash3D). This approach aims to comprehensively depict the dynamic processes of rockslide-induced landslides and the ensuing behavior of landslide-induced tsunamis upon entering the sea.

        Huang (2018) utilized multi-beam bathymetric data, sub-bottom profiler, sides can sonar, sparker seismic reflection data, and remotely operated vehicle (ROV) dive data
        to investigate landslide features in the northern maritime region of Guishan Island.
        The study proposed that the landslide deposits can be divided into three MTD (Mass Transport Deposit) units (MTD1, MTD2, and MTD3). The main volcanic debris avalanche deposits are identified as MTD3, and a model was proposed to explain the lateral collapse and subsequent submarine landslide events.

        By leveraging the measurements of MTD3 from Huang (2018), this study was able to calibrate the parameters used in the DBM to reconstruct ancient tsunami events around Guishan Island. This not only facilitates a more in-depth understanding of the dynamic processes during landslide events but also offers an ideal basis for disaster prevention references and formulating strategies for potential tsunami hazards in the region in the future.

        Speakers: Mr Yi-Xuan Huang (National Central University) , Tso-Ren Wu
      • 53
        Emergent Value-Added Product Processing by Gaussian Statistical Approach for Sentinel-2 Data
        Speakers: Jung Chien Hung (Taiwan Space Agency) , Yu Chang Li (Taiwan Space Agency)
    • Network, Security, Infrastructure & Operations Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

      Convener: David Kelsey (STFC-RAL)
      • 54
        Closer to IPv6-only on WLCG

        It has now been over 12 years since the HEPiX-IPv6 working group began investigating the migration of WLCG towards IPv6 in 2011.
        The effort of the working group can be split into three phases. In the first phase LHC software was analyzed in Ipv6 ready, ready with caveats and not ready at all. The aim “enable IPv6 access to all storage” (the second phase of the working group) was at the end of 2023 closely concluded with the dual-stack deployment of all Tier-1 and 95% dual-stack storage of the LHC Tier-2 centers as presented at ISGC2023. The next step towards IPv6 is that the WLCG management board agreed on a dual stack worker node and service deployment at all LHC Tier-1 and Tier-2 centers until the end of June 2024. Therefore another GGUS ticket campaign has been started. This presentation shows the challenges discovered as well as the status of the campaign and furthermore how much this step brought LHC forward towards an IPv6-only infrastructure.
        In the last two weeks of February 2024 the next WLCG Data Challenge will run. The aim is that 25% of the data throughput expected of the high luminosity LHC Run starting in 2029 will be transferred between the Tier-0 (CERN) and all connected sites at a data rate, that has compared to todays standards an extremely high volume. One of the IPv6 initiatives is to identify specific link(s) and study their IPv4 and IPv6 traffic. Initially at least the LHCOPN link between CERN and DE-KIT will be inspected as well as further LHCOPN links other Tier-1s which will join this initiative. The first task is to evaluate the remaining use of IPv4 on the inspected links. For example preference for IPv6, which should be the norm with IPv6 Address Selection (RFC 6724), is not being honored due to “hidden” settings in applications and programming environments, or accidental misconfiguration. For example the Java preference for IPv4 caused most dCache traffic to be IPv4 until corrected. The next step will be to determine how to remove the remaining use of IPv4 traffic on the link to establish IPv6-only communication over the LHCOPN links between these site(s) in the future.
        This will prepare for the ultimate goal of removing IPv4 from the WLCG infrastructure, to simplify operations, streamline security management, and remove NAT inefficiencies. All this is required to make sure that there is no critical traffic left using IPv4 and that the last and third phase of the WG can be brought forward going in LHC to IPv6 only.
        The migration of the German Tier-1 towards IPv6 has developed further. The additional transferred communication towards IPv6 of the worker nodes will be shown as well as an overview of the required configuration steps.
        LHC is moving towards IPv6-only and has reached the point where already a large majority of the traffic is IPv6. The step to go to IPv6-only has yet to come, however, the duration of this process cannot be determined. Other research communities can learn much from LHC migration towards IPv6, but the recommendation is definitely to every new program “start from the beginning with IPv6 only”.

        Speaker: Bruno Hoeft (Karlsruhe Institute of Technology)
      • 55
        A TeRABIT network for the Einstein Telescope in Italy

        TeRABIT (Terabit Network for Research and Academic Big Data in ITaly) is a project funded within the initiative for realization of an integrated system of research and innovation infrastructures of the Italian National Recovery Plan. TeRABIT aims at creating a distributed, hyper-networked, hybrid Cloud-HPC computing environment offering tailored services to address the diverse requirements of research communities. This will be done by networking, integrating and upgrading three leading digital Research Infrastructures inserted in the National Plan for Strategic Research Infrastructures: GARR-T (the national network infrastructure dedicated to Research and Education), PRACE-Italy (the Italian node of PRACE, the Partnership for Advanced Computing in Europe), HPC-BD-AI (the National HUB for HPC for Big Data and Artificial Intelligence).
        The project will seamlessly integrate state-of-the-art High-Performance and High-Throughput computing elements into an innovative distributed platform, leveraging heterogeneous hardware and offering a portfolio of computing solutions for data-intensive research and industry applications, from edge computing to connectivity and workflows to central HPC Exascale systems. The extensive portfolio and a computing power in the order of Petaflops will enable TeRABIT to handle parallel requests from many scientific domains and industrial applications, where and when needed and act as a fast lane to prototype innovative research.
        The project, funded with a budget of 41 M€ in three years , is coordinated by INFN (the National Institute of Nuclear Physics) in partnership with OGS (the National Institute of Oceanography and Experimental Geophysics - OGS), GARR Consortium (the Italian Education and Research Network) and CINECA (a Consortium dedicated to advanced computing which includes 117 Institutions, Universities, the Ministry of Research, the Ministry of Education and several Research Institutions and public bodies).

        TeRABIT will consider high performance computing and distributed cloud services within the framework of the edge-computing paradigm, developing a mid-size HPC service instances (HPC Bubbles) which are deployed closer to the users and data sources. The project is developed in strict synergy with the supercomputing services of ICSC, the National Research Centre for High Performance Computing, Big Data and Quantum Computing, as well as the national Tier-1s.

        The TeRABIT network related activity, which represents 40% of the total budget, is focused on providing the necessary technology and capacity upgrades and the geographical extension of the ultra-broadband network layer in order to bring Terabit-grade connectivity and composable services to Universities and Research centers in the Sardinia Region and upgrade the fiber network in Sicily where the KM3net project is located.
        This substantial connectivity upgrade will become an essential asset for the Italian proposal to host in Sardinia the Einstein Telescope, the future European Laboratory for gravitational waves. Italy already hosts the VIRGO detector and brings a consolidated scientific expertise developed within the LIGO-VIRGO collaboration (now LIGO-VIRGO-KAGRA Collaboration) which achieved the first direct detection of a gravitational waves signal (leading to the Nobel prize in 2017) and opened the era of multi-messenger astronomy. Presently LIGO and VIRGO are still the only detectors worldwide capable of observing gravitational waves.
        Sardinia has unique characteristics in terms of extremely low seismic and anthropic noise, making the region an ideal site for the future European observatory.
        Einstein Telescope is part the program for the future large research infrastructures within ESFRI, the European Strategy Forum for Research Infrastructures. Presently two sites are under study, one in Sardinia, supported by the Italian Government and one in the Euregio Meuse-Rhine (the border area of the Netherlands, Belgium and Germany) with the support of the Dutch Government. The project started with the idea of a single site, but recent scientific results show that the most promising solution should be a configuration based on a twin system, located at a large distance, like the present LIGO detector.

        Speaker: Alberto Masoni (INFN National Institute of Nuclear Physics)
      • 56
        Using Network Architecture Virtualization Concepts to Build Tunnel-based BGP Emulation Testbed: A Study Case

        Since the global network continues to grow at a fast pace, the
        inter-connection becomes more and more complicated to support reliable transmission. Meanwhile, the prosperity of network application service is getting increasing expanding as well. This brings more concerns and attractions on using software-defined concepts to make wide-area network to
        be optimized and secured. However, collecting and manipulating BGP routes from/to global network for experiment is a challenge. Researchers may have to spend lots of time and cost in testbed conduction, and they also have to establish peering connections to other networks. Hence, this research aims to use an essential way to develop a tunnel-based BGP testbed to support Software-Defined WAN experiments, trying to satisfy the research and education purposes of global network exploration.

        Speakers: Hsiang-Ming Hung (NCKU) , Pang-Wei Tsai (NCKU)
    • eScience Activity in Asia Pacific Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Kento Aida (National Institute of Informatics)
      • 57
        eSience Activities in Korea (Remote Presentation)
        Speaker: Sang Un Ahn (Korea Institute of Science and Technology Information)
      • 58
        eScience Activities in China (Remote Presentation)
        Speaker: Gang Chen (Institute Of High Energy Physics)
      • 59
        eScience Activities in Philippines (Remote Presentation)
        Speaker: Franz A. De Leon (ASTI)
      • 60
        eScience Activities in Indonesia (Remote Presentation)
        Speaker: basuk suhardiman (itb)
      • 61
        e-Sciences Activities in Vietnam
        Speaker: DAO VAN TUYET (Vietnam National Space Center/ Vietnam Academy of Science and Technology)
    • 12:30
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • GDB Meeting Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

      • 62
        Introduction
        Speaker: Mattias Wadenstein (NeIC)
      • 63
        Belle-II computing infrastructure update
        Speaker: Dr Ikuo Ueda (KEK IPNS)
      • 64
        Juno update
        Speaker: Giuseppe Andronico (INFN Sez. CT)
    • Health & Life Science Applications Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Jung-Hsin Lin (Academia Sinica)
      • 65
        Protein-protein recognition mechanism through interfacial hydrogen-bonded water chains

        Protein-protein recognition through the hydrogen-bonded chains of interfacial water is a less explored mechanism due to the technical challenges involved in the analyses, and the role of waters in forming a stable protein-protein complex is often elusive. It is still unclear whether and how the hydrogen-bonded interfacial-water chains contribute to triggering or participating in the recognition process, especially when the interacting proteins are about to encounter to form a complex, but are still distinctly separated. In this work, we used the trajectories generated by curvilinear-path umbrella sampling approach to extract some conformations from the physical paths of unbinding. We analyzed the water-interface formed between interacting proteins and systematically characterized the hydrogen-bonded water molecules through a newly developed procedure. The revealed hydrogen-bonded water molecules are used for shortest-path network analysis. The presence of significant clusters in the mid of the interface and overall robustness of the connectivity between the interacting proteins through hydrogen-bonded water-chains suggest that these waters play a crucial role in bridging the interacting proteins, much before they encounter. This work proposes a highly generalized approach to characterize interfacial waters for protein-protein recognition.

        Speaker: Dhananjay Joshi (Research Center for Applied Science)
      • 66
        The Coherent Multirepresentation Problem in Protein Structure Determination

        The Coherent Multirepresentation Problem (CMP) was recently proposed as a generalization of the well-known and widely studied Distance Geometry Problem (DGP). Given a simple directed graph G, the idea is to associate various types of representations to each arc of the graph, where the source vertices are supposed to play the role of reference for the corresponding destination vertices. Each representation is in charge to transform a given set of internal variables (spherical coordinates, angles, etc) into Cartesian coordinates. In the applications, these internal variables are not precisely known but generally only an approximated value is given, and sometimes bounds on the feasible values are actually available. Since these representations are associated to arcs (and not directly to the vertices) in the CMP, multiple Cartesian coordinates can be computed for every vertex, every time the same vertex appears as a destination vertex for a given arc. Naturally, these Cartesian coordinates obtained for the same vertex may not be compatible with one another. The CMP asks therefore whether a subset of feasible values exists for the internal variables used in the various representations that is able to ensure that all the vertex Cartesian coordinates are identical (or at least close enough to each other). In this situation, we say that the multi-representation associated to the graph G is coherent. In this work, we focus our attention on CMPs arising in the context of protein structure determination, and we explore the possibility to introduce novel representations where several vertices are involved at the same time, for example for modeling the protein secondary structures.

        Speaker: Antonio Mucherino (IRISA, University of Rennes)
      • 67
        Exploring the conformational space of proteins by enumeration in the frame of the Distance Geometry Problem

        The continuous development of the methods for the protein structure prediction was taking advantage from the precious experimental information obtained by structural biology as well as by sequencing of multiple organisms. Indeed, the general developed pipeline is based on determining conformations of protein fragments, and then using multiple sequence alignments to obtain long-range distances allowing to get predicted protein conformations. The introduction of the deep learning techniques has recently permitted a important jump in the results obtained in this framework, as illustrated by the success of AlphaFold2, RoseTTAFold and ESMFold. Nevertheless, in some cases, long-range restraints cannot be obtained, as for intrinsically disordered proteins or regions (IDP/IDR), or in the case of orphan proteins for which not enough statistical signal can be obtained by sequence alignment. Here, we propose to investigate another point of view, in which local information would be mainly used to determine the protein folding, this local conformational information being extracted directly from the primary sequence. Two major obstacles arise from: (i) the variability in protein stereochemistry, inducing a drift of the protein backbone when the residues are successively added; (ii) the size of the protein conformational space described by local conformations, as it would not be reduced by the long-range proximity information. In the framework of the Distance Geometry Problem (DGP), the Branch-and-Prune (BP) approach, based on a graph description of proteins, brings an answer to the problem of the size of the conformational space by performing a systematic enumeration of all protein conformations satisfying a given set of geometric constraints. Applications of the BP approach will thus be presented for the reconstruction
        of folded structures of proteins, with propositions for dealing with the variability of stereochemistry. The efficiency of the BP approach will be also demonstrated on the cases of disordered or flexible proteins, in particular on the Small EDRK-rich factor 1.

        Speaker: Dr Therese Malliavin (CNRS)
    • Humanities, Arts & Social Sciences Application Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: Prof. Eva Hladka (Masaryk University)
      • 68
        Social responsibility for AI in cyberspace

        The so-called trolley problem is to ask AI for an ethical solution. However, such a problem is not unique to AI. A similar ethical question has also been there in China for many centuries : “Whom to save first when both your mother and your wife fall into water?”
        Indeed, the ethics associated with AI are discussed at various levels. UNESCO and OECD are the leading international organizations that recognize the importance, implication and possible challenges of AI to human lives. The landscape of AI is very wide and diverse :
        - From language translation to personalized content creation
        - Scientific measurements and observations
        - Audio and video generation, and music composition or painting
        - Financial and legal assistance
        - Healthcare and sophisticated medical detectors for diseases, cancer cells
        - Chatbot, etc.

        However, AI is not human. Thus, AI need not respect any ethics that are aimed for humans. But designers of AI can make AI obey human ethics. Thus, designers should be armed with “AI ethics” in order to make AI ethical, i.e., to make AI equipped with “ethics of AI.” These two concepts look identical, but have noticeably different aspects.

        We all know that AI should be designed to ensure the users of the AI that :
        -The whole life cycle of AI are transparent and explainable
        -They [designers] are responsible and accountable for AI
        -They have full awareness and literacy
        -AI is designed, developed, set into operation with multi-stakeholder collaboration and adaptive governance
        These guidelines may be called the AI ethics.

        On the other hand, there is also the ethics of AI, which is to make AI ethical. AI should not help, suggest, decide or lead humans to immoral, unethical conclusions. It is definitely required when AI deals with subjects involving ethical values. Hence, ethics of AI should be installed or equipped beforehand in AI.

        In this talk, I would discuss, including these two concepts, social responsibility in cyberspace.

        Speaker: Sun Kun OH (Suranaree University of Technology, Thailand)
      • 69
        Exploration on Using AI-Generated Scenario-Based Videos on English Learning: A Focus on ESG Initiatives

        This research investigates the impact of AI-generated scenario-based videos on the English-speaking learning experience and motivation, particularly within the context of Environment, Society, and Governance (ESG) initiatives. Recognizing the global significance of ESG initiatives for long-term corporate value and sustainability, the study emphasizes the need to integrate theoretical frameworks, practical insights, and diverse knowledge systems. In the Taiwanese context, English education has traditionally been influenced by cultural norms and exam-centric approaches, emphasizing written expression and reading skills over oral communication. However, effective English communication is crucial for advancing environmental protection initiatives, necessitating a shift in educational focus. Current research indicates a significant gap between engineering tasks and the predominant need for writing and communication in professional settings. Scenario-based learning is proposed to bridge this gap, fostering 21st-century skills such as problem-solving, communication, critical thinking, and creativity.

        The research aims to assess whether AI-generated scenario-based videos, enriched with animation and realistic static images, can enhance the learning experience in English-speaking scenarios. Utilizing the AI video generation tools Steve.ai and Flex Clip, the study employs ChatGPT to create contextually rich English-speaking learning content. The content is structured using a prompt template from ChatGPT and adapted from MovieFactory, a robust system for generating high-resolution cinematic images and multi-modal films based on natural language input. The study employs a within-subject experimental design to evaluate the effectiveness of AI-generated videos, comparing the outcomes of the same content regarding picture quality, audio, voices, and animations. The evaluation of the learning experience is based on individual survey questionnaires gauging learning engagement and motivation. Individual differences in prior knowledge significantly influence the preference for static images or animations within specific domains of knowledge or skills. The hypothesis is that individuals with proficient English skills would better comprehend AI-generated animation content. The participants in this pilot study are thirteen graduate students with experience working and speaking English with foreigners, including six men and seven women from 22-27 years old. The experiment procedure contains three main parts, including the first interview part, which is about the difficulty of speaking English. The second part of the interview uses videos to learn English speaking. The third part is watching two types of visual representation AI-generated videos for learning English speaking in the scenario of ESG. The findings from the descriptive analysis and interview transcriptions indicate that individuals with advanced English proficiency (C1 and C2 levels) exhibit better comprehension of the animation learning material. This finding is attributed to the clarity, coherence, and interactivity facilitated by the characters' actions and eye contact in the AI-generated video. In contrast, participants possessing intermediate proficiency (B2 level) indicate that the primary issue with videos featuring static AI-generated images lies in the inconsistency between the content and the static images. On the other hand, those with lower proficiency (A2 and B1 levels) prefer realistic images, emphasizing the reduced cognitive load and enhanced focus on subtitles. Additionally, they note that static real images serve as visual aids, aiding content comprehension. However, the AI-generated images may need to align better with the learning material.

        Future work will explore the impact of AI-generated videos on learning motivation using the ARCS model, which will specifically explore the relationship between input and output prompts, providing insights for future scenario-based learning video generations. Overall, the study contributes valuable findings of prior knowledge that can lead to different learning experiences of using AI-generated scenario-based videos for English-speaking learning experiences.

        Speaker: Yu-Xuan Dai (National Taipei university of technology)
      • 70
        Participatory Artificial Intelligence Generated Music for Pressure Healing

        Abstract
        This study investigates the therapeutic potential of using Artificial Intelligence Generated Music in addressing the impacts of societal pressures on mental well-being. After the 2020 COVID-19 epidemic, industrial and economic uncertainty increased dramatically, and the emotional and psychological state of the individual became a stressful phenomenon. The WHO recognizes that 70% to 90% of psychosomatic disorders are primarily due to emotional stress and psychological. Music can uplift the heart, beat depression, relieve pain, and promote physical and mental health. The "emotional" medicine of the mind is music. Thus, this study focuses on the issue of 1) addressing the psychological aspects of preventive care and reducing stress, anxiety, and potential physical risks, 2) using music as an emotionally connected tool to integrate psychological preventive care and mental health impacts, 3) enhancing overall health prevention, and to promote psychological well-being. This study employs a participatory framework. Individuals actively engage with AI algorithms to co-create personalized content tailored for pressure healing. The study utilizes a mixed-methods approach, combining quantitative assessments and text analysis techniques to evaluate the impact of participatory AI interventions comprehensively. Participants collaboratively contribute to the customization process, providing input and preferences that guide the generation of AI-generated content. The study also employs a structured process to ensure meaningful participant engagement, leveraging AI algorithms to analyze user feedback and preferences, facilitating a personalized content creation experience. The quantitative assessments involve text analysis techniques applied to user-generated content, utilizing natural language processing to extract sentiments, themes, and contextual insights related to pressure, well-being, and the effectiveness of pressure healing. This comprehensive analysis aims to capture quantitative metrics and the nuanced qualitative aspects embedded within participant narratives. Preliminary findings from the pilot study indicate a positive impact of participatory AI-generated music on pressure healing. Quantitative data reveal trends suggesting reductions in self-reported pressure levels, while qualitative insights provide a nuanced understanding of participants' subjective experiences and perceptions. These initial results underscore the potential efficacy of the intervention. Future research should refine text analysis methodologies to enhance sentiment extraction accuracy and thematic identification. Expanding the study across diverse populations and cultural contexts will contribute to the generalizability of the intervention. Longitudinal studies are recommended to explore the sustained effects of participatory AI interventions over time. Incorporating real-time physiological feedback mechanisms and exploring ethical considerations associated with AI in mental health interventions constitute critical areas for future exploration. Additionally, integrating participatory AI within broader mental health frameworks and treatment plans offers promising avenues for advancing the field.

        Keywords: Artificial Intelligence Generated Music, Pressure Healing, Participatory Framework, Text Analysis, Quantitative and Qualitative insights

        Speaker: Ms Chen Tzu-Hsiu (Doctoral Program in Design, College of Design, National Taipei University of Technology, Taipei, Taiwan)
      • 71
        Fusion of Participatory Design and Digital Learning with Artificial Intelligence-Generated Content for Costume Art and Craft Education

        The intersection of participatory design and digital learning with artificial intelligence (AI) presents a transformative opportunity for costume art and craft education. This research explores the efficacy of combining AI-generated content (AIGC) with participatory design and digital learning platforms to enhance the educational experience in costume arts and crafts disciplines. Digital learning combined with AI rapidly advances the current educational landscape and is widely applied in various design and technology fields. However, traditional arts and crafts design education needs to be more noticed. The study aims to develop a pedagogical framework in which learners actively engage with AI tools to create and understand AIGC pertinent to costume design. This study also aims to revolutionize traditional craft learning by integrating participatory design methods and incorporating CLO3D digital learning and AI-driven design results. Explore the potential of incorporating AI-generated content into digital learning to enhance the learning experience and nurture creativity among costume arts and crafts students. The study started with embroidery craft exercises. It then introduces the CLO3D software in digital learning, integrating AI technology into design content creation activities within the education environment.

        Using participatory elements, the study shapes collaborative learning experiences and evaluates how the CLO3D digital software presents the texture of embroidery and its virtual representation in fashion design. Additionally, it considers the quality, creativity, and educational effectiveness of AI-generated visual stories. Through quantitative and qualitative analysis, the research evaluates the impact of AI-generated content on student engagement, understanding, and creative performance. The findings offer valuable information on digital learning and AI in craft and design education. Analysis of participatory digital learning and AI-generated visual storytelling reveals the potential benefits of improving creativity and learning outcomes. It addresses the challenges in integrating craft design creation, digital learning, and AIGC applications in the educational environment.

        In conclusion, this study investigates the impact of this novel approach on students' learning outcomes, engagement, and creativity, shedding light on the potential of AI in fostering co-creation in creative disciplines. The results of the pilot study open avenues for future research, highlighting the possibilities to improve digital learning with AIGC while exploring participatory design methods in costume and craft education. The impact extends to broader intersections of technology and education, emphasizing the ongoing need for research and development to fully harness the potential of combining participatory design, digital learning, and AIGC in the context of craft design education. Future studies build upon the initial findings and expand the knowledge base in this innovative field, including 1) conducting longitudinal studies to assess the long-term effects, 2) comparing the effectiveness of different AI models and platforms, and 3) investigating how the co-creation design process can be further optimized to be more student-centred, ensuring that students have agency and a sense of ownership in designing their learning experiences.

        Keywords: Participatory Design, Digital Learning, Artificial Intelligent Generated Content, Costume and Craft Education

        Speaker: Yen-I Wu (Doctoral Program in Design, College of Design, National Taipei University of Technology, Taipei, Taiwan / Assistant Professor, Department of Textiles and Clothing, Fu-Jen Catholic University)
    • Joint DMCC, UMD & Environmental Computing Workshop Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Stephan Hachinger (Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities)
      • 72
        Environmental Computing at LRZ: Updates and Developments
        Speaker: Viktoria Pauw (Leibniz Rechenzentrum)
      • 73
        Automatic pollen monitoring in Europe: the next steps (Remote Presentation)
        Speaker: Jeroen Buters (ZAUM - TUM/HMGU)
      • 74
        Challenges of Distributed Preprocessing, Computation, and Postprocessing in Ice Sheet Simulations (Remote Presentation)
        Speaker: Timm Schultz (AWI)
    • 15:30
      Coffee Break
    • GDB Meeting Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

      • 75
        DC24 first impressions
        Speakers: Christoph Wissing (Deutsches Elektronen-Synchrotron (DE)) , Mario Lassnig (CERN)
      • 76
        Token Transition Update
        Speaker: Maarten Litmaath (CERN)
      • 77
        IHEP site update
        Speaker: Jingyan Shi (Chinese Academy of Sciences (CN))
    • Health & Life Science Applications Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Jung-Hsin Lin (Academia Sinica)
      • 78
        Porting the IRCCS Sant’Orsola Computational Genomic platform on INFN Cloud: a first proof of concept (Remote Presentation)

        Modern technologies for DNA and RNA sequencing allow for fast, parallel reading of multiple DNA lines. While sequencing the first genome took 32 years, today with Next Generation Sequencing technologies we are able to sequence 40 genomes in about 2 days, producing 4 TB of text data (a file of about 100 GB per genome). This ability poses a challenge to computing infrastructures, which need to be able to ingest this amount of data and to process it through efficient genomic pipelines, exploiting heterogenous resources such CPUs, GPUs, HPC clusters and storage exposing different Quality of Service (QoS) to perform the analysis with the optimal cost-performance balance. At the same time, the computing platform needs to have user friendly interfaces to be exploited by a plethora of different scientists like biologists, physicists, engineers, medical doctors and others. Finally, the API and data formats need to meet international de-facto standard and to be interoperable, to maximize their portability and to be runnable on cloud federations.

        In this talk we describe the status of the Computational Genomic platform under development in the context of the collaboration between INFN and IRCCS AOU Sant’Orsola (the main research hospital in Bologna, Italy). The platform is deployed as a series of Openstack projects on EPIC (Enhanced PrIvacy and Compliance) Cloud: the high security partition of INFN Cloud certified ISO 27001 27017 27018. Presently it consists of about 1000 CPU cores with 5,8 TB RAM, 2 NVIDIA A100 GPU and 320 TB of storage (HDD, SSD, tape). We’ll provide information about the performance reached on some sample genomic pipelines and on security measures adopted to guarantee GDPR compliance. Finally, we’ll discuss possible synergies and interactions with other similar and broader initiatives at both national and international level.

        Speaker: Jacopo Gasparetto (INFN CNAF )
      • 79
        3D Ultrasound Computer Tomography - a data & computing intensive approach at multimodal ultrasound imaging

        Background & Motivation

        3D Ultrasound Computer Tomography (3D USCT) is developed at the
        Karlsruhe Institute of Technology for early breast cancer detection.
        Unlike conventional ultrasound sonography methods with manual guided
        ultrasound (US) probes, the patient is placed on a patient bed with a
        stable and reproducible measurement configuration. A reproducible stable
        measurement configuration is achieved by surrounding the patient's breast
        with many spherical placed ultrasound transducers -- 2304 individual US
        transducer elements. This allows screening and diagnosis in a
        reproducible way -- long term, longitudinal tracking of a patient's
        status of it's breast health is possible with a non-ionizing, 3D and
        easy to delivery imaging modality is now becoming thinkable.

        Method

        Despite the promising design and vision of the project, early
        pre-studies in hospitals (Hospital Jena, Hospital Mannheim) indicated
        that there are still some technical challenges to be tackled to realise
        the full potential of the method - the enormous computational burden of
        the various imaging approaches is hindering the scientific progress and
        the method.

        More specifically, on challenge is the large amount of data (40 to
        80 GB per measurement of pressure of time signals, so called a-scans)
        combined with large three dimensional imaging domain, the
        Region-of-interest, of 20x20x20cm\^3. With the desired resolution
        of \~0.2mm this leads to an image volume of 1GVoxel in need of being
        computed. In case of reflectivity image reconstruction with SAFT
        (Synthetic Aperture Focussing Technique) both combined lead to 0.112
        Terabyte (double floatingpoint data type) in write and read accesses in
        the computing process of the image.

        While the SAFT algorithm is parallelised in GPU, the other modalities
        imaging methods having their challenges in the dimensionality and
        non-trivial partitioning schemes - the transmission based tomography is
        formulated as optimization problem, which is challenging due to the
        inherent three dimensional nature oft he USCT's aperture.

        Even more demanding approaches of promising full wave inversion schemes,
        inspired also from prior work in the field of geo-science, which struggle
        with high-frequency nature of the data provided by the USCT device and
        method.

        Next Steps

        HPC and grid computing should be offering the infrastructure and
        interfaces to tackle the multifold computational partitioning and big data
        challenges of USCT. The USCT project tries now to enable fellow scientist
        and associated communities and kick off collaborations. We are committed
        to open science, open access and open data: example data sets and access
        code are provided available under liberal licenses on github and an
        webserver.

        A Matlab script with some reference imaging and visualization code:

        "3D-USCT-III-access-script" KIT-3DUSCT/3D-USCT-III-access-script
        (github.com)

        The following datasets are provided:

        1: Gelatine phantom with four inclusions made from PVC (spheres of
        different size 8 mm to 22 mm)

        2: Empty measurement with the same acquisition parameters as the
        gelatin phantom.

        3D KIT USCT -- USCT data exchange and
        collaboration

        Keywords: HPC, Bioinformatics, Medical imaging, open science, open data

        Speaker: Michael zapf
      • 80
        Text Classification on COVID-19: a Transformer-based Approach

        During the COVID-19 pandemic, there has been a rapid growth of literature and the need to access useful information quickly to understand its disease mechanisms, define prevention techniques, and propose treatments, therefore text classification has become an essential and challenging activity. LitCovid represents an example of the extent to which a COVID-19 literature database can grow: it has accumulated over 390,000 articles with millions of accesses since 2020. Approximately 10,000 new articles have been added to LitCovid every month since May 2020.

        Text classification is the process of assigning predefined labels to text by its content. Label selection is a demanding task due to various factors. Firstly, a deep knowledge of the topic domain is compulsory. Secondly, a label structure has to be established since a label may be connected or imply other labels, i.e. diagnosis implies tomographic, medical imaging, and radiation. Finally, the depth of this label structure has to be defined and be consistent for every topic, i.e. treatment and prevention must be described using the same level of detail.
        For text classification, we have considered transformer models that have achieved unprecedented breakthroughs in the field of Natural Language Processing. The core function that drives the success is the attention mechanism, which provides the ability to dynamically focus on different parts of the input sequence when producing the predictions and capturing the relationships between words in the sentence.

        In this study, we have proposed labels that can satisfy the need to assist title and abstract screening for supporting COVID-19 research on detection, diagnosis, treatment, and prediction. Our labels extend what LitCovid and CORONA Central corpora provide for COVID-19 literature. Furthermore, we have classified literature by using different pre-trained transformer models, mainly based on BERT and ELECTRA models, such as BioBERT, PubMedELECTRA, and BioFormer. The selected papers have been identified through the usage of PRISMA, the Preferred Reporting Items for Systematic Reviews and Meta-Analysis, an incident approach originally developed in the chemical process industry and afterward well-distinguished in the medical field.

        All the models have been compared by considering micro average, weighted average, and sample average methods for performance metrics. During this study, we have tackled the problem of the computational requirements, e.g. 10 hours per 5 epochs (2 hours per epoch) with a GPU P100 for PubMedBERT-large; and of the domain specificity of the model performances.

        Speakers: Dr Marco Canaparo (INFN CNAF) , Prof. Elisabetta Ronchieri (INFN CNAF)
    • Humanities, Arts & Social Sciences Application Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: Prof. Eva Hladka (Masaryk University)
      • 81
        The Vesuvius Challenge: Competitive Science as a Vehicle for Scientific Breakthroughs

        The Vesuvius Challenge (scrollprize.org) is a current machine learning and data science competition built around the goal of making visible the evidence of ink captured in micro computed tomography scans of damaged Herculaneum scrolls. The prize pool for the competition is more than $1M. The website for the challenge states as its objective that it intends “...to make history by reading an unopened Herculaneum scroll for the very first time. We believe that an open competition will accelerate progress and enable us to achieve this goal in 2023.” It is very unusual to see researchers from academia using open competitions to further their research and to accelerate progress. While there are other barriers, funding a prize pool large enough to attract many competitors is one major hurdle. This paper explores the concept of competitive science through the lens of the Vesuvius Challenge as a case study.

        The promise of a large community of talented scientists and engineers working long hours without pay on a significant research project seems quite compelling. Imagine a group of several thousand people all working on the same goal, sharing code and ideas and progress, for the purpose of meeting an overall challenge. There have been competitions in the past that have produced amazing results, perhaps the most famous of which is the “Longitude Prize of 1714”. The problem of navigating at sea was turned over to a government-sponsored prize called “The Longitude Act,” which offered a £20,000 prize (approx £1.5m today) for a solution. Other more recent prizes (Millenium Prize; X Prizes; XTX Markets Math Olympiad Prize) hope to exploit the idea of realizing breakthrough, accelerated results through the inducements of large monetary prizes.

        But there are barriers to making such offerings successful. Competitions are, well, competitive. There is no guarantee that winners can succeed, or that competitors will share innovations. It is also risky to assume large numbers of people will be able or willing to spend time working on a project that may in fact produce no return on the investment. How can talented and energetic people be convinced to spend time when the chances of winning might be very low? Of course any competition needs to offer prizes. It can be difficult or even impossible to secure a prize pool that is attractive enough to engage a substantial number of competitors. There is also the question of opening up research for the purpose of supporting a competition when research itself is competitive. What is the incentive for a research team to deliver to the open community everything that might be its own competitive advantage in the research arena? What if there is intellectual property at stake, or competing research teams that are unwilling to share and/or offer credit?

        In this paper we detail the landscape and the crucial elements that make a competition a viable approach. We show how many of the hurdles and barriers can be overcome. We also discuss how the digital era facilitates the concept of competitive science like never before through the use of cloud-based computing systems, social media sharing platforms, global finance systems that make electronic transfers of prize money simple, and tutorial schemes that lower the barrier to entry for contestants who show interest.

        Speaker: Brent Seales (University of Kentucky)
      • 82
        A lexicon for social media-based cultural heritage information in crisis situations: a proposal (Remote Presentation)

        Social media can play a crucial role in disseminating information about cultural heritage if a proper lexicon is available and able to identify valuable data for the management of crises that are caused by either natural or human-induced disasters. A literature review has been conducted, encompassing existing attempts to define terminology within the cultural heritage domain. A Review of the Role of Social Media in Cultural Heritage Sustainability reveals that there is ongoing interest in the investigation of the role and impact of social media platforms on cultural heritage sustainability and culture preservation. However, the lack of published studies concerning terminological resources for cultural heritage (neither generally, nor in the context of social media discussion) and the absence of a lexicon dedicated to detecting cultural heritage-related tweets on social media during crisis events pushed us to investigate such area of research. For such reason we have undertaken the task of creating our lexicon that provides essential information, comprehends the domain, and facilitates further research in the field. In particular, the lexicon has been defined according to keywords that are commonly used on social media for a specific discussion, and are represented in a list of unigram and bigram terms from natural language processing solutions: e.g., culture or ancient site are keywords for cultural heritage discussion, while vandal or property damage are keywords for vandalism discussion. Furthermore, the defined lexicon can be representative of the domain but also accurately reflect the specific vocabulary commonly utilized within social media platforms, such as Twitter.

        Developing a representative lexicon is an essential preliminary step in this study because we have to devise a method for identifying Twitter messages that are related to the field of cultural heritage management in crises. The raw datasets have been collected from January 1 to April 27, 2023, with the Twitter API, in the context of the 4CH project (European Competence Centre for the Conservation of Cultural Heritage) that aims at setting up the methodological, procedural, and organizational framework of a Competence Centre able to seamlessly work with a network of national, regional, and local Cultural Institutions. The collected data, despite being downloaded based on keywords, contain numerous irrelevant tweets and are not suitable for investigation within the context of cultural heritage management in crises. Additionally, the lexicon can enhance the utility of machine learning classification algorithms by serving as a reference point for manual labeling and semi-supervised classification techniques. Consequently, they can be applied to other similar datasets of tweets.

        Our dataset is extensive and originates from diverse periods, events, and geographical locations. These distinct locations encompass various nations and institutions, each with its distinct interpretations and definitions of culture and its elements. Questions regarding the nature of culture and what constitutes heritage lack general clear answers on an international scope. In addition, we take into account that the texts collected are in English. This implies that users either come from English-speaking countries or, if they come from other regions, communicate in English due to their connection with an international community or a desire to address global issues using an international language. Given this complexity, we have chosen to create a lexicon that provides the most general framework as possible, relying on the documents of The United Nations Educational, Scientific and Cultural Organization (UNESCO) whose vocabulary is assumed to be close to the one we intend to create for cultural heritage.

        Speaker: Elisabetta Ronchieri (INFN CNAF)
      • 83
        Speculative Design for Sustainable Urban Mobility: E-Bike Futures and Data-Driven Innovation

        This study aims to investigate the potential of future public bicycle services in Taipei to achieve net-zero emission strategies and Sustainable Development Goals (SDGs) by using a speculative design. It proposes a sustainable energy model for future public bicycle service. Given the challenges of global environmental change, issues such as sustainable development and net-zero emission strategies have attracted much attention in recent years. More than 130 countries worldwide have put forward the "2050 net-zero emission" declaration and action. Taiwan is no exception, and in this case, the target strategies for electrification of vehicles and power storage are mentioned, emphasizing the importance of sustainable transportation. The study aims to conceptualize the future of e-bikes as a pivotal element in achieving sustainable development in urban settings.

        This study uses the User Experience Questionnaire (UEQ) and the System Usability Scale (SUS) to explore the public's perceptions of the YouBike 2.0 service and its apps in Taipei City. It was found that users perceived YouBike to be supportive and environmentally friendly. The overall acceptance was positive but prioritized fun over utility and had poor perceptions of the current app. These results validate the critical role of YouBike in encouraging people to gradually switch from cars to bicycles and significantly reduce carbon emissions. Still, they also reveal the potential for promoting more environmentally friendly E-Bikes in the future. At the same time, it highlights the importance of optimizing the app service to achieve a better user experience.

        Afterward, this study implemented a pilot survey on the bicycle pedal power through data-integrated micro-electricity calculations, which apply micro electricity generation technology to bicycle charging and micro electricity installations in the city, such as traffic signals and streetlights. In addition, advanced technologies such as artificial intelligence, geo-location, and real-time data are integrated with applications to promote sustainable urban mobility, with a particular emphasis on the reconciliation of thick data and big data to provide a more comprehensive and in-depth understanding of the needs of urban mobility, thereby optimizing service strategies and enhancing user experience.

        Finally, this study ventures into speculative design to address the pressing need for sustainable urban mobility solutions, focusing on integrating e-bikes within the framework of net-zero emission goals, public bike systems, and smart city transportation. The study aims to conceptualize the future of e-bikes as a pivotal element in achieving sustainable development in urban settings. At the heart of this research is exploring speculative design scenarios that incorporate future e-bike design and data-driven analysis into a comprehensive public bike system, aligning with net-zero emission targets. This involves reimagining where e-bikes are seamlessly integrated into the fabric of smart city transportation networks, offering an eco-friendly, efficient, and accessible mode of transport. By projecting into speculative future scenarios, the study provides valuable insights for policymakers, urban planners, and technologists, guiding them toward strategic decisions that balance technological innovation with the broader objectives of sustainable urban development and environmental stewardship.

        This study contributes a visionary perspective on the role of e-bikes in transforming urban mobility, presenting a roadmap for cities to navigate towards a sustainable, efficient, and net-zero emission future through the application of speculative design and data-driven Innovation.

        Speakers: Ms Yi-Ci LIAO (Department of Interaction Design, National Taipei University of Technology) , Ms Tzu-Hsuan FENG (Department of Interaction Design, National Taipei University of Technology) , Ms Nawarut Srisung (Department of Interaction Design, National Taipei University of Technology) , Mr Yan-Lun DAI (Department of Interaction Design, National Taipei University of Technology) , Ms Chih-Yu CHANG (Department of Interaction Design, National Taipei University of Technology)
    • Joint DMCC, UMD & Environmental Computing Workshop Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Stephan Hachinger (Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities)
      • 84
        Tools and Frameworks in Environmental Computing
      • 85
        Funding opportunities Asia - Europe
    • 18:30
      PC Dinner
    • CryoEM Workshop (in Chinese) Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

    • Keynote Speech Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Junichi Tanaka (University of Tokyo)
      • 86
        Exploring Computational Science Towards the Era of Quantum Computing
        Speaker: Koji TERASHI (University of Tokyo)
      • 87
        Development of HPC in Chinese Academy of Sciences (Remote Presentation)
        Speaker: Xuebin Chi (Computer Network Information Center, Chinese Academy of Sciences)
    • 10:30
      Coffee Break (CryoEM Workshop)
    • 10:30
      Coffee Break (ISGC 2024)
    • CryoEM Workshop (in Chinese) Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

    • AI workshop/tutorial Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 88
        Introduction to AI/ML/DL
        Speaker: Daniele Bonacorsi (University of Bologna)
      • 89
        Lab on Regression
        Speaker: Daniele Bonacorsi (University of Bologna)
    • Physics & Engineering Application Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: Junichi Tanaka (University of Tokyo)
      • 90
        Design and Implementation of an IO Framework for HEPS (Remote Presentation)

        HEPS (High Energy Photon Source) is expected to generate a massive and diverse amount of data, and the data IO bottleneck severely affects computational efficiency. In order to address these issues, we have designed and implemented an IO framework specifically for HEPS, serving as the data IO module of Daisy (Data Analysis Integrated Software System), which is a software framework developed for HEPS. Firstly, to solve the problem of diverse data formats, based on the analysis of HEPS scientific tasks, we provide a unified IO interface to the computation, effectively shielding the underlying data format differences. Secondly, to improve the batch processing speed, we parallelize the IO read and write operations based on the characteristics of different data formats. Additionally, we design a prefetching strategy to asynchronously read the data required for subsequent computations into memory, further reducing the time-consuming data IO in the computation process through pipeline technology. Lastly, we introduce streaming data IO to avoid the IO bottleneck caused by writing data to disk and then reading it back. Moreover, we design an online data repository based on distributed memory, providing two forms of support for real-time online processing: one is data-triggered computation, where data is processed as it arrives, and the processing methods are compatible with those registered in Daisy; the other is remote reading the streaming data for computation, retrieving data from the online data repository.
        Overall, our proposed IO framework addresses the challenges posed by the massive and diverse data generated by HEPS, significantly improving the computational efficiency and providing support for real-time online processing.

        Speaker: Shiyuan Fu (IHEP)
      • 91
        Exploring Database-aided Workflows on Cloud and High-Performance Computing for Physics Simulation Data

        While database management systems (DBMS) are one of the most important IT concepts, scientific supercomputing makes little use of them. Reasons for this situation range from a preference towards direct file I/O without overheads to feasibility problems in reaching DBMS from High-Performance Computing (HPC) cluster nodes. However, trends such as the increasing re-usage of scientific output data or the collaboration of researchers from science, SMEs and industry give a strong motivation to revisit the topic. The Horizon Europe “Extreme Data” project EXA4MIND, for example, aims at bridging the ecosystems of DBMS, supercomputing, and European Data Spaces.

        In the context of this project, the work presented here explores different approaches and systems for managing data and thus optimising data-driven workflows across Cloud-Computing (IaaS) and HPC systems. We evaluate typical workflows for physics simulations on supercomputing systems at LRZ (Garching b.M./DE) and IT4Innovations (Ostrava/CZ). The use cases we focus upon in this contribution are simulated many-body systems of molecules or elementary particles on different energy scales, either using molecular dynamics (MD, low energy) or plasma physics (high energy). As often in computational science, much work goes into postprocessing, visualising and discussing the simulated data, often several times in an iterative process. Our test datasets are, on the one hand, produced by MD simulations (Modelling for Nanotechnologies Lab, IT4Innovations, Ostrava/CZ), where the interaction of molecules is calculated via empirical force field models. On the other hand, we have outputs from the Plasma Simulation Code (Ruhl et al., Ludwig Maximilian University of Munich/DE), simulating the fields and trajectories of charged particles in a plasma. These simulations follow up to billions of particles, writing out their properties and location trajectories in each time step.

        The production of such simulation outputs can take up to hundreds of thousands CPU hours, and the particle and field data can occupy Gigabytes or Terabytes. These then have to be postprocessed (e.g. aggregation of domain patches, extraction of statistical information), and evaluated, on various levels – from ensembles of simulations down to single particle trajectories or timesteps. Even for one study, the datasets are revisited typically several times, for example for weight recalculation, visualization and evaluation of the validity of force field assumptions. These workflows typically involve a lot of manual labour and attention from the researcher. We benchmark proper data backends (including DBMS) and use cross-system orchestration tools, in particular the LEXIS platform (lexis-project.eu), to make this more efficient.

        Our focus includes testing the performance of typical data queries and iterative postprocessing steps with different execution methods. We strive to facilitate faster and more flexible access to the raw data by exploring the properties of different storage and database systems. These range from data access schemes facilitated by common python environments over row-based DBMS such as PostgreSQL to column DBMS like MonetDB, where techniques like SciQL can operate live queries on large array-based datasets in memory, or functionalities like Data Vaults can provide access to external repositories. We conclude our contribution stating that modern data storage concepts involving DBMS are also an optimum basis for data sharing and systematic metadata management. Thus, we aim at facilitating a research data management according to the FAIR (findable, accessible, interoperable, reusable) principles from the start.

        This research received the support of the EXA4MIND project, funded by a European Union´s Horizon Europe Research and Innovation Programme, under Grant Agreement N° 101092944. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.

        Speaker: Viktoria Pauw (Leibniz Rechenzentrum)
      • 92
        Sustainable Computing in High-Energy Physics: HEPScore and the HEP Benchmark Suite

        In the contemporary era, where scientific progress meets the imperative of responsible resource utilization, the need for innovative tools is paramount. This paper explores the pivotal role of a new benchmark for High-Energy Physics (HEP), HEPScore, and the HEP Benchmark Suite in steering the HEP community toward sustainable computing practices. As exemplified by projects like the Large Hadron Collider (LHC), HEP demands massive computational resources, making it imperative to strike a balance between scientific excellence and environmental responsibility.

        The HEP Benchmark Suite, a toolkit for benchmark orchestration, shares the center stage in this paradigm shift alongside HEPscore. Developed to manage benchmarks from a single application, the Suite characterizes the performance of individual and clustered heterogeneous hardware. Its modular design enables users to configure and run a diverse range of benchmarks, including HEP SPEC06, SPEC 2017, HEPscore23 (HS23), and DB12. Distributed under the GNU General Public License v3, the Suite is open-source, inviting the community to scrutinize and enhance its code.

        Modularity extends to the suite's add-ons, enhancing configurability. Existing plugins are able to retrieve hardware and software metadata, offering deeper insights into benchmark running conditions. The prime example of it is the actively developed Energy Plugin, which reports benchmark consumption. This information is critical for the hardware procurement process, as it makes it possible to deliver maximal performance at the lowest energy consumption, reducing at the same time the energy cost.

        While plugins may be configured with any compatible benchmark, in the field of High-Energy Physics, HEPscore 23 appears to provide an accurate performance score, taking into account the latest technological advances. The benchmark is composed of seven underlying workloads, containerized applications based on the current applications being used at CERN and the World-Wide LHC Computing Grid. A thorough study led to this final configuration as the best compromise between runtime and accuracy when representing the present applications of the HEP community.

        A central database, following the containerized nature of benchmarks, further augments the Suite's capabilities. Reports generated by the Suite can be published to a message broker, facilitating storage in CERN's central benchmarking OpenSearch database. With over 90,000 entries for reference, users can easily query and compare results. The graphical user interface atop the database enables users to create visualizations and dashboards as well.

        This paper delves into the architectural design and capabilities of HEPScore and the HEP Benchmark Suite, showcasing their potential to revolutionize benchmarking practices within the HEP community, in particular in terms of energy consumption. Drawing on experiences from worldwide deployments across the Worldwide LHC Computing Grid, we present evidence of the tools' effectiveness in gathering and analyzing measurements globally.

        As the HEP community embraces these tools, we anticipate a transformative shift towards a future where scientific progress harmonizes with environmental responsibility, setting a precedent for sustainable computing in the domain of big science.

        Speaker: Gonzalo Menéndez Borge (CERN)
      • 93
        JUNO Distributed Computing Infrastructure status

        The Jiangmen Underground Neutrino Observatory (JUNO) is an underground 20 kton liquid scintillator detector being built in the south of China and expected to start data taking in late 2024. The JUNO physics program is focused on exploring neutrino properties, by means of electron anti-neutrinos emitted from two nuclear power complexes at a baseline of about 53km. Targeting an unprecedented relative energy resolution of 3% at 1 MeV, JUNO will be able to study neutrino oscillation phenomena and determine neutrino mass ordering with a statistical significance of about 3 sigma within six years.

        These physics challenges are addressed by a large Collaboration localized in three continents. In this context, key to the success of JUNO will be the realization of a distributed computing infrastructure (DCI), which will satisfy its predicted computing needs.

        The development of the computing infrastructure is performed jointly by the Institute of High Energy Physics (IHEP), and a number of Italian, French and Russian data centers, already part of WLCG.

        Upon its start, JUNO is expected to deliver 2 PB of data per year, which is to be stored in the above mentioned data centers in China and Europe. Data analysis activities will be also carried out in cooperation, according to a coordinated joint effort.

        This contribution reports on the design and deployment of the JUNO DCI. It will describe its main characteristics and requirements.

        Speaker: Giuseppe Andronico (INFN Sez. CT)
    • 12:00
      Lunch (CryoEm Workshop)
    • 12:30
      Lunch (ISGC 2024) 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • 12:30
      PC Face-to-face Meeting
    • CryoEM Workshop (in Chinese) Academia Sinica CryoEM Facility, B2, Interdisciplinary Sci. & Tech. Building

      Academia Sinica CryoEM Facility, B2, Interdisciplinary Sci. & Tech. Building

    • AI workshop/tutorial Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 94
        Lab on Classification
        Speaker: Daniele Bonacorsi (University of Bologna)
      • 95
        Introduction to Neural Networks
        Speaker: Daniele Bonacorsi (University of Bologna)
      • 96
        Lab on Convolutional NNs
        Speaker: Daniele Bonacorsi (University of Bologna)
    • Data Management & Big Data Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

      Convener: Patrick Fuhrmann (DESY/dCache.org)
      • 97
        High Energy Physics Scientific Data Transfer System (Remote Presentation)

        The Institute of High Energy Physics has constructed multiple large-scale scientific facilities, including BSRF, HEPS, LHAASO, JUNO, AliCPT, which generate a large amount of data requiring high-performance data transfer services. According to the strategic plan of the Computing Center's "One Platform, Multiple Centers", data needs to be moved between multiple data centers, thus necessitating high-performance data transfer services. The National High Energy Physics Scientific Data Center receives data from various research projects and requires data submission and long-term preservation, also requiring high-performance data transfer services. To meet the data transfer needs of different experiments, a High Energy Physics Scientific Data transfer System has been designed. This system adopts a cluster-based design and management, with the transfer cluster consisting of a control master node and transfer sub-nodes. The control master node implements functions such as transfer task discovery, message queues, and web service support. The transfer sub-nodes provide scientific data transfer services and metadata interactions. The system has been deployed in multiple experiments and has achieved stable operation and good performance. This report will provide a detailed description of the various functional modules of the transfer system, as well as the deployment and application scenarios in different experiments.

        Speaker: zhuang bo (IHEP)
      • 98
        Blockchain-Enabled Secure Management of Scientific Data in Permissioned Networks: a Metadata-Centric Approach (Remote Presentation)

        In recent years, blockchain has emerged as a promising new technology to manage trusted information, making it easier for companies to access and use critical data while maintaining the security of this information.
        Permissioned blockchains, unlike permissionless ones, restrict access to a select group of certified entities. They ensure a controlled and secure environment where only authorized participants can join the network and perform operations, a peculiar aspect in sectors where data sensitivity, confidentiality, and limited access are crucial.
        In this work, we present the implementation of a permissioned blockchain system aimed at ensuring data immutability, operations traceability, and the ability to reproduce workflows.
        Tracking operations performed on the data and guaranteeing reproducibility of research through workflow reconstruction upon data processing become very important in different sectors ranging from scientific communities to private companies and health.
        In such regards, we work with Hyperledger Fabric, an enterprise-grade permissioned distributed ledger platform that offers modularity and versatility for a broad set of industry use cases.
        Among its features, Fabric supports the generation of digital certificates and the possibility to import externally provided X.509 format to ensure participant identity; smart contract development in different programming languages (such as Go, Javascript or Java); provides a REST API for communication with off-chain service; its modular architecture allows various components and features to be replaced or extended based on the specific needs of a blockchain network.
        In our implementation of the private and permissioned blockchain network, we decided to operate with the hash of the data and the related metadata instead of the whole data. In such a way, storage space usage can be optimised, providing the possibility to delete data. In a blockchain framework, in fact, data are not removable. By storing only the metadata, we store essential information on the blockchain, and leave the possibility to manage and delete specific data entries stored off-chain efficiently.
        Moreover, the support of a data lake to store data enables the possibility to trigger events when the data is modified.
        Furthermore, we take into consideration also aspects such as data tampering and how blockchain can help in capturing and storing the modification, revealing the owner and timestamp of the operation. As already mentioned, including hashes of the data allows a faithful reconstruction and reproduction of the entire workflow, making the data processing management trustable.
        In the present work, the blockchain implementation is explained, with some examples, and possible improvements are presented, including the design of a cloud-enabled blockchain as a service on the INFN Cloud infrastructure.

        Speaker: Domingo Ranieri
      • 99
        Enhanced StoRM WebDAV data transfer performance with a new deployment architecture behind NGINX reverse proxy

        StoRM WebDAV is a component of StoRM (Storage Resource Manager) which is designed to provide a scalable and efficient solution for managing data storage and access in Grid computing environments. StoRM WebDAV specifically focuses on enabling access to stored data through the WebDAV (Web Distributed Authoring and Versioning) protocol. WebDAV is an extension of the HTTP protocol that allows users to create, change and move resources on a web server.

        StoRM WebDAV is designed to follow the requirements set forth by the WLCG (Worldwide LHC Computing Grid) community, in particular it supports: Third Party Copies (TPC), authorization based on JWT tokens or X.509 certificates (often VOMS proxies), and fine-grained access policies. Third-party copy has been one of main GridFTP features used by LHC experiments data management frameworks to implement scalable data transfer and management. In 2017 the Globus Alliance announced that the open-source Globus Toolkit would no longer be supported. This seriously impacted the WLCG community because of the central role of the Globus Security Infrastructure and GridFTP in the context of data transfer frameworks. As a natural consequence, WLCG is moving towards HTTP-based data transfers. In this HTTP-based context, an extension of the WebDAV COPY verb has been defined by the WLCG community and consists of bulk transfer requests between two remote storage endpoints.

        This contribution highlights how data transfer (TPC or direct upload/download) performance can be enhanced by delegating them to NGINX. This strategic decision is driven by the proven reliability, scalability, and performance capabilities of NGINX in handling such critical functions.

        Speakers: Francesco Giacomini (INFN) , Enrico Vianello (INFN)
      • 100
        Archive storage in SDCC of BNL

        The Scientific Data and Computing Center (SDCC) at Brookhaven National Laboratory provides data storage, transfer, and computing resources to scientific experiments and communities at Brookhaven National Lab and worldwide.
        A significant amount of scientific data has been stored and retrieved daily from the multiple tiered storage systems. My presentation covers the infrastructure of the data archiving operations on disk and tape infrastructures.

        Speaker: Tim Chou (Brookhaven National Laboratory)
    • Network, Security, Infrastructure & Operations Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: David Kelsey (STFC-RAL)
      • 101
        LHCOPN and LHCONE prepare for HL-LHC

        The LHCOPN network, which links CERN to all the WLCG Tier 1s, and the LHCONE network, which connects WLCG Tier1s and Tier2s, have successfully provided the necessary bandwidth for the distribution of the data generated by the LHC experiments during first two runs of the LHC accelerator. This talk gives an overview of the most remarkable achievements and the current state of the two networks. It also explains how the two networks are evolving to support Run3 and how they are preparing to meet the high demands foreseen for Run4, notably by adopting new transmission technologies to increase the available bandwidth, introducing new software tools to improve the efficient utilization of all the links, new monitoring capabilities to increase the understanding of the network traffic.

        Speaker: Edoardo Martelli (CERN)
      • 102
        NOTED: a congestion driven network controller

        NOTED is an intelligent network controller that aims to improve the throughput of large data transfers in FTS (File Transfers Services), which is the service used to exchange data transfers between WLCG sites, to better exploit the available network resources. For a defined set of source and destination endpoints, NOTED retrieves the data from FTS to get the on-going data traffic and uses the CRIC (Computing Resource Information Catalog) database to get comprehensive understanding about the network topology. This feature, has been shown successful results during SC22 and SC23 conferences where, NOTED was executing actions when it detected congestion on a given link and dynamically reconfiguring the network topology by using an SDN (Software-defined Network) service. Recently, NOTED has been integrated with the CERN NMS (Network Monitoring System) to increase even more its capabilities and be driven by congestion. In this way, NOTED brings the capability to identify which WLCG sites are congesting the network, both in LHCOPN (Tier 0 to Tier 1’s links) and LHCONE (Tier 1’s to Tier 2’s links) networks, and execute an action in the network to reconfigured it by adding capacity to the network. This new version of NOTED has been tested during SC23 and will be used at scale during DC24 (WCLG Data Challenge) in which the NREN’s and WLCG sites performs the testing at 25% of rate that will be use by HL-LHC (High Luminosity LHC) to accomplish the requirements by 2029.

        Speaker: Ms Carmen Misa Moreira (CERN)
      • 103
        Networking at the WLCG: R&D activities and DC24 testing

        In recent years, different R&D activities have been developed at CERN within the WLCG (World LHC Computing Grid) to exploit the network and provide new capabilities for future applications. An example is NOTED (Network Optimised Transfer of Experimental Data) to dynamically reconfigure network links to increase the effective bandwidth available for FTS-driven transfers by using dynamic circuit provisioning systems such as SENSE and AutoGOLE. Another example is the packet marking initiative of the RNTWG (Research Network Technical Working Group) to increase network visibility and distinguish network flows by LHC experiment and application, through use of the flow label field of IPv6 packets. Additionally, the RNTWG is working on packet pacing with BBR (Bottleneck Bandwidth and Round-trip propagation time) and Jumbo frames. In this way, during DC24 the different R&D activities have been tested at scale and based on the results analyse their feasibility for future applications.

        Speaker: Ms Carmen Misa Moreira (CERN)
    • 15:30
      Coffee Break (ISGC 2024)
    • AI workshop/tutorial Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 104
        Lab on AutoEncoders
        Speaker: Daniele Bonacorsi (University of Bologna)
      • 105
        Lab on Generative Adversarial Networks
        Speaker: Daniele Bonacorsi (University of Bologna)
      • 106
        AI challenges and ethical implications
        Speaker: Daniele Bonacorsi (University of Bologna)
    • Data Management & Big Data Media Conf. Room

      Media Conf. Room

      BHSS, Academia Sinica

      Convener: Patrick Fuhrmann (DESY/dCache.org)
      • 107
        Open Data and Open Science as Effective Ways to Revive Hard-to-reach Population: A Decade Research Applies to Taiwan Indigenous Peoples

        There has been a rich body of text and numerical archives about Taiwan indigenous peoples (TIPs) before 1940. Because detailed statistical archives and numerical data on TIPs were not available in the period between 1940-2010, we thus have very limited knowledge about developmental trajectory about TIPs. Such situation makes TIPs gradually become the so-called hard-to-reach population (HRP) and invisible in the real world. To overcome this issue, it becomes urgent to collect and build contemporary TIPs data. Supported by a decade research program form Council of Indigenous Peoples (https://www.cip.gov.tw/) in 2013, the author has been fully devoting himself to building a number of big open data on TIPS using scientific computing methods and techniques, based on the principles of open science and data science. The open data sets are built by integrating hacking skills, advanced math/statistics methods, and domain knowledge of various disciplines. Their repositories are hosted on OSF (Open Science Framework, https://osf.io/) and termed as TIPD (Taiwan Indigenous Peoples Open Research Data, for details, see https://osf.io/e4rvz/). TIPD complies with FARE data principle. It consists of the following categories of open data from 2013 to 2022: (1) categorical data, (2) multi-dimensional data, (3) population dynamics, (4) temporal geocoding data, (5) household structure data, (6) traditional TIPs community data (TICD), (7) generalized TICD query system, (8) genealogical data (not open to the public). The main contributions are as follows: (1) to enable TIPs who have been “invisible” to the world for seven decades to become “close” to “open” to the real world; (2) to empower TIPs researches from being “elite” to “ordinary” by using open data to reduce tech-barriers for researchers; (3) to extend TIPs studies from “local” to “global” arena by building bilingual open data repository; (4) to make research methods and techniques of TIPs switch from “macro” to “individual” level that make TIPS to revive from hard-to-reach to easy-to-reach population.
        Keywords: Big Data, Data Science, FAIR, Open Data, Open Science, TIPD
        Reference
        1. Utsurikawa, N., Miyamoto, N., Mabuchi, T. 1935. The Formosan Native Tribes: A Genealogical and Classificatory System. Institute of Ethnology, Taipei Imperial University (移川子之蔵、馬淵東一、宮本延人. 1935.《臺灣高砂族系統所屬の硏究》. 臺北帝國大學).
        2. Lin, Ji-Ping. 2017a. “Data Science as a Foundation towards Open Data and Open Science: The Case of Taiwan Indigenous Peoples Open Research Data (TIPD),” in Proceedings of 2017 International Symposium on Grids & Clouds, PoS (Proceedings of Science).
        3. Lin, Ji-Ping, 2017b, “An Infrastructure and Application of Computational Archival Science to Enrich and Integrate Big Digital Archival Data: Using Taiwan Indigenous Peoples Open Research Data (TIPD) as Example,” in Proceedings of 2017 IEEE Big Data Conference, the IEEE Computer Society Press.
        4. Lin, Ji-Ping. 2018. “Human Relationship and Kinship Analytics from Big Data Based on Data Science: A Research on Ethnic Marriage and Identity Using Taiwan Indigenous Peoples as Example,” pp.268-302, in Stuetzer et al. (ed) Computational Social Science in the Age of Big Data. Concepts, Methodologies, Tools, and Applications. Herbert von Halem Verlag (Germany), Neue Schriften zur Online-Forschung of the German Society for Online Research.
        5. Lin, Ji-Ping. 2021. “Computational Archives of Population Dynamics and Migration Networks as a Gateway to Get Deep Insights into Hard-to-Reach Populations: Research on Taiwan Indigenous Peoples,” Proceedings of 2021 IEEE International Conference on Big Data, IEEE Computer Society Press.

        Reference of open data repositories:
        6. TIPD: https: https://www.rchss.sinica.edu.tw/capas/posts/11206
        7. TPDD: https://www.rchss.sinica.edu.tw/capas/posts/11621
        8. TICD: https://www.rchss.sinica.edu.tw/capas/posts/11205
        9. Integrated Query System of TICD: https://TICDonGoogle.RCHSS.sinica.edu.tw
        10. TIPs Migration Dynamics: https://www.rchss.sinica.edu.tw/capas/posts/11329
        11. High-resolution visualizations of population distribution, migration dynamics, traditional communities: https://www.rchss.sinica.edu.tw/capas/posts/11393
        12. Interactive migration visualizations: https://www1.rchss.sinica.edu.tw/jplin/TIPD_Migration/

        Speaker: Ji-Ping Lin (Academia Sinicca)
      • 108
        Facilitating the distribution of software by using CernVM File System and S3 bucket (Remote Presentation)

        The adoption of user-friendly solutions aimed at sharing data as well as software and related configuration files, among heterogeneous and distributed resources, becomes a necessity for the scientific community. By adopting and using software products dedicated to this purpose, it is possible to facilitate the distribution of software, configurations and files. To this extent, the CernVM-File System has been adopted and integrated with other technologies such as S3 object storage, Vault identity-based secrets and encryption management system and RABBitMQ open source message broker.

        The CernVM File System (CVMFS) provides a scalable, reliable and low-maintace software distribution service. It was developed to assist High Energy Physics (HEP) collaborations in deploying software on the worldwide-distributed computing infratructure used for running data processing applications. It is a network file system implemented as a POSIX read-only file system in user space (a FUSE model) and it uses a standard HTTP transport, thereby avoiding most of the firewall issues. Files and directories available via CernVM-FS are hosted on standard web servers and are always mounted in the universal namespace /cvmfs.

        The integration with Vault provides encryption services that are gated by authentication and authorization methods to ensure secure, auditable and restricted access to secrets and to store the CVMFS keys. On the other hand, RABBitMQ collects events used to process the creation and /or the update of new repositories.

        The objective of the present work is to design and to develop a cloud-oriented service aimed at enabling the final user to require a personal or a group CVMFS repository. At a later time the user can upload data to the dedicated object storage space and access it in his personal CVMFS repository in a transparent way. The whole system will provide an abstraction layer enabling the final user to distribute data, software, libraries and related depedences from an S3 object storage to different resources, having them installed under a proper path of the file system, and having a POSIX access. Hiding the complexity of the above mentioned system, in fact, will shorten the learning curve by improving the user experience in the adoption of the service itself.

        In the present work, the deployment of the different services on the INFN Cloud distributed infrastructure is presented, together with the integration process. Also, some practical examples are presented to demonstrate both the high level of reproducibility and the usability of the deployed solution suitable to be adopted by other communities.

        Speaker: Giada Malatesta (INFN-CNAF)
      • 109
        Remote S3 storage access using CEPH Rados Gateway (Remote Presentation)

        INFN-CNAF is one of the Worldwide LHC Computing Grid (WLCG) Tier-1 data centers, providing support in terms of computing, networking, storage resources and services also to a wide variety of scientific collaborations, ranging from physics to bioinformatics and industrial engineering.
        Recently, several collaborations working with our data center have developed computing and data management workflows that require access to object storage services and the integration with POSIX capabilities.
        To accomplish this requirement in distributed environments, where computing and storage resources are located at geographically distant physical sites, the possibility to locally mount a file system from a remote site to directly perform operations on files and directories becomes crucial.
        Nevertheless, accessing data must be regulated by standard, federated authentication and authorization mechanisms, such as OpenID Connect (OIDC), which is already adopted within WLCG and the European Open Science Cloud (EOSC).
        Starting from such principles, we have implemented a solution that provides fine-grained data access by integrating JSON Web Token (JWT) authentication, provided by INDIGO-IAM as Identity Provider (IdP), Open Policy Agent (OPA) , CEPH Rados Gateway supporting the S3 compatible API and the Security Token Service (STS) for cross-account operations and sts-wire, a Rclone wrap-up to mount cloud storage as a disk.
        CEPH RADOS Gateway allows access via OIDC Identity Provider and cross-account operations by offering Security Token Service (STS). In addition, the integration with OPA allows an authorization policy reinforcement: while CEPH administrator creates a role to define both location and type of storage resources available for the identity provider’s users, the fine-grained policies are handled by OPA to manage buckets and objects, and to perform S3 operations. This design has allowed us to decouple authorization enforcement from the storage service. Hence, modifying user’s access can be executed independently ensuring the availability of the service. Moreover, the policy engine offers scalability that can accommodate increasing resource demanding on geographically distributed infrastructures. The S3 storage made available with the present solution is also accessible via a wide range of client applications such as SDK (Boto3), CLI (s3cmd and sts-wire) and the in-house designed Web Application, developed upon ReactJS, able to exploit the official AWS SDK for JavaScript. The WebApp allows easy access to RADOS Gateway resources, giving the user the ability to list, upload and download file objects, providing authentication with both plain credentials and OpenID connect. In this work, the design and integration process of the above mentioned solution is presented, together with some examples and related advancements.

        Speakers: Ahmad Alkansa (INFN CNAF) , Alessandro Costantini (INFN-CNAF)
    • Network, Security, Infrastructure & Operations Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: David Groep (Nikhef and Maastricht University)
      • 110
        CERN Quantum Communications activities

        This talk will give an overview of the second phase of the CERN Quantum Technologies Initiative (QTU2), focusing on the Quantum communications work package.
        On Quantum Communications, CERN will focus on two main activities: 1) Quantun Key Distribution using White Rabbit for time synchronization and 2) very precise time and frequency distribution.

        Speaker: Edoardo Martelli (CERN)
      • 111
        Lockers: An Innovative and Secure Solution for Managing Secrets in the EGI Cloud Infrastructure

        Secret management stands as an important security service within the EGI Cloud federation. This service encompasses the management of various types of secrets, including tokens and certificates, and their secure delivery to the target cloud environment. Historically, accessing secrets from virtual machines (VMs) has relied on OIDC access tokens, a method that harbors potential security vulnerabilities. In the event of VM compromise, these access tokens can be pilfered, enabling attackers to gain access to all user secrets.

        The Locker mechanism introduces an innovative and robust approach to securely deliver secrets to VMs. Users can effortlessly create a locker, deposit their secrets within it, and then furnish the locker's token to their VMs. Key security attributes of the locker system include:

        • Temporary and Autoclean: Lockers have a limited lifespan and quantity. Upon expiration, lockers are automatically purged, along with all the secrets contained within them.
        • Isolation: Access to the secrets within a locker is exclusively through its associated token, which can solely be used for accessing the locker's secrets—nothing more. This isolation allows users to store tokens in Continuous Integration/Continuous Deployment (CI/CD) pipelines and similar tools, mitigating the risk of exposing personal secrets.
        • Malfeasance Detection: The locker mechanism possesses the capability to detect if a token has been compromised and is being misused.

        By adopting the locker approach, users can securely deliver secrets to VMs within the EGI Cloud federation, all while safeguarding their personal credentials from exposure. This innovative solution enhances the overall security posture of the cloud infrastructure, providing a robust foundation for secret management.

        Speaker: Viet Tran (Institute of Informatics, Slovak Academy of Sciences)
      • 112
        Carbon life cycle modelling of scientific computing

        Summary: We propose a model to estimate and minimise full life cycle emissions of scientific computing centres based on server embodied carbon, PUE, projected next-generation performance-per-Watt improvements and actual/projected carbon intensity of the location.

        In this paper we present a model for the assessment of the replacement cycle of a compute cluster as a function of the carbon intensity of the region where it is deployed and the embodied carbon in the manufacturing of the server. The model allows to take into account facility PUE or cooling emissions, server load and energy efficiency gains through replacement by the latest models.

        The embodied carbon in the manufacturing of the server is estimated based on public historical data reported by manufacturers for some models, together with average transport emissions. These carbon emissions for acquiring new hardware is weighed against the ongoing emissions for running older hardware, and we provide a model to optimize replacement cycles for minimal carbon footprint given that newer equipment will have greater energy efficiency for equivalent scientific compute work.

        We show the results of this model for several real world sites to provide an equivalent scientific computing capacity, using actual conditions for their power and cooling emissions, which gives us a tailored strategy for the site on replacement cycle for hardware to minimize carbon footprint.

        For instance, the model shows that with current estimates for embodied carbon, to minimise the full life cycle emissions, clusters in countries with moderate electricity carbon intensity (<200 gCO2e/kWh) should ideally be kept in operation for several decades.

        Since running computers for several decades provide practical and financial challenges, we also discuss limits to the applicability of the model outcomes to practical site operations and procurements. In a addition we also come with some community and industry recommendations that from our work indicate they would be helpful to lower the total carbon emissions from scientific computing.

        Speaker: Mattias Wadenstein (NeIC)
    • 18:00
      Gala Dinner 2F, Fuji Grand Hotel

      2F, Fuji Grand Hotel

    • ASGC User Training Auditorium

      Auditorium

      BHSS, Academia Sinica

    • Data Management & Big Data Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: Kento Aida (National Institute of Informatics)
      • 113
        Status of IHEP Grid Data Management System (Remote Presentation)

        With the increase in the number of large-scale international collaborative experiments supported by the Institute of High Energy Physics (IHEP) of the Chinese Academy of Sciences, IHEP and collaborators face challenges in various application scenarios such as different data volumes, multiple data management needs, and various data authentication requirements. At the same time, new technologies and applications for international grid computing are developing rapidly. Grid services such as Rucio and IAM are gradually being widely used due to their high flexibility and reliability, while distributed computing systems like DIRAC are expanding their support for new grid components. In order to meet the needs of experiments such as JUNO, HERD, CEPC, IHEP Computing Center has established a grid data distribution and management system based on distributed computing systems like Rucio and DIRAC. This report will introduce recent research and progress made by IHEP in grid data management services, including research and application of new systems or components related to experimental software development grid data management interface and non-grid site data transmission system components. It will also present the planning and construction status of site-level operation statistics and monitoring systems at experimental centers.

        Speaker: Xuantong Zhang (IHEP, CAS)
      • 114
        The Data Management and Data Service for HEPS (Remote Presentation)

        The 14 beamlines for the phase I of High Energy Photon Source(HEPS) will produces more than 300PB/year raw data. Efficiently storing, analyzing, and sharing this huge amount of data presents a significant challenge for HEPS.

        HEPS Computing and Communication System(HEPSCC), also called HEPS Computing Center, is an essential work group responsible for the IT R&D and services for the facility, including IT infrastructure, network, computing, analysis software, data preservation and management, public services etc. Aimed at addressing the significant challenge of large data volume, HEPSCC has designed and established a network and computing system, making great progress over the past two years.

        For the IT infrastructure, A deliciated and high-standard machine room, with about 900㎡ floor space for more than 120 high-density racks in total has been ready for production since this August. The design of the network utilizes RoCE technology and a spine-leaf architecture. The data center network’s bandwidth can support speeds of up to 100Gb/s, fully meeting the demands of high- speed data exchange. To meet the requirements of data analysis scenarios for HEPS, a computing architecture is designed and deployed in three types, including Openstack, Kubernetes, and Slurm. Openstack integrates the virtual cloud desktop protocol to provide users with remote desktop access services, and supports users to use browsers to access windows/Linux desktop, running commercial visualization data analysis software. Kubernetes manages container clusters, and starts multiple methodological container images according to user analysis requirements. Slurm is used to support HPC computing services and meet users' offline data analysis needs.

        Additionally, HEPSCC designed and developed two softwares for the data management and analysis, DOMAS and Daisy. DOMAS (Data Organization, Management and Accessing Software stack), which is aimed for automating the organization, transfer, storage, distribution and sharing of the scientific data for HEPS experiments, provides the features and functions for metadata catalogue, metadata ingestor, data transfer, data web portal. Daisy (Data Analysis Integrated Software System) is a data analysis software framework with a highly modular C++/Python architecture. Some online data analysis algorithms developed by HEPS beamlines have been integrated into Daisy successfully most of which were validated at the beamlines of BSRF (Beijing Synchrotron Radiation Facility) for the real-time data processing. Other data analysis algorithms/software will be continuously integrated to the framework in the future.

        In 2021, A testbed was set up at beamline 3W1 of BSRF, which is a running synchrotron radiation facility and provides the technology R&D and test platforms for HEPS. The 3W1 beamline, which is dedicated to test high-throughput instruments for HEPS. It is an ideal candidate to set up the testbed where we can deploy the system and verify the functions and the whole process of data acquisition, data processing, data transfer, data storage and data access.
        The integration and the verification of the whole system at 3W1 beamline were finished and achieved great success. It strongly proved the rationality of the design scheme and the feasibility of the technologies. After the optimization and upgrade of the functionality, in July 2022, all the sub-systems of HEPSCC were deployed at 4W1B, which is a running beamline at BSRF, can provide full process service for beamline users.

        Speaker: Hao Hu (Institute of High Energy Physics)
      • 115
        the interTwin project Digital Twin Engine

        The Horizon Europe interTwin project is developing a highly generic Digital Twin Engine (DTE) to support interdisciplinary Digital Twins(DT). Comprising 31 high profile scientific partner institutions, the project brings together infrastructure providers, technology providers and DT use cases from High Energy and AstroParticle Physics, Radio Astronomy, Climate Research and Environmental Monitoring. This group of experts enables the co-design of both the DTE Blueprint Architecture and the prototype platform; not only benefiting end users like scientists and policymakers but also DT developers. It achieves this by significantly simplifying the process of creating and managing complex Digital Twins workflows.
        In our presentation, we'll share the latest updates on our project, including a detailed look at the DTE Blueprint Architecture released in June 2023. We're also excited to discuss the second version, due for release in January 2024. The presentation will cover the diverse DT use cases we're supporting and give a preview of our First Software release planned for February 2024. This talk is an excellent opportunity for anyone keen to learn about the latest developments in digital twin engines and their applications in science

        Speaker: Patrick Fuhrmann (DESY/dCache.org)
    • Network, Security, Infrastructure & Operations Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: David Groep (Nikhef and Maastricht University)
      • 116
        Support to experiments in the transition from X.509 authN/Z to SciTokens

        X.509 certificates and VOMS proxies are widely used by the scientific community for authentication and authorization (authN/Z) in GRID Storage and Computing Elements. Although this has contributed to improve worldwide scientific collaboration, X.509 authN/Z comes with some downsides: mainly security issues and lots of customization needed to integrate them with other services.

        The GRID computing communities have decided to migrate to token-based authentication, a new web technology that has proved to be flexible and secure.
        SciTokens, the token model adopted by the GRID communities, are based on JSON Web Tokens (JWT): a compact way to securely transmit information as JSON objects.
        JWT are usually short-lived and provide fine-grained authorization, based on "scopes", to perform specific actions.
        These scopes are embedded into the token and are specified during the request procedure so they last only until token expiration time. Scopes can be requested based on user groups and permission thus providing the possibility of restricting a group to perform only a subset of actions.
        These characteristics make up to a more secure alternative to X.509 proxies.
        Being largely used in industries, JWT are also easily integrated in services not specifically developed for the scientific community, such as calendars, Sync and Share services, collaborative software development platforms, and more.
        As such, SciTokens suit the many heterogeneous demands of GRID communities and some of them already started the transition in 2022.

        In the Italian WLCG Tier-1, located in Bologna and managed by INFN - CNAF, several computing resources are hosted and made available to scientific collaborations in the fields of High-Energy Physics, Astroparticle Physics, Gravitational Waves, Nuclear Physics and many others.
        Although LHC experiments at CERN are the main users of CNAF resources, many other communities and experiments are being supported in their computing activities.

        While the main LHC experiments have already planned their own transition from X.509 to token-based authN/Z, many medium/small-sized collaborations struggle to put effort into it.

        The Tier-1 User Support unit has the duty of guiding users towards efficient and modern computing techniques and workflows involving data and computing resources access.

        As such, the User Support group is playing a central role in preparing documentation, tools and services to ease the transition from X.509 to SciTokens.
        The foreseen support strategy and the related tools will be presented. Future workflow plans in view of the complete transition will also be presented.

        Speaker: Mr Alessandro Pascolini (INFN - CNAF)
      • 117
        INDIGO IAM migration to Spring Authorization Server framework with a new customizable React user dashboard

        INDIGO Identity and Access Management (IAM) is a comprehensive solution that enables organizations to manage and control access to their resources and systems effectively. It is a Spring Boot application, based on OAuth/OpenID Connect technologies and the MITREid Connect library. INDIGO IAM has been chosen as the AAI solution by the WLCG community and has been used for years by the INFN DataCloud services, as well as by several other projects and experiments. The constant evolution of identity and access management systems like INDIGO IAM is imperative in the rapidly advancing landscape of cybersecurity and software development. This abstract encapsulates the transformative journey of the INDIGO IAM software, transitioning from its existing frameworks, MITREid Connect and AngularJS web user interface, to the robust and contemporary Spring Authorization Server with a React-based new dashboard.

        The MITREid framework has served as a reliable foundation for INDIGO IAM, providing essential identity management capabilities. However, recognizing the need for enhanced security, scalability, and modern features, the decision was made to migrate to the Spring Authorization Server, a powerful and flexible IAM framework built on the widely adopted Spring Security. This migration represents a strategic move towards aligning INDIGO IAM with cutting-edge industry standards and best practices. The Spring Authorization Server offers a comprehensive set of features, including OAuth 2.1 draft and OpenID Connect support, adaptive authentication, and a modular architecture that facilitates seamless integration with other Spring ecosystem components. This abstract highlights the migration benefits, such as improved security protocols, simplified development workflows, and enhanced scalability.

        The decision to migrate to React was driven by a desire to harness the latest advancements in User Interface (UI) development, improve performance, and elevate the overall user experience. React, known for its declarative and component-based architecture, offers a modern and efficient approach for building interactive and responsive user interfaces. The abstract addresses the impact of the UI transformation on end-users, highlighting the commitment to preserving a familiar and intuitive experience while introducing new features and improvements. The benefits of React, including virtual DOM efficiency and state management, are underscored as pivotal elements contributing to the enhanced responsiveness and interactivity of the INDIGO IAM Dashboard.

        Finally, this abstract elucidates the technical considerations and architectural decisions made during the development process to facilitate multi-backend support. The commitment to a standardized and modular design is highlighted as a key enabler for developers to seamlessly integrate and configure the UI with different backend implementations.

        Speakers: Jacopo Gasparetto (INFN) , Roberta Miccoli (INFN) , Enrico Vianello (INFN)
      • 118
        A Study of Credential Policy and Credential Practice Statement for an Authentication Proxy Service

        Authentication proxy services are becoming increasingly important in existing ID infrastructure linkage. It is necessary to clarify how the service identifies and authenticates end entities and to strictly operate the service. In this paper, we discuss a credential policy and credential practice statement of Orthros, an authentication proxy service that has begun trial operation.
        Identification and authentication in authentication proxy services have a complex structure because they have a method themselves and delegate it to multiple ID infrastructures. We will summarize its characteristics and introduce the credential policy and credential practices statement of the authentication proxy service, Orthros.

        Speaker: Dr Eisaku Sakane (National Institute of Informatics)
    • 10:30
      Coffee Break
    • ASGC User Training Auditorium

      Auditorium

      BHSS, Academia Sinica

    • Network, Security, Infrastructure & Operations Conf. Room 2

      Conf. Room 2

      BHSS, Academia Sinica

      Convener: David Kelsey (STFC-RAL)
      • 119
        Evolution of SSH with OpenId Connect

        The Secure Shell Protocol (SSH) is the de-facto standard for accessing
        remote servers on the commandline. Use cases include
        - remote system administration for unix administrators
        - git via ssh for developers
        - rsync via ssh for system backups
        - HPC access for scientists.

        Unfortunately, there is no globally accepted usage pattern for globally
        federated usage yet.

        The large variety of users with different backgrounds and usage profiles
        motivated us to develop a set of different tools for facilitating the
        integration with federated user identities, which are being presented in
        this contribution. The main novelty is the integration of an ssh
        Certificate Authority (CA) into the existing motley-cue + oidc-agent
        mechanism. Oinit simplifies the usage of ssh-certificates by leveraging
        authorisation information via established federation mechanisms. The
        benefit is that - after an initial setup step - ssh may be used securely
        without interrupting existing flows. This allows for example the use of
        rsync.

        To enable this, oinit consists of a collection of programs to enable
        OpenSSH login for federated identities based on certificates:

        • The oinit-ca provides a REST interface to an ssh-ca at which  
          authorised users obtain an ssh certificate for a specified host or host
          group. Authorisation decisions are made by motley-cue, the component  
          that enables federated use of ssh on the ssh-server side. User
          provisioning may also be triggered at this point, via motley-cue &
          feudal.

        • Users employ the oinit tool to add hosts to the oinit mechanism. Once  
          established, ssh-certificates will automatically be retrieved, whenever
          this may be necessary and stored in the ssh-agent.

        • Serverside tools and configuration for enabling ssh without knowledge of
            local usernames, which is particularly useful in federated scenarios.

        We present the architecture, an initial security assessment, as well as a
        live demo of ssh with OpenId Connect, with oinit and selected components.

        Speaker: Marcus Hardt (KIT)
      • 120
        A Self-service Authentication and Access System for Computing Cluster (Remote Presentation)

        Keywords: cluster computing, account passport, secure shell (SSH), lightweight certificate, remote access, SSH tunnel

        Advanced computing infrastructure such as high-performance clusters, supercomputers, and cloud computing platforms offer unparalleled computing capabilities and effectively support a multitude of computing requirements across diverse fields such as scientific research, big data analysis, artificial intelligence training and inference, and many more. Secure Shell (SSH) is a widely used method for accessing remote computing resources, It not only provides command-line tools, but also offers rich textual and graphical interfaces based on tunnel and port forwarding, allowing users to access components and services located on remote servers. Nowadays, there are numerous methods for remotely accessing computing resources, for example, utilizing a simple password, utilizing a password enhanced with VPN, utilizing a static password and a dynamic token provided by hardware, utilizing a public key in software or hardware, and utilizing a token provided by a mobile application. However, these methods have several drawbacks, including network inefficiency resulting from VPN, high costs associated with dedicated hardware, and security concerns such as brute-force attacks. These issues have caused not only a detrimental experience for users but have also burdened administrators with unwarranted maintenance tasks and complexities.
        Addressing the aforementioned issues and aiming for efficient and secure access to computing resources, this paper proposed the establishment of an authentication chain consisting of a CSTNET Passport and an SSH lightweight certificate. CSTNET Passport is primarily utilized in web-based services and similar contexts, providing support for single sign-on and multi-factor authentication, thereby exhibiting robust security and user-friendliness. Nevertheless, its adaptability in non-web-based scenarios, particularly in SSH command line utilities, remains limited. An SSH lightweight certificate system eliminates the need for users to remember intricate passwords or engage in frequent password changes. Additionally, it circumvents the drawbacks associated with the decentralized deployment and long-term effectiveness of the public key system. This paper introduced an authentication and access model that effectively combines the initial login process using the CSTNET Passport with the subsequent login utilizing the SSH lightweight certificate, either directly or through multiple jumps. The model outlines procedures for users to issue lightweight certificates and configure SSH clients, as well as guidelines for administrators in configuring SSH daemons, mapping passport and local account pairs, and other related tasks. It facilitates on-demand authentication and access flows. To enhance security, the model restricts the time window, narrowing it from an unrestricted period to a specific time range designated by the user. This paper also introduced a dynamic firewall model that transparently acquires the network address of the browser client and the validity period of the certificate. It then forms IP address and time range metadata, and dynamically adds firewall policies. To enhance security, it narrows the scope of allowed IP addresses from any IP to a specific IP address or list based on whether the user possesses a public IP address or a NAT address. These two models substantially diminish the time and space windows available for network attacks. Moreover, they accomplish this enhancement without incurring extra expenses for additional software, such as VPNs, or hardware, like token cards.
        Leveraging these models, this paper has developed a self-service authentication and access system, which comprises two integral parts: a web application subsystem and a toolset subsystem. The web application subsystem facilitates passport login, account mapping, certificate issuance, and status inquiry for both users and administrators. The toolkit, on the other hand, equips users with key generation, public key uploading, certificate downloading, and dedicated files within an SSH client. Additionally, it offers CA certificates and principal configuration tools for SSH login servers, while also providing optional SSH connection status profiling capabilities for administrators.
        Leveraging open-source and free software solutions, such as Nginx, OpenSSH, and MobaXterm, the system software has been developed with a comprehensive utilization of HTTPS/TLS protocols, port multiplexing, and NJS scripts. It fully supports SSH command line operations, tunneling, and port forwarding in both direct and multi-hop modes.
        Since 2023, the system has been successfully deployed to a cluster consisting of one login node and 20 computing nodes, effectively supporting self-service authentication and secure access for hundreds of users. The system changes access from anytime and anywhere to on-demand access with various restrictions specified by users, and also transfers controls from administrators to users. Users have the ability to decide when and where to access the server with specified permissions. The system is easy to deploy, occupies fewer resources, does not introduce extra hardware costs, and can effectively increase the usability and security of computing resource. In future, this paper will focus on optimizing and strengthening the functionalities pertaining to firewall policies and SSH connections.

        Speaker: Rongqiang Cao (Computer Network Information Center, Chinese Academy of Sciences)
      • 121
        Sitting under a broad-leaved AARC-TREE – making authentication and authorization for research collaboration even better

        The authentication and authorisation infrastructures (AAIs) for research worldwide have for years now based their architectures in the “AARC Blueprint Architecture” and the suite of accompanying guidelines. Developed by the “Authentication and Authorisation for Research Collaboration” (AARC) community, and fostered by the accompanying “engagement group for infrastructures” (AEGIS), the model has been a key ingredient of the European Open Science Cloud, many national research AAIs, and research and e-infrastructures in Europe, the Americas, and the Asia-Pacific region. However, with the increased scope and complexity of novel federated identity models, come new challenges. The single ‘AAI proxy’ model of the initial AARC blueprint – which combined identity sources, community collaboration management, authorisation controls, and service provider connections - has already evolved to include both ‘infrastructure proxies’ to provide coherency on the service provider side, as well as ‘community proxies’ focussing on membership management.

        Yet the challenges keep coming at an ever-increasing pace, both in terms of complexity as well as in the range of communities that can benefit from this coherent ‘AARC approach’ to federated access management. The new AARC-TREE project, the short label name for “AARC Technical Revision to Enhance Effectiveness”, will define common strategies for the development and deployment of AAIs in the large-scale Research Infrastructures where single proxies do not suffice.

        For example, there are multiple technical federation models (such as OpenID Connect Federation besides SAML) and a multitude of identity sources (academic, governmental, and self-sovereign) that need to co-exist and be linked together. And global interoperability between infrastructures must be strengthened to avoid fragmentation and unnecessary duplication.

        At the same time, we see that collaborations and thematic research domains struggle to keep up when just provided with guidelines and architecture, and this gets more urgent as collaborative AAIs extend beyond research to education, high-performance computing, and mid-sized communities.

        In this contribution, we will reflect on the AARC Blueprint Architecture for AAI and identify its critical areas that need improvement, look forward to addressing more AAI interoperability requirements and service gaps for more research infrastructures, and to enhancements of the AARC BPA to support more effectively research infrastructures by further expanding authorisation aspects and enabling new use-cases. The upcoming project combines both technical and architectural measures as well as trust and identity policies to define and validate new technical and policy guidelines for the AARC BPA. We will describe the project objectives, the outline of the upcoming architectural and trust policy guidelines, and – most importantly – how one can contribute to this open and deliberately globally inclusive process. By employing existing structures (such as REFEDS, FIM4R, IGTF, and WISE), via a compendium process, and through broad global representation in AEGIS as a direction-setting body, the project strives to expand the number of research communities that can implement the AARC BPA and the AARC guidelines.

        Speaker: David Groep (Nikhef and Maastricht University)
      • 122
        2024 and Beyond: FIM4R's Contributions to the AAI landscape for Research

        The Federated Identity Management for Research (FIM4R) community is a forum where research communities convene to establish common requirements, combining their voices to convey a strong message to Federated Identity Management (FIM) stakeholders. FIM4R produced two whitepapers on the combined Authentication and Authorization Infrastructure (AAI) requirements for research communities in 2012 and 2018, which contributed to the two Authentication and Authorization for Research Collaboration (AARC) projects, with as outcome the AARC Blueprint Architecture (BPA) and accompanying guidelines. In 2020, FIM4R led the effort to produce a position paper on the EOSC identity management strategy from the perspective of research communities.

        Post-pandemic, FIM4R organised two workshops at the end of 2022 (US) and early 2023 (Europe), and we will organise a new workshop in early 2024 in Copenhagen. The AAI topics include work on requirements regarding challenges for small and midsize communities, policies on multi-proxy setups, and OpenID federation. Additionally, a third iteration of the AARC project (AARC Technical Revision to Enhance Effectiveness, or AARC-TREE project) will address the AAI challenges that were being raised meanwhile, among others, by FIM4R,

        In the Copenhagen meeting and during the duration of the AARC TREE project, FIM4R will be the sounding board for requirements and outcomes of the AARC work.

        In our presentation, we will focus on outcomes of the meeting in January and previous post-pandemic workshops, as well as the upcoming collaboration with AARC TREE.

        Our target audience are the research communities and the infrastructures providing their services.

        Aims of the presentation:

        • The audience will learn about the work done in FIM4R
        • Share the outcomes from the Copenhagen meeting (Jan 2024)
        • Share the and ongoing work with AARC TREE
        • Promote participation in FIM4R from the asian-pacific region
        Speaker: Mr Maarten Kremers (SURF)
    • Physics & Engineering Application Conf. Room 1

      Conf. Room 1

      BHSS, Academia Sinica

      Convener: Josep Flix (PIC / CIEMAT)
      • 123
        Signal model parameter scan using Normalizing Flow

        The discovery of Beyond the Standard Model (BSM) is a major subject of many experiments, such as the ATLAS and CMS experiments with the Large Hadron Collider, which has the world's highest centre-of-mass energy. Many types of BSM models have been proposed to solve the issues of the Standard Model. Many of them have some or many model parameters, e.g. the Minimal Supersymmetric Standard Model, which is one of the most famous BSM models, has more than 100 model parameters. These model parameters are free parameters; they cannot be predicted from theories and need to be tested experimentally.

        Data analysis of BSM model searches involves comparing observed experimental data with a particular BSM model. When the BSM model parameters are multidimensional, performing an analysis covering the whole phase space is difficult. Instead, it is often performed by fixing some of the model parameters to focus on one or two parameters or by defining and using some typical benchmark points, resulting in phase space holes that are not covered by the search.

        This talk presents a parameter scan technique for BSM signal models based on Normalizing Flow. Normalizing Flow is a type of deep learning model that transforms a simple probability distribution into a complex probability distribution as an invertible function. By learning an invertible transformation between a complex multidimensional distribution, such as experimental data observed in collider experiments, and a multidimensional normal distribution, the Normalizing flow model gains the ability to sample (or generate) pseudo experimental data from random numbers and to evaluate a log-likelihood value from multidimensional observed events. The Normalizing Flow model can also be extended to take multidimensional conditional variables as arguments. That is, the Normalizing Flow model becomes a generator and evaluator of pseudo experimental data conditioned by the BSM model parameters. The log-likelihood value, the output of the Normalizing Flow model, is a function of the conditional variables. Therefore, the model can quickly calculate gradients of the log-likelihood to the conditional variables. Following this property, it is expected that the most likely set of conditional variables that reproduce the experimental data, i.e. the optimal set of parameters for the BSM model, can be efficiently searched. This talk will demonstrate this on a simple dataset and discuss its limitations and future extensions.

        Speaker: Masahiko Saito (ICEPP, The University of Tokyo)
      • 124
        The Application of Cluster-Based Distributed Computing in High Energy Physics Experiments (Remote Presentation)

        The computing cluster of the Institute of High Energy Physics has long provided computational services for high energy physics experiments, with a large number of experimental users. With the continuous expansion of the scale of experiments and the increase in the number of users, the queuing situation of the existing cluster is becoming increasingly severe.

        To alleviate the shortage of local cluster resources and the long job queuing time, the Dongguan Big Science Data Center has provided 18,000 CPU cores to expand the scale of the Institute of High Energy Physics cluster. Considering that the experimental users of the Institute of High Energy Physics have long maintained the habit of submitting computational jobs using the cluster, it would be difficult to promote the use of these remote resources in various experiments through grid computing. Therefore, this paper designs and implements cluster-based distributed computing.

        Cluster-based distributed computing monitors the demand of the cluster queue for distributed resources through the implementation of Glidein Factory, and completes the dynamic expansion of distributed resources and distributed job scheduling. Secondly, through the XRootD proxy and the CVMFS file system, data access between user jobs at distributed sites and the local cluster is facilitated.

        We have implemented cross-platform identity authentication based on Kerberos Tokens to ensure that user jobs at distributed sites have access rights to local cluster services at the Institute of High Energy Physics. A detailed token update and maintenance mechanism has been designed to ensure the timeliness of the token during user job queuing and running.

        Finally, considering users' long-standing habit of submitting jobs using the cluster, in order not to add extra learning costs to users, we have developed relevant job submission tools to achieve transparent distributed scheduling for users.

        Currently, cluster-based distributed computing has been preliminarily promoted and used in BES, LHAASO, and HERD experiments, contributing over 30,000,000 CPU core hours in total.

        Speaker: Mr Chaoqi Guo (Institute of High Energy Physics, Chinese Academy of Sciences)
      • 125
        Content Delivery Network solutions for the CMS experiment: the evolution towards HL-LHC

        The Large Hadron Collider at CERN in Geneva is poised for a transformative upgrade, preparing to enhance both its accelerator and particle detectors. This strategic initiative is driven by the tenfold increase in proton-proton collisions anticipated for the forthcoming high-luminosity phase scheduled to start by 2029. The vital role played by the underlying computational infrastructure, the World-Wide LHC Computing Grid, in processing the data generated during these collisions underlines the need for its expansion and adaptation to meet the demands of the new accelerator phase. The provision of these computational resources by the worldwide community remains essential, all within a constant budgetary framework. While technological advancements offer some relief for the expected increase, numerous research and development projects are underway. Their aim is to bring future resources to manageable levels and provide cost-effective solutions to effectively handle the expanding volume of generated data. In the quest for optimised data access and resource utilisation, the LHC community is actively investigating Content Delivery Network (CDN) techniques. These techniques serve as a mechanism for the cost-effective deployment of lightweight storage systems that support both, traditional and opportunistic compute resources. Furthermore, they aim to enhance the performance of executing tasks by facilitating the efficient reading of input data via caching content near the end user. A comprehensive study is presented to assess the benefits of implementing data cache solutions for the Compact Muon Solenoid (CMS) experiment. This in-depth examination serves as a use-case study specifically conducted for the Spanish compute facilities, playing a crucial role in supporting CMS activities. Data access patterns and popularity studies suggest that user analysis tasks benefit the most from CDN techniques. Consequently, a data cache has been introduced in the region to acquire a deeper understanding of these effects. In this contribution, the details of the implementation of a data cache system in the PIC Tier-1 compute facility is presented. It includes insights into the developed monitoring tools and discusses the positive impact on CPU usage for analysis tasks executed in the region. The study is augmented by simulations of data caches, with the objective of discerning the most optimal requirements in both size and network connectivity for a data cache serving the Spanish region. Additionally, the study delves into the cost benefits associated with deploying such a solution in a production environment. Furthermore, it investigates the potential impact of incorporating this solution into other regions of the CMS computing infrastructure.

        Speaker: Josep Flix (PIC / CIEMAT)
      • 126
        Fundamental Research and Space Economy: the Italian strategy in the new High-Performance Computing, Big Data e Quantum Computing Research Centre

        In the context of the Italian National Recovery and Resilience Plan (NRRP), the High-Performance Computing, Big Data e Quantum Computing Research Centre, created and managed by the ICSC Foundation, has been recently established as one of the five Italian “National Centres” to address designated strategic sectors for the development of the country, including simulations, computing, and high-performance data analysis, agritech, development of gene therapy and drugs via RNA technology, sustainable mobility, biodiversity. The focus of this specific National Supercomputing Centre is on maintenance and upgrade of the Italian HPC and Big Data infrastructure, as well as on the development of advanced methods and numerical applications and software tools to integrate computing, simulation, collection, and analysis of data of interest for research, manufacturing, and society, also through cloud and distributed approaches. In particular, in a hub-spoke set-up, the so-called “Spoke 2” is devoted to research at the frontiers of theoretical and experimental physics, mainly on experimental particle physics research, conducted with or without accelerating machines, as well as detectors studying gravitational waves, and more. The talk will present the organisation and activity status of this spoke and elaborate on its scientific and technological contributions to the overall innovation ecosystem.

        Speakers: Daniele Bonacorsi (University of Bologna) , Sandra Malvezzi (Università di Milano-Bicocca) , Tommaso Boccali (INFN)
    • Closing Ceremony Room 2

      Room 2

      BHSS, Academia Sinica

    • ASGC User Training Auditorium

      Auditorium

      BHSS, Academia Sinica

    • 15:30
      Coffee Break
    • ASGC User Training Auditorium (BHSS, Academia)

      Auditorium

      BHSS, Academia