International Symposium on Grids & Clouds 2020 (ISGC 2020)

Asia/Taipei
BHSS, Academia Sinica

BHSS, Academia Sinica

Ludek Matyska (CESNET) , Simon C. Lin (ASGC)
Description

Theme: Challenges in High Performance Data Analytics: Combining Approaches in HPC, HTC, Big Data and AI

While the research data are becoming a real asset nowadays, it is an information and knowledge gained through thorough analysis that makes them so valuable. To process vast amounts of data collected, novel high performance data analytics methods and tools are needed, combining classical simulation oriented approaches, big data processing and advanced AI methods. Such a combination is not straightforward and needs novel insights at all levels of the computing environment – from the network and hardware fabrics through the operating systems and middleware to the platforms and software, not forgetting the security – to support data oriented research. Challenging use cases that apply difficult scientific problems are necessary to properly drive the evolution and also to validate such high performance data analytics environments.

The goal of ISGC 2020 is to create a face-to-face venue where individual communities and national representatives can present and share their contributions to the global puzzle and contribute thus to the solution of global challenges. We cordially invite and welcome your participation!

    • APGridPMA Meeting Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Eric YEN (ASGC)
    • LHCOPN & LHCONE Workshop Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Research Data Management Training Workshop Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • 10:30
      Coffee Break
    • APGridPMA Meeting Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Eric YEN (ASGC)
    • LHCOPN & LHCONE Workshop Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Research Data Management Training Workshop Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • 12:30
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • Education Informatics Workshop: Curriculum Development for STEAM Education Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Tosh Yamamoto (Kansai University)
    • LHCOPN & LHCONE Workshop Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Research Data Management Training Workshop Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • 15:30
      Coffee Break
    • Education Informatics Workshop Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Tosh Yamamoto (Kansai University)
    • LHCOPN & LHCONE Workshop Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Research Data Management Training Workshop Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • FitSM Foundation Training and Certification Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Mr Sy Holsinger (EGI Foundation)
    • LHCOPN & LHCONE Workshop Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Security Workshop Room 802

      Room 802

      BHSS, Academia Sinica

    • Workshop on Machine Learning & AI Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      Convener: Prof. Daniele Bonacorsi (University of Bologna)
    • 10:30
      Coffee Break
    • FitSM Foundation Training and Certification Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • LHCOPN & LHCONE Workshop Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Security Workshop
    • Workshop on Machine Learning & AI Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      Convener: Prof. Daniele Bonacorsi (University of Bologna)
    • 12:30
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • FitSM Foundation Training and Certification Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Mr Sy Holsinger (EGI Foundation)
    • LHCOPN & LHCONE Workshop Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Security Workshop Room 802

      Room 802

      BHSS, Academia Sinica

    • Workshop on Machine Learning & AI Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      Convener: Prof. Daniele Bonacorsi (University of Bologna)
    • 15:30
      Coffee Break
    • FitSM Foundation Training and Certification Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Mr Sy Holsinger (EGI Foundation)
    • LHCOPN & LHCONE Workshop Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Mr Edoardo Martelli (CERN)
    • Security Workshop Room 802

      Room 802

      BHSS, Academia Sinica

    • Workshop on Machine Learning & AI Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      Convener: Prof. Daniele Bonacorsi (University of Bologna)
    • Opening Ceremony & Keynote Session: I Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 1
        Opening Remarks
        Speakers: Dr Ludek Matyska (CESNET) , Simon C. Lin (ASGC)
      • 2
        The European Open Science Cloud on a global stage
        Over the past years, numerous policy makers from around the world have articulated a clear and consistent vision of global Open Science as a driver for enabling a new paradigm of transparent, data-driven science as well as accelerating innovation. In Europe, this vision is being realised through an ambitious programme under the heading of the European Open Science Cloud (EOSC). The EOSC will offer 1.7 million European researchers and 70 million professionals in science, technology, the humanities and social sciences a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines by federating existing scientific data infrastructures, currently dispersed across disciplines and the EU Member States. The research communities that EOSC aims to serve span the globe and so EOSC cannot operate in isolation but must find ways to be part of the broader, international open science movement. This presentation will give an update on the status of EOSC and the approaches being considered for how EOSC can engage at a global level.
        Speaker: Dr Bob Jones (CERN)
    • 10:30
      Group Photo & Coffee Break
    • Biomedicine & Life Science Applications Session Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      • 4
        Towards Speech reconstruction from MEG signal when audio stimuli
        In recent years, research and development have been conducted on Brain-Machine Interface (BMI), a technology that directly connects the brain and machine using information obtained by measuring brain signals or giving stimulation to the brain. BMI is expected to contribute not only to the medical field but also to various other fields. BMI technology can be divided into two types (input-type and output-type). Among them, output-type BMI is a technology for sending signals from within the brain to the outside. This technology is being researched and developed to help patients with intractable diseases who are unable to move their bodies as they wish. Communication with patients with intractable diseases is an essential issue in nursing care. Improving communication can improve a patient’s quality of life. Conventionally, an output-type BMI system that selects words one by one by image stimulation or controlling the cursor from brain activity has been the mainstream. However, they must pay attention to the screen, and input speed is plodding. To reconstruct with simple word information, complex information such as emotions and intonation could not be given. As an innovative method to solve these problems, attention has been focused on a method that converts from brain neural activity to speech. This method can be divided into two: reconstructing external audio stimuli and reconstructing the subject's own speech. In both methods, other research results show that high-quality speech has been reconstructed. However, in this research, electrodes are surgically implanted in the brain, and the invasive electrocorticography(ECoG) is measured. In this research, we propose a method to reconstruct external audio stimuli focusing on magnetoencephalography (MEG), which is one of the non-invasive measurement techniques. In this method, the target parameter for reconstruction is a Vocoder parameter (spectral envelope, band aperiodicity, fundamental frequency, voiced-unvoiced(VUV)) used in a speech synthesis system. However, since brain activity data acquired by MEG has limitations such as long-term measurement costs and physical constraints, large-scale data collection for deep learning cannot be performed sufficiently. To learn even small data efficiently, the target parameter is learned with Auto-Encoder. The target parameters are converted to abstract features by the output of the learned Auto-Encoder middle layer. This research introduces how to perform better learning with limited brain activity data and reconstruct external audio stimuli.
        Speaker: Mr Masato Yamashita (Kanazawa Institute of Technology)
      • 5
        Convolutional neural networks for DESY photon science
        We are exploiting possible applications of artificial intelligence at the German electron synchrotron (DESY) in Hamburg, in particular in the field of photon science. Our current focus is on the use of convolutional neural networks applied to 2D and 3D image analysis for life science. We will present successful applied semantic segmentation of volumetric 3D synchrotron radiation microcomputed tomography (SRμCT) data with a U-Net. We have trained a convolutional neural network to segment biodegenerable bone implants (screws) and degeneration products from bone and background. The results obtained significantly outperform the previously used semi-automatic segmentation procedure in terms of accuracy and has successfully been applied to more than 100 rather heterogeneous datasets. Remarkably the performance of the U-Net segmentation is considerably better than the experts segmentation that has been used for training. In addition to our ongoing work for instance segmentation (SRμCT) in the context of material science, object detection and classification for cryo electron tomography will be introduced. With a combination of a U-Net and a simple convolutional network for object classification, membrane protein complexes are identified in CryoEM tomograms, for subsequent subtomogram averaging. The machine learning efforts at DESY-IT also include the development of a classification/filter method for XFEL SFX diffraction data.
        Speaker: Dr Christian Voss (DESY Hamburg)
    • e-Science Activities in Asia Pacific Session: I Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 6
        eScience Activities in Taiwan
        Speaker: Eric YEN (ASGC)
      • 7
        eScience Activities in China
        Speaker: Dr Gang Chen (Institute Of High Energy Physics)
      • 8
        eScience Activities in Mongolia
        Speaker: Mrs Otgonsuvd Badrakh (Researcher at Institute of Physics and Technology, Mongolian Academy of Sciences)
      • 9
        eScience Activities in Malaysia
        Speaker: Dr Suhaimi Napis (Universiti Putra Malaysia)
      • 10
        eScience Activities in Indonesia
        Speaker: Basuki Suhardiman
      • 11
        eScience Activities in Thailand
        Speaker: Dr Chalee Vorakulpipat (NECTEC)
      • 12
        eScience Activities in Philippines
    • 13:00
      Lunch
    • Network, Security, Infrastructure & Operations Session: I Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      Convener: Dr David Groep (Nikhef)
      • 13
        WISE development of new baseline policies for collaborating research infrastructures
        As most are fully aware, cybersecurity attacks are an ever-growing problem as larger parts of our lives take place on-line. Distributed digital infrastructures are no exception and action must be taken to both reduce the security risk and to handle security incidents when they inevitably happen. These activities are carried out by the various e-Infrastructures and in recent years a successful collaboration between Infrastructures has helped improve the security and allowed all to work more efficiently. For more than 4 years, “the WISE community” has enhanced best practice in information security for IT infrastructures for research. WISE fosters a collaborative community of security experts and builds trust between IT infrastructures, i.e. all the various types of distributed computing, data, and network infrastructures in use today for the benefit of research, including cyberinfrastructures, e-infrastructures and research infrastructures. Through membership of working groups and attendance at workshops these experts participate in the joint development of policy frameworks, guidelines, and templates. With participants from e-Infrastructures such as EGI, EUDAT, GEANT, EOSC-hub, PRACE, XSEDE, HBP, OSG, NRENs and more, the actual work of WISE is performed in focussed working groups, each tackling different aspects of collaborative security and trust. Since the presentation on WISE at ISGC2019, several working groups have been active. Two face to face community meetings have been held during 2019, the first in Kaunas, Lithuania, jointly with the GEANT SIG-ISM group, and then in San Diego, USA as part of the NSF Cybersecurity Summit for Large Facilities. The WISE Security for Collaborating Infrastructures (SCI) working group, in collaboration with Trust and Security activities in EOSC-hub and the GEANT GN4-3 project, is working on new baseline/template security policies. This builds on the Policy Development Kit, an output of the EU Horizon 2020 projects Authentication and Authorisation for Research Collaborations (AARC/AARC2), and is aiming to produce new WISE policy templates for Service Operations Security, for Acceptable Authentication Assurance and also for Research Community Management operations. This talk will present a report from some of the activities of WISE since ISGC2019, together with details of the work to produce new security baseline policy templates, guidelines and recommendations.
        Speaker: Dr David Kelsey (STFC-RAL)
      • 14
        Trust Groups and the use of Threat Intelligence in the Research Community
        The information security threats currently faced by the research community are not only sophisticated, but also in many instances highly profitable for the actors involved. Evidence suggests that targeted organisations take on average more than six months to detect a cyber attack; the more sophisticated the attack, the more likely it is that it will pass undetected for longer. In this context, the WLCG Security Operations Centre Working Group has been working to establish a threat intelligence sharing trust group in the academic research community. The purpose of this group would enable members to easily exchange Indictors of Compromise (IoCs) of ongoing security incidents and allow them to use this information to secure their own infrastructures. In addition, this capability would enhance the ability of participating organisations to better respond to security incidents spanning multiple sites within the community. The mandate of the working group includes the exploration of both the technical and social aspects of forming such a trust group. The technological means of sharing intelligence is provided by the Malware Information Sharing Platform (MISP) which allows for considerable flexibility in the design of an information sharing network. The topology adopted by the working group focuses in the first instance on a purpose deployed central MISP instance hosting at CERN, which leverages existing trust partnerships. In addition, MISP allows for a number of methods to access intelligence data, including synchronising events to peer instances as well as direct access via a REST API. The type of intelligence being shared in this trust group is often most applicable at the campus or institution level; an important part of the work of establishing the group is investigating ways of sharing the relevant intelligence with the hosting institutions. We discuss the current extent of this trust group, including examples of sites that have deployed MISP instances themselves as well as those that are using the central instance directly. We also consider the type of events that are being shared and methods used to help sites gain confidence in sharing information of their own. In addition, we report on the outcomes of a recent workshop focussing on threat intelligence which took place in Nikhef in October 2019, which addressed many of these issues, as well as including a validation of a reference workflow incorporating threat intelligence. This workflow includes a technology stack previously reported on at ISGC 2018. Finally, we report on the status of ongoing work to establish the necessary rules of engagement for sites taking part in this trust group.
        Speakers: Dr David Crooks (UKRI STFC) , Mr Liviu Valsan (CERN)
      • 15
        Global operational security in collaborating distributed infrastructures
        Since the Stakkato Incident (https://en.wikipedia.org/wiki/Stakkato) in 2004, it is well understood that incidents can spread quickly over infrastructures with a shared user community. With the advent of federated identity management, the likelihood of incidents "hopping" from one infrastructure to another has increased. In 2004, the attacker took advantage of the fact that the security teams at the affected infrastructures had only limited opportunities for a coordinated response, and that the infrastructures were rather isolated with regard to policies and procedures. As a result, the incident response happened primarily at the local level. During the setup of large federated computing infrastructures like EGI and OSG, security was an important component and the readiness of the infrastructures to respond to an multi site incident were assessed in various scenarios. An increased understanding of the implications of a compromised account in a federated identity management system, which gives a miscreant access to multiple distributed IT infrastructures, along with communities and frameworks like WISE and SIRTFI, aimed at harmonizing the security policies across infrastructures, have helped to build an environment where a coordinated incident management is possible. EGI and OSG already provide operational security to infrastructures that share large user communities. Even so, we may face a similar situation to the infrastructures in 2004, since the anticipated methods of collaboration in an incident affecting EGI and OSG are not yet fully tested. In this presentation we will describe a possible way towards a Security Service Challenge spanning the Infrastructures coordinated by EGI and OSG, with the goal of assessing our readiness to manage a global incident affecting the EGI and OSG infrastructures. One way to tackle this is to break down the main exercise into challenges addressing the components of a globally coordinated incident response, and to run these first before getting everything together to challenge the overall security situation. To accomplish this, we will develop a table top roleplay for the infrastructure CSIRTs; in this roleplay the CSIRTs will handle a ficticious incident spanning the infrastructures. In their response the CSIRTs should follow their existing policies and procedures, with an important outcome being insight into where potential conflicts in these may exist. In addition, it is hoped that this exercise will show whether the information necessary to handle the incident is available to the infrastructure CSIRTs. The availability of this information, used in an abstract way in the tabletop exercise, will be tested in separate challenges with the involved entities. One example of this is the information available at the Virtual Organisations, or resource centers, in particular in the logs of the various services used to access the resources offered by the infrastructure. Once the parts of the overall incident response are known to work, the SSC would then address the connections between the IR components and should give a meaningful assessment of our capabilities to an incident spanning multiple infrastructures. Finally, as a result from SSC-19.03, the importance of having the evaluation of the different actions taken as soon as possible was demonstrated. Ideally this would be made available within a matter of days after the SSC is over; in the presentation we will show how a near time evaluation can be achieved.
        Speaker: Dr Sven Gabriel (Nikhef/EGI)
    • Physics & Engineering: I Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Prof. Junichi Tanaka (University of Tokyo)
      • 17
        The use of HPC at the Tokyo regional analysis center
        The ATLAS experiment is one of the experiments at the Large Hadron Collider (LHC). It discovered the Higgs boson in 2012 and continues to investigate unsolved physics problems. The current total data amount of ATLAS is more than 200 PB. To manage and process the data, the LHC experiments use the Worldwide LHC Computing Grid (WLCG) project, which is the collaboration of computing centers of 42 countries. Our computing center, the Tokyo regional analysis center, at the International Center for Elementary Particle Physics (ICEPP), the University of Tokyo, is one of the centers of WLCG and it supports ATLAS virtual organization. The system consists of 10,000 CPU cores and 16 PB disk storage. We provide 7,700 CPU cores and 10 PB disk storage to WLCG and remaining resources are dedicated to the local usage of the ATLAS Japan member. All hardware devices are supplied by the three years rental. The current system is the 5th system which started in January 2020. LHC will be upgraded to increase the luminosity 5 times in 2026. The data amount will be increased more than 10 times if we use the current analysis model and both calculation power and storage size will be in short supply. To solve this problem, a lot of software improvements have been done and new analysis models have been developed. However, there is still a gap between the amount of required resources and what we can obtain. Therefore, the WLCG system itself is needed to be upgraded. High-Performance Computing (HPC) resources will be one of the solutions to increase the calculation power of WLCG. In recent years, many countries try to develop new HPCs. The University of Tokyo has several HPC systems, too. We used one of them, so-called Reedbush,and developed the interface between WLCG and Reedbush at the Tokyo regional analysis center. Reedbush system consists of Intel Xeon E5-2695v4 CPU and NVIDIA Tesla P100 GPU. Each node has 2 CPU (36 cores) and 256 GB memory. Some nodes are provided without GPU. All nodes have no direct external network access, which requires special setup to manage input and output data of WLCG jobs. We will talk about how we have established the interface to the HPC in the ATLAS workflow and the experiences in the development.
        Speaker: Dr Michiru Kaneda (ICEPP, the University of Tokyo)
      • 18
        Using HEP experiment workflows for the benchmarking of WLCG computing resources
        The HEP-SPEC06 (HS06) benchmarking suite has been used for over a decade in the accounting and procurement of WLCG resources. HS06 is stable, accurate and reproducible, but it is an old benchmark and it is becoming clear that its performance and that of typical HEP applications have started to diverge. After evaluating several alternatives for the replacement of HS06, the HEPIX benchmarking WG has chosen to focus on the development of a HEP-specific benchmarking suite based on actual software workloads of the LHC experiments. This approach, based on container technologies, is designed to provide by construction a better correlation between the new benchmark and the throughputs of the experiment production workloads. It also offers the possibility to separately explore and describe the independent architectural features of different computing resource types. This is very important in view of the growing heterogeneity of the HEP computing landscape, where the role of non-traditional computing resources such as HPCs and GPUs is expected to increase significantly. This presentation will review the status and outlook of development of the new benchmarking suite, and in particular of the efforts to value HPC resources, at the time of ISGC2020.
        Speaker: Dr Andrea Valassi (CERN)
    • VRE Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

      Convener: Dr Patrick Fuhrmann (DESY/dCache.org)
      • 19
        EGI Workload Manager Service
        The EGI Workload Manager Service (WMS) is based on the DIRAC Interware and is part of the EOSC-Hub Project service catalog. The service provides access to various computing resources of the EGI infrastructure to various scientific communities in Europe and in the world. Different kinds of computing resources can be connected to the Manager: HTC/grid resources, cloud resources or standalone computing clusters including HPC centers. The DIRAC WMS provides tools for submitting jobs with a detailed description of their requirements, job execution on the matching resources, monitoring and accounting of the consumed computing power. It assists users in construction and execution of complex workflows consisting of very large numbers of jobs automatically submitted as soon as all the necessary prerequisites are available. The EGI WMS ensures user support helping to adapt their applications for the efficient use on the currently available resources. This required multiple developments to meet user’s needs in accessing new computing technologies, e.g. GPUs, containers, cloud clusters, etc. New developments were also needed for managing user communities using Authentication/Authorization systems based on OAuth2/OIDC technologies and SSO Federated Identity Provides solutions. In this contribution we will present the experience with setting up and running the EGI Workload Manager Service and we will describe the new developments carried out to fulfill requirements of the EOSC-Hub users.
        Speaker: Dr Andrei Tsaregorodtsev (CPPM-IN2P3-CNRS)
      • 20
        Anonymous Jobs enabling Scaling and Integration of external Resources
        Drawback of an authentication-based access control to storage or computing resources is the need to have a consistent identity namespace over all such resources, e.g., a program runs under a user ID and can read/write files belonging to that suer or group. For a distributed instance of storage and computing this means, that at all components the correct identities have to be mapped and authenticated, where errors might pose significant security risks. By moving to an authorization-based access control and confining the authentication to a few central components, one can overcome the constraints of a site-wide identity handling and allows as well for an easier scaling out to external resources. We propose for our local workflow chains the concept of anonymous jobs, where such an anonymous jobs is a self-sufficient description of the job's file input and output as well as the processing function or application combined with the necessary identity-free access tokens for both, storage and compute resources. For automatised workflow chains an event initiates a processing chain, in which access tokens in the form of Macaroons are requested from the dCache storage system. As the access tokens are tailored to only the necessary paths for input \& output and limited in time as well as network ranges, the risk of file losses can be significantly reduced compared to the full file namespace available to an user. Similarly, compute resources on the HTCondor batch system could be abstracted as tokens, so that one can combine the access tokens in a self-sufficient job, that can be processed decoupled from the initial user.
        Speaker: Mr Michael Schuh (DESY)
      • 21
        Command Line Web Portal for multiple physical and virtual clusters on Scientific computing Platform
        In recent years, many supercomputers and clusters have been deployed and provide massive High Performance Computing (HPC) resources for users. As a virtual cluster can provide similar efficiency of physical computing resource, many users choose virtual clusters to execute HPC jobs for satisfying different kinds of requirements. Due to the advantages of low resource occupation, fast execution speed and general-purpose, Command Line (CL) is still an important interactive interface for accessing physical or virtual resources in scientific computing. Different from desktop and Web graphical interfaces, it is difficult and annoying for users to learn, remember and use lot of commands properly, especially in a scientific computing platform composed of multiple management domains that have multiple supercomputers and clusters. There are 3 important and annoying issues with using CL in scientific computing. In network security, most HPC resources must be accessed through dedicated VPN networks, or that a SSH client must be running on the specified server whose IP is permissioned by all network firewalls that locate on the route path between the client and the targeted login server. In credential security, a user must install a client in different operating systems and platforms, configure a certificate in the target client, and input complex password information and the other authentication information from a smart card or other hardware devices. In configuration management, a user needs to repeat complicated and cumbersome works as done in network and credential security when the user wants to access a new cluster by CL. The frequent creation and destruction of virtual clusters makes it even more difficult. Focusing on solving above issues, Command Line WEB Portal (CLWP) was proposed in this paper which provides easy-to-use access to multiple physical and virtual computing resources. The entire process of command execution consists of 3 interconnected message loops, a Web Socket (WS) circle, an Event Bus (EB) circle and a Secure Shell (SSH) circle. The WS-circle consists of xterm.js located on the browser, and a pair of WS client and server which are located separately on the front-end browser and on the back-end server. The WS-circle receives a command each time from a user and displays output of each command. The SSH-circle consists of a pair of SSH client and server which are located separately on the back-end server and a login server of physical or virtual clusters. The SSH-circle connects the specified login server by VPN network or satisfying certain firewall rules, then receives a command from the EB-circle and returns output of the command line by line. The EB-circle consists of an input chain and an output chain, which are core components in the entire process of command execution. The input chain is responsible for receiving commands letter by letter, modifying and checking commands on syntax, format and security attributes by different filters in the chain. The output chain is responsible for help users understanding output of commands from job life-cycle, cluster status and other aspects. Each command inputted from browser will go through the entire process consisted of 3 circles described above to provide easy-to-use command line service for users. In addition, CLWP provides account registration service for users to add information about new HPC configuration and new account credential into persistence storage in an intuitive way. CLWP can help users authenticate into a login server by account/password and account/certificate credentials. CLWP also provides SSH login service for virtual clusters resided in HPC resources which are prohibited to access internet for security. In login process, SSH-cycle firstly through the fixed port logins to any login server on front of HPC computing resources that container virtual clusters, and then logins to the target virtual cluster from any specified port. The anonymous account is the unique identify of each user in CLWP, which records different group of information, such as user, organization, research projects, and computing requirements. The anonymous account is mapped up to the 3rd Identify Provider (IdP), such as China Science and Technology network passport (CST-passport), webchat and similar public IdPs. The anonymous account is also mapped down to one or more accounts belong to a user from different HPC resources or virtual clusters. A user logins to CLWP by an up mapping account such as CST-passport but not an anonymous account. Then the user selects any physical or virtual HPC resources and a local account from the down mapping accounts on the target HPC resource, and accesses the target HPC resource based on web command line service provided by CLWP.
        Speaker: Dr Rongqiang Cao (Computer Network Information Center, Chinese Academy of Sciences)
      • 22
        Platform Services in the National High-Performance Computing Environment of China
        In the past 20 years, being the national high-performance computing environment in China, CNGrid had shown a magnificent development from various angles: computing and storage capacity, number of software and user accounts, the supported projects and research papers, etc. As HPCs are widely used for accelerating the experiments and simulations of scientific researches, the main goal of CNGrid is to provide computing resources and services to researchers from various subjects. There are several advantages of using HPC environments instead of single supercomputers. Single supercomputers have limited kinds of software, and sometimes need to be shut down for maintenance work, while HPC environments that are composed of multiple HPCs can provide much more software covering different subjects, and can still provide services when any single HPC in the environment is offline. During the service and maintenance work for CNGrid, it is found that only a small part of users can skilfully play with supercomputer systems, while the vast majority are very familiar with their research work but knows almost nothing about how to use HPCs to solve their problems. To help these green hands, CNGrid has built its platform service to aggregate HPC resources from the infrastructure pool and application pool, and provide easy accesses by using both command line consoles and web portals. Moreover, by providing APIs to developers, CNGrid can support communities and platforms for special research subjects, and is able to gather more resources to run simulation tasks for large scientific researching equipment. At the moment CNGrid is in front of the gate towards the new era of exa-scale computing, with the followed new challenges. Two categories of questions lies to be answered by the HPC environments: the construction of the environment and the user services. The detailed entries are listed following: - How to support the HPC resources and data aggregation in the context of exa-scale? - How to design the task scheduler to improve the HPC resource usage and efficiency? - How to solve the bottleneck of internet transmission speed? - How to ensure the environment working stably and securely? - How to provide easier accesses from users to HPC resources? - How to support communities and platforms of various subjects in a unified method? - How to evaluate the resource level and the service level in the environment? We will unveil the considerations and efforts that have and being made towards these questions while the continuous construction of CNGrid. We will also introduce some planning ideas of development directions in the next stage of HPC environment construction in China.
        Speaker: Prof. Xuebin Chi (Computer Network Information Center, Chinese Academy of Sciences)
    • 15:30
      Coffee Brea
    • Network, Security, Infrastructure & Operations Session: II Confernece Room 2

      Confernece Room 2

      BHSS, Academia Sinica

      Convener: Dr David Kelsey (STFC-RAL)
      • 23
        Working towards functional OIDCfed
        The way we, in Research and Education, do identity federation - especially in collaborations - diers from the `standard' use case that companies have. After all, in contrary to one identity provider to many services, we want many sources of identity linked to many services. For many years, SAML has successfully been coerced into adapting to this model, but the times are changing, and OpenID Connect (OIDC) and OAuth are starting to enjoy the support of IT companies - more so than SAML does. With the years of experience SAML has brought us, we as a community are able to implement federations in a way that should prevent issues we have encountered before. This is done in the OIDCfed specication, lead by Roland Hedberg. To contribute to this process, we have during the past two years, black-box tested the OIDCfed reference implementation written in Python, and leveraged these experiences to create a secondary (HSM supporting) full implementation of a `Trust Anchor'/Metadata Signing Service (MDSS) - based solely on the on the specication, especially focusing on ease-of-use and maintainability. At the time of writing this abstract, we are working towards a library that supports identity sources and services in their usage of the new specication. This library too, is going to be a 100% compliant, full implementation of the relevant parts of the specication, based on sustainable technology. In this talk, we would like to share our experiences working on the OIDCfed specication and implementations, and explore the needs of the community with regards to using this new standard.
        Speaker: Jouke Roorda (Nikhef)
      • 24
        Making Identity Assurance and Authentication Strength Work for Federated Infrastructures
        In both higher Research and Education (R&E) as well as in research-/ e-infrastructures (in short: infrastructures), federated access and single sign-on by way of national federations (operated in most cases by NRENs) are used as a means to provide users access to a variety of services. Whereas in national federations institutional accounts (e.g. provided by an university) are typically used to access services, many infrastructures also accept other sources of identity: provided by ‘community identity providers’, social identity providers, or governmental IDs. Hence, the quality of a user identity, for example in regard to identity proofing, enrollment and authentication, may differ - which has an impact on the service providers risk perception and thus their authorization decision. In order to communicate qualitative information on both identity vetting and on the strength of the authentication tokens used between the identity providers and service providers, assurance information is used - with the strength being expressed by different Levels of Assurance (LoA) or ‘assurance profiles’ combining the various elements in community-specific ways. While in the commercial sector assurance frameworks such as NIST 800-63-3 or Kantara IAF have been established, these are often considered as too heavy with strict requirements, and not appropriate for the risks encountered in the R&E community. This is why in the R&E space a more lightweight solution is necessary. The REFEDS Assurance Suite comprises orthogonal components on identity assurance (the REFEDS Assurance Framework RAF) and authentication assurance (Single Factor Authentication Profile, Multi Factor Authentication Profile) and provides profiles for low and high risk use cases. The Suite is applicable in many scenarios, like identity interfederations (cross-national collaborations) or for exchanging assurance information between identity providers and Infrastructure Proxies (according to AARC Blueprint Architecture). This presentation serves as a guidance on how the assurance values can be assessed and implemented with standard products (e.g. Shibboleth IdP, SimpleSAMLphp, SaToSa), and how that enables new use cases in the research infrastructures. This talk starts with a short overview of existing assurance frameworks showing the relationships and dependencies between commercial frameworks such as NIST 800-63 and Kantara and the standards introduced in the R&E sector. Following that, use cases of the REFEDS Assurance Suite will be presented to show how the REFEDS specifications can be used to exchange identity and authentication assurance in cross-collaborative scenarios. The focus of this talk lies in providing guidance for operators to facilitate the adoption of exchanging assurance information, also in regard to proxy scenarios where assurance elements may arise from different sources.
        Speaker: Mrs Jule A. Ziegler (Leibniz Supercomputing Centre)
      • 25
        Federated Identity Management Plans at the SDCC
        The evolving mission of the Scientific Data & Computing Center (SDCC) at BNL has grown to include support for nuclear physics (RHIC), high-energy physics (ATLAS, Belle-II and DUNE) and Photon Sciences (NSLS-II and CFN) among our major stakeholders. The diverse needs of the SDCC user community have guided our efforts to integrate federated identity services within our facility fabric and enable access to local and remote resources. In this presentation, we describe the status of our activities, the challenges faced to enable federated identity management system and the status of our newly federated tools and services.
        Speaker: Dr Jason Smith (Brookhaven National Laboratory)
      • 26
        UK Experiences with token based AAI
        Over the last two years the UK’s IRIS collaboration and IRIS 4x4 project has worked to deploy hardware and federating tools across the range of physics supported by STFC. Providing a coherent framework for accessing HTC, HPC and Open Stack cloud resources the IRIS IAM service provides federated access to resources based on the AARC blueprint architecture, removing friction for scientific communities and promising to facilitate a new generation of workflows across diverse resources.
        Speaker: Mr Ian Collier (STFC-RAL)
    • Physics & Engineering: II Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Dr Andrea Valassi (CERN)
      • 27
        Event classification using Graph Network
        After the Higgs boson discovery, the main interest in the elementary particle physics is the discovery of beyond the Standard Model. LHC, which is the most energetic collider in the world, continues to be the leading experiment in the energy frontier. While LHC does not increase the center of mass energy beyond 14 TeV over a few decades, the amount of collision data will be significantly increased by the upgrade to High-Luminosity LHC. We need to leverage the observed data to the fullest extent in such an era. One of the most attractive ways to utilize observed collision data is deep learning. Deep learning can represent a complex correlation between the input variables, and it is known to have better sensitivity than the traditional analysis method in some real examples. Although deep learning has a very significant capability to represent any complicated functions, such a representation ability leads to the overfitting of the training data. To use deep learning more effectively, we need to embed our domain knowledge in the deep learning model as inductive bias. Graph Network could accomplish such a requirement. Graph Network handles a graph, which has nodes and edges. By using a graph structure, we can assign a grouped element as a node and the known relation between elements as an edge. Additionally, a graph structure could produce interpretable results than the typical one-row representation. We apply Graph Network to an event classification problem to embed a domain knowledge as an inductive bias in the model. Implementation of inductive bias is expected to avoid the unphysical calculation of the input variables in the neural network. We expect to avoid overfitting and accomplish better classification performance than the traditional multilayer perceptron model, in particular, when the number of training data is insufficient. We construct the Graph Network model for the event classification for a typical physics process and compare the performance between the Graph Network and traditional analysis methods. Additionally, it is important to consider the known constraint in the experimental data. Usually, experimental data has symmetries, e.g., spatial rotation invariance. A generic deep learning model does not consider such a rule, resulting in the over-fitting. By giving such a known rule, we can improve the performance of deep learning. We will report the latest results and the issues in the implementation of the high-level domain knowledge.
        Speaker: Dr Masahiko Saito (University of Tokyo)
      • 28
        Managing Heterogeneous Compute Resources through Harvester and iDDS for ATLAS
        The ATLAS experiment has been making significant efforts to integrate new resources such as HPC and preemptible cloud resources in addition to the traditional grid resources, to meet continuously increasing needs for data processing. It is difficult to optimally exploit all resources since their intrinsic nature and requirements are quite different. Harvester has been developed around a modular structure to separate core functions and resource-specific plugins to simplify the operation with heterogeneous resources and provide a uniform monitoring view, enabling more intelligent workload management and dynamic resource provisioning based on detailed knowledge of resource capabilities and their real-time state. Harvester has been in production since early 2018 with various resources, which was one of the crucial milestones for LHC Run3. Intelligent Data Delivery Service (iDDS) is an experiment agnostic service to orchestrate workload management system and data management, in order to transform and deliver data and let clients consume data just in time. iDDS has been actively developed by IRIS-HEP and ATLAS. One of the main goals of iDDS is to address performance issues and suboptimal resource usage in ATLAS workflows. This talk will report architecture overview of Harvester and iDDS, migration of the entire ATLAS grid to Harvester, demonstration of scalable resource management with Kubernetes plugins, seamless integration of US and European HPC, and plans for the future.
        Speaker: Mr Fa Hui Lin (UTA)
      • 29
        A comparative analysis of machine learning techniques for prediction of software change: a HEP software case study (Remote Presentation)
        The objective of this work is to assess various machine learning techniques in order to predict change proneness of software modules [1, 2] that are used in High Energy Physics (HEP) domain. This type of prediction determines the modules that have a higher probability of modification across the version of a software. Therefore, together with other software quality assessment tools, it can help software developers during the software life cycle, leading them to focus on the modules that need the most attention. The present study has been carried out by measuring a set of software metrics for some HEP software, notably Geant4 [3] and ROOT [4], making use of proper software metrics tools, both open source and proprietary. Taking into account the existing documentation, changes to the code made during the development and maintenance phases of the software (such as improvements, fixed bugs and warnings) have been traced and codified. The resultant datasets have then been used to identify software metrics suitable to predict change proneness of software modules. In our previous studies [4, 5], work has been performed on unlabelled software metrics datasets, whose values belong to (HEP and non-HEP) software modules. These modules have been characterized by the lack of some kind of information (like their defectiveness) that can help developers to determine their quality. This lack is mainly due to the fact that the measurement of software metrics has been done on ongoing software projects with a partial knowledge of software changes. Therefore, our previous work has been initially focused on software modules defectiveness prediction by using unsupervised machine learning techniques. In the present work, our activity aims at investigating code changes by applying both supervised and semi-supervised machine learning techniques. The application of machine learning techniques to denote problems in HEP software is an open area of research: software metrics datasets are either incomplete or absent. To tackle this problem, a new dictionary of software changes has been defined by leveraging the collected information related to software changes. The dictionary is essential in the classification phase of the various software modules: terms like warning, fixed bug, minor fix and optimization have been opportunely codified and used to label each module according to the correspondent types of changes (e.g., a module could include minor fix and optimization changes related to an inappropriate development; another module could include warning and performance changes). The defined software changes dictionary will derive by the available software documentation of the considered HEP software. This work will in future be extended to generalize our achievements also to non-HEP software. As a final remark, the relationship between software metrics and code changes have been investigated by applying existing machine learning techniques. These techniques are compared in terms of performance indicators such as F-measure and Kappa statistics. Our labelling approach can be easily automated and used by software scientists to monitor their software over time and identify the modules that require a particular attention. [1] F. Khomh, M. Di Penta, Y.-G. Guéhéneuc, and G. Antoniol, 2012. “An exploratory study of the impact of antipatterns on class change-and fault-proneness”, Empirical Softw. Engg. 17, 3, 243–275, https://doi.org/10.1007/s10664-011-9171-y [2] Q. D. Soetens, S. Demeyer, A. Zaidman, and J. Perez, 2016. “Change-based test selection: An empirical evaluation”, Empirical Softw. Engg. 21, 5, 1990–2032, https://doi.org/10.1007/s10664-015-9405-5 [3] E. Ronchieri, M. G. Pia, T. Basaglia, and M. Canaparo, 2016. “Geant4 Maintainability Assessed with Respect to Software Engineering References”, In Proc. of IEEE NSS/MIC/RTSD 2016, https://doi.org/10.1109/NSSMIC.2016.8069636 [4] E. Ronchieri, M. Canaparo, D. C. Duma, and A. Costantini, 2019. “Data mining techniques for software quality prediction: a comparative study”, In Proc. of IEEE NSS MIC 2018, https://doi.org/10.1109/NSSMIC.2018.8824313 [5] M. Canaparo, and E. Ronchieri, 2019. “Data mining techniques for software quality prediction in open source software: an initial assessment”, In Proc. of CHEP 2018, EPJ Web of Conferences 214, 05007, https://doi.org/10.1051/epjconf/201921405007 [6] E. Ronchieri, M. Canaparo, and D. Salomoni, Machine Learning Techniques for Software Analysis of Unlabelled Program Modules, under review of Proc. of ISGC 2019
        Speaker: Dr Elisabetta Ronchieri (INFN CNAF)
    • Soundscape Session Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • 19:00
      Welcome Reception
    • Keynote Session: II Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 30
        TBA
        Speaker: Dr Ian Foster (ANL)
      • 31
        Recovery Monitoring of Tsunami Damaged Area using Remote Sensing
        On March 11, 2011, Great East Japan Earthquake struck Tohoku Region of Japan. Huge area in the northeast coast of Japan was seriously damages by the magnitude 9.0 earthquake and subsequent tsunami. In late of the year, the author has set up a project for monitoring the recovery of the tsunami damaged area of Miyagi Prefecture by ground survey and satellite image data analysis. Environmental education is another important aspect of this project. The project was funded by Japan Society for the Promotion of Science(JSPS). The first term was from 2012 to 2016, and we are now in the second term which will last until 2021. In this project, we have been monitoring the recovery status of various damaged areas of Miyagi Prefecture by visiting the areas twice a year and comparing multi temporal satellite images. We are also involving students for environmental education. In my talk, various onsite photos and satellite images, which reflect the dramatically recovery of the areas, will be presented. The time series of various satellite images showed how the seriously destroyed paddy fields near the mouth of Kitakami River rapidly recovered by the landfilling. Multi temporal analysis of MODIS NDVI was also very useful for evaluating the recovery status of paddy fields. We cannot stop disasters. Disasters will come suddenly. However, by proper preparation, we may minimize the damages of disasters. Several ways to utilize remote sensing technology for quick disaster monitoring will also be proposed.
        Speaker: Prof. Kohei Cho (Tokai University)
        Abstract
    • 10:30
      Coffee Break
    • GDB Meeting Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Humanities, Arts & Social Sciences Session Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      • 32
        Federated Data, Cluster Mapping, and Social Network Analysis: Indiana Data Partnership and the Opioid Crisis
        In 2018, the Polis Center at Indiana University-Purdue University Indianapolis joined with two other Indiana University centers in a partnership with the State of Indiana’s Management Performance Hub to link a wide array of governmental and non-profit data to address problems confronting the state. The Indiana Data Partnership (IDP) aims to establish an improved technology platform for multi-party external data sharing and matching and using cluster network analysis, social network analysis, and other advanced analytic methods to tackle these key social challenges. The intent of network analysis is to understand a community by mapping the relationships that connect them as a network, and then to pinpoint key individuals/groups with the network and/or associations amongst the individuals. The IDP is piloting work to identify how, by using existing data among the partners, network analysis can be introduced as a tool to help policy makers and the implementers of policy understand the ties that bind organizations. Two distinct issues for Indiana are being tackled by the IDP: the opioid crisis and the alignment of workforce and education. A primary goal is to improve collaboration among service providers and to understand where key network nodes exist that can enhance the efficacy of the proposed actions. The presentation will outline the development of the partnership, the governance structure, requirements for the integrated data platform, and the resulting system schema. It will use the opioid test case to illustrate the work being done by IDP and discuss how the partnership will expand to include other data providers and analysts statewide to bring added value to Indiana’s efforts to address its social problems more effectively.
        Speaker: Dr David J. Bodenhamer (Indiana University-Purdue University)
      • 33
        Service Design for Developing the User-Centered Smart City by Applying Real-World Spatial Interaction Data to Generative Adversarial Network
        This paper presents the research results of applying service design for developing the user-centered smart city by the integration of spatial interaction data and generative adversarial network. Researches show that "smart cities are built on technology, focused on the outcome." It means that the foundation of a smart city depends on the technology used to build it; its main priority is to provide value to the general public. Nowadays, we often see that technological advances tend to have a domino effect on the development of a smart city. It is essential to explore what does the rise of smart cities mean for people who are the most critical user in the city. Hence, this research introduces a user-centered approach to know what does a smart city means in a world where technology has been lifted out of our monitors and into the physical world users experience on a day to day basis. The implementation of this research divides into three phases. It begins with a service design process and analysis to outline a plan for developing a user-centered smart city. The second phase of this research implements several field observations, and surveys in the case studies area to collect the necessary information of space and the spatial interaction data of users to space as the primary dataset for user-centered smart city development. Notably, we record commercial activities of space and the track of different types of users base on the persona analysis proposed in service design analysis in the space. However, two main issues need to discuss in this phase. The first is the relatively high cost of collecting real user and field survey data. The second is that the simulation analysis limit by various spatial conditions and characteristics of the research area. Thus, this research proposes an interdisciplinary research structure to extract diversified features of city space, to collect and survey the spatial interaction data of different user groups, and then to use the generative adversarial network (GAN), which is an algorithm in the field of machine learning to implement integrated simulation and analysis of spatial interaction of users to the space. The final phase of this research uses data-driven storytelling to explore and explain the users' behavior in the space. So that people can develop a user-centered smart city by knowing, simulating, and, in the future, predicting the user behavior pattern in a space. The results of this research show that after applying 45 sets of real-world spatial interaction data to GAN, it derives the best simulation results of users' spatial interaction to the case study space after 5000 times of training. By comparing and overlaying the simulated results and the real-world data collected from the field survey, there are 450 sets of simulated data that are feasible for future use in the development of the user-centered smart city. This research contributes to the study of the integration of real-world data collected from a field survey with GAN, which is a machine learning approach using deep learning methods, to generate more simulated spatial interaction data to explain and explore users' behavior in space. Future studies will explore the application of the proposed research structure to the other case study area to evaluate the desirability, feasibility, and viability of the application of artificial intelligence, machine learning, data science, and data visualization to the development of the human-centered smart city.
        Speakers: Mr Kevin Kai-Wen Cheng (Department of Interaction Design, National Taipei University of Technology) , Dr Ryan Sheng-Ming Wang (Department of Interaction Design, National Taipei University of Technology)
    • Joint DMCC & UND Workshop: Deep Understanding Natural Disasters Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      • 34
        EISCAT_3D: Extreme computing and network in the Arctic
        EISCAT, originally the European Incoherent Scatter Scientific Association, was established in 1975 as a collaboration between Norway, Sweden, Finland, UK, Germany and France. The purpose was to develop an incoherent scatter radar for the Northern auroral zone. EISCAT has been operational since 1981 and has grown to a globally used research infrastructure. The present members are Norway, Sweden, Finland, UK, China and Japan. The existing EISCAT radars are single beam systems with parabolic dish antennas. EISCAT has now started to construct the next generation imaging incoherent scatter radar, EISCAT_3D. This will be a system of distributed antenna arrays with fully digital signal processing that will enable comprehensive three-dimensional vector observations of the atmosphere and ionosphere above Northern Fenno-Scandinavia. Through the use of new technology based on the latest advances within digital signal processing, it aims to achieve ten times higher temporal and spatial resolution compared to present radars. The Nordic e-Infrastructure Collaboration (NeIC) facilitates development and operation of high-quality e-Infrastructure solutions in areas of joint Nordic interest. NeIC is a distributed organization consisting of technical experts from academic high-performance computing centres across the Nordic countries. The NeIC EISCAT_3D Data Solutions (E3DDS) project supports the EISCAT_3D project This project cooperates with national e-infrastructure providers in the EISCAT_3D participating countries in order to simulate the data flows at the radar receive sites and from the sites to the central data storage and computing. On each of the three EISCAT_3D radar receiver sites will be 109 sub-arrays of 91 antennas each, each with a first stage beamformer, based on FPGA technology, to form 10 wide-angle beams at two polarizations. Subsequently, the wide-angle beam data packets from each sub-array will be stored in a fast RAM memory buffer and combined into 100 narrow-angle beams by a second stage beamformer on the site level. Data from each site will be sent to a storage buffer at a central site. At this central site, computing capacity connected to the storage buffer will combine the site data to form spatially resolved data products. The operational modes of the EISCAT_3D radars may be affected or controlled by results produced at the central site. Quality-checked final data products will be sent to long-term storage at data centres. This data is subsequently served to the EISCAT_3D users for further analysis. The E3DDS project has facilitated a proposal for a novel extreme high-capacity wide-area network deployment in order to optimize the deployment of online and offline computing facilities. Concentration of the high-value computing resources will allow to solve the online and analysis problems. A re-arrangement of the computing resources, permitted by an improved WAN network, will allow a greater capacity, improve the use-ability and reduce the operational costs. The E3DDS project is cooperating in the acquisition, installation and operation of a first and second stage beamformer test cluster consisting of an EISCAT_3D first-stage receive unit and state of the art test server. This test programme is expected to expand to a national partner-hosted test cluster connected to the Nordic WAN.
        Speaker: Dr John White (NeIC)
      • 35
        Performance and Scalability Analysis of WRF on Multicore High Performance Computing Systems
        Numerical Weather Prediction (NWP) model is an important tool for both research and operational forecast of meteorology, which can be used for various other purposes such as weather aviation, agriculture, air pollution modeling, etc... It faces the fundamental challenge of understanding how increasingly available computational power can improve the accuracy of modeling climate processes, and in addition also the reliability of the NWP output itself. The Weather Research and Forecast model (WRF) is one of the most commonly used NWP system specifically designed to run on a large variety of platforms, either serially or in parallel. WRF model is widely adopted for the assessment of weather prediction and atmospheric science, at least for two main reasons: 1) its superior computational scalability and efficiency, and 2) because belongs to the latest generation of NWP equipped with current developments in physics, numerical models and data assimilation. Performance benchmarking and analysis of WRF model has been done under different environments to demonstrate the scalability of the code and the possibility to use it for higher productivity forecasting. In this work, we conducted WRF performance analysis and scalability focusing on recent multicore High Performance Computing (HPC) systems, using our own benchmarking configuration. In particular, we used WRF for testing its application for a tropical region domain in Southeast Asia (SEA), mainly dominated by convective meteorology conditions. First, we tested performance and scalability using a WRF single domain configuration for different grid sizes (experiment E1), and then followed by two-way nesting configuration (experiment E2). In this study we have run the code enabling both Message Passing Interface (MPI) to exploit parallelism among different node-processors, and Open-MPI to exploit parallelism within each node-processor, under different set-up of nodes, MPI tasks and threads. E1 results showed that the simulation speed decreased when the number of grids in the domain increased, as expected. The increase in number of nodes used in the simulation would increase the simulation speed. In E1, using the total of 64 numbers of cores gave the better performance, with the highest speed in domain 80x80 and 128x128. E2 results also showed the optimum performance using 4 nodes, 8 MPI-per-Node and 2 Threads-per-MPI which was slightly better than using only 2 nodes. Overall, times required for computation contributed the most (89-99%) for both experiments as compared to input processing and output writing. Simulation speed of nesting domain configuration experiment (E2) was 100 times slower than the one-way nesting simulation for single domain (E1) when 2-way nesting applied for simultaneous simulation on 3 domains. This shows that WRF model domain configuration (one way or 2-way nesting) was an important factor for simulation speed,in addition to the computing core configuration. In the future we plan to extend this work in several directions making extensive performance analysis and scalability for ideal cases of WRF simulation such as hurricane/typhoon and meteorology forecasting, and run run on different computing configuration, using also GP-GPUs able to boost significantly the execution time of most computing intensive kernel routines. This research work has been done under the EU Erasmus+ Capacity Building TORUS project, aiming to increase computing capabilities and synergies in terms of hardware, software and methodologies between Europe and South East Asia countries.
        Speakers: Prof. Luca Tomassetti (University of Ferrara and INFN) , Prof. Sebastiano Fabio Schifano (University of Ferrara and INFN)
      • 36
        Air quality prediction of Ulaanbaatar using machine learning approach
        The capital city Ulaanbaatar is one of the most polluted city in the world. The government are looking for suited solution decreasing the air pollution, but it could not work out efficiency. Air pollution is contained from various components like particulate matter (PM) , gaseous pollutants like ozone, nitrogen dioxide (NO2), carbon monoxide (CO), sulfur dioxide (SO2) and other organic compounds. Ulaanbaatar city have 15 air quality monitoring stations. There are some air quality monitoring data of Ulaanbaatar, that PM10 particulate matter in January 2019 was 5% higher than air quality in 2018 and tolerance level is 2.5 times higher than air quality standard (WHO), the average amount of PM2.5 in January was 12% higher compared to the same period of the previous year and 3.9 times higher than the air quality standard, also the average amount of nitrogen dioxide in January was 10% higher than 2018 and 1.1 times higher than the air quality standard, the amount of sulfur dioxide in January 2019 was 1.2 times higher than the air quality standard compared to the same period in 2018. The air quality is affected by Ger district, vehicle, furnace, industry, thermal power station and other. Predicting and forecasting air quality is the one of the most essential activity in the Smart City. Recently, there are many study to use the machine learning approaches for evaluating and predicting air quality using big data. The aim of this study is to obtain machine learning model for air quality forecasting using previous air quality station data and the weather data. The air quality depends on multi-dimensional factors including location, time, weather parameters, such as temperature, humidity, wind direction and force, air pressure, etc. There are many machine learning approaches, but artificial neural Network model tries to simulate the structures and networks within human brain. It is convenient for working to find relation between multi parameters. If the neural network could determine the relation of the air quality using the weather and air quality data of last year, it is possible to predict approximately air quality of Ulaanbaatar city. We used input layers including parameters of temperature, humidity, wind direction, air pressure, PM2.5 and PM10, NO2, CO, SO2 and measuring time to build neural network model. We are working on the neural network model and this time we are going to introduce the machine learning approach predicting air quality of Ulaanbaatar. In the further work we are going to do experiment of neural network algorithm of the air quality prediction and to discuss machine learning results.
        Speaker: Mrs Otgonsuvd Badrakh (Institute Mathematics and Digital Technology, MAS)
    • e-Science Activities in Asia Pacific Session: II Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 37
        eScience Activities in Japan
        Speaker: Prof. Kento Aida (National Institute of Informatics)
      • 38
        eScience Activities in Viet Nam
      • 39
        eScience Activities in Myanmar
      • 40
        eScience Activities in Bangladesh
      • 41
        Discussion
    • 12:30
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica

    • GDB Meeting Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Joint DMCC & UND Workshop: Deep Understanding Natural Disasters Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

    • Network, Security, Infrastructure & Operations Session: III Confernece Room 2

      Confernece Room 2

      BHSS, Academia Sinica

      Convener: Prof. Kento Aida (National Institute of Informatics)
      • 42
        CaaS: Challenge as a Service - Modernizing the SSC framework
        For years, EGI has run Security Service Challenges to assess and keep aware incident response teams within its community. Nikhef has, as part of this community, been tasked with the role of Red Team, developing anything and everything needed to systematically monitor properties related to the challenge. This means there is a need for a flexible software stack, enabling various payloads and forms of delivery. After all, the grid infrastructure is quite heterogeneous, while at the same time relatively inflexible. As a result, projects have introduced internal standardization, adding an additional axis to the available interfaces. With digital threads becoming more and more imminent (cryptocoin mining, network attacks), SSCs have become more popular, and we have noticed an increased demand for training by simulation - while at the same time noticing an increase of non-technical communications challenges. In order to increase our flexibility, and thus being able to help with more of these challenges, we decided to design a new modular SSC framework, integrating more than a decade of SSC experience with the newer technologies that are available. This development effort has been coupled with research into the deployment of mini-grids for exercises. The individual exercises have run are confronted with lost time due to the irregularity of environments we encounter, and/or decrease real-world value as they normally lack the batch systems that are a vital part of the normal challenges. With a mini-grid under our control, we are able to more independently debug issues with submission, and more easily integrate new deployment schemes, as well as acting as an infrastructure operations team during exercises.
        Speaker: Jouke Roorda (Nikhef)
      • 43
        Research on Lossless Data Center Network in IHEP
        The key to advancing cloud infrastructure to the next level is eliminate network loss, including packet loss, throughput loss and latency loss. The main reason for loss is congestion, we will present the recent study result on eliminating congestion in data center network and the thoughts on deploying lossless network in future HEPS experiment will also be discussed.
        Speaker: SHAN ZENG (IHEP)
      • 44
        Integration of HPC resources for HTC workloads
        The Scientific Data & Computing Center (SDCC) has a growing portfolio of HPC resources in support of BNL research activities. These resources include CPU-only and GPU-equipped clusters inter-connected with low-latency network infrastructure. This presentation describes the provisioning of HPC resources to HTC workloads at the SDCC, through workspaces managed by JupyterHub, the integration of HTCondor and Slurm workload management systems and the addition of the Arc-CE Grid front-end resource management system.
        Speaker: Mr William Strecker-Kellogg (Brookhaven National Laboratory)
      • 45
        Application of OMAT in HTCONDOR resource management
        At IHEP Computing Center, there are thousands of nodes managed by the htcondor scheduler, with about 12,000 cores, and these nodes provide computing services for multiple experimental groups. In the process of job scheduling, some work nodes will cause jobs abnormal due to some service exception. Under the traditional scheduling method, these abnormal nodes will continue to devour jobs, like "black holes", resulting in a large number of job errors, affecting the service quality of the computing cluster. In addition, in the process of job scheduling, in order to quickly locate the unknown abnormal information in the job or the operating environment, we often isolate part of the work nodes for the specific experimental group. How to quickly isolate the node and record the change history is also an urgent problem to be solved. OMAT is short for open operation analysis toolkits, it is applied to cluster computing center in 2017 operational monitoring, providing rapid acquisition of abnormal data, correlation analysis, strategy alarm and visualization, etc. In this report, we will introduce how to use OMAT monitoring IHEP computing cluster, and feedback the node service status, help htcondor scheduler rapid convenient management of computing resources, and thus to minimize the effects of abnormal service for user operation, improve computing cluster service quality.
        Speaker: Mr Qingbao Hu (IHEP)
    • 15:30
      Coffee Break
    • GDB Meeting Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Joint DMCC & UND Workshop: Deep Understanding Natural Disasters Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

    • Network, Security, Infrastructure & Operations Session: IV Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      • 46
        Abnormal Log Flow Detecting System in CNGrid
        Distributed systems have grown larger and larger since this concept appears, and they soon evolve to environments that contain heterogeneous components playing different roles, e.g. data centers and computing units. CNGrid is a good example of a large distributed environment. It is composed of 19 HPC clusters contributed by many research institutes and universities throughout China. Computer Network Information Center of Chinese Academy of Sciences is the operation and management center of CNGrid, and is responsible for keeping the environment running smoothly and efficiently. During the maintenance work it is found that logs generated by devices from the environment play a very important role to locate anomalies and system failures, and can help us backtrack root causes of those occurred troubles. In previous report we have proposed the general framework of log based monitoring and diagnosing in CNGrid. Different types of logs are gathered from hosts in the environment and analyzed in various ways. Finally diagnosing reports are expected to be produced, reflecting the running status of the whole environment. Among these analyses being performed to logs, the abnormal log flow analyzing method has been implemented as a complete anomaly detecting system. The idea of this system assumes that in most time when the environment runs normally and stably, the number of logs for each classified log type should be kept in a nearly stable level in unit times. If in a unit time logs of a certain log type has a dramatic increase or decrease, it is highly possible that in the corresponding host there are something happened, i.e. an anomaly, and the system maintainers should be noticed to make rapid response for any potential threat. The advantage of taking log flow analysis is that most log analyses are based on the content of logs, but some anomalies cannot be captured by words. For example, only one log showing disable to build connection may be caused by just a very short network delay, but a big number of this log continuously happened in a short time may be caused by a misconfiguration in the destination host or a power-off, which is a much more serious problem. Abnormal log flow detecting system can well find anomalies like this. In this report we will introduce the work process of the abnormal log flow detecting system in details, including how to generate log flow models to classify log flow anomalies, and how to combine different modules to build the system as a whole. We will also demonstrate the real running effect of the system, including the visualization work to make the detecting result much clearer for humans to understand. We believe that the work process of this system can be adapted to many log analyzing methods and systems.
        Speaker: Dr Yining Zhao (Computer Network Information Center, Chinese Academy of Sciences)
      • 47
        Extracting and Structuring Real Log Files for Efficient Monitoring and Automated Computing Operations at a WLCG Tier
        The efforts to ensure smooth operations of WLCG computing centers towards the HL-LHC challenges are crucial to support all the research communities in the next decades. In this context, the addition of automation and intelligence to the computing operations are a key factor to achieve a real-time response to problems and minimize the person-power needed to do so in the daily operations. In this paper, we focus on the case of the Italian Tier-1, as a generic data center that collects a multi-terabyte amount of log data yearly from more than 1000 machines and services. We present a strategy to ingest unstructured log data from various sources, rearrange them in a structured and classified manner, enabling the automated extraction of useful information. This work is based on the premise that: 1) previously, there is no knowledge about the log file structure or content; 2) each log file is a set of log messages of several types; and 3) each of the log message's type is generated through a template. The template represents the fixed part of the log message and informs about the occurred event's type. While in turn, the parameters constitute its variable part, giving some details of a specific event. Both give us useful information about the system status. To extract templates and parameters from log messages, an approach based on Decker Clusterization is hereby proposed. This strategy relies on a dictionary of word frequencies as centroid for each generated cluster, in which a new log message is assigned to the cluster with the biggest calculated similarity. Related to the similarity, the word frequency is used as weight in a normalized weighted sum that takes into account the log message words present in the cluster's dictionary. This dictionary saves a list of words, their corresponding identifiers, and their frequencies in the relative cluster. Once the clusterization is done, the log messages of each cluster are mapped to a numerical version, where each word is replaced by the corresponding identifier given by the dictionary. The parameters' extraction is achieved through a long-equal-sequence search strategy. The developed work furnishes a fully and directly applicable approach capable of extract both template and parameters of the log messages in a generic log file.
        Speaker: Leticia Decker de Sousa (Data Science and Computation, Università di Bologna (UNIBO))
      • 48
        A distributed network intrusion detection system for large science facilities in IHEP
        The Institute of high energy physics is operating and launching many large science facilities in China, such as BEPCII in Beijing, CSNS in Guangdong and JUNO in Shenzhen. These large science facilities are facing many network security threats. How to detect and prevent these threats is becoming important. Considering that the network traffic of large science facilities has obvious characteristics and the timeliness of different network attacks, a centralized traffic storage and distributed intrusion detection system is considered. The main task of the distributed part is to obtain data and detect real-time threats that can immediately affect the network. This part is considered to use the distributed probes, the probes are responsible for collecting and light-processing data and transmitting them to the central platform, and are also responsible for the anomaly detection of these data, screening out the abnormal parts, such as DoS attacks, then marking and transmitting them to the central platform for comprehensive analysis. The central part is the central storage and analysis platform, which is used to receive, store and mine the data. The central part is the central storage and analysis platform, which is used to receive , store and mining the data.The design of the system framework has been finished currently and the machine learning technology is to considered to do the data analysis in the future.
        Speaker: Zhongtian Liang (IHEP)
    • 18:30
      PC Dinner
    • Keynote Session: III Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 49
        TBA
        Speaker: Prof. Daniele Bonacorsi (University of Bologna)
      • 50
        TBA
        Speaker: Dr Takashi Sasaki (KEK)
    • Special Talk Auditorium

      Auditorium

      BHSS, Academia Sinica

    • 11:00
      Coffee Break
    • Data Management Session: I Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 51
        The OSG Data Federation
        The Open Science Grid operates a global Data Federation for all of Open Science. As of Fall 2019, this includes 6 data origins for a variety of scientific domains and communities, and 12 caches. The caches are deployed inside the network backbone, and at or near various endpoints, i.e. compute clusters. A combination of technologies, including CVMFS for file metadata and authentication, and XRootd for data access of large data volumes, provide scientific communities like the International Gravitational-Wave Observatory Network, Dune, DES, Minerva, as well as individual bioinformatics researchers a platform akin to a content delivery network that looks like a read-only global file system to the end user scientist. In this talk, we describe this global infrastructure, how it is used, and how we use service containers and K8S as container orchestration system to support this global platform.
        Speaker: Prof. Frank Wuerthwein (UCSD/SDSC)
      • 52
        Russian National Data Lake Prototype
        The evolution of the computing facilities and the way storage will be organized and consolidated will play a key role in how this possible shortage of resources will be addressed by the LHC experiments. The need for an effective distributed data storage has been identified as fundamental from the beginning of LHC, and this topic has became particularly vital in the light of the preparation for the HL-LHC run. WLCG has started an R&D within DOMA project and in this contribution we will report the recent results related to the Russian federated data storage systems configuration and testing. We will describe different system configurations and various approaches to test data storage federation. We are considering EOS and dCache storage systems as a backbone software for data federation and xCache for data caching. We'll also report about synthetic tests and experiments specific tests developed by ATLAS and ALICE for federated storage prototype in Russia. Data Lake project has been launched in Russian Federation in 2019 to set up a National Data Lake prototype for HENP and to consolidate geographically distributed data storage systems connected by fast network with low latency, we will report the project objectives and status.
        Speaker: Mr Andrey Kiryanov (NRC "Kurchatov Institute")
      • 53
        The European Open Science Cloud (EOSC) Photon and Neutron Data Service ExPaNDS
        The ambition of the European HORIZON2020 project ExPaNDS (EOSC Photon and Neutron Data Services) is to enrich the European Open Science Cloud (EOSC) with data management services and to coordinate activities to enable national Photon and Neutron (PaN) Research Infrastructure (RIs) to make the majority of their data ‘open’ following FAIR principles (Findable, Accessible, Interoperable, Reusable) and to harmonise their efforts to make their data catalogues and data analysis services accessible through the EOSC, thereby enabling them to be shared in a uniform way. EOSC currently provides a range of services that needs to be adapted to the ever-increasing requirements of scientific experiments held at various PaN RIs. It is essential that these services become standardised, interoperable and integrated to fully exploit the scientific opportunities at PaN RIs. ExPaNDS therefore seeks to: Enable EOSC services and to provide coherent FAIR data services to the scientific users of PaN RIs; connect PaN RIs through a platform of data catalogues and analysis services through the EOSC for users from RIs, universities, industry etc.; gather feedback and cooperate with the EOSC governance bodies to improve the EOSC and develop standard relationships and interconnections between scientific publications, PaN scientific datasets, experimental reports, instruments and authors (via ORCID). Concretely ExPaNDS proposes to standardise and link all the relevant PaN RI catalogues to ensure that the user community has access to both the raw data they collect, which is linked to their research session at the various national RIs, and relevant peer review articles produced as a direct result of their usage. It is paramount that ExPaNDS develops a common ontology to fully integrate all the elements of the catalogues as well as a roadmap for the back-end architecture and functionalities. ExPaNDS also proposes to develop a powerful taxonomy strategy in line with the requirement of the EOSC user community. The proposed activity will feed into the OpenAIRE infrastructure integrating and linking entities from a wide range of scholarly resources.
        Speaker: Mr Patrick Fuhrmann (DESY)
      • 54
        Scientific data management at HEPS
        The High Energy Photon Source (HEPS) in Beijing is the first national light source of high-energy synchrotron radiation in China, and will be one of the world's brightest fourth-generation synchrotron radiation facilities. Doubtless data are of crucial importance for the scientific discoveries made in the experiments at HEPS. According to the estimated data rates, we predict 30 PB raw experimental data will be produced per month from 14 beamlines at the first stage of HEPS, and the data volume will be even greater after over 90 beamlines are available at the second stage in the near future. Therefore, successful data management is critically important for the present and future scientific productivity of HEPS. The data management system is responsible for automating the organization, transfer, storage, distribution of the data collected at HEPS. First of all, the main features of the scientific data acquired from all the beamlines and the possible problems exists in data managing and data sharing is explained in this paper. Second, the architecture and data flow of the HEPS data management system are described from the perspective of facility users and IT. Furthermore, key techniques implemented in this system are introduced. Finally, the progress and the effect of the data management system deployed as a pilot test at BSRF are given.
        Speaker: Ms Hao Hu (Institute of High Energy Physics)
    • ECAI Workshop: New Tools for Humanity Data Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Joint DMCC & UND Workshop: Deep Understanding Natural Disasters Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • 13:00
      Lunch
    • Converging High Performance infrastructures: Supercomputers, clouds, accelerators Session: I Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Prof. Kento Aida (National Institute of Informatics)
      • 55
        HPC-Cloud-Big Data Convergent Architectures and Research Data Management: The LEXIS Approach
        The LEXIS project (Large-scale EXecution for Industry & Society, H2020 GA 825532) provides a platform for optimized execution of Cloud-HPC workflows, reducing computation time and increasing energy efficiency. The system will rely on advanced, distributed orchestration solutions (Bull Ystia Orchestrator, based on TOSCA and Alien4Cloud technologies), the High-End Application Execution Middleware HEAppE, and new hardware capabilities for maximizing efficiency in data processing, analysis and transfer (e.g. Burst Buffers with GPU- and FPGA-based data reprocessing). LEXIS handles computation tasks and data from three Pilots, based on representative and demanding HPC/Cloud-Computing use cases in Industry (SMEs) and Science: i) Simulations of complex turbomachinery and gearbox systems in Aeronautics, ii) Tsunami simulations and earthquake loss assessments which are time-constrained to enable immediate warnings and to support well-informed decisions, and iii) Weather and Climate simulations where massive amounts of in-situ data are assimilated to improve forecasts. A user-friendly LEXIS web portal, as a unique entry point, will provide access to data as well as workflow-handling and remote visualization functionality. As part of its back-end, LEXIS builds an elaborate system for the handling of input, intermediate and result data. At its core, a Distributed Data Infrastructure (DDI) ensures the availability of LEXIS data at all participating HPC sites, which will be federated with a common LEXIS AAI (with unified security model, user database and authorization policy). The DDI leverages best of breed data-management solutions from EUDAT, such as B2SAFE (based on iRODS) and B2HANDLE. REST APIs on top of it will ensure a smooth interaction with LEXIS workflows and the orchestration layer. Last, but not least, the DDI will provide functionalities for Research Data Management following the FAIR principles ("Findable, Interoperable, Accessible, Reusable"), e.g. DOI acquisition, which helps to publish and disseminate open data products.
        Speaker: Dr Stephan Hachinger (Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities)
      • 56
        Experience running Engineering Applications at CERN on HPC clusters in HTCondor and SLURM
        CERN IT department has been running two Linux based computing infrastructures in HTCondor and SLURM for many years. HTCondor resources are used for general purpose parallel but single-node type jobs, providing computing power to the CERN experiments and departments for tasks such as physics event reconstruction, data analysis and simulation. For HPC workloads that require multi-node parallel environments for MPI programs, there is a dedicated HPC service with MPI clusters running under the SLURM batch system and dedicated hardware with fast interconnects. Engineering users at CERN need to run critical simulations in very different domains. They use applications like CST, COMSOL and Ansys. These simulations are very demanding in terms of computing power and storage. In the past, there used to be a dedicated Windows based HPC cluster that was running for five years. However, the Windows HPC infrastructure was eventually decommissioned in 2019 to consolidate all computing resources under Linux. Since mid 2018, engineering users at CERN were migrated to run their simulations in the HTCondor and SLURM clusters. The change of infrastructure implied some technical and human challenges like the lack of Linux expertise among engineering teams, the lack of application specific knowledge on the IT side, and the fact that HTCondor and SLURM were not supported by CST, COMSOL or Ansys. After a successful migration of CST, COMSOL and Ansys to Linux, the challenge has changed focus to running the simulations in the most optimized way, to make the most of the available computing resources. Some of the tasks where the IT team has worked in close collaboration with the engineers are: fine tuning the applications to reduce I/O access, understanding how to calculate the maximum number of cores to gain on processing time, integrating the Windows GUI interface to submit jobs to Linux and learning how to debug problems. In this contribution we will describe how we have dealt with all these challenges to offer a production computing infrastructure that meets the engineering users needs.
        Speaker: Dr Pablo Llopis Sanmillan (CERN)
      • 57
        GuangMu Cloud: An Example of Data-centric and Converging infrastructure
        GuangMu Cloud: An Example of Data-centric and Converging infrastructure 广目云:一个以数据为中心的超融合基础设施 CASEarth is a strategic priority research program of the Chinese Academy of Sciences (CAS), it’s designed to achieve substantial breakthroughs for Big Earth Data, which is an important subset of big data dealing with sciences of atmosphere, land, and oceans. CASEarth is becoming a new frontier contributing to the advancement of Earth science and significant scientific discoveries. GuangMu Cloud is designed to support CASEarth, including all the requirements from data collecting to data sharing, from fast data processing to large-scale parallel computing, and also from traditional big data tools to lastest AI frameworks. GuangMu Cloud is born to be an HPC, Bigdata and AI converging infrastructure. This presentation will include some details, such as difficulties and trade-offs, about how to design GuangMu Cloud to support 10K cores MPI jobs and 10K VMs, with the capability of 2PF and 50PB. [1] GUO Huadong (2019) CASEarth: A Big Earth Data Engineering Program, Keynotes CODATA 2019, Beijing [2] Xuebin CHI (2019) Building Infrastructure for Big Earth Data and Cloud Services, International Symposium on Grids & Clouds 2019 (ISGC 2019), Taipei [3] Huadong Guo (2017) Big Earth data: A new frontier in Earth and information sciences, Big Earth Data, 1:1-2, 4-20, DOI: 10.1080/20964471.2017.1403062
        Speaker: Mr Haili XIAO (Supercomputing Center, Chinese Academy of Sciences)
    • ECAI Workshop: New Tools for Humanity Data Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Joint DMCC & UND Workshop: Deep Understanding Natural Disasters Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • Network, Security, Infrastructure & Operations Session: V Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 58
        IPv6-only networking for High Energy Physics
        The use of IPv6 on the general internet continues to grow. Several Broadband/Mobile-phone companies, such as T-Mobile in the USA and BT/EE in the UK, now use IPv6-only networking with connectivity to the IPv4 legacy world enabled by the use of NAT64/DNS64/464XLAT. Large companies, such as Facebook, use IPv6-only networking within their internal networks, there being good management and performance reasons for this. The transition of WLCG central and storage services to dual-stack IPv4/IPv6 is progressing well, thus enabling the use of IPv6-only CPU resources as agreed by the WLCG Management Board. The use of dual-stack services is, however, a complex environment not only to configure and manage but also for trouble-shooting during observation of network performance and operational problems. It is time for WLCG to consider when and where it can move to the much simpler environment of IPv6-only networking. The HEPiX IPv6 working group has been encouraging and supporting the WLCG transition to IPv6 over many years. We last reported on our work to an ISGC conference in 2015. During 2019, the HEPiX IPv6 working group has not only been chasing and supporting the transition to dual-stack storage services, but has also been encouraging network monitoring providers to allow for filtering of plots by the IP protocol used. We have investigated and fixed the reasons for the use of IPv4 between two dual-stack endpoints when IPv6 should be preferred. We present this work and the tests that have been made of IPv6-only CPU showing the successful use of IPv6 protocols in accessing WLCG services. The dual-stack deployment, as mentioned above, does however result in a networking environment which is much more complex than when using just IPv4 or just IPv6. Some services, e.g. the EOS storage system at CERN, are using IPv6-only for internal communication, where possible. The IPv6 working group has been investigating the removal of the IPv4 protocol in more places. We will present the areas where this could be useful and possible and be even so bold as to suggest a potential timetable for the end of support for IPv4 within WLCG. There are many lessons we learned along the way, which should be of interest to other research communities who have not yet started their transition to IPv6. Even more importantly for new research communities just starting to plan their distributed IT Infrastructure, there is a clear message to consider the use of IPv6-only right from the start.
        Speaker: Dr David Kelsey (STFC-RAL)
      • 59
        Feasibility study on MPTCP ACK via alternative path in real network environment
        Improving network performance is a key for grid computing system consisting of computational resources and data sources. There have been studies about network performance between them such as file transfer schedulers and network protocols. MPTCP is one of network protocols which has the potential to improve network performance. This protocol treats one TCP flow as one subflow and handles two or more TCP flows as one MPTCP flow. If there are multiple network paths between machines, it can be expected high network performance by MPTCP. These days, cloud computing services such as Azure, AWS and GCP are popular for deploying system instead of adopting on-premises system. These providers have been deployed all over the world, and they have many computational resources and data storages in various regions. Therefore, it can be also expected to compose a high-performance computer system by clustering them using multiple network paths in various countries. Morikoshi et al. proposed MPTCP with HayACK for the purpose of improving data transfer throughput when multiple network paths are available, whose RTTs are quite different. In original MPTCP, the sender transmits a data packet via a path and the receiver transmits ACK packet via the same path. In MPTCP with HayACK, the sender can select an appropriate path, regardless of which path the data packets came through. In that study, they simulated HayACK by comparing to original MPTCP. As a result, HayACK improved MPTCP throughput when RTTs of the paths are quite different. However, there are concerns about using HayACK in real network environment. According to a study by Honda et al. in 2011, it was confirmed that 33% of packets had inconsistent sequence number and ACK number were discarded by middlebox which exists between a source and a destination. Middlebox includes various devices such as NAT (Network Address Translators) router and FW(Firewall). Middlebox not only relays packets, but also changes or discards the contents of packets for the purposes of improving performance, changing destination, or hardening security. When MPTCP with HayACK using an alternative subflow, HayACK returns MPTCP ACK without modification from receiver. Therefore, HayACK packets may be discarded by middlebox in the real network environment. The purpose of this study is to support the usefulness of HayACK for high-performance network which has multiple paths. Particularly, we focus on how middlebox affect HayACK packets. Our investigation conducted comprehensively via various network paths by preparing many IP address clients and one server. The influence from the middlebox is decided by checking probe packets passed through, changed or discarded. Assuming HayACK response, we crafted some packets such as TCP SYN packets and TCP ACK packets.
        Speaker: Mr Hiroyuki Koibuchi (Graduate School of System and Information Engineering University of Tsukuba)
      • 60
        Unroutable LHCONE traffic
        This talk explores the methods and results confirming the baseline assumption that LHCONE traffic is science traffic. The LHCONE (LHC Open Network Environment) is a network conceived to support globally distributed collaborative science. The LHCONE connects thousands of researchers to LHC data sets at hundreds of universities and labs performing analysis within the global collaboration. It is “Open” to all levels of the LHC as well as a short list of approved non-LHC science collaborations. It is distinct from the smaller, tightly integrated and private LHCOPN (Optical Private Network) network which is strictly for “Tier 1” compute centers and used in support of the engineered workflow for LHC data processing, distribution and storage of the baseline datasets. LHCONE satisfies the need for a high performance global data transfer network of networks supporting scientific analysis at universities and science labs. Science traffic separation is the hard part The separation of science flows from non-science flows, an essential first step in traffic engineering high performance science networks. Before resources or preference can be applied to more effectively move science data, it is essential to identify and separate the science from non-science traffic. This talk explores the methods and results in detecting traffic in the LHCONE network that does not comply with the Appropriate Use Policy established by the global LHC collaboration. LHCONE hosts are high performance Through integration of the Science DMZ network model and collaborative software platforms. The data transfer nodes connected to LHCONE are high performing data movers placed on the network edge/Science DMZ and secured precisely according to the applications they support and the purpose they serve. LHCONE is at risk of unauthorized use Unauthorized use of LHCONE places both the network and the sites using it at risk. The risk takes two forms: Science flows mixing with non-science flows Unauthorized traffic being dropped inside LHCONE Identifying unauthorized traffic An EDUgain authenticated portal displaying unauthorized usage will be demonstrated. Since LHCONE is growing and changing quite frequently the underlying database will get collaborative maintained and administered.
        Speaker: Mr Bruno Hoeft (Karlsruhe Institute of Technology)
      • 61
        Mitigation of inter-vehicle communication congestion by using visible light communication as an alternative path
        Currently, inter-vehicle communication technology that allow cars to communicate with each other directly is being developed. With this technology established, automobiles will become one of the major infrastructures of society as communication terminals with several sensing abilities and also communication entities that constitutes their own network. In other words, car is no longer just a tool for moving, it acts as a totally new sensor-incorporated communication node that can be scattered and voluntarily moved in various places. They can establish radio communication links with each other to form an automobile network, which is called VANET. Processing data acquired by automobiles at the cloud via VANET will greatly benefit society. For example, processing image data measured by a car to quickly check the state of each region at the time of a disaster, or processing measured traffic data to get a more detailed understanding of domestic traffic conditions. To provide such a new mechanism using vehicles and the cloud, it is very important to maintain the quality of the communication link between the vehicle and the vehicle or between the vehicle and the cloud. However, as the number of vehicles equipped with inter-vehicle communication increases, the frequency band provided for inter-vehicle communication is expected to be exhausted. To solve this, we considered using not only wireless radio communication but also a visible light path by car headlights. Visible light communication does not cause radio wave interference and has high scalability, so in that respect it was considered the better transmission medium to solve this problem. However, communication quality is unavoidably affected by external noise such as sunlight, heavy fog, and rain. Various technologies have been studied to achieve high performance under these noises, and the technology of visible light communication is improving day by day. Therefore, it takes some time to realize. In light of this situation, we propose a new method for improving the communication link of VANET that can be fully effective with the current visible light technology. This uses both a visible light path and a wireless path, usually communicates via the visible light path, and automatically switches the traffic flow to the wireless path when it detects a decrease in the communication quality of the visible light path. In addition, better traffic allocation is performed in consideration of the traffic volume required by in-vehicle applications and the current communication quality. In other words, by actively using the visible light path as much as possible, it is possible to avoid wasted bandwidth while maintaining throughput. By doing this, it prevents radio interference to other vehicles and improves the quality of the entire automobile network formed by vehicles in the area. We implemented the proposed method and conducted simulations to evaluate its effectiveness. Specifically, we confirmed whether throughput can be maintained even when vehicle density is high and wireless communication quality is degraded by radio wave interference. In addition, we confirmed whether it is possible to prevent waste of frequency resources by effectively using the visible light path.
        Speaker: Mr Kenta Yamasawa (Tsukuba University in Japan)
    • 15:30
      Coffee Break
    • Converging High Performance infrastructures: Supercomputers, clouds, accelerators Session: II Conference Room 1

      Conference Room 1

      BHSS, Academia Sinica

      Convener: Dr Patrick Fuhrmann (DESY/dCache.org)
      • 62
        Automatised Event-Workflows with Offloading to Batch Systems
        DESY provides significant storage and computing resources with more than 30PB of data about 50 000 cores to its users. It is one of the largest sites in the High-Energy computing Grid and provides the computing and storage infrastructure to the European XFEL as well as local experiments. As with such a the large user base, DESY's goal is to provide an easy and efficient access to the resources to enable user workflows.\newline Established workflow management systems are mostly polling-like, so that they regularly sample the states of the connected systems and initiate further steps based on the current sampling. We present our push-based approach to enable scalable workflow chains based on events and functions. Here events are flowing on an Apache Kafka/Confluent message backbone and can trigger predefined functions in an OpenWhisk framework. With dCache storage events as prime example, this allows for automatic processing chains, where file updated or newly written to the storage system initiates its own processing by a predefined function. However, as lambdas in Function-As-A-Service (FaaS) frameworks are intended to be low-latency and fast result operations, computational or I/O intensive processing jobs need to be run on better suited systems. Thus, we demonstrate how to further offload such computational heavy workloads to our HTCondor batch system by function chains.\newline This enables us to interlink our storage and computing resources even closer. Due to the generic approach, such automatised workflow chains are not limit to storage events but can be extended to other event types as well.
        Speaker: Mr Michael Schuh (DESY)
      • 63
        The Pacific Research Platform and Chase-CI …. A distributed platform for interactive and batch use from Machine Learning with GPUs and FPGAs to Multi-Messenger Astrophysics
        The NSF funded Pacific Research Platform Project operates a distributed compute environment including GPUs and FPGAs as accelerators that seamlessly extends into Google Cloud for use of TPUs as accelerators. It offers interactive access via Jupyter notebooks, and High Throughput Computing access via the Open Science Grid. The former tends to be the preferred platform for machine learning and AI applications, while the latter is primarily used by IceCube and LIGO for neutrinos and gravitational waves as windows to the universe. In this talk, we describe how we make these seemingly opposites, and hard to reconcile modalities of computation coexist within a single Kubernetes cluster.
        Speaker: Prof. Frank Wuerthwein (UCSD/SDSC)
      • 64
        Exploring Deep Learning fast inference on an ATCA processor board with Xilinx Virtex-7 FPGAs
        Machine and Deep Learning techniques experienced an explosion in the adoption in a variety of HEP applications, ranging from event selection to end-user physics data analysis, as well as computing metadata based optimizations. The range of applicability of such techniques in the High Energy Physics (HEP) context – with a particular accent on experiments at the Large Hadron Collider (LHC) at CERN in Geneva – will extend to an even larger variety of applications, if low-latency hardware solutions are added to usually exploited (CPUs/)GPUs - or even Google TPUs. One example area of application in particle physics is the domain of FPGA-based Trigger/DAQ, characterized by even sub-microsecond latency requirements: stringent requirements towards this solution come from the upcoming Run-3 needs and even more from the evolution towards the operational conditions foreseen for the High-Luminosity LHC (HL-LHC) phase. Crucial ingredients to prepare for this future are the availability of adequate hardware resources and expertize, as well as the capability to streamline the process of build and testing ML/DL models into FPGA firmware. This talk will present the work done and planned ahead in the University of Bologna and the INFN-Bologna cross-experiment working groups. The team is working on a customized ATC136 board hosting a Xilinx Virtex-7 FPGA with full-mesh backplane fabric connectivity, mounted on a ATCA crate installed in the INFN-CNAF Tier-1 data center. The High-Level Synthesis (HLS) toolkit was used - called hls4ml – developed very close to the HEP needs. Hardware and software set-up, and performances on various baseline models used as benchmarks, will be presented. Real-life case studies for specific deep neural networks developed in the context of future evolutions of LHC Trigger systems will be also presented, and the possible advantages of performing neural network inference with FPGAs for this class of problems will be discussed.
        Speaker: Mr Tommaso Diotalevi (INFN and University of Bologna)
    • ECAI Workshop: New Tools for Humanity Data Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Infrastructure Clouds and Virtualisation Session: I Auditorium

      Auditorium

      BHSS, Academia Sinica

      • 65
        ELIXIR Cloud & AAI: FAIR Data & Analytics Services for Human Data Communities
        Currently, patient data is geographically dispersed, difficult to access, and often stored in siloed project-specific databases preventing large-scale data aggregation, standardisation, integration/harmonisation and advanced disease modelling. The ELIXIR Cloud and Authentication & Authorisation Infrastructure (AAI) for Human Data Communities project aims to leverage a coordinated network of ELIXIR Nodes to deliver a Global Alliance for Genomic Health (GA4GH) standards-compliant FAIR federated environment to enable population-scale genomic and phenotypic data analysis across international boundaries and a potential infrastructure to enable 1M Genome analysis. The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework. The ELIXIR Cloud & AAI project will lay the groundwork to deliver the foundational FAIR capability of “federation” of identities, sensitive data access, trusted hybrid HPC/Cloud providers and sensitive data analysis services across ELIXIR Nodes by underpinning the bi-directional conversation between partners with the GA4GH standards and specifications and ELIXIR trans-national expertise. The project is also developing a framework for secure access and analysis of sensitive human data based on national federations and standardised discovery protocols. The secure authentication and authorisation process alongside guidelines and compliance processes is essential to enable the community to use this data without compromising privacy and informed consent. ![enter image description here][1] Running complex data analysis workflows from the command line often requires IT and programming experience that make such workflows inaccessible to many scientists. Constant changes from new software versions, different operating systems and HPC/Cloud installations add to the complexity. To address this, it is vital to work not only with tool developers but also with users and infrastructure providers, to enable users to access workflows composed by different bioinformatics software containers that can be easily reused and deployed in both academic HPC and commercial clouds. Well documented, containerised workflows are also inherently reproducible, thus addressing one of the key challenges in computational life science. The project provides a FAIR repository to store bioinformatics software containers and workflows, a FAIR data registry to discover and resolve the locations of datasets and a standardised and interoperable workflow and task execution service to leverage the federated life-science infrastructure of ELIXIR. The project therefore provides a globally available curated FAIR repository to store bioinformatics software containers and workflows (Biocontainers - GA4GH TRS), a FAIR service to discover and resolve the locations of datasets (RDSDS - GA4GH DRS) and a distributed workflow and task execution service (WES-ELIXIR/TESK - GA4GH WES/TES) to leverage the federated life-science infrastructure of ELIXIR. The ambition of the project is to provide a global ecosystem of joint sensitive data access and analysis services where federated resources for life science data are used by national and international projects across all life science disciplines, with widespread support for standard components securing their long-term sustainability. Connecting distributed datasets via common standards will allow researchers unprecedented opportunities to detect rare signals in complex datasets and lay the ground for the widespread application of advanced data analysis methods in the life sciences. [1]: https://elixir-europe.github.io/cloud/assets/img/ELIXIR-Cloud-AAI.png
        Speakers: Dr Steven Newhouse (EMBL-EBI) , Dr Susheel Varma (EMBL-EBI)
      • 66
        Clouds and FAIR data in Smart Cities applications with the Cagliari 2020 project
        CAGLIARI 2020 is a 25 million euro project funded within the framework of the National Operational Program for Research and competitiveness of the Italian Ministry of Education University and Research. The Project exploits the FAIR concept of Findable, Accessible, Interoperable and Re-usable data in the context of Smart Cities applications within a cloud computing approach. A relevant part of the Project is also a technology transfer of tools and know-how developed in the context of LHC experiments. The main goal of CAGLIARI 2020 is the development of innovative and environmentally friendly solutions for urban mobility boosting energy and environmental performances. The Project aims at answering idea the ever increasing need of innovative tools and technological solutions for the optimization of urban mobility. The approach is based on collecting data traffic flow as well as environmental parameters merging data from different sources and combining them in order to obtain lower travel times and improved air quality. These data are made available both to operators and managers as well as to citizens. Critical events management is also included. Integration of data from different sources and availability to multiple users are key points in the project. CAGLIARI 2020 is based on the study and testing of a sensors network comprised of: 1. Fixed sensors for the tracking of vehicles entering/exiting the urban area. These sensors allow real-time and/or historical analysis, especially helpful in gathering the information required to manage traffic lights systems and sending routing optimization information to interested users; 2. fixed and mobile sensors for the collection of environmental data. Such data will be used to feed decision-making models for the reduction of carbon emissions and the consequent improvement of air quality in the urban area. 3. Mobile devices for the acquisition of the motion habits of people. The integration of environmental models and smart systems for the management of urban mobility allows optimizing public and private traffic flows as well as to reduce carbon emissions. CAGLIARI 2020 concept is related to the application of the “netcentric” paradigm by means of a dynamic and pervasive net (the urban information grid) whose nodes can be both fixed and mobile. This feature allows the sensorial integration of the devices distributed in the urban area and turns public transport buses into “mobile platforms” for the urban road system monitoring thanks to the continuous gathering of traffic, carbon emissions and noise pollution data. It is therefore possible to develop models for the analysis of environmental parameters and to provide support tools to policies aimed at curbing traffic flows, energy consumptions, and carbon emissions within urban areas. Merging of data from multiple sources processing them and making them interoperable and usable to multiple clients is a core element in the Project. The integration between the aforementioned information and the people’s traveling habits (by means of the anonymous tracking of their mobile phones) allows for the creation of people’s mobility maps. Cloud services play a key role within the project in supporting the applications dedicated to data traffic monitoring and analysis. A mixed cloud approach has been adopted with data acquisition services and mediation layer on private cloud and analysis and data fusion on commercial cloud. A micro services approach has been adopted and it is currently operational. The system is scalable and fully interoperable, The project started in January 2017 with a duration of four years. The partnership includes public and private organisms of the South Sardinia for the development of ICT technologies aimed at optimizing the usage of the “city system” and improving the quality of life for people working and living in the city.
        Speaker: Dr Alberto Masoni (INFN National Institute of Nuclear Physics)
      • 67
        GOAT -- Flexible Accounting Framework for Hybrid Clouds
        Large-scale HTC infrastructures, such as the grid and cloud infrastructure operated by the European Grid Initiative (EGI), require a comprehensive accounting tool to keep track of resource use at different cooperating resource sites. Traditionally the accounting process has been strictly job-centric. Indeed, only running grid jobs were consuming resources. However, the emerging cloud computing paradigm made it possible for users to consume resources in a more flexible manner, e.g., consuming cloud storage to gather data without actually consuming any compute resources. Furthermore, clouds made it possible to reserve resources without actual consumption, which led resource providers to recognize new types of separately consumable resources -- often scarce ones, such as public IPv4 addresses. Infrastructure operators obviously feel the need to extend their accounting coverage to follow the consumption of such resources, and standardization bodies respond by developing new types of accounting record specifications. Accounting tool developers then try to react by supporting these new types with new accounting record generators on the side of the client, and extended functionality in record aggregators on the side of the server. This contribution introduces the GOAT (GO Accounting Tool) -- a suite of client-side and server-side components to collect accounting information from hybrid cloud sites and aggregate them efficiently. It promotes flexibility in terms of accounting record types supported. To that end, server-side aggregator developers have experimented with both the traditional SQL-based approach and with the currently popular NoSQL approach. The results of their comparison are included herein.
        Speaker: Ms Lenka Svetlovská (CESNET)
    • Joint DMCC & UND Workshop: Deep Understanding Natural Disasters Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

    • 18:30
      Gala Dinner
    • Data Management Session: II Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      • 68
        ESCAPE, next generation management of exabytes of cross discipline scientific data.
        The European-funded ESCAPE project will prototype a shared solution to computing challenges in the context of the European Open Science Cloud. It targets Astronomy and Particle Physics facilities and research infrastructures and focuses on developing solutions for handling Exabyte scale datasets. The DIOS work package aims at delivering a Data Infrastructure for Open Science. Such an infrastructure would be a non HEP specific implementation of the data lake concept elaborated in the HSF Community White Paper and endorsed in the WLCG Strategy Document for HL-LHC. The science projects in ESCAPE are in different phases of evolution. While HL-LHC can leverage 15 years of experience of distributed computing in WLCG, other sciences are building now their computing models. This contribution describes the architecture of a shared ecosystem of services fulfilling the needs in terms of data organisation, management and access for the ESCAPE community. The backbone of such a data lake will consist of several storage services operated by the partner institutes and connected through reliable networks. Data management and organisation will be orchestrated through Rucio. A layer of caching and latency hiding services, supporting various access protocols will serve the data to heterogeneous facilities, from conventional Grid sites to HPC centres and Cloud providers. The authentication and authorisation system will be based on tokens. For the success of the project, DIOS will integrate open source solutions which demonstrated reliability and scalability as at the multi petabyte scale. Such services will be configured, deployed and complemented to cover the use cases of the ESCAPE sciences which will be further developed during the project.
        Speaker: Dr Riccardo Di Maria (CERN)
      • 69
        Achievements of the eXtreme-DataCloud project
        The eXtreme-DataCloud project (XDC) is a software development initiative, funded by the European Commission under the H2020 framework program, aimed at developing and implementing data management scalable services. The project addresses high level topics like: policy driven data management based on storage Quality-of-Services, Data Life-cycle management, storage federations creation, smart placement of data with caching mechanisms, metadata with no predefined schema handling, execution of pre-processing applications during ingestion, data management and protection of sensitive data in distributed e-infrastructures. The project service catalogue is based on a Toolbox made of already existing, production quality components and services that the project enriched with new features and functionalities as requested by the user communities represented in the Consortium and belonging to a wide range of scientific domains: High Energy Physics (WLCG), Astronomy (CTA and LSST), Photon and Life Science (XFEL and LifeWatch), Medical research (ECRIN). The list of services include dCache, Onedata, EOS, FTS, Indigo-Orchestrator, Indigo-CDMI server, RUCIO and Dynafed. All of them have been organized in a coherent architecture that can be easily plugged into the current distributed e-Infrastructures. This is possible thanks to the adoption of established standard for protocols and authentication methods. XDC started in November 2017 and is now close to its end. Two releases have been produced: XDC-1, codenamed Pulsar, at the beginning of 2019 and XDC-2, codenamed Quasar, released at the end of 2019. This work presents the new features introduced with these two releases, how they impacted on the tools composing the XDC-Toolbox and, in general, the main achievements of the project. Some of the developments that will be presented are: the OpenIDConnect support for all the tools, Data Caching methods based on xrootd and http protocols to support geographic deployment of distributed caches and the inclusion of diskless sites in the e-infrastructures, bulk storage QoS transitions support, new Onedata features for metadata management, sensitive data handling and QoS change, storage events notifications to support complex workflows.
        Speaker: Daniele Cesini (INFN-CNAF)
    • Infrastructure Clouds and Virtualisation Session: II Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Dr Tomoaki Nakamura (KEK)
      • 70
        A big data infrastructure for predictive maintenance of large-scale data centers
        Predictive maintenance is emerging as a new trend in research, due to its advantages compared to the alternative methodologies of corrective and preventive maintenance. The ability to predict faults and intervene before they occurs, allows saving money in a wide set of application domains, among which management of data centers. Savings are usually directly proportional to the size of the involved entities. Due to the novelty of this approach, and to the variety and heterogeneity of the involved data sources, identifying, extracting and processing valuable information in an efficient and effective way is a challenging task. In fact, in such scenario data sources may include log files produced by each computing node, as well as infrastructure monitoring data (e.g. cabinets and rack sensors) and environmental data produced by sensors (e.g., temperature/humidity, fire, flooding etc.) installed in the data center. We hereby present a layered scalable big data infrastructure aimed at predictive maintenance for large data centers. Despite leveraging open source Apache technologies, the proposed infrastructure (supporting both batch and real-time analysis) follows a general approach ensuring its portability to different frameworks. On the bottom level, Apache Flume performs data ingestion dealing with different data sources (e.g. syslog, time series databases). Timely distribution to processing nodes is carried out at a higher level by the topic-based publish-subscribe engine Apache Kafka. Processing is performed through Apache Spark and Spark Streaming instances. Data persistence is achieved through Apache's distributed filesystem HDFS, that allows saving both the original log files and the results of the analysis. The topmost layer constitutes the presentation layer, including the visualization tools through which the final users (i.e., sysadmins) may monitor the status of the system be notified about foreseen faults. This work is framed in the context of the INFN Tier-1 data center, involving data from approximately 1200 nodes. DODAS (Dynamic On Demand Analysis Service) has been adopted to deploy and easily replicate the deployment of the analysis cluster. DODAS is a Platform as a Service (PaaS) tool allowing local and remote deployment with a minimal effort, based on the specifications included in TOSCA templates. It supports any cloud provider, only requiring the access credentials. Within DODAS, the Infrastructure Manager (IM) provides an abstraction over the underlying architecture.
        Speaker: Dr Fabio Viola (INFN-CNAF)
      • 71
        Performance and Cost Evaluation of Public Cloud Cold Storage Services for Astronomy Data Archive and Analysis
        Currently major cloud providers provide cold storage services as a part of their public IaaS offerings, targeting users who need to store data with relatively low access frequency for long periods. The adoption of cold storage services should be considered in order to reduce the total cost of ownership and the labor of storage management of maintaining large amounts of scientific research data over a long period of time. However, performance and cost of public cold storage services in scientific applications have not been well studied, and the following issues arise:
        1. It is difficult to determine whether cold storage services meet the performance requirements of research applications.
        2. It is also difficult to assess the feasibility of storing and accessing research data in cold storage services in terms of cost.
        In order to address the issues mentioned above and to validate feasibility of adopting cold storage services in the astronomical research area, we present evaluation of cloud cold storage services using astronomical research data and applications. We stored the observation and analysis data of the ALMA radio telescope project[1] in S3 Infrequent Access and Glacier provided by Amazon Web Services (AWS) and we ported the data archive software used in the ALMA project, Next Generation Archive System (NGAS), to AWS. To solve the first issue, we measured the performance of data retrieval operations of NGAS on AWS. In addition, we conducted performance benchmark tests such as uploading up to 60TB ALMA data. We also conducted the same benchmark tests on other commercially available cold storage services such as Google, Azure, and Oracle to validate the performance requirements can be generally fulfilled. To solve the second issue, we proposed a cost estimation model of NGAS based on the AWS payment required for storing and retrieving data and estimated the yearly expense of the NGAS on AWS by using the actual values of data amounts and the accesses frequency statistics. Our estimation shows that retrieving data from a cold storage service and analyzing the data outside of the cloud (e.g. an on-premise system) increase the cost because data transfer cost outward the cloud is significantly high. We designed the architecture to analyze the retrieved data inside cloud and estimated cost for running common analysis applications, Common Astronomy Software Applications package (CASA)[2], with NGAS on a variety of instances of AWS. From those experiments, the following findings are obtained:
        1. We can obtain practically acceptable performance of data access in the cold storage services by configuring the archive system with appropriate sizing of instances and tuning the system. Although some cold storage services require hours to start data access, this disadvantage can be mitigated by adopting an appropriate tiered storage architecture.
        2. The proposed cost estimation model enables to estimate total cost of data archive in the cloud cold storage services and data analysis on the cloud services. The model is also capable to estimate cost on a hybrid system organized by clouds and on-premise systems. Additionally, the practical information which can be used to determine the optimal configuration of the analysis system such as sizing information of AWS instances are acquired.
        [1] https://www.nao.ac.jp/en/research/project/alma.html [2] https://casa.nrao.edu/
        Speaker: Mr Hiroshi Yoshida (National Institute of Informatics)
      • 72
        Cloud Bursting GPU workflow in support of Multi-Messenerg Astrophysics with IceCube
        In Fall 2019, we performed the largest possible GPU burst across multiple commercial cloud providers, and all of their relevant regions. The goal of our NSF funded EAGER award (NSF OAC 1941481) was to achieve a scale of 80,000 V100 equivalent GPUs to process photon propagation simulations for the IceCube Neutrino Observatory for one hour, thus achieving fp32 Exaflop scale. Since then, we learned that it is rather unlikely that even the combination of the big three commercial cloud providers have that kind of on-demand capacity. In this talk, we will first report how we convinced ourselves that we have the capability to run the IceCube workflow at this scale across the cloud providers, including the necessary hundreds of terabytes of IO handling across the various relevant regions globally. After that, we present the actual scale achieved, and conclude with lessons learned from this exercise.
        Speaker: Prof. Frank Wuerthwein (UCSD/SDSC)
    • Joint DMCC & UND Workshop: Deep Understanding Natural Diasters Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • 10:30
      Coffee Break
    • Data Management Session: III Conference Room 2

      Conference Room 2

      BHSS, Academia Sinica

      Convener: Dr David Groep (Nikhef)
      • 73
        Preparing for the HL-LHC computing challenge: the WLCG DOMA project
        The HSF Community White Paper indicated Data Organization, Management, and Access as one of the key areas to explore to address the HL-LHC computing challenge. The WLCG collaboration initiated the R&D DOMA project in early 2018 to expose existing initiatives in this area, foster collaboration between active parties and organize the evolution in a coherent way. The ultimate goal of WLCG DOMA is to commission an ecosystem of tools and services to build a cloud-like distribute storage infrastructure, also known as Data Lake, to optimize cost of scientific computing. The R&D initiatives span many areas: commissioning of asynchronous transfer protocol alternatives to gridFTP, prototyping token-based authentication and authorization, evaluating technologies and methodologies for caching and latency hiding, defining and implementing storage Quality-Of-Services. Finally, network R&D activities aim at optimizing traffic in a Data Lake environment at several levels. In this contribution we will present the current achievements of the DOMA project and outline the future direction.
        Speaker: Dr Simone Campana (CERN)
      • 74
        A distributed R&D storage platform implementing quality of service
        Optimization of computing resources, in particular storage, the costliest one, is a tremendous challenge for the High Luminosity LHC (HL-LHC) program. Several venues are being investigated to address the storage issues foreseen for HL-LHC. Our expectation is that savings can be achieved in two primary areas: optimization of the use of various storage types and reduction of the required manpower to operate the storage. We will describe our work, done in the context of the WLCG DOMA project, to prototype, deploy and operate an at-scale research storage platform to better understand the opportunities and challenges for the HL-LHG era. Our multi-VO platform includes several storage technologies, from highly performant SSDs to low end disk storage and tape archives, all coordinated by the use of dCache. It is distributed over several major sites in the US (AGLT2, BNL, FNAL & MWT2) which are several tens of msec RTT apart with one extreme leg over the Atlantic in DESY to test extreme latencies. As a common definition of attributes for QoS characterizing storage systems in HEP has not yet been defined, we are using this research platform to experiment on several of them, e.g., number of copies, availability, reliability, throughput, iops and latency. The platform provides a unique tool to explore the technical boundaries of the ‘data-lake’ concept and its potential savings in storage and operations costs. We will conclude with a summary of our lessons learned and where we intend to go with our next steps.
        Speaker: Dr Patrick Fuhrmann (DESY/dCache.org)
      • 75
        Processing of storage events as a service for federated science clouds
        Extending large scale research infrastructures for the Photon and Neutron (PaN) science community is done in preparation for hundreds of Petabytes of data, that will be delivered over the next years with ground breaking molecular imaging technologies. These data volumes and rates as well as a growing number of data sources result in an increasing demand for innovative, flexible storage and compute services for low latency, high throughput data processing in distributed environments. Leveraging cloud computing and containerization, DESY has presented pilot systems for the Photon and Neutron Open Science Cloud (PaNOSC), and provides collaborative platforms like GitLab and JupyterHub as services to the European Open Science Cloud (EOSC). Building on cloud and container orchestration templates, elastic analysis services can be dynamically provisioned freeing users from burden with infrastructure management. This allows to focus on enhancements for efficient resource provisioning and auto-scaling units from very granular functions to long-running complex HTC jobs e.g. automating data reduction services on classic batch systems. Scaling out both, modern Function-as-a-Service systems like Apache OpenWhisk and well known dynamic batch systems like HTCondor, to transient resources introduces a shift from local to federated resource management, from user based brokering to matching the current demand by a service. With respect to the FAIR principles, we encourage the reuse of data and metadata as well as codes and configuration by providing computational microservices which typically run with user supplied software stacks that integrate analysis frameworks and share large portions of their codebase. Function-as-a-Service systems allow to run user provided containers as cloud functions, which we continuously integrate in shared container registries (GitLab) and deploy to OpenWhisk running on a linked Kubernetes cluster. Adopting storage events, presented by the dCache project at ISCG 2019, as triggers for automated scientific data processing has led to the implementation of event brokers (Apache Kafka) and stream processing modules. This infrastructure is used for routing of generated events from the backend storage as well as from various other sources to streams which consumer services can subscribe to in order to invoke cloud functions. At the center of our discussion is the need for a comprehensive security model, where authenticated clients are authorised to filter dedicated subsets of events, which may contain tokenized access delegation for reading data and storing output as well as access tokens for compute services to run data processing pipelines. In this context we show, how the function-as-a-service compute model itself provides the means to implement stream processing for storage events and to provide secure delivery to external systems. We compare this to a more centralized design with a multi-tenant event streaming platform. Finally, we discuss use cases spannig from data taking and online monitoring to offline batch processing and multi-cloud bursting.
        Speaker: Mr Michael Schuh (DESY)
    • Infrastructure Clouds and Virtualisation Session: III Auditorium

      Auditorium

      BHSS, Academia Sinica

      Convener: Dr Tomoaki Nakamura (KEK)
      • 76
        Join the Dots: from the Edge to the Core with Functions, IoT and a Touch of AI.
        In the past year, INFN has been actively working on several projects that involve close collaborations between research and industry. In these collaborations, industries typically bring in complex use cases that involve data collection from heterogeneous sources, often originated by many distributed sensors, while INFN works together with them to define proper architectures to handle these use cases. They normally follow a three-tier approach, with levels such as IoT, Edge and Cloud. In this talk, we briefly discuss the most challenging use cases, moving on to report on some state-of-the-art solutions we have been developing, involving data ingestion from streamed data in real-time or quasi real-time, processing and analytics. More in detail, a key point of our architecture is typically the integration of reusable, open components at multiple layers, providing vendor neutrality in terms of infrastructures, with rapid prototyping and deployment. At the Edge level, we show how data processing and an AI-based inference engine can be implemented through a reusable set of pre-trained neural networks and with use of Function as a Service solutions. At the Cloud level, we show how our "Infrastructure as Code" approach led us to automatize data transfer, possibly replicating and respecting privacy for data sets where this is necessary, to do continuous training of changing data sets, and to set up repositories for complete infrastructural solutions (e.g. dynamically provisioned Spark clusters or monitoring engines), for AI models, down to executable functions. These components can all be published, used and reused in either testing or production mode, to verify changes before they enter live status, eventually shortening time to market. We conclude the talk by showing how these solutions are not only important for industry, but are of immediate interest and use also for several scientific applications, and how they have been integrated into the INFN nation-wide distributed infrastructure.
        Speakers: Daniele Cesini (INFN-CNAF) , Prof. Davide Salomoni (INFN)
      • 77
        Exploring Artificial Intelligence as a Service (AI-aaS) for an AI-Assisted Assessment Online Infrastructure
        This study explores Artificial Intelligence as a service (AI-aaS) from two perspectives. The first is to develop a system that can provide educational researchers and practitioners alike ready access to AI algorithms and services across an entire school. The second is to leverage, and then evaluate the use of AI-aaS to support AI-driven assessment, paying particular attention to its infrastructure requirements and pedagogical demands. AI-aaS has several successful use cases and can be found in smart applications across numerous sectors like transportation, manufacturing and energy. This study proposes a new application for education. It explores third-party AI-aaS as an outsourced service and evaluates its effectiveness for handling educational routines and complex assessment policies, and large data volumes generated from formal assessment of learners. AI-aaS offers the opportunity to try algorithms and services to determine feasibility and suitability before committing to scale. In the first case exploration, it examines the feasibility of using AI-aaS when applied in a monitoring, analysis, plan, execution plus knowledge (MAPE-K) loop to manage the internal operation of a system as well as its interactions with other systems in an autonomous manner. AI-aaS applications, even within education, have to account for quality and performance requirements that need to be translated into an institution’s quality of service standards. The study explores the use of a management layer and a training layer to achieve this. In the second case exploration, which is linked to the first use case, the study explores building an adaptive AI-assisted competency-based assessment engine for higher education and workplace training. The goal is to develop an AI engine, that is an AI-aaS, for evaluating, mapping and tracking skills and competencies proficiency attainment. Besides time and labour-saving benefits, the AI skills engine can better support skills proficiency levels determination by personalising assessment to learners and workers using machine learning and deep learning methods. Currently, determining and tracking a learner’s or worker’s proficiency levels by a battery of “stop and test” quizzes, exams and questionnaires remain a time-consuming affair for human assessors. AI can help unburden labour and time-intensive evaluation especially for determining gaps in skills and competencies required for academic and career progression. It is also better at compiling a training attainment record that can be retrieved and updated almost instantaneously. The goal of smart, responsive and personalised automated evaluation has tremendous potential in higher education and workplace training. This is even more so for evaluating adult skills and competencies, as this is a current gap. The study investigates a unified view of AI-aaS for research, educational and business opportunities in future infrastructure and virtualisation, vis-a-vis an on-premise system. It includes discussing the challenges and advantages of developing a nominal, lean network architecture for AI-aaS, by defining new protocols in the network management layer to enhance MAPE-K loops, by capturing efficiencies in the training layer. It also evaluates the effectivenessof AI-aaS to support implementation at the application level. AI-aaS is underdeveloped for research and educational purposes and there is promise to advance research and understanding in this emerging area. Keywords: Artificial Intelligence, AI as a Service, AI-aaS, AI Assessment
        Speakers: Prof. Chris PANG (Nanyang Polytechnic, Singapore) , Prof. Kok Hong CHAN (Nanyang Polytechnic)
      • 78
        Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)
        High Performance Computing (HPC) facilities provide vast computational power and storage, but generally work on fixed environments designed to address the most common software needs locally, making it challenging for users to bring their own software. To overcome this issue, most HPC facilities have added support for HPC friendly container technologies such as Shifter, Singularity, or CharlieCloud. These different container technologies are all compatible with the more popular Docker containers, however the implementation and use of said containers is different for each HPC friendly container technology. These usage differences can make it difficult for an end user to easily submit and utilize different HPC sites without making adjustments to their workflows and software. This issue is exacerbated when attempting to utilize workflow management software between different sites with differing container technologies. The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) that span multiple sites. Specifically, the project has extended the CERN-based REANA framework, a platform designed to enable analysis reusability, and reproducibility while supporting different workflow engine languages, in order to support submission to different High Performance Computing facilities. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA's dependence on Kubernetes support at the workers level. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on HPC resources, as well as the implementation of an abstraction layer that allows the support of different container technologies and different transfer protocols for files and directories between the HPC facility and the REANA cluster edge service from the user's workflow application.
        Speaker: Dr Kenyi Hurtado Anampa (University of Notre Dame)
    • Joint DMCC & UND Workshop: Deep Understanding Natural Diasters Media Conference Room

      Media Conference Room

      BHSS, Academia Sinica

    • Closing Ceremony Auditorium

      Auditorium

      BHSS, Academia Sinica

    • 12:50
      Lunch 4F Recreation Hall

      4F Recreation Hall

      BHSS, Academia Sinica