15-20 March 2026
BHSS, Academia Sinica
Asia/Taipei timezone

Evolution of IFARM: A Multi-Schedd HTCondor Cluster for Korean Scientific Communities at GSDC

17 Mar 2026, 16:25
25m
Auditorium (3F, BHSS, Academia Sinica)

Auditorium

3F, BHSS, Academia Sinica

Oral Presentation Track 9: Converging High Performance Computing Infrastructures: Supercomputers, clouds, accelerators Converging High PerformanComputing Infrastructures: Supercomputers, clouds, accelerators

Speaker

Heeseok Jeong (Korea Institute of Science and Technology Institute)

Description

IFARM is a computing farm operated by GSDC/KISTI that supports computational experiments for several Korean scientific communities. IFARM has been in service since 2019 by integrating the previously separate computing‑farm services that had been provisioned for each scientific community. As of the end of 2025, the service targets are the CMS Tier‑3 and ALICE Tier‑3 communities. By mid‑2025, the BIO community (a Korean bioinformatics group) had also been hosted on this farm, but it is currently spun off into a separate service in July, 2025.
IFARM is a multi-schedd HTCondor cluster, where each community posesses 1 HTCondor user access point node. There are shared hosts for all communities : 1 HTCondor central manager node, 1 XRootD frontend node, 5 XRootD backend nodes, a CVMFS cache node and 76 worker nodes with around 5000 logical cores in total. IFARM is currently being utilized for interactive analysis, local analysis, and code development and testing by the users of CMS Tier-2 and ALICE Tier-1, respectively, at GSDC.
By the end of 2025, the introduction of the new servers for substituting the old servers and enlargement of XRootD storage capacity of CMS Tier‑3 are planned at IFARM. We want to share our current challenges regarding HEP computing services and discuss the direction for improving IFARM's system architecture in relation to its upgrade. The application of system virtualization is actively pursued for all computing services, including CMS Tier-3 and ALICE Tier-3, as well as FCC (a newly launched computing service by the end of 2025), considering the current system users' behaviour patterns. We also plan to build a reliable system by duplicating key service nodes, including UI nodes, to ensure high availability of scientific computing service. Additionally, we would like to discuss here, the methodologies that enhances the current system operation methods or system monitoring method for proactively mitigating failures and for streamlining incident response.

Primary author

Heeseok Jeong (Korea Institute of Science and Technology Institute)

Co-authors

Dr Heejun Yoon (Korea Institute of Science and Technology Information) Dr Geonmo Ryu (Korea Institute of Science and Technology Information)

Presentation materials

There are no materials yet.