International Symposium on Grids & Clouds 2017 (ISGC 2017)

Name: International Symposium on Grids & Clouds 2017 (ISGC 2017)
Start: 2017-03-05T08:00:00+08:00
End: 2017-03-10T18:00:00+08:00
Location: BHSS, Academia Sinica

5-10 March 2017

BHSS, Academia Sinica

Asia/Taipei timezone

Support

stella.shen@twgrid.org

Examination of dynamic partitioning for multi-core jobs in the Tokyo Tier-2 center

10 Mar 2017, 09:00

30m

Conf. Room 2 (BHSS, Academia Sinica)

Conf. Room 2

BHSS, Academia Sinica

No. 128, Sec. 2, Academia Rd., Taipei, Taiwan

Physics (including HEP) and Engineering Applications Physics & Engineering I

Dr Tomoe Kishimoto (The University of Tokyo)

The Tokyo Tier-2 site, which is located in International Center for Elementary Particle Physics (ICEPP) at the University of Tokyo, is providing computer resources for the ATLAS experiment in the Worldwide LHC Computing Grid (WLCG). The official site operation in the WLCG was started in 2007 after the several years development since 2002, and the site has been achieving a stable operation since then. In the current system, which was upgraded in 2016, 6144 CPU cores have been deployed as worker nodes for the WLCG, where each worker node consists of 24 CPU cores. The ATLAS experiment developed the multi-core implementation of their software framework for reconstruction and simulation jobs, which provides an efficient memory sharing. In 2014, the experiment started to submit eight-core jobs using the above software framework to the Grid. The Tokyo Tier2 site has been processing this multi-core job and normal single-core job (e.g. user analysis job) separately using dedicated worker nodes and computing elements (static partitioning). However, we have often observed idle CPUs of the worker nodes due to this static partitioning when either multi-core or single-core jobs are not assigned to the site. Therefore, we started to evaluate an implementation of the dynamic partitioning of the worker nodes using HTCondor batch scheduler to reduce this idle CPU time, and have deployed a small cluster (1536 CPU cores) in the production. For the dynamic partitioning, draining of single-core jobs is necessary in order to dispatch a new multi-core job into a worker node when the worker node is filled by only single-core jobs. This draining should be performed until the number of running multi-core jobs reaches a target share. In order to perform an efficient draining, we need to consider several parameters, such as the number of draining machines at the same time, based on properties of the jobs. In this presentation, improvement of CPU efficiencies by introducing the dynamic partitioning and optimization of the drain parameters in the Tokyo Tier-2 center will be reported.

Dr Tomoe Kishimoto (The University of Tokyo)

Prof. Hiroshi Sakamoto (The University of Tokyo) Mr Nagataka Matsui (The University of Tokyo) Prof. Tetsuro Mashimo (The University of Tokyo) Prof. Tomoaki Nakamura (KEK)

Slides

2017_03_10.pdf

International Symposium on Grids & Clouds 2017 (ISGC 2017)

Support

Examination of dynamic partitioning for multi-core jobs in the Tokyo Tier-2 center

Conf. Room 2

BHSS, Academia Sinica

Speaker

Description

Primary author

Co-authors

Presentation materials