International Symposium on Grids & Clouds 2017 (ISGC 2017)

Name: International Symposium on Grids & Clouds 2017 (ISGC 2017)
Start: 2017-03-05T08:00:00+08:00
End: 2017-03-10T18:00:00+08:00
Location: BHSS, Academia Sinica

5-10 March 2017

BHSS, Academia Sinica

Asia/Taipei timezone

Support

stella.shen@twgrid.org

VCondor - an implemention of dynamic virtual computing cluster

10 Mar 2017, 10:50

20m

Conf. Room 1 (BHSS, Academia Sinica)

Conf. Room 1

BHSS, Academia Sinica

No. 128, Sec. 2, Academia Rd., Taipei, Taiwan

Infrastructure Clouds and Virtualisation Infrastructure Clouds and Virtualisation II

Mr Yaodong CHENG (IHEP, CAS)

As a new approach to manage resource, virtualization technology is more and more widely applied in high energy physics field. We have built virtual computing cluster at IHEP based on Openstack, with HTCondor as the job management system. In traditional computing cluster, fixed number of slots are pre-allocated to the job queue of different experiments. However, this kind of policy has gradually become dissatisfy with the peak requirements of different experiments, and also leads to a low CPU utilization. To solve the problem, we designed and implemented a dynamic virtual computing cluster system - VCondor based on HTCondor and Openstack. This system performs unified management of virtual machines according with queue status in HTCondor. One or more VMs will be created automatically when some jobs are waiting to run. VM will be destroyed when job is finished and there is no more job in HTCondor queue. Job queue status is checked in a period of time such as 10 minutes, so a VM will continue to run if there are new jobs in the period of time. VCondor also support resource provision and reservation for different experiments. VCondor has to request and get the available number of VM from a VM resource scheduling system called VMQuota before it acreatea VMs. VMQuota tells how many VMs VCondor can create and how long these VMs will be reserved before they are created. This talk will present several use cases of LHAASO and JUNO experiments. The results show virtual computing cluster can dynamically expanded or shrunk while computing requirements changed. Additionally, CPU utilization of overall computing resource is significantly improved compared with traditional resource management system. The system also has good performance when there are multiple condor schedulers and multiple job queues. It is stable and easy to maintain as well.

Mr Yaodong CHENG (IHEP, CAS)

Slides

VCondor_ISGC2017.pdf

International Symposium on Grids & Clouds 2017 (ISGC 2017)

Support

VCondor - an implemention of dynamic virtual computing cluster

Conf. Room 1

BHSS, Academia Sinica

Speaker

Description

Primary author

Presentation materials