Workload management for heterogeneous multi-community grid infrastructures

20 Mar 2018, 12:00
30m
Media Conference Room, BHSS (Academia Sinica)

Media Conference Room, BHSS

Academia Sinica

Oral Presentation Virtual Research Environment (including Middleware, tools, services, workflow, ... etc.) VRE

Speaker

Dr Andrei Tsaregorodtsev (CPPM-IN2P3-CNRS)

Description

Grid infrastructures are providing access to computing resources using transparent uniform tools to multiple user communities of different sizes exploiting applications with different properties and requirements. Traditional grid computing resources, the so-called High Throughput Computing (HTC) clusters can be complemented with virtualized cloud and High Performance Computing (HPC) resources. The user workflows can require access to either of these types of resources or can use all of them. This can be done with job scheduling systems that can submit user payloads to various types of the computing resources transparently. The DIRAC project Workload Management System is providing a job scheduler that spans various heterogeneous computing resources and provides means for the users to select dynamically those resources that correspond to their application requirements. Various policies can be applied to activities of different users and groups to allow fair sharing of the communal resources. In particular HTC and cloud resources can be used together for the same user payloads. Special attention is paid to the use of the cloud resources respecting quotas for different user groups, which allows to share those resources efficiently. In this contribution we describe the use of the DIRAC Workload Management System (WMS) based on the example of the European Grid Infrastructure (EGI). We outline the general architecture of the system and give details on its practical operations. Experience with the replacement of the traditional grid middleware WMS will be presented as well as the necessary developments to meet the requirements of different user communities and applications.

Primary author

Dr Andrei Tsaregorodtsev (CPPM-IN2P3-CNRS)

Presentation materials