13-18 March 2016
Academia Sinica
Asia/Taipei timezone

The Cluster Monitoring System of IHEP

18 Mar 2016, 10:00
20m
BHSS, Conf. Room 2 (Academia Sinica)

BHSS, Conf. Room 2

Academia Sinica

Oral Presentation Physics (including HEP) and Engineering Applications Physics & Engineering Session

Speaker

Mr Qingbao Hu (IHEP)

Description

With the rapid increase of the high-energy physics experimental requirements, the IHEP cluster scale is in rapid growth. More services running at the different devices, and more software and hardware status need to be monitored in real-time. A fine grained Monitor system can guarantee a device runs well, and solve the error happened to it. A Monitoring system ensures the stability of the whole platform. In IHEP monitor system, Ganglia is used to record the status of the cluster machines, such as CPU load averages and network utilization; Nagios detects the service status, and sends the alarm actively based on NRPE remote plugin. A real-time logger-analyze,the monitoring tool we developed, collects services log and machines log and provides an overview of the whole cluster health status. The aim of this tool is to give a summary of the cluster stability in real time.

Primary author

Mr Qingbao Hu (IHEP)

Co-authors

Mr Xiaowei Jiang (IHEP) Mrs jingyan Shi (IHEP)

Presentation materials