13-18 March 2016
Academia Sinica
Asia/Taipei timezone

The Cluster Monitoring System of IHEP

Mar 18, 2016, 10:00 AM
Mr Qingbao Hu (IHEP)


With the rapid increase of the high-energy physics experimental requirements, the IHEP cluster scale is in rapid growth. More services running at the different devices, and more software and hardware status need to be monitored in real-time. A fine grained Monitor system can guarantee a device runs well, and solve the error happened to it. A Monitoring system ensures the stability of the whole platform. In IHEP monitor system, Ganglia is used to record the status of the cluster machines, such as CPU load averages and network utilization; Nagios detects the service status, and sends the alarm actively based on NRPE remote plugin. A real-time logger-analyze,the monitoring tool we developed, collects services log and machines log and provides an overview of the whole cluster health status. The aim of this tool is to give a summary of the cluster stability in real time.

Mr Qingbao Hu (IHEP)


Mr Xiaowei Jiang (IHEP) Mrs jingyan Shi (IHEP)

