Speaker
            Mr
    Qingbao Hu
            
                (IHEP)
        
        
    Description
With the rapid increase of the high-energy physics experimental requirements, the IHEP cluster scale is in rapid growth. More services running at the different devices, and more software and hardware status need to be monitored in real-time. A fine grained Monitor system can guarantee a device runs well, and solve the error happened to it. A Monitoring system ensures the stability of the whole platform.
    In IHEP monitor system, Ganglia is used to record the status of the cluster machines, such as CPU load averages and network utilization; Nagios detects the service status, and sends the alarm actively based on NRPE remote plugin. A real-time logger-analyze,the monitoring tool we developed, collects services log and machines log and provides an overview of the whole cluster health status. The aim of this tool is to give a summary of the cluster stability in real time.
            Primary author
        
            
                
                        Mr
                    
                
                    
                        Qingbao Hu
                    
                
                
                        (IHEP)
                    
            
        
    
        Co-authors
        
            
                
                        Mr
                    
                
                    
                        Xiaowei Jiang
                    
                
                
                        (IHEP)
                    
            
        
            
                
                        Mrs
                    
                
                    
                        jingyan Shi
                    
                
                
                        (IHEP)
                    
            
        
    
        