Speaker
Mr
Tomoe Kishimoto
(University of Tokyo)
Description
The Tokyo Tier-2 center, which is located in the International Center for Elementary Particle Physics (ICEPP) at the University of Tokyo, is providing computer resources for the ATLAS experiment in the Worldwide LHC Computing Grid (WLCG). The official site operation in the WLCG was launched in 2007 after several years of development. The site has been achieving a stable and reliable operation since then.
We replaced almost all hardware devices in every three years in order to satisfy the requirement of the ATLAS experiment. The latest hardware upgrade was done in January 2016, and the new system (so-called 4th system) is stably running. In the 4th system, 6144 CPU cores (256 worker nodes) and 7392 TB disk storages are reserved for the ATLAS experiment. For the Grid middlewares, the ARC-CE is deployed as the computing element in front of the HTCondor batch job scheduler. The disk storage consists of 48 sets of a disk array and a file server, which is managed by the Disk Pool Manager (DPM).
Logs produced by these Grid services provide useful information to determine whether the services are working correctly. For example, the job slot occupancy, the job success rate, the job duration and so on can be measured by parsing log files in the ARC-CE + HTCondor system. Therefore, we are constructing a new real-time monitoring system based on log analysis using the ELK stack in order to detect the problem of the Grid services. The ELK stack provides an efficient way of log processing, storing, query and visualization. In this poster, the status of construction of this new real-time monitoring system based on log analysis at the Tokyo Tier-2 center will be described. Improvements in terms of flexibility and reliability of the site operation by introducing the new monitoring system will also be discussed.
Primary author
Mr
Tomoe Kishimoto
(University of Tokyo)
Co-authors
Prof.
Hiroshi Sakamoto
(The University of Tokyo)
Mr
Nagataka Matsui
(The University of Tokyo)
Dr
Tetsuro Mashimo
(The University of Tokyo)
Dr
Tomoaki Nakamura
(KEK)