13-18 March 2016
Academia Sinica
Asia/Taipei timezone

4th system upgrade of Tokyo Tier2 center

18 Mar 2016, 09:20
20m
BHSS, Conf. Room 2 (Academia Sinica)

BHSS, Conf. Room 2

Academia Sinica

Oral Presentation Physics (including HEP) and Engineering Applications Physics & Engineering Session

Speaker

Dr Tomoaki Nakamura (KEK)

Description

The Tokyo Tier2 center, which is located at International Center for Elementary Particle Physics (ICEPP) in the University of Tokyo in Japan, was established as a regional analysis center for the ATLAS experiment. The official operation with Worldwide LHC Computing Grid (WLCG) was started in 2007 after the several years development since 2002. In December 2015, we have replaced a lot of hardware as the fourth system upgrade to cover the requirement of ATLAS experiment in LHC Run2. The total number of CPU cores is not increased from the previous system (9984 cores) including the CPUs for service instance, but the performance of individual CPU core is improved by 5% according to the HEPSPEC06 benchmark test (Intel Xeon E5-2680 v3 2.50GHz). Since all worker nodes are made by 24 physical CPU cores configuration, we deploy 416 blade servers in total. They are connected to 10.56PB of disk storage system with 10Gbps internal network backbone by using two center network switch (NetIr on MLXe-32, Brocade Communication Systems, Inc). The disk storage system is made by 80 of RAID6 disk arrays (Infortrend DS 3024G000F8C16D00) and served by equivalent number of 1U file servers (DELL PowerEdge R630) with 8G-FC connection. Among of the total computing resource in the fourth system, 3840 CPU cores and 7.392PB of storage capacity are reserved for the WLCG worker nodes and ATLAS disk storage area in upcoming three years, respectively. The remaining resources are dedicated to the Japanese collaborators. Since most of the data analysis jobs are I/O bound type jobs, we assigned 10Gbps of internal network bandwidth per two worker node for the effective use of such number of CPU cores. GPFS have been introduced for the non-grid resource, while Disk pool manager (DPM) are continued to be used for WLCG from the third system. In the third system, we had already have 3.168PB of ATLAS data in the DPM storage. All of those data is once migrated to the temporal storage so tha t Grid jobs can use the data stored in Tokyo Tier2 with reduced number of worker nodes during the migration period. In this talk, we would like to introduce a procedure of the whole scale system upgrade, the improvement of the performance of the system and future perspectives based on the experience at the Tokyo Tier2 center.

Primary author

Co-authors

Prof. Hiroshi Sakamoto (The University of Tokyo) Prof. Nagataka Matsui (University of Tokyo) Prof. Tetsuro Mashimo (University of Tokyo)

Presentation materials