International Symposium on Grids & Clouds 2018 (ISGC 2018) in conjunction with Frontiers in Computational Drug Discovery (FCDD)

Name: International Symposium on Grids & Clouds 2018 (ISGC 2018) in conjunction with Frontiers in Computational Drug Discovery (FCDD)
Start: 2018-03-16T08:00:00+08:00
End: 2018-03-23T18:00:00+08:00
Location: Academia Sinica

16-23 March 2018

Academia Sinica

Asia/Taipei timezone

Support

stella.shen@twgrid.org

Explore New Computing Environment for LHAASO offline data analysis

23 Mar 2018, 10:00

30m

Auditorium, BHSS (Academia Sinica)

Auditorium, BHSS

Academia Sinica

Oral Presentation Networking, Security, Infrastructure & Operations Networking, Security, Infrastructure & Operation Session

Dr Qiulan Huang (Institute of High Energy of Physics, Chinese Academy Sciences)

The exploitation of a new computing environment has become an urgent practice to overcome a series of challenges with the development of the new generation of High Energy Physics(HEP). LHAASO(Large High Altitude Air Shower Observatory) is expected the most sensitive project to studies the problems in Galactic cosmic ray physics, and requires massive storage and computing power. Efficient parallel algorithms/frameworks and High IO throughput are key to meet the scalability and performance requirements of LHAASO offline data analysis. Though Hadoop has gained a lot of attention from scientific community for its scalability and parallel computing framework for large data sets, it is still difficult to make LHAASO data processing tasks run directly on Hadoop. In this paper we explore ways to build a new computing environment using Hadoop to make LHAASO jobs run on it transparently. Particularly, we discuss a new mechanism to support LHAASO software to random access data in HDFS. Because HDFS is streaming data stored only supporting sequential write and append. It cannot satisfy LHAASO jobs to random access data. This new feature allows the Map/Reduce tasks to random read/write on the local file system on data nodes instead of using Hadoop data streaming interface. This makes HEP jobs run on Hadoop possible. We also develop diverse MapReduce model for LHAASO jobs such as Corsika simulation, ARGO detector simulation (Geant 4) and MK2A data processing. And we wrap the models to make them transparent to users. In addition, we provide the real-time cluster monitoring in terms of cluster healthy, number of running jobs, number of finished jobs and number of killed jobs. Also the accounting system is included. This work has been in production for LHAASO offline data analysis to gain about 40,000 CPU hours per month since September, 2016. The results show the efficiency of IO intensive job can be improved about 46%. Finally, we describe our current work of data migration tools to serve the data move between HDFS and other storage system or Tape.

Dr Qiulan Huang (Institute of High Energy of Physics, Chinese Academy Sciences)

Prof. Gongxing Sun (Institute of High Energy of Physics,Chinese Academy of Sciences)

Slides

The_New_Computing_Environment_for_LHAASO.pdf

International Symposium on Grids & Clouds 2018 (ISGC 2018) in conjunction with Frontiers in Computational Drug Discovery (FCDD)

Support

Explore New Computing Environment for LHAASO offline data analysis

Auditorium, BHSS

Academia Sinica

Speaker

Description

Primary author

Co-author

Presentation materials