International Symposium on Grids & Clouds 2018 (ISGC 2018) in conjunction with Frontiers in Computational Drug Discovery (FCDD)

Name: International Symposium on Grids & Clouds 2018 (ISGC 2018) in conjunction with Frontiers in Computational Drug Discovery (FCDD)
Start: 2018-03-16T08:00:00+08:00
End: 2018-03-23T18:00:00+08:00
Location: Academia Sinica

16-23 March 2018

Academia Sinica

Asia/Taipei timezone

Support

stella.shen@twgrid.org

What goes up must go down: A case study from RAL on the process of shrinking an existing storage service

20 Mar 2018, 12:00

30m

Conference Room 1, BHSS (Academia Sinica)

Conference Room 1, BHSS

Academia Sinica

Oral Presentation Big Data & Data Management Data Management & Big Data Session

Mr Rob Appleyard (STFC)

Much attention is paid to the process of how new storage services are deployed into production that the challenges therein. Far less is paid to what happens when a storage service is approaching the end of its useful life. The challenges in rationalising and de-scoping a service that, while relatively old, is still critical to production work for both the UK WLCG Tier 1 and local facilities are not to be underestimated. RAL has been running a disk and tape storage service based on CASTOR (Cern Advanced STORage) for over 10 years. CASTOR must cope with both the throughput requirements of supplying data to a large batch farm and the data integrity requirements needed by a long-term tape archive. A new storage service, called ‘Echo’ is now being deployed to replace the disk-only element of CASTOR, but we intend to continue supporting the CASTOR system for tape into the medium term. This, in turn, implies a downsizing and redesign of the CASTOR service in order to improve manageability and cost effectiveness. We will give an outline of both Echo and CASTOR as background. This paper will discuss the project to downsize CASTOR and improve its manageability when running both at a considerably smaller scale (we intend to go from around 140 storage nodes to around 20), and with a considerably lower amount of available staff effort. This transformation must be achieved while, at the same time, running the service in 24/7 production and supporting the transition to the newer storage element. To achieve this goal, we intend to transition to a virtualised infrastructure to underpin the remaining management nodes and improve resilience by allowing management functions to be performed by many different nodes concurrently (‘cattle’ as opposed to ‘pets’), and also intend to streamline the system by condensing the existing 4 CASTOR ‘stagers’ (databases that record the state of the disk pools) into a single one that supports all users.

Mr Rob Appleyard (STFC)

Dr George Patargias (STFC)

Slides

CASTORTalk_ISGC2018.pptx

International Symposium on Grids & Clouds 2018 (ISGC 2018) in conjunction with Frontiers in Computational Drug Discovery (FCDD)

Support

What goes up must go down: A case study from RAL on the process of shrinking an existing storage service

Conference Room 1, BHSS

Academia Sinica

Speaker

Description

Primary author

Co-author

Presentation materials