Speaker
Description
With the deepening application of large artificial intelligence (AI) models in high energy physics (HEP) data analysis, both data access patterns and storage architectures are undergoing profound transformation. Traditional file systems face performance and scalability bottlenecks when dealing with unstructured data, high-concurrency access, and cross-site data sharing. Object storage, with its advantages of high scalability, cost efficiency, strong consistency, and native API access, is increasingly becoming the new data infrastructure for AI computing platforms.
This report analyzes typical application scenarios of object storage in AI model training, inference, and data management within HEP experiments, and introduces a newly developed cross–wide-area-network distributed object storage system, JWanFS. The system’s applications in training dataset management and cross-center collaboration are examined in detail.
By comparing JWanFS with mainstream object storage systems, the report proposes key optimization strategies for deploying object storage in HEP data centers, including multi-level caching, wide-area data synchronization, and metadata acceleration mechanisms. The study demonstrates that AI- and science-oriented object storage systems like JWanFS can significantly improve data access efficiency and operational flexibility in AI computing platforms, providing a solid foundation for future large-scale scientific computing.