Description
The rapid growth of the data available to scientists and scholars – in terms of Velocity and Variety as well as sheer Volume – is transforming research across disciplines. Increasingly these data sets are generated not just through experiments, but as a byproduct of our day-to-day digital lives. This track explores the consequences of this growth, and encourages submissions relating to two aspects in particular - firstly, the conceptual models and analytical techniques required to process data at scale; secondly, approaches and tools for managing and creating these digital assets throughout their lifecycle.
Additionally, a significant additional dimension is the automated generation and provisioning of metadata, either from simulated data such as Digital Twins or from experiments that produce vast amounts of data beyond manual annotation capacity. The automation of metadata creation and their availability in searchable catalogues is crucial for aligning with the FAIR Data Principles, ensuring data is findable and reusable. This process is also pivotal in making data usable for machine-driven applications, notably in AI training scenarios."
Data Management Planning within the EOSC CZ - Czech National Data Infrastructure for Research Data
Author: Jiří Marek, Open Science manager at Masaryk University, Head of EOSC CZ Secretariat, Czech Republic
The rapid expansion of data availability is reshaping research methodologies across various disciplines. This surge, characterized by its Velocity, Variety, and Volume, is driven not...
With the Run2025 for sPhenix, it comes the higher data throughput and data volume
requirements.
The sustained data throughput required for sPhenix2025 is 20GB/sec. Once started in
mid-April, this sustained data steam will be steadily constant with no breaks through December.
The projected data volume is 200PB.
In order to meet these data throughput and volume requirement, we must rebuild...
CLUEstering is a versatile clustering library based on CLUE, a density-based, weighted clustering algorithm optimized for high-performance computing. The library offers a user-friendly Python interface and a C++ backend to maximize performance. CLUE’s parallel design is tailored to exploit modern hardware accelerators, enabling it to process large-scale datasets with exceptional scalability...
The 14 beamlines for the phase I of High Energy Photon Source(HEPS) will produces more than 300PB/year raw data. Efficiently storing, analyzing, and sharing this huge amount of data presents a significant challenge for HEPS.
HEPS Computing and Communication System(HEPSCC), also called HEPS Computing Center, is an essential work group responsible for the IT R&D and services for the facility,...