Background
High-performance data compression can alleviate capacity bottlenecks and even improve performance in GPU compute applications, by reducing memory bandwidth requirements. However, integrating compression throughout the entire data processing pipeline is a non-trivial tasks, especially in multi-GPU environments, and doing so manually might encode assumption that make it difficult to test other compression methods.
UMUGUC is a research project that aims to provide a simple-declarative API which allows a runtime system to transparently map accesses to compressed data representations, without requiring any compression-specific code in user programs.
Make it easy to work with end-to-end compressed (dense) data in multi-GPU programs.
Overview
UMUGUC will offer a simple declarative extension on top of existing buffer abstractions (used in GPU programming models such as SYCL) to allow users to specify that a buffer should be compressed, and to specify either the constraints on compression, or the compression method directly. The former allows the runtime to select the best compression method to achieve the desired quality on the current hardware, while the latter enable users to experiment with different compression methods.
The API is currently not finalized, but will look somewhat like this:
namespace c = celerity::compression;
auto range = celerity::range<2>(height, width);
using input_comp = celerity::compressed<c::lossy<12>>;
celerity::buffer image_input_buf(image_input.data(), range);
using tmp_comp = celerity::compressed<c::ndzip<2,16>>;
celerity::buffer image_tmp_buf(range);
The runtime system will then implement the necessary data movement and compression/decompression operations to ensure that the user program can work with the compressed data as if it were uncompressed and of the base type. Depending on the constraints imposed by factors such as the available local and global memory, and the access granularity afforded by the compression method, the runtime system will choose different implementation strategies, as outlined in the figure below for the decompression case:
Technology
UMUGUC is built on top of the Celerity runtime system, which provides a high-level programming model for distributed-memory clusters of GPUs. Celerity is built on top of the Khronos SYCL standard, and provides a data-parallel multi-GPU programming model with a single-source C++ interface. The UMUGUC project extends Celerity with the necessary abstractions to support end-to-end data compression.
Publications & Talks
- Talk:
A High-level API for End-to-end Data Compression in Multi-GPU Cluster Applications
Gabriel Mitterrutzner, Peter Thoman & Philipp Gschwandtner
HLPP'25: 18th International Symposium on High-level Parallel Programming and Applications (2025) - Conference Paper:
Achieving High-throughput Strided Data Movement Across GPUs
Peter Thoman & Philipp Gschwandtner
IWOCL '25: Proceedings of the 13th International Workshop on OpenCL and SYCL (2025) - Talk:
Celerity - SYCL-Based High-Productivity Development at HPC Scale
Peter Thoman
PASC '25: Platform for Advanced Scientific Computing (2025) - Journal Paper:
Celerity-RSim: Porting Light Propagation Simulation to Accelerator Clusters Using a High-Level API
Peter Thoman, Philipp Gschwandtner, Facundo Molina Heredina & Thomas Fahringer
International Journal of Parallel Programming (2025) - Conference Paper:
SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance
Peter Thoman, Fabian Knorr & Luigi Crisci
IWOCL '24: Proceedings of the 12th International Workshop on OpenCL and SYCL (2024)
Partners & Funding
UMGUC is funded by the Austrian Research Promotion Agency (FFG) under the BRIDGE programme.
The project partners are the University of Innsbruck, and AirborneHydroMapping GmbH.
UMUGUC is coordinated by Peter Thoman.