Skip to content

matrixorigin/sirius

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

741 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diagram
Slack

Sirius is a GPU-native SQL engine. It plugs into existing databases such as DuckDB via the standard Substrait query format, requiring no query rewrites or major system changes. Sirius currently supports DuckDB and Doris (coming soon), other systems marked with * are on our roadmap. Built on NVIDIA CUDA-X libraries including cuDF and RAPIDS Memory Manager (RMM), Sirius delivers high-performance GPU-accelerated analytics.

Diagram

Performance

Running TPC-H on 1TB data, Sirius accelerates DuckDB by 5x on DGX Station (GB300).

Performance

Supported OS/GPU/CUDA

  • Ubuntu >= 22.04
  • NVIDIA Volta™ or higher with compute capability 7.0+
  • CUDA >= 13.0 (requires NVIDIA driver >= 570)
  • We recommend building Sirius with at least 16 vCPUs to ensure faster compilation.

Installing Dependencies

  • Git (to clone the repo)
  • Pixi (install instructions here)

Building and Running Sirius

Sirius provides two execution paths. See each page for how to build, run, and test:

  • gpu_execution (Recommended) — Out-of-core execution with tiered memory management (GPU/host/disk), automatic data partitioning, and spilling. Works with Parquet data format.
  • gpu_processing — In-memory execution where the dataset must fit in GPU memory. Works with DuckDB's native storage format.

Logging

Sirius uses spdlog for logging messages during query execution. Default log directory is log (relative to the current working directory) and default log level is info.

Log directory and level can be initialized via environment variables before loading the extension:

export SIRIUS_LOG_DIR=/path/to/logs
export SIRIUS_LOG_LEVEL=trace

Both can also be configured at runtime via DuckDB's SET command:

SET sirius_log_dir = '/path/to/logs';
SET sirius_log_level = 'trace';
SET sirius_log_flush_seconds = 1;

Limitations

Sirius is under active development. Notable current limitations include:

  • Data Type Coverage: Sirius currently supports commonly used data types including INTEGER, BIGINT, FLOAT, DOUBLE, VARCHAR, DATE, TIMESTAMP, and DECIMAL. We are actively working on supporting additional data types—such as nested types.
  • Operator Coverage: At present, Sirius supports FILTER, PROJECTION, JOIN (Hash/Nested Loop/Delim), GROUP-BY, ORDER-BY, AGGREGATION, TOP-N, LIMIT, and CTE. We are working on adding more advanced operators such as WINDOW functions and ASOF JOIN, etc.

For a full list of current limitations and ongoing work, please refer to our GitHub issues page. If these issues are encountered when running Sirius, Sirius will gracefully fallback to DuckDB query execution on CPUs.

Contributors and Partners

NVIDIA          UW-Madison

DuckDB Labs          VAST Data

Future Roadmap

Sirius is still under major development and we are working on adding more features to Sirius, such as disk spilling, multi-GPUs, multi-node, more operators, data types, accelerating more engines, and many more.

Sirius always welcomes new contributors! If you are interested, check our website, reach out to our email, or join our slack channel.

Let's kickstart the GPU eras for Data Analytics!

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 80.0%
  • Cuda 10.1%
  • Shell 5.5%
  • Python 3.6%
  • CMake 0.5%
  • Dockerfile 0.2%
  • Makefile 0.1%