qs2: a framework for efficient serialization
qs2 is the successor to the qs package that introduces two new
formats: qs2 and qdata. The goal is to have reliable and fast
performance for saving and loading objects in R.
The qs2 format directly uses R serialization (via the
R_Serialize/R_Unserialize C API) while improving underlying
compression and disk IO patterns. If you are familiar with the qs
package, the benefits and usage are the same.
qs_save(data, "myfile.qs2")
data <- qs_read("myfile.qs2")Use the file extension qs2 to distinguish it from the original qs
package. It is not compatible with the original qs format.
install.packages("qs2")On x64 Mac or Linux (x86 only), you can gain a little more performance with the following configure flag:
remotes::install_cran("qs2", type = "source", configure.args = "--with-simd=AVX2")Multi-threading in qs2 uses the Intel Thread Building Blocks
framework via the RcppParallel package.
Because the qs2 format directly uses R serialization, you can convert
it to RDS and vice versa.
file_qs2 <- tempfile(fileext = ".qs2")
file_rds <- tempfile(fileext = ".RDS")
x <- runif(1e6)
# save `x` with qs_save
qs_save(x, file_qs2)
# convert the file to RDS
qs_to_rds(input_file = file_qs2, output_file = file_rds)
# read `x` back in with `readRDS`
xrds <- readRDS(file_rds)
stopifnot(identical(x, xrds))The qs2 format saves an internal checksum. This can be used to test
for file corruption before deserialization via the validate_checksum
parameter, but has a minor performance penalty.
qs_save(data, "myfile.qs2")
data <- qs_read("myfile.qs2", validate_checksum = TRUE)The package exposes the ZSTD compression library for both in memory data and file workflows.
Use these functions when you already have raw vectors in memory and want direct control of compression.
x <- serialize(mtcars, connection = NULL)
xz <- zstd_compress_raw(x, compress_level = 3)
x2 <- zstd_decompress_raw(xz)
stopifnot(identical(x, x2))These functions mirror typical file compression tools and keep the workflow simple when you want explicit input and output files.
infile <- tempfile()
writeBin(as.raw(1:5), infile)
zfile <- tempfile(fileext = ".zst")
zstd_compress_file(infile, zfile, compress_level = 1)
outfile <- tempfile()
zstd_decompress_file(zfile, outfile)
stopifnot(identical(readBin(infile, "raw", 5), readBin(outfile, "raw", 5)))These generic wrappers substitute a zstd compressed file for a normal file path, so you can add zstd compression support to existing functions for reading and writing data.
# library(data.table)
save_file <- tempfile(fileext = ".csv.zst")
# write out zstd compressed table
zstd_out(data.table::fwrite, mtcars, file = save_file)
# read in zstd compressed table
dt <- zstd_in(data.table::fread, file = save_file)The package also introduces the qdata format which has its own
serialization layout and works with only data types (vectors, lists,
data frames, matrices).
It will replace internal types (functions, promises, external pointers,
environments, objects) with NULL. The qdata format differs from the
qs2 format in that it is not general, but is more performant.
Please use qdata or qd as the file extension.
qd_save(data, "myfile.qdata")
data <- qd_read("myfile.qdata")There is a use_alt_rep parameter that is intended to improve
performance.
For the upcoming CRAN release, qdata does not use ALTREP but should be restored in the release after.
Serialization functions can be accessed in compiled code. Below is an example using Rcpp.
// [[Rcpp::depends(qs2)]]
#include <Rcpp.h>
#include "qs2_external.h"
using namespace Rcpp;
// [[Rcpp::export]]
SEXP test_qs_serialize(SEXP x) {
SEXP buffer = qs_serialize(x, 10, true, 4);
return qs_deserialize(buffer, false, 4);
}
// [[Rcpp::export]]
SEXP test_qd_serialize(SEXP x) {
SEXP buffer = qd_serialize(x, 10, true, true, 4);
return qd_deserialize(buffer, false, false, 4);
}
// [[Rcpp::export]]
SEXP test_qs_save(SEXP x, const std::string& path) {
qs_save(x, path, 10, true, 4);
return qs_read(path, false, 4);
}
// [[Rcpp::export]]
SEXP test_qd_save(SEXP x, const std::string& path) {
qd_save(x, path, 10, true, true, 4);
return qd_read(path, false, false, 4);
}
/*** R
x <- runif(1e7)
stopifnot(identical(test_qs_serialize(x), x))
stopifnot(identical(test_qd_serialize(x), x))
stopifnot(identical(test_qs_save(x, tempfile(fileext = ".qs2")), x))
stopifnot(identical(test_qd_save(x, tempfile(fileext = ".qd")), x))
*/You can serialize and de-serialize qdata format outside the R API.
Functions for doing so are exported in qdata_cpp_external.h.
You can also compile these independently in inst/include/qdata-cpp and
include in a standalone C++ project.
// [[Rcpp::depends(qs2)]]
#include <Rcpp.h>
#include "qdata_cpp_external.h"
// [[Rcpp::export]]
Rcpp::IntegerVector qdata_ext_roundtrip() {
std::vector<std::int32_t> x{1, 2, 3, 4};
auto bytes = qdata_ext::serialize(x);
qdata_ext::object out = qdata_ext::deserialize(bytes);
const auto& ints = qdata_ext::get<qdata_ext::integer_vector>(out).values;
return Rcpp::IntegerVector(ints.begin(), ints.end());
}
// [[Rcpp::export]]
Rcpp::IntegerVector qdata_ext_file_roundtrip(const std::string& path) {
std::vector<std::int32_t> x{1, 2, 3, 4};
qdata_ext::save(path, x);
qdata_ext::object out = qdata_ext::read(path);
const auto& ints = qdata_ext::get<qdata_ext::integer_vector>(out).values;
return Rcpp::IntegerVector(ints.begin(), ints.end());
}
/*** R
stopifnot(identical(qdata_ext_roundtrip(), 1:4))
stopifnot(identical(qdata_ext_file_roundtrip(tempfile(fileext = ".qdata")), 1:4))
*/The following global options control the behavior of the qs2
functions. These global options can be queried or modified using qopt
function.
-
compress_level
The default compression level used when compressing data.
Default:3L -
shuffle
A logical flag indicating whether to allow byte shuffling during compression.
Default:TRUE -
nthreads
The number of threads used for compression and decompression.
Default:1L -
validate_checksum
A logical flag indicating whether to validate the stored checksum when reading data.
Default:FALSE -
warn_unsupported_types
Forqd_save, a logical flag indicating whether to warn when saving an object with unsupported types.
Default:TRUE -
use_alt_rep
Forqd_readandqd_deserialize, a logical flag requesting ALTREP string reads. This option is temporarily disabled; ifTRUE, qs2 warns and falls back to ordinary character vectors.
Default:FALSE