Contents
Guiding principles
- performant. We focus on speed & memory efficiency. This results in Rarr being among the most performant Zarr implementations. This is ensured by the use of an extensive continuous benchmarking suite. Performance critical steps are written in C.
- maintainable and extensible. Additional codecs can easily be supported as the entire codec codebase is decoupled from the rest of the codebase.
Scope
We aim for full support of the Zarr specification.
There is currently no clear decision process regarding support for Zarr extensions. Please reach out if you have a specific use case that relies on a Zarr extension.
Zarr version
Rarr is a “Zarr version 3 first” implementation.
There is full support for both version 2 and version 3 Zarr arrays but:
- the package API is modelled on the version 3 specification
- we backport some backward compatible feature from version 3 to version 2. For example, the
dimension_names field is supported for both version 2 and version 3 arrays. This is neither strictly defined nor forbidden in the version 2 specification, but it is a feature that we have chosen to support in both versions for consistency.
- if we ever had to make a decision on a tradeoff (e.g., performance tradeoff) between version 2 and version 3, we would prioritise version 3.
Functional programming and API design
- Reading and writing Zarr arrays should be as easy as reading and writing
.csv files. In other words, writing read_zarr_array("my_array.zarr") should be enough to read a Zarr array, and writing write_zarr_array(my_array, "my_array.zarr") should be enough to write a Zarr array. This has several consequences:
- it is not necessary to explicitly manipulate custom objects for Zarr stores or groups. Passing a file path as a string should work out of the box.
- when writing, we provide sensible defaults for:
- data type (derived from
storage.mode of the input array)
- chunk size (TBD)
- compression (Zstd with default compression level, for a good balance between speed and compression ratio)
- dimension names (taken from the dimension names of the input array, if they exist)