Skip to content

Commit

Permalink
chore(docs): document that elements in ArrayBytes must be in C-cont…
Browse files Browse the repository at this point in the history
…iguous order
  • Loading branch information
LDeakin committed Jan 16, 2025
1 parent da00798 commit 2f5a665
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 4 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Document that elements in `ArrayBytes` must be in C-contiguous order

### Changed
- Use new language/library features added between Rust 1.78-1.82 (internal)
- Cleanup root docs and README removing ZEPs table and ecosystem table
Expand Down
23 changes: 19 additions & 4 deletions zarrs/src/array/array_bytes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,30 @@ use crate::{
use super::{codec::CodecError, ravel_indices, ArraySize, DataType, FillValue};

/// Array element bytes.
///
/// These can represent:
/// - [`ArrayBytes::Fixed`]: fixed length elements of an array in C-contiguous order,
/// - [`ArrayBytes::Variable`]: variable length elements of an array in C-contiguous order with padding permitted,
/// - Encoded array bytes after an array to bytes or bytes to bytes codecs.
pub type RawBytes<'a> = Cow<'a, [u8]>;

/// Array element byte offsets.
pub type RawBytesOffsets<'a> = Cow<'a, [usize]>;
///
/// These must be monotonically increasing. See [`ArrayBytes::Variable`].
pub type RawBytesOffsets<'a> = Cow<'a, [usize]>; // FIXME: Switch to a validated newtype in zarrs 0.20

/// Fixed or variable length array bytes.
///
/// Offsets are [`None`] if bytes are composed of fixed size data types.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum ArrayBytes<'a> {
/// Bytes for a fixed length array.
///
/// These represent elements in C-contiguous order (i.e. row-major order) where the last dimension varies the fastest.
Fixed(RawBytes<'a>),
/// Bytes and element byte offsets for a variable length array.
///
/// The bytes and offsets are modeled on the [Apache Arrow Variable-size Binary Layout](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout).
/// - The offsets buffer contains length + 1 ~~signed integers (either 32-bit or 64-bit, depending on the data type)~~ usize integers.
/// - Offsets must be monotonically increasing, that is `offsets[j+1] >= offsets[j]` for `0 <= j < length`, even for null slots. Thus, the bytes represent C-contiguous elements with padding permitted.
Variable(RawBytes<'a>, RawBytesOffsets<'a>),
}

Expand All @@ -39,18 +50,22 @@ pub enum ArrayBytesError {

impl<'a> ArrayBytes<'a> {
/// Create a new fixed length array bytes from `bytes`.
///
/// `bytes` must be C-contiguous.
pub fn new_flen(bytes: impl Into<RawBytes<'a>>) -> Self {
Self::Fixed(bytes.into())
}

/// Create a new variable length array bytes from `bytes` and `offsets`.
pub fn new_vlen(
bytes: impl Into<RawBytes<'a>>,
offsets: impl Into<RawBytesOffsets<'a>>,
offsets: impl Into<RawBytesOffsets<'a>>, // FIXME: TryInto
) -> Self {
Self::Variable(bytes.into(), offsets.into())
}

// TODO: new_vlen_unchecked

/// Create a new [`ArrayBytes`] with `num_elements` composed entirely of the `fill_value`.
///
/// # Panics
Expand Down

0 comments on commit 2f5a665

Please sign in to comment.