Skip to content

Commit

Permalink
Fixes and verifies issue 42 (#44)
Browse files Browse the repository at this point in the history
* Fixes and verifies issue 42

* Improving documentation.
  • Loading branch information
lemire authored Aug 6, 2022
1 parent 18142e5 commit 17cb90c
Show file tree
Hide file tree
Showing 5 changed files with 2,502 additions and 17 deletions.
2 changes: 1 addition & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ Daniel Lemire
Kendall Willets
Alexander Gallego
@aqrit
Vladimir Kazanov
Vladimir Kazanov
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ expects a Linux-like system. We have a CMake build.

# Users

This library is used by
This library is used by

* [UpscaleDB](https://github.com/cruppstahl/upscaledb),
* [UpscaleDB](https://github.com/cruppstahl/upscaledb),
* Redis' [RediSearch](https://github.com/RedisLabsModules/RediSearch),
* [Facebook Thrift](https://github.com/facebook/fbthrift),
* [Trinity Information Retrieval framework](https://github.com/phaistos-networks/Trinity).
Expand Down Expand Up @@ -73,8 +73,10 @@ size_t compsize = streamvbyte_delta_encode(datain, N, compressedbuffer,0); // en
streamvbyte_delta_decode(compressedbuffer, recovdata, N,0); // decoding (fast)
```
You have to know how many integers were coded when you decompress. You can store this
information along with the compressed stream.
information along with the compressed stream. The

During decoding, the library may read up to `STREAMVBYTE_PADDING` extra bytes
from the input buffer (these bytes are read but never used).

Signed integers
-----------------
Expand Down
14 changes: 12 additions & 2 deletions include/streamvbyte.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
extern "C" {
#endif

#define STREAMVBYTE_PADDING 16

// Encode an array of a given length read from in to bout in varint format.
// Returns the number of bytes written.
// The number of values being stored (length) is not encoded in the compressed stream,
Expand All @@ -28,17 +30,21 @@ size_t streamvbyte_encode_0124(const uint32_t *in, uint32_t length, uint8_t *out
// return the maximum number of compressed bytes given length input integers
// in the worst case we overestimate data bytes required by four, see below
// for a function you can run upfront over your data to compute allocations
// It includes the STREAMVBYTE_PADDING bytes.
static inline size_t streamvbyte_max_compressedbytes(const uint32_t length) {
// number of control bytes:
size_t cb = (length + 3) / 4;
// maximum number of control bytes:
size_t db = (size_t) length * sizeof(uint32_t);
return cb + db;
return cb + db + STREAMVBYTE_PADDING;
}

// return the exact number of compressed bytes given length input integers
// runtime in O(n) wrt. in; use streamvbyte_max_compressedbyte if you
// care about speed more than potentially over-allocating memory
// Our decoding functions may read (but not use) STREAMVBYTE_PADDING extra bytes beyond
// the compressed data: the user needs to ensure that this region is allocated, and it
// is not included by streamvbyte_compressedbytes.
static inline size_t streamvbyte_compressedbytes(const uint32_t *in, uint32_t length) {
// number of control bytes:
size_t cb = (length + 3) / 4;
Expand All @@ -58,6 +64,9 @@ static inline size_t streamvbyte_compressedbytes(const uint32_t *in, uint32_t le
// return the exact number of compressed bytes given length input integers
// runtime in O(n) wrt. in; use streamvbyte_max_compressedbyte if you
// care about speed more than potentially over-allocating memory
// Our decoding functions may read (but not use) STREAMVBYTE_PADDING extra bytes beyond
// the compressed data: the user needs to ensure that this region is allocated, and it
// is not included by streamvbyte_compressedbytes.
static inline size_t streamvbyte_compressedbytes_0124(const uint32_t *in, uint32_t length) {
// number of control bytes:
size_t cb = (length + 3) / 4;
Expand All @@ -76,7 +85,8 @@ static inline size_t streamvbyte_compressedbytes_0124(const uint32_t *in, uint32


// Read "length" 32-bit integers in varint format from in, storing the result in out.
// Returns the number of bytes read.
// Returns the number of bytes read. We may read up to STREAMVBYTE_PADDING extra bytes
// from the input buffer (these bytes are read but never used).
// The caller is responsible for knowing how many integers ("length") are to be read:
// this information ought to be stored somehow.
// There is no alignment requirement on the "in" pointer.
Expand Down
3 changes: 2 additions & 1 deletion include/streamvbytedelta.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ extern "C" {
size_t streamvbyte_delta_encode(const uint32_t *in, uint32_t length, uint8_t *out, uint32_t prev);

// Read "length" 32-bit integers in StreamVByte format from in, storing the result in out.
// Returns the number of bytes read.
// Returns the number of bytes read. We may read up to STREAMVBYTE_PADDING extra bytes
// from the input buffer (these bytes are read but never used).
// The caller is responsible for knowing how many integers ("length") are to be read:
// this information ought to be stored somehow.
// There is no alignment requirement on the "in" pointer.
Expand Down
Loading

0 comments on commit 17cb90c

Please sign in to comment.