Efficient binary SST file format #11

justinethier · 2022-02-04T21:10:39Z

At some point we need a more efficient binary SST file format.

justinethier · 2022-03-04T04:07:00Z

Design considerations:

Create multiple files for each SST
- Manifest contains header information, possibly indicate if a file is scheduled for deletion
- Index file contains sparse set of keys and their location within the file
- Actual SST file contains entries
Header contains sequence number, possibly CRC, anything else?
Entry contains key length, key contents, data length, data contents, deleted flag
- data can be set to 0 length as an optimization when it is deleted
All length and seq number fields defined as 64-bit integers. or lower for length??
Can use single byte for deleted flag
Instead of caching a whole SST file, can just cache a range of the file specified by an index
- We need to load this anyway when searching for a key
- Allows us to cache a much smaller region of the file, may scale better
- Still need to time-out the cache. May want to have configurable caching behavior (criteria to cache, TTL, etc)
Want cmd tools for dealing with binary data.
- At a minimum want a tool to convert from a binary to text (json?) format to inspect data
- If we are going to do that it would be handy to have a converter from that text format back to binary, to allow any changes to be made in a straightforward way

justinethier · 2022-03-13T14:43:40Z

An additional optimization is to gzip every segment of the SST file. That is, the values between one sparse index and another. Can then read the whole block into memory, decompress, and cache it. Saves I/O and space on disk.

justinethier changed the title ~~Efficient SST file format~~ Efficient binary SST file format Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient binary SST file format #11

Efficient binary SST file format #11

justinethier commented Feb 4, 2022 •

edited

Loading

justinethier commented Mar 4, 2022 •

edited

Loading

justinethier commented Mar 13, 2022

Efficient binary SST file format #11

Efficient binary SST file format #11

Comments

justinethier commented Feb 4, 2022 • edited Loading

justinethier commented Mar 4, 2022 • edited Loading

justinethier commented Mar 13, 2022

justinethier commented Feb 4, 2022 •

edited

Loading

justinethier commented Mar 4, 2022 •

edited

Loading