Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams - keep track of first offset and timestamp per segment #817

Merged
merged 14 commits into from
Nov 13, 2024

Conversation

viktorerlingsson
Copy link
Member

@viktorerlingsson viktorerlingsson commented Oct 25, 2024

WHAT is this pull request doing?

Saves a hash with the first offset for each segment, greatly reducing the time (and memory used) to find a message by offset.
Does the same for timestamps.

Tests done by getting a message by a random offset 10 times in a stream with 10M messages.

main:

Getting 10 messages by random offsets...
Average time: 660.8 milliseconds
Maximum resident set size (kbytes): 824424

this branch:

Getting 10 messages by random offsets...
Average time: 52.9 milliseconds
Maximum resident set size (kbytes): 23692

HOW can this pull request be tested?

Existing specs in stream_queue_spec.cr should cover this pretty well.

@spuun
Copy link
Member

spuun commented Oct 25, 2024

Cool!
Maybe a sorted array could perform even better since that would enable binary search. Store tuples of {UInt32, Int64}.
Which makes me wonder... Will the keys in the Hash always be sorted? I.e. are they always inserted sorted?

@viktorerlingsson
Copy link
Member Author

Maybe a sorted array could perform even better since that would enable binary search. Store tuples of {UInt32, Int64}. Which makes me wonder... Will the keys in the Hash always be sorted? I.e. are they always inserted sorted?

Yeah, probably! I don't think it will make a huge difference, since the number of segments won't be super big. But can give it a try 👍

Yes, it should always be sorted. We only add larger offsets and delete the smallest.

Copy link
Member

@carlhoerberg carlhoerberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature!

Could theoretically build an index over timestamps too, right?

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved
@viktorerlingsson
Copy link
Member Author

Good feature!

Could theoretically build an index over timestamps too, right?

Yes, should be possible to do pretty much the same way for timestamps I think. I'll give it a try!

@carlhoerberg carlhoerberg added this to the v2.1 milestone Nov 1, 2024
@viktorerlingsson viktorerlingsson force-pushed the stream-index-offset-per-segment branch from be026db to eb8bcb4 Compare November 4, 2024 15:38
@viktorerlingsson viktorerlingsson changed the title Streams - keep track of first offset per segment Streams - keep track of first offset and timestamp per segment Nov 4, 2024
@viktorerlingsson viktorerlingsson marked this pull request as ready for review November 12, 2024 15:35
@viktorerlingsson viktorerlingsson requested a review from a team as a code owner November 12, 2024 15:35
@viktorerlingsson viktorerlingsson force-pushed the stream-index-offset-per-segment branch from c9cf24a to a8883b1 Compare November 13, 2024 08:21
@viktorerlingsson viktorerlingsson merged commit 1b9678e into main Nov 13, 2024
22 of 25 checks passed
@viktorerlingsson viktorerlingsson deleted the stream-index-offset-per-segment branch November 13, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants