-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more BTreeMap like apis to LiteMap #6068
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've been missing these! A couple small comments
I'm all in favor of I've pushed back for a long time on a generic
We have |
I agree around |
If someone wants to use If we did add |
I agree that extend is tricky, but I was hoping that we could add a sort + dedup-based version to curb the worst case while allowing good drop-in experience.
Let's add that, sounds good.
I'm afraid I could be missing something obvious, but I don't think there is a sort step right now and it's N² worst case. |
I see, I thought FromIterator was icu4x/utils/litemap/src/store/vec_impl.rs Line 110 in eaaf49f
|
I fixed the quadratic behavior and implemented
Edit-2: I figured out. With some scratch space we can avoid all quadratic behavior. I think this is a good direction again. Before
After
|
Once this PR is sorted, I'll open another PR to improve the serde deserialization and avoid quadratic worst case whenever possible. As it stands it's a potential attack vector. |
} else { | ||
(size_hint_lower + 1) / 2 | ||
}; | ||
self.reserve_exact(reserve); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
This function is a bit hard to follow, and I'm not convinced it is always correct, especially the things involving sorted_len
.
We handle some special cases (such as the append-items already being in order) but not others (such as when they are all less than the target).
How about you just append everything to the end and have a single call to self.sort_by
, and assume that self.sort_by
will have a fast path if the things are in order. Easy to review, and the worst case is still N log N
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about you just append everything to the end and have a single call to self.sort_by, .... Easy to review, and the worst case is still N log N.
That was my thinking in 5bab562 too, but I realized that the required deduplication step (after sort) had quadratic behavior. That became clear when I added bench_extend_rand_dups
.
This function is a bit hard to follow, and I'm not convinced it is always correct, especially the things involving sorted_len.
I can try to improve clarity and maybe an ASCII illustration as well.
Add BTreeMap-like APIs to LiteMap to get closer to the std lib BTreeMap, which has familiarity benefits and allows easier integration of LiteMap into existing projects. This is a follow up of #5894
lm_extend
to the traits. The vec implementation attempts have O(N) performance on the best case and an asymptotic worst case of O(NLog(N)), effectively avoiding quadratic worst cases. The end goal is to use this in the deserialization code path, which will also come in a follow-up PR.Extend
forLiteMap
. This is based onlm_extend
.lm_extend
try_get_or_insert
exists, the Entry API is a more familiar API with good performance, as it may avoid a potentially wasted second binary search.