Add more BTreeMap like apis to LiteMap #6068

arthurprs · 2025-02-05T00:21:29Z

Add BTreeMap-like APIs to LiteMap to get closer to the std lib BTreeMap, which has familiarity benefits and allows easier integration of LiteMap into existing projects. This is a follow up of #5894

Add lm_extend to the traits. The vec implementation attempts have O(N) performance on the best case and an asymptotic worst case of O(NLog(N)), effectively avoiding quadratic worst cases. The end goal is to use this in the deserialization code path, which will also come in a follow-up PR.
Implements Extend for LiteMap. This is based on lm_extend.
Prevent quadratic worst case in the existing FromIter implementation by using the newly added lm_extend
Even though try_get_or_insert exists, the Entry API is a more familiar API with good performance, as it may avoid a potentially wasted second binary search.

robertbastian

Thanks, I've been missing these! A couple small comments

utils/litemap/src/map.rs

sffc · 2025-02-05T14:48:12Z

I'm all in favor of Entry APIs

I've pushed back for a long time on a generic extend function because it is a performance footgun; I think it is even worse than O(N^2) in the worst case; could be O(N^2 log N) (because there are N insertions, and each insertion is N log N, with log N for the lookup and N for the array insert).

FromIterator is O(N log N) which is the optimal complexity: it collects things into a Vec and then sorts the Vec, and it uses the same Vec for its eventual storage.

We have extend_from_litemap which does the more efficient O(N) thing since it knows the input is sorted, and I would support adding extend_from_btreemap since we can make a similar assumption there. But I lean toward not having a O(N^2 log N) generic extend function.

Manishearth · 2025-02-05T14:51:18Z

I agree around extend, it could be extend_sorted() which takes an ExactSizeIterator, perhaps, but that's about it.

sffc · 2025-02-05T15:11:32Z

If someone wants to use .extend(collection), they are better off doing .extend_from_litemap(collection.into_iter().collect()). There is an extra allocation, but the asymptotic performance is much faster.

If we did add .extend, I would want it to be implemented like that, I think.

arthurprs · 2025-02-05T21:19:14Z

I agree that extend is tricky, but I was hoping that we could add a sort + dedup-based version to curb the worst case while allowing good drop-in experience.

I agree around extend, it could be extend_sorted() which takes an ExactSizeIterator, perhaps, but that's about it.

Let's add that, sounds good.

FromIterator is O(N log N) which is the optimal complexity: it collects things into a Vec and then sorts the Vec, and it uses the same Vec for its eventual storage.

I'm afraid I could be missing something obvious, but I don't think there is a sort step right now and it's N² worst case.

sffc · 2025-02-05T23:28:39Z

I see, I thought FromIterator was N log N but it isn't currently implemented that way. Maybe it should be.

icu4x/utils/litemap/src/store/vec_impl.rs

Line 110 in eaaf49f

fn lm_sort_from_iter<I: IntoIterator<Item = (K, V)>>(iter: I) -> Self {

arthurprs · 2025-02-05T23:29:35Z

I fixed the quadratic behavior and implemented from_iter and extend as described in the first comment. A sorted fast-path with fallback to sort+dedup.

~~I benchmarked and I think it's a good direction. Let me know if you agree with this extend version otherwise I can take it out.~~

~~Edit: duplications can still lead to quadratic behavior, so I'm going remove extend.~~

Edit-2: I figured out. With some scratch space we can avoid all quadratic behavior. I think this is a good direction again.

Before

litemap/from_iter_rand/small
                        time:   [916.60 ns 941.77 ns 965.82 ns]
litemap/from_iter_rand/large
                        time:   [2.3779 s 2.3863 s 2.3946 s]
litemap/from_iter_sorted/small
                        time:   [84.010 ns 84.084 ns 84.155 ns]
litemap/from_iter_sorted/large
                        time:   [725.31 µs 739.88 µs 755.20 µs]

After

litemap/from_iter_rand/small
                        time:   [814.95 ns 830.50 ns 845.94 ns]
litemap/from_iter_rand/large
                        time:   [42.048 ms 43.138 ms 44.337 ms]
litemap/from_iter_sorted/small
                        time:   [76.707 ns 76.759 ns 76.808 ns]
litemap/from_iter_sorted/large
                        time:   [640.42 µs 652.98 µs 668.38 µs]
litemap/extend_rand/large
                        time:   [108.48 ms 110.88 ms 113.52 ms]
litemap/extend_rand_dups/large
                        time:   [173.80 ms 175.58 ms 177.54 ms]
litemap/extend_from_litemap_rand/large
                        time:   [2.0401 s 2.0488 s 2.0575 s]

arthurprs · 2025-02-06T09:15:58Z

Once this PR is sorted, I'll open another PR to improve the serde deserialization and avoid quadratic worst case whenever possible. As it stands it's a potential attack vector.

sffc · 2025-02-12T13:58:51Z

utils/litemap/src/store/vec_impl.rs

+        } else {
+            (size_hint_lower + 1) / 2
+        };
+        self.reserve_exact(reserve);


Suggestion:

This function is a bit hard to follow, and I'm not convinced it is always correct, especially the things involving sorted_len.

We handle some special cases (such as the append-items already being in order) but not others (such as when they are all less than the target).

How about you just append everything to the end and have a single call to self.sort_by, and assume that self.sort_by will have a fast path if the things are in order. Easy to review, and the worst case is still N log N.

How about you just append everything to the end and have a single call to self.sort_by, .... Easy to review, and the worst case is still N log N.

That was my thinking in 5bab562 too, but I realized that the required deduplication step (after sort) had quadratic behavior. That became clear when I added bench_extend_rand_dups.

This function is a bit hard to follow, and I'm not convinced it is always correct, especially the things involving sorted_len.

I can try to improve clarity and maybe an ASCII illustration as well.

Implement Extend and Entry apis for LiteMap

185ac79

arthurprs requested review from Manishearth and sffc as code owners February 5, 2025 00:21

robertbastian reviewed Feb 5, 2025

View reviewed changes

utils/litemap/src/map.rs Show resolved Hide resolved

utils/litemap/src/map.rs Show resolved Hide resolved

arthurprs changed the title ~~Add/change BTreeMap like apis to LiteMap~~ Add more BTreeMap like apis to LiteMap Feb 5, 2025

Tidy Entry implementation

db43549

Optimize from_iter and extend

5bab562

arthurprs requested a review from a team as a code owner February 5, 2025 23:28

arthurprs mentioned this pull request Feb 7, 2025

Avoid accidental quadratic behavior in LiteMap deserialization #6083

Draft

arthurprs added 2 commits February 8, 2025 19:20

Prevent quadratic cases due to duplications

a5c032a

fix clippy

c952202

sffc reviewed Feb 12, 2025

View reviewed changes

Update comments

2816e0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more BTreeMap like apis to LiteMap #6068

Add more BTreeMap like apis to LiteMap #6068

arthurprs commented Feb 5, 2025 •

edited

Loading

robertbastian left a comment

sffc commented Feb 5, 2025

Manishearth commented Feb 5, 2025

sffc commented Feb 5, 2025 •

edited

Loading

arthurprs commented Feb 5, 2025

sffc commented Feb 5, 2025

arthurprs commented Feb 5, 2025 •

edited

Loading

arthurprs commented Feb 6, 2025 •

edited

Loading

sffc Feb 12, 2025

arthurprs Feb 12, 2025 •

edited

Loading

Add more BTreeMap like apis to LiteMap #6068

Are you sure you want to change the base?

Add more BTreeMap like apis to LiteMap #6068

Conversation

arthurprs commented Feb 5, 2025 • edited Loading

robertbastian left a comment

Choose a reason for hiding this comment

sffc commented Feb 5, 2025

Manishearth commented Feb 5, 2025

sffc commented Feb 5, 2025 • edited Loading

arthurprs commented Feb 5, 2025

sffc commented Feb 5, 2025

arthurprs commented Feb 5, 2025 • edited Loading

arthurprs commented Feb 6, 2025 • edited Loading

sffc Feb 12, 2025

Choose a reason for hiding this comment

arthurprs Feb 12, 2025 • edited Loading

Choose a reason for hiding this comment

arthurprs commented Feb 5, 2025 •

edited

Loading

sffc commented Feb 5, 2025 •

edited

Loading

arthurprs commented Feb 5, 2025 •

edited

Loading

arthurprs commented Feb 6, 2025 •

edited

Loading

arthurprs Feb 12, 2025 •

edited

Loading