Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use FxHasher in places where we don't need DDoS resistance #2342

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Jan 10, 2025

I think this may be worthwhile. The cargo benches don't consistently show a benefit, but the loopback transfers on the bencher machine are faster, e.g.,

neqo 	neqo 	cubic 	on 	1504 	495.9 ± 96.2 	426.6 	712.7

without this PR but

neqo 	neqo 	cubic 	on 	1504 	429.1 ± 9.6 	415.4 	442.6

with it.

(I'll see if I can improve CI so that we also see the differences to main for the table results.)

@larseggert larseggert marked this pull request as ready for review January 10, 2025 14:12
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.38%. Comparing base (0e41954) to head (bf98a60).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2342      +/-   ##
==========================================
- Coverage   95.39%   95.38%   -0.02%     
==========================================
  Files         115      115              
  Lines       36982    36982              
  Branches    36982    36982              
==========================================
- Hits        35280    35275       -5     
- Misses       1696     1701       +5     
  Partials        6        6              
Components Coverage Δ
neqo-common 97.17% <ø> (ø)
neqo-crypto 90.07% <ø> (ø)
neqo-http3 94.50% <100.00%> (ø)
neqo-qpack 96.28% <100.00%> (ø)
neqo-transport 96.22% <ø> (-0.03%) ⬇️
neqo-udp 94.70% <ø> (-0.59%) ⬇️

Copy link

github-actions bot commented Jan 10, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to e660b0b.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Jan 10, 2025

Benchmark results

Performance differences relative to e660b0b.

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.756 µs 11.793 µs 11.836 µs]
       change: [-0.3060% +0.1678% +0.7606%] (p = 0.52 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) low severe
1 (1.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [2.8911 ms 2.9004 ms 2.9113 ms]
       change: [-0.6442% -0.1316% +0.3790%] (p = 0.62 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.622 µs 19.668 µs 19.722 µs]
       change: [-0.4100% +0.0652% +0.5843%] (p = 0.80 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low severe
14 (14.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [4.7011 ms 4.7125 ms 4.7253 ms]
       change: [-0.4212% -0.0443% +0.3600%] (p = 0.82 > 0.05)

Found 20 outliers among 100 measurements (20.00%)
5 (5.00%) high mild
15 (15.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.2018 µs 6.2278 µs 6.2610 µs]
       change: [-0.4002% +0.2808% +0.9056%] (p = 0.44 > 0.05)

Found 19 outliers among 100 measurements (19.00%)
3 (3.00%) high mild
16 (16.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [2.1052 ms 2.1121 ms 2.1191 ms]
       change: [-0.5187% -0.0585% +0.4027%] (p = 0.87 > 0.05)

Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high severe

1 streams of 1 bytes/multistream: No change in performance detected.
       time:   [67.527 µs 68.998 µs 70.915 µs]
       change: [-1.2105% +0.9712% +3.7338%] (p = 0.51 > 0.05)

Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe

1000 streams of 1 bytes/multistream: 💚 Performance has improved.
       time:   [23.737 ms 23.770 ms 23.803 ms]
       change: [-2.8996% -2.7076% -2.5183%] (p = 0.00 < 0.05)
10000 streams of 1 bytes/multistream: Change within noise threshold.
       time:   [1.6324 s 1.6339 s 1.6353 s]
       change: [-0.8622% -0.7205% -0.5844%] (p = 0.00 < 0.05)

Found 31 outliers among 100 measurements (31.00%)
11 (11.00%) low severe
9 (9.00%) low mild
7 (7.00%) high mild
4 (4.00%) high severe

1 streams of 1000 bytes/multistream: Change within noise threshold.
       time:   [68.102 µs 68.666 µs 69.690 µs]
       change: [-2.6944% -1.8302% -0.2751%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

100 streams of 1000 bytes/multistream: 💚 Performance has improved.
       time:   [3.1618 ms 3.1696 ms 3.1775 ms]
       change: [-2.8253% -2.5176% -2.1903%] (p = 0.00 < 0.05)
1000 streams of 1000 bytes/multistream: Change within noise threshold.
       time:   [137.38 ms 137.46 ms 137.53 ms]
       change: [-0.4528% -0.3793% -0.3079%] (p = 0.00 < 0.05)
coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [92.063 ns 92.367 ns 92.677 ns]
       change: [-0.4999% -0.1185% +0.2573%] (p = 0.54 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
10 (10.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [109.52 ns 109.81 ns 110.13 ns]
       change: [-0.7449% -0.3318% +0.0674%] (p = 0.11 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [109.66 ns 110.17 ns 110.77 ns]
       change: [-0.1270% +0.4081% +0.9619%] (p = 0.14 > 0.05)

Found 16 outliers among 100 measurements (16.00%)
7 (7.00%) low severe
1 (1.00%) high mild
8 (8.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [91.215 ns 96.824 ns 109.27 ns]
       change: [-1.0794% +1.8558% +6.6420%] (p = 0.53 > 0.05)

Found 16 outliers among 100 measurements (16.00%)
4 (4.00%) high mild
12 (12.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.
       time:   [115.36 ms 115.41 ms 115.46 ms]
       change: [-0.0557% -0.0009% +0.0566%] (p = 0.97 > 0.05)

Found 21 outliers among 100 measurements (21.00%)
1 (1.00%) low severe
7 (7.00%) low mild
11 (11.00%) high mild
2 (2.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [5.3297 µs 5.5149 µs 5.7137 µs]
       change: [-3.1167% +2.0582% +8.2112%] (p = 0.53 > 0.05)

Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [34.105 ms 34.169 ms 34.233 ms]
       change: [-3.3513% -3.1020% -2.8465%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

transfer/pacing-true/varying-seeds: Change within noise threshold.
       time:   [34.691 ms 34.739 ms 34.787 ms]
       change: [-1.4620% -1.2383% -1.0103%] (p = 0.00 < 0.05)
transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [34.716 ms 34.778 ms 34.840 ms]
       change: [-2.1681% -1.9396% -1.7114%] (p = 0.00 < 0.05)
transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [35.005 ms 35.056 ms 35.108 ms]
       change: [-1.7862% -1.5911% -1.3892%] (p = 0.00 < 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.
       time:   [2.1806 s 2.1880 s 2.1954 s]
       thrpt:  [45.551 MiB/s 45.704 MiB/s 45.858 MiB/s]
change:
       time:   [-1.4875% -1.0400% -0.6100%] (p = 0.00 < 0.05)
       thrpt:  [+0.6138% +1.0509% +1.5100%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.
       time:   [379.27 ms 381.31 ms 383.37 ms]
       thrpt:  [26.084 Kelem/s 26.226 Kelem/s 26.366 Kelem/s]
change:
       time:   [-3.2163% -2.5284% -1.8269%] (p = 0.00 < 0.05)
       thrpt:  [+1.8609% +2.5940% +3.3231%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.
       time:   [28.336 ms 29.084 ms 29.837 ms]
       thrpt:  [33.515  elem/s 34.383  elem/s 35.291  elem/s]
change:
       time:   [+1.8457% +5.5653% +8.9860%] (p = 0.00 < 0.05)
       thrpt:  [-8.2451% -5.2719% -1.8122%]
1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: Change within noise threshold.
       time:   [3.1405 s 3.1635 s 3.1880 s]
       thrpt:  [31.368 MiB/s 31.611 MiB/s 31.842 MiB/s]
change:
       time:   [-2.0950% -1.1486% -0.1795%] (p = 0.02 < 0.05)
       thrpt:  [+0.1798% +1.1620% +2.1399%]

Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe

Client/server transfer results

Performance differences relative to e660b0b.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client Server CC Pacing Mean ± σ Min Max Δ main Δ main
neqo neqo reno on 497.2 ± 65.9 451.1 728.7 💚 -63.1 -3.0%
neqo neqo reno 562.0 ± 180.7 445.8 1182.1 -19.2 -0.8%
neqo neqo cubic on 508.8 ± 66.8 450.6 751.3 💚 -39.9 -1.9%
neqo neqo cubic 503.4 ± 35.9 454.9 584.5 -15.1 -0.7%
google neqo reno on 902.4 ± 98.8 648.0 994.9 -11.3 -0.3%
google neqo reno 899.1 ± 96.9 655.8 1007.7 -6.9 -0.2%
google neqo cubic on 910.8 ± 96.9 668.7 1070.1 12.6 0.3%
google neqo cubic 899.5 ± 102.7 657.3 1095.5 4.6 0.1%
google google 576.7 ± 78.2 531.8 885.4 26.9 1.2%
neqo msquic reno on 231.9 ± 33.6 202.1 326.9 -15.8 -1.6%
neqo msquic reno 238.9 ± 55.6 197.3 436.3 1.2 0.1%
neqo msquic cubic on 244.2 ± 81.7 202.5 627.4 16.4 1.7%
neqo msquic cubic 225.2 ± 36.6 201.6 407.4 -1.9 -0.2%
msquic msquic 121.6 ± 31.1 98.1 262.7 -11.3 -2.2%

⬇️ Download logs

@larseggert larseggert marked this pull request as draft February 4, 2025 15:10
@larseggert larseggert marked this pull request as ready for review February 4, 2025 16:08
Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 in general.

That said, I would prefer only replacing std::collectionsHash* where ever it proofs beneficial, e.g. not in unit tests.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spotted two places where EnumMap could give us bigger wins.

I think that your performance gains largely derive from the changes to the client and server code. There, the security risk is limited (we're not using this server in real deployments).

Still, you should review the changes for security risk. This hasher could expose us to DoS if the hashed values are controlled by an adversary. I've checked the usage in our server code, which is fine because attackers don't get to control memory allocations (we use pointer values for the hash). Still, that makes me wonder whether we should be using Pin.

@larseggert
Copy link
Collaborator Author

larseggert commented Feb 6, 2025

@martinthomson thanks for the analysis. My plan is to add some benches first in another PR. I'll add some for those instances where you suggest to look into EnumMap as well.

Even if some of the macro benefits come from speeding up the demo client and server code, it's IMO still worth doing, since eliminating those overheads makes it easier to spot other bottlenecks.

About security, I didn't do much of an analysis, but I think the main use of this insecure hasher would be when looking up items (streams, unacked chunks) that while under the control of an attacker are also quite limited in what valid values are that wouldn't immediately cause a connection clause.

@martinthomson
Copy link
Member

I definitely agree with the point about removing the overheads from our toy code as much as possible. This seems like a pretty substantial win there, so it's worth doing. I doubt that my EnumMap suggestions will have a major impact, but the change did highlight the possibility (and it's not that much typing to switch over).

@larseggert
Copy link
Collaborator Author

I think the EnumMap work should be factored out another PR, it will cause a bunch of changes throughout.

@larseggert larseggert changed the title feat: Try FxHasher to see if it makes a difference feat: Use FxHasher in places where we don't need DDoS resistance Feb 10, 2025
@mxinden
Copy link
Collaborator

mxinden commented Feb 12, 2025

I've checked the usage in our server code, which is fine because attackers don't get to control memory allocations (we use pointer values for the hash). Still, that makes me wonder whether we should be using Pin.

Good point. Though before we introduce the complexity of Pin, we might find a simple way around hashing the pointer values in the first place.

@martinthomson
Copy link
Member

Though before we introduce the complexity of Pin, we might find a simple way around hashing the pointer values in the first place.

Definitely the right question to be asking. I think that it might be possible to use the first connection ID as a key for this sort of thing, but we don't tend to keep that around today, once we stop using it. Everything else -- as far as I know -- is ephemeral and therefore not suitable.

@larseggert larseggert marked this pull request as draft February 15, 2025 15:34
@larseggert
Copy link
Collaborator Author

larseggert commented Feb 15, 2025

I'm doing a benchmark in #2444 to quantify the benefits first. (It's not going well, a lot of variation run-to-run for some reason.)

@mxinden
Copy link
Collaborator

mxinden commented Feb 17, 2025

a lot of variation run-to-run for some reason

That is counterintuitive for me, given that it uses test-fixtures and thus does no IO via the OS. Let me know if you want me to look into it.

@larseggert
Copy link
Collaborator Author

larseggert commented Feb 17, 2025

I wonder if it's the CPU scheduler and frequency control on my Mac. Bencher seems much more stable.

@mxinden
Copy link
Collaborator

mxinden commented Feb 17, 2025

For what it is worth, here is #2444 on my machine:

➜  neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench

Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.1966 s (2400 
1 streams of 1 bytes/multistream
                        time:   [31.399 µs 31.555 µs 31.731 µs]
                        change: [-12.172% -10.263% -8.3468%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2030 s (40
1000 streams of 1 bytes/multistream
                        time:   [13.088 ms 13.117 ms 13.151 ms]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.1s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.111 s (1
10000 streams of 1 bytes/multistream
                        time:   [876.43 ms 882.16 ms 888.29 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.0982 s (22
1 streams of 1000 bytes/multistream
                        time:   [33.435 µs 33.884 µs 34.409 µs]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1397 s (
100 streams of 1000 bytes/multistream
                        time:   [1.5683 ms 1.5823 ms 1.5968 ms]

Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, or reduce sample count to 60.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 7.2433 s 
1000 streams of 1000 bytes/multistream
                        time:   [66.837 ms 67.100 ms 67.391 ms]
Found 20 outliers among 100 measurements (20.00%)
  5 (5.00%) high mild
  15 (15.00%) high severe

➜  neqo-http3 git:(test-streams-bench) ✗ cat /proc/cpuinfo     
                               
model name      : AMD Ryzen 7 7840U w/ Radeon  780M Graphics

I don't see much deviation. Am I running the wrong version @larseggert?

@larseggert
Copy link
Collaborator Author

Can you run it again and see if there are changes run to run? That is where I see random improvements or regressions.

@mxinden
Copy link
Collaborator

mxinden commented Feb 18, 2025

Here are two more runs with vanilla #2444. No significant deviations. Note that I am not running your optimizations in this pull request.

➜  neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.0811 s 
1 streams of 1 bytes/multistream
                        time:   [31.514 µs 31.727 µs 32.013 µs]
                        change: [-0.3795% +0.5461% +1.6585%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2106
1000 streams of 1 bytes/multistream
                        time:   [13.032 ms 13.066 ms 13.104 ms]
                        change: [-0.7614% -0.3884% -0.0397%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  9 (9.00%) high severe

Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.3s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.26
10000 streams of 1 bytes/multistream
                        time:   [850.20 ms 852.13 ms 853.94 ms]
                        change: [-4.1112% -3.4050% -2.7470%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high mild

Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1258
1 streams of 1000 bytes/multistream
                        time:   [32.380 µs 32.615 µs 32.914 µs]
                        change: [-5.3650% -3.7472% -2.1786%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.16
100 streams of 1000 bytes/multistream
                        time:   [1.4970 ms 1.5041 ms 1.5121 ms]
                        change: [-5.9242% -4.9438% -3.9764%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, or reduce sample count to 70.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 6.9
1000 streams of 1000 bytes/multistream
                        time:   [66.039 ms 66.255 ms 66.489 ms]
                        change: [-1.7872% -1.2586% -0.7446%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 21 outliers among 100 measurements (21.00%)
  18 (18.00%) high mild
  3 (3.00%) high severe
➜  neqo-http3 git:(test-streams-bench) ✗ cargo bench --features bench
Benchmarking 1 streams of 1 bytes/multistream: Collecting 100 samples in estimated 5.0222 s 
1 streams of 1 bytes/multistream
                        time:   [31.196 µs 31.566 µs 32.008 µs]
                        change: [-1.9923% -0.5099% +1.0965%] (p = 0.52 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Benchmarking 1000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 6.2479
1000 streams of 1 bytes/multistream
                        time:   [12.863 ms 12.919 ms 12.980 ms]
                        change: [-1.6309% -1.1270% -0.5695%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
  10 (10.00%) high mild
  9 (9.00%) high severe

Benchmarking 10000 streams of 1 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 87.5s, or reduce sample count to 10.
Benchmarking 10000 streams of 1 bytes/multistream: Collecting 100 samples in estimated 87.46
10000 streams of 1 bytes/multistream
                        time:   [862.91 ms 864.67 ms 866.53 ms]
                        change: [+1.1571% +1.4717% +1.7784%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking 1 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.1478
1 streams of 1000 bytes/multistream
                        time:   [32.551 µs 32.892 µs 33.283 µs]
                        change: [-0.5530% +0.8511% +2.2636%] (p = 0.24 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Benchmarking 100 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 5.22
100 streams of 1000 bytes/multistream
                        time:   [1.5114 ms 1.5174 ms 1.5245 ms]
                        change: [+0.2271% +0.8880% +1.5818%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

Benchmarking 1000 streams of 1000 bytes/multistream: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.0s, or reduce sample count to 70.
Benchmarking 1000 streams of 1000 bytes/multistream: Collecting 100 samples in estimated 7.0
1000 streams of 1000 bytes/multistream
                        time:   [66.765 ms 66.997 ms 67.247 ms]
                        change: [+0.6277% +1.1200% +1.6225%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

@larseggert
Copy link
Collaborator Author

larseggert commented Feb 19, 2025

Oh good. I think it is core pinning being awkward on macOS then.

BTW, I came across https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/ today, and we should change the bencher accordingly.

@larseggert
Copy link
Collaborator Author

I'm redoing this PR in stages, to check if the new bench actually shows any improvements. The first push changes only the existing (non-test) uses of HashMap and HashSet to FxHasher.

@larseggert
Copy link
Collaborator Author

Hm. The benches all show small improvements, while the client/server tests all show small regressions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants