-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use FxHasher
in places where we don't need DDoS resistance
#2342
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2342 +/- ##
==========================================
- Coverage 95.39% 95.38% -0.02%
==========================================
Files 115 115
Lines 36982 36982
Branches 36982 36982
==========================================
- Hits 35280 35275 -5
- Misses 1696 1701 +5
Partials 6 6
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to e660b0b. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Benchmark resultsPerformance differences relative to e660b0b. decode 4096 bytes, mask ff: No change in performance detected.time: [11.756 µs 11.793 µs 11.836 µs] change: [-0.3060% +0.1678% +0.7606%] (p = 0.52 > 0.05) decode 1048576 bytes, mask ff: No change in performance detected.time: [2.8911 ms 2.9004 ms 2.9113 ms] change: [-0.6442% -0.1316% +0.3790%] (p = 0.62 > 0.05) decode 4096 bytes, mask 7f: No change in performance detected.time: [19.622 µs 19.668 µs 19.722 µs] change: [-0.4100% +0.0652% +0.5843%] (p = 0.80 > 0.05) decode 1048576 bytes, mask 7f: No change in performance detected.time: [4.7011 ms 4.7125 ms 4.7253 ms] change: [-0.4212% -0.0443% +0.3600%] (p = 0.82 > 0.05) decode 4096 bytes, mask 3f: No change in performance detected.time: [6.2018 µs 6.2278 µs 6.2610 µs] change: [-0.4002% +0.2808% +0.9056%] (p = 0.44 > 0.05) decode 1048576 bytes, mask 3f: No change in performance detected.time: [2.1052 ms 2.1121 ms 2.1191 ms] change: [-0.5187% -0.0585% +0.4027%] (p = 0.87 > 0.05) 1 streams of 1 bytes/multistream: No change in performance detected.time: [67.527 µs 68.998 µs 70.915 µs] change: [-1.2105% +0.9712% +3.7338%] (p = 0.51 > 0.05) 1000 streams of 1 bytes/multistream: 💚 Performance has improved.time: [23.737 ms 23.770 ms 23.803 ms] change: [-2.8996% -2.7076% -2.5183%] (p = 0.00 < 0.05) 10000 streams of 1 bytes/multistream: Change within noise threshold.time: [1.6324 s 1.6339 s 1.6353 s] change: [-0.8622% -0.7205% -0.5844%] (p = 0.00 < 0.05) 1 streams of 1000 bytes/multistream: Change within noise threshold.time: [68.102 µs 68.666 µs 69.690 µs] change: [-2.6944% -1.8302% -0.2751%] (p = 0.00 < 0.05) 100 streams of 1000 bytes/multistream: 💚 Performance has improved.time: [3.1618 ms 3.1696 ms 3.1775 ms] change: [-2.8253% -2.5176% -2.1903%] (p = 0.00 < 0.05) 1000 streams of 1000 bytes/multistream: Change within noise threshold.time: [137.38 ms 137.46 ms 137.53 ms] change: [-0.4528% -0.3793% -0.3079%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [92.063 ns 92.367 ns 92.677 ns] change: [-0.4999% -0.1185% +0.2573%] (p = 0.54 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [109.52 ns 109.81 ns 110.13 ns] change: [-0.7449% -0.3318% +0.0674%] (p = 0.11 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [109.66 ns 110.17 ns 110.77 ns] change: [-0.1270% +0.4081% +0.9619%] (p = 0.14 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [91.215 ns 96.824 ns 109.27 ns] change: [-1.0794% +1.8558% +6.6420%] (p = 0.53 > 0.05) RxStreamOrderer::inbound_frame(): No change in performance detected.time: [115.36 ms 115.41 ms 115.46 ms] change: [-0.0557% -0.0009% +0.0566%] (p = 0.97 > 0.05) SentPackets::take_ranges: No change in performance detected.time: [5.3297 µs 5.5149 µs 5.7137 µs] change: [-3.1167% +2.0582% +8.2112%] (p = 0.53 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [34.105 ms 34.169 ms 34.233 ms] change: [-3.3513% -3.1020% -2.8465%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [34.691 ms 34.739 ms 34.787 ms] change: [-1.4620% -1.2383% -1.0103%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [34.716 ms 34.778 ms 34.840 ms] change: [-2.1681% -1.9396% -1.7114%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [35.005 ms 35.056 ms 35.108 ms] change: [-1.7862% -1.5911% -1.3892%] (p = 0.00 < 0.05) 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: Change within noise threshold.time: [2.1806 s 2.1880 s 2.1954 s] thrpt: [45.551 MiB/s 45.704 MiB/s 45.858 MiB/s] change: time: [-1.4875% -1.0400% -0.6100%] (p = 0.00 < 0.05) thrpt: [+0.6138% +1.0509% +1.5100%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: 💚 Performance has improved.time: [379.27 ms 381.31 ms 383.37 ms] thrpt: [26.084 Kelem/s 26.226 Kelem/s 26.366 Kelem/s] change: time: [-3.2163% -2.5284% -1.8269%] (p = 0.00 < 0.05) thrpt: [+1.8609% +2.5940% +3.3231%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💔 Performance has regressed.time: [28.336 ms 29.084 ms 29.837 ms] thrpt: [33.515 elem/s 34.383 elem/s 35.291 elem/s] change: time: [+1.8457% +5.5653% +8.9860%] (p = 0.00 < 0.05) thrpt: [-8.2451% -5.2719% -1.8122%] 1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: Change within noise threshold.time: [3.1405 s 3.1635 s 3.1880 s] thrpt: [31.368 MiB/s 31.611 MiB/s 31.842 MiB/s] change: time: [-2.0950% -1.1486% -0.1795%] (p = 0.02 < 0.05) thrpt: [+0.1798% +1.1620% +2.1399%] Client/server transfer resultsPerformance differences relative to e660b0b. Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 in general.
That said, I would prefer only replacing std::collectionsHash*
where ever it proofs beneficial, e.g. not in unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spotted two places where EnumMap
could give us bigger wins.
I think that your performance gains largely derive from the changes to the client and server code. There, the security risk is limited (we're not using this server in real deployments).
Still, you should review the changes for security risk. This hasher could expose us to DoS if the hashed values are controlled by an adversary. I've checked the usage in our server code, which is fine because attackers don't get to control memory allocations (we use pointer values for the hash). Still, that makes me wonder whether we should be using Pin
.
@martinthomson thanks for the analysis. My plan is to add some benches first in another PR. I'll add some for those instances where you suggest to look into Even if some of the macro benefits come from speeding up the demo client and server code, it's IMO still worth doing, since eliminating those overheads makes it easier to spot other bottlenecks. About security, I didn't do much of an analysis, but I think the main use of this insecure hasher would be when looking up items (streams, unacked chunks) that while under the control of an attacker are also quite limited in what valid values are that wouldn't immediately cause a connection clause. |
I definitely agree with the point about removing the overheads from our toy code as much as possible. This seems like a pretty substantial win there, so it's worth doing. I doubt that my |
I think the |
FxHasher
in places where we don't need DDoS resistance
Good point. Though before we introduce the complexity of |
Definitely the right question to be asking. I think that it might be possible to use the first connection ID as a key for this sort of thing, but we don't tend to keep that around today, once we stop using it. Everything else -- as far as I know -- is ephemeral and therefore not suitable. |
I'm doing a benchmark in #2444 to quantify the benefits first. (It's not going well, a lot of variation run-to-run for some reason.) |
That is counterintuitive for me, given that it uses |
I wonder if it's the CPU scheduler and frequency control on my Mac. Bencher seems much more stable. |
For what it is worth, here is #2444 on my machine:
I don't see much deviation. Am I running the wrong version @larseggert? |
Can you run it again and see if there are changes run to run? That is where I see random improvements or regressions. |
Here are two more runs with vanilla #2444. No significant deviations. Note that I am not running your optimizations in this pull request.
|
Oh good. I think it is core pinning being awkward on macOS then. BTW, I came across https://manuel.bernhardt.io/posts/2023-11-16-core-pinning/ today, and we should change the bencher accordingly. |
2b00ef1
to
60cb6f2
Compare
I'm redoing this PR in stages, to check if the new bench actually shows any improvements. The first push changes only the existing (non-test) uses of |
Hm. The benches all show small improvements, while the client/server tests all show small regressions... |
Signed-off-by: Lars Eggert <lars@eggert.org>
I think this may be worthwhile. The
cargo bench
es don't consistently show a benefit, but the loopback transfers on the bencher machine are faster, e.g.,without this PR but
with it.
(I'll see if I can improve CI so that we also see the differences to
main
for the table results.)