-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Use 1MB socket TX and RX buffers #2470
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2470 +/- ##
==========================================
- Coverage 95.42% 95.41% -0.02%
==========================================
Files 115 115
Lines 36996 36996
Branches 36996 36996
==========================================
- Hits 35305 35301 -4
- Misses 1687 1689 +2
- Partials 4 6 +2
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 79068bc. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Benchmark resultsPerformance differences relative to 79068bc. decode 4096 bytes, mask ff: No change in performance detected.time: [11.834 µs 11.866 µs 11.905 µs] change: [-0.2592% +0.2090% +0.6971%] (p = 0.43 > 0.05) decode 1048576 bytes, mask ff: No change in performance detected.time: [3.0696 ms 3.0792 ms 3.0904 ms] change: [-0.3548% +0.1200% +0.5542%] (p = 0.63 > 0.05) decode 4096 bytes, mask 7f: No change in performance detected.time: [19.765 µs 19.813 µs 19.864 µs] change: [-0.7559% -0.2047% +0.3059%] (p = 0.47 > 0.05) decode 1048576 bytes, mask 7f: No change in performance detected.time: [5.1493 ms 5.1606 ms 5.1733 ms] change: [-0.2918% +0.0325% +0.3619%] (p = 0.85 > 0.05) decode 4096 bytes, mask 3f: No change in performance detected.time: [6.8741 µs 6.9005 µs 6.9336 µs] change: [-0.3551% +0.2139% +0.9690%] (p = 0.56 > 0.05) decode 1048576 bytes, mask 3f: No change in performance detected.time: [1.7539 ms 1.7581 ms 1.7637 ms] change: [-0.6174% -0.1459% +0.3291%] (p = 0.56 > 0.05) 1 streams of 1 bytes/multistream: No change in performance detected.time: [69.025 µs 70.038 µs 71.512 µs] change: [-2.6302% +0.0527% +2.7653%] (p = 0.95 > 0.05) 1000 streams of 1 bytes/multistream: No change in performance detected.time: [24.510 ms 24.544 ms 24.579 ms] change: [-0.1906% +0.0147% +0.2159%] (p = 0.89 > 0.05) 10000 streams of 1 bytes/multistream: Change within noise threshold.time: [1.6453 s 1.6468 s 1.6484 s] change: [-0.3740% -0.2276% -0.0886%] (p = 0.00 < 0.05) 1 streams of 1000 bytes/multistream: No change in performance detected.time: [70.727 µs 71.715 µs 73.151 µs] change: [-1.5245% +0.5445% +2.6527%] (p = 0.70 > 0.05) 100 streams of 1000 bytes/multistream: No change in performance detected.time: [3.2461 ms 3.2522 ms 3.2588 ms] change: [-0.4924% -0.1933% +0.0739%] (p = 0.19 > 0.05) 1000 streams of 1000 bytes/multistream: No change in performance detected.time: [138.13 ms 138.19 ms 138.26 ms] change: [-0.1430% -0.0699% +0.0004%] (p = 0.06 > 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [92.533 ns 92.831 ns 93.128 ns] change: [-1.3809% -0.0963% +0.9150%] (p = 0.88 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [110.33 ns 110.65 ns 110.99 ns] change: [-0.7340% -0.2089% +0.2866%] (p = 0.43 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [109.90 ns 110.23 ns 110.66 ns] change: [-1.9362% -0.6801% +0.3640%] (p = 0.28 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [91.275 ns 91.310 ns 91.351 ns] change: [-1.0702% -0.2103% +0.6373%] (p = 0.65 > 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [115.65 ms 115.70 ms 115.75 ms] change: [-0.1856% -0.1214% -0.0627%] (p = 0.00 < 0.05) SentPackets::take_ranges: No change in performance detected.time: [5.3703 µs 5.5566 µs 5.7561 µs] change: [-0.2782% +2.5626% +5.4953%] (p = 0.09 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [34.257 ms 34.325 ms 34.394 ms] change: [+0.4256% +0.6817% +0.9488%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: Change within noise threshold.time: [34.105 ms 34.162 ms 34.219 ms] change: [-0.7713% -0.5383% -0.2868%] (p = 0.00 < 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [33.887 ms 33.932 ms 33.978 ms] change: [+0.0531% +0.2456% +0.4449%] (p = 0.02 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [34.599 ms 34.657 ms 34.716 ms] change: [-0.4572% -0.2370% -0.0183%] (p = 0.04 < 0.05) 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.time: [2.0295 s 2.0369 s 2.0443 s] thrpt: [48.917 MiB/s 49.094 MiB/s 49.272 MiB/s] change: time: [-9.1521% -8.6803% -8.2298%] (p = 0.00 < 0.05) thrpt: [+8.9679% +9.5054% +10.074%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.time: [385.95 ms 387.77 ms 389.63 ms] thrpt: [25.665 Kelem/s 25.789 Kelem/s 25.910 Kelem/s] change: time: [-1.8457% -1.1416% -0.4341%] (p = 0.00 < 0.05) thrpt: [+0.4360% +1.1548% +1.8804%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: 💚 Performance has improved.time: [27.379 ms 28.055 ms 28.744 ms] thrpt: [34.790 elem/s 35.645 elem/s 36.524 elem/s] change: time: [-11.525% -8.4360% -5.1664%] (p = 0.00 < 0.05) thrpt: [+5.4479% +9.2132% +13.026%] 1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.time: [2.8967 s 2.9240 s 2.9535 s] thrpt: [33.858 MiB/s 34.200 MiB/s 34.522 MiB/s] change: time: [-17.495% -16.475% -15.389%] (p = 0.00 < 0.05) thrpt: [+18.188% +19.724% +21.205%] Client/server transfer resultsPerformance differences relative to 79068bc. Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of implementing this in neqo-bin
, how about proposing it upstream quinn-udp
:
I assume close to all QUIC implementations will want a large send and receive buffer.
Sounds good. Or maybe I don't understand why the benches show a performance improvement, but the runs over loopback show a regression. |
@mxinden where would you suggest to add this in |
@mxinden alternatively, I'd suggest merging this to neqo now and switching over to any |
@larseggert how about something along the lines of: diff --git a/quinn-udp/src/unix.rs b/quinn-udp/src/unix.rs
index c39941d5..0b5bbd21 100644
--- a/quinn-udp/src/unix.rs
+++ b/quinn-udp/src/unix.rs
@@ -257,6 +257,14 @@ impl UdpSocketState {
self.may_fragment
}
+ pub fn set_send_buffer_size(&self, bytes: usize)-> io::Result<()> {
+ todo!();
+ }
+
+ pub fn set_rec_buffer_size(&self, bytes: usize) -> io::Result<()> {
+ todo!();
+ }
+
/// Returns true if we previously got an EINVAL error from `sendmsg` syscall.
fn sendmsg_einval(&self) -> bool {
self.sendmsg_einval.load(Ordering::Relaxed)
@@ -543,7 +551,7 @@ fn recv(io: SockRef<'_>, bufs: &mut [IoSliceMut<'_>], meta: &mut [RecvMeta]) ->
Ok(1)
} |
Do we also need the equivalent for Windows? |
Yes. I would say the more platforms the better. |
@mxinden let me know if quinn-rs/quinn#2179 is what you had in mind. |
We need to do the same for `neqo-glue`. Fixes mozilla#1962
On my Mac, I now see:
Note that the send buffer size is apparently based on the Linux (from a QNS run):
Windows (from CI):
https://lists.freebsd.org/pipermail/freebsd-questions/2005-February/075624.html says:
This seems to indicate that we should not do this increase on macOS and BSDs? |
I also don't understand why loopback benchmark performance is going down. |
I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft. https://stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html |
We currently don't. Something along the following lines should work. Want to try running a benchmark on your local Mac to see whether either of them triggers? diff --git a/Cargo.lock b/Cargo.lock
index 64bfcc1f..d6b8b6a7 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1028,6 +1028,7 @@ name = "neqo-udp"
version = "0.12.2"
dependencies = [
"cfg_aliases",
+ "libc",
"log",
"neqo-common",
"quinn-udp",
diff --git a/neqo-udp/Cargo.toml b/neqo-udp/Cargo.toml
index 4abd402a..8ebc2e83 100644
--- a/neqo-udp/Cargo.toml
+++ b/neqo-udp/Cargo.toml
@@ -19,6 +19,7 @@ workspace = true
log = { workspace = true }
neqo-common = { path = "./../neqo-common" }
quinn-udp = { workspace = true }
+libc = "0.2"
[build-dependencies]
cfg_aliases = "0.2"
diff --git a/neqo-udp/src/lib.rs b/neqo-udp/src/lib.rs
index d498f5aa..0cfe0e74 100644
--- a/neqo-udp/src/lib.rs
+++ b/neqo-udp/src/lib.rs
@@ -74,7 +74,19 @@ pub fn send_inner(
src_ip: None,
};
- state.try_send(socket, &transmit)?;
+ if let Err(e) = state.try_send(socket, &transmit) {
+ match e.raw_os_error() {
+ Some(libc::ENOBUFS) => {
+ todo!();
+ }
+ Some(libc::EAGAIN) => {
+ todo!();
+ }
+ Some(_) | None => {}
+ }
+
+ return Err(e);
+ }
qtrace!(
"sent {} bytes from {} to {}", Related: Neither of them are currently exposed through |
It's different, in that I doubt that we actually fill that larger buffer, and they do. (If we filled it, we should IMO also see a throughput increase.) |
We need to do the same for
neqo-glue
.Fixes #1962