-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent requests slightly larger than 4096 bytes are slow #2062
Comments
Did you run that as debug or release? |
I have to admit these numbers were run in debug but the original issue surfaced in production The 40ms spike at 4097 bytes remains if anything it's even more pronounced since the larger requests did speed up. for b in 4096 4097 8429 8430 10000 40960 409600 4096000;do; for n in 1 10 100;do \
echo -n "b=$b n=$n "; httperf --server localhost --port 8000 \
--uri /$b --num-calls $n | grep "ms/req"; done; done
|
Ok well same for optimized releases, may I ask, have you run any perf benchmarks? just to see if it has something to do with mem preasure/cpu overhead |
Running with 1 worker is the same. Always allocating the same amount has still the same result #[get("/<size>")]
fn size(size: usize) -> String {
let mut s = String::with_capacity(4_000_000);
if size == 4096 {
s += " ";
}
for _ in 0..size {
s += " ";
}
if size == 4096 {
s.truncate(size);
}
s
} as has returning a |
@ssendev Don't I get the point or this code is for sure slower with 4096 than any other size... |
I may have identified the problem. Since Sergio is kinda off until mid this year for personal reasons I have created a fork to work on. Well at least using the script with httperf provided by you I couldn‘t notice delays anymore [dependencies]
rocket = { git = "https://github.com/somehowchris/Rocket", tag = "0.5.0-patch.1"}
|
Yes there is a sharp cliff above 4096. time httperf --server localhost --port 8000 --uri /4096 --num-calls 100000
real 4,958
user 2,976
sys 1,541
maxmem 8 MB
faults 0 time httperf --server localhost --port 8000 --uri /4097 --num-calls 1000
real 40,996
user 33,519
sys 7,286
maxmem 8 MB
faults 0 and requests stay slow until a size of 8429 where there is another cliff and response times return from ~40ms to 1.7ms time httperf --server localhost --port 8000 --uri /8429 --num-calls 1000
real 41,005
user 33,381
sys 7,468
maxmem 8 MB
faults 0 time httperf --server localhost --port 8000 --uri /8430 --num-calls 10000
real 16,524
user 13,117
sys 3,224
maxmem 8 MB
faults 0 beware the changing @somehowchris |
I've put it up here https://github.com/kolbma/rocket-issue-2062 with modification to
But I see there some more slow requests... Have a look at the n=2... also these are not constant slow in relation to content size... Have you tried a different benchmark tool?
Another...
|
That the slowdowns aren't linearly correlated to content size is what makes this strange. The sharp edge at 4KiB makes me think it's some buffer that causes scheduling issues. i just tried Apache bench |
I've tried wrk... the 1st value is average... you see also the breakdown with 4097 and the raise with 8430...
|
This is the default chunk size in Rocket. But the window to 8430 is strange. |
If you change |
It doesn't work correctly after 8430 bytes count also! Look at the Net I/O output of httperf. This is much too slow.
With 3x max_chunk_size == 12288 bytes it is fast like with 4096 bytes and because of bigger "file" more effective in Net I/O.
Is there anything wrong with this? Anyone an idea why it starts to become faster again after the additional 4334 bytes (4096 + 4334 = 8430)? I see there some more not really explainable differences in speed. Looks like there is some race condition and blocking. |
https://github.com/tokio-rs/console advertises itself as capable of it The fact that the delay is 40ms also made me think of this https://vorner.github.io/2020/11/06/40-ms-bug.html |
http1_writev disabled doesn't make a difference. I've setup in the meantime some code for bare hyper which has the same functionality to respond to
I'm thinking about how the keep_alive suits in the situation with different byte counts. 4096 ok...
4097 slow....
Creating 500 connections for each 1 GET request is ok...
But 250 connections each reused for a 2nd concurrent GET is a bottleneck...
There should be more req/s than with calls: 500 conns: 1, because there is lesser TCP connection overhead. |
So the time is lost in the concurrent call of In my debug build and a connection with 2 requests, there is waited for one request about 290 milliseconds. The other request is about 2 or 3 milliseconds. The
to remove temporarily the unwinding stuff. In the blackbox part called by |
After all the code inspection and code timings... I don't believe anylonger there is an issue in Rocket... If you check the network traffic and its timing, you'll see for 2 requests in one keep-alive connection with content length 4096 And for 2 requests in one keep-alive connection with content length 4097 The data inclusive proto overhead 4399 bytes is sent by Rocket server always quickly after the GET request. For content length 4096 there is a single TCP segment, while for 4097 there is a 2nd. Back to whats going on...
With bigger byte counts there is still a problem with the tool...
With only 3 calls it is always fast. With 5 calls it is always slow. Problem is the same like above... if it is slow, over 44ms wait for ACK the received data.
All calls against debug build with timing output, so much slower as a release build of Rocket. |
@kolbma thats the 40ms bug that vorner wrote up
The issue here is the combination of small content and multiple packets. The writev setting having no effect may just mean that a similar thing is having a writev impact. I suggest a strace: if the syscalls being made by rocket apps split headers and body for small content this problem will occur. See https://en.wikipedia.org/wiki/Nagle%27s_algorithm and https://datatracker.ietf.org/doc/html/draft-minshall-nagle The answer is to ensure that userspace - rocket - always writes either a complete response, or at least a packet worth of data. |
@rbtcollins PR welcome! |
To summarize, this is due to interaction between two TCP optimizations: Nagle's and delayed ACKs. From Wikipedia:
What are you suggesting, precisely? Rocket does not control the data size: this is entirely up to the application author. As far as I can tell, there are really only two solutions, assuming you don't control the client:
Though it seems ideal, I cannot see any mechanism to do 2) with Hyper's |
Description
A response slightly larger than 4096 bytes is significantly slower than other responses if more than one request is made concurrently
httperf --server localhost --port 8000 --uri /$bytes --num-calls $n
To Reproduce
Expected Behavior
no large jumps in response time
Environment:
The text was updated successfully, but these errors were encountered: