Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression: tls.createSecurePair() Not Buffering Connections in v10 #20263

Closed
alexfernandez opened this issue Apr 24, 2018 · 21 comments
Assignees
Labels
performance Issues and PRs related to the performance of Node.js. regression Issues related to regressions. tls Issues and PRs related to the tls subsystem.

Comments

@alexfernandez
Copy link

  • Version: v10.0.0-pre
  • Platform: Linux newtons7 4.9.0-6-rt-amd64 deps: update openssl to 1.0.1j #1 SMP PREEMPT RT Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux
  • Subsystem: TLS

With the upcoming Node.js v10.0.0, performance of sockets created with createSecurePair() is degraded, with several effects as unresponsive event loop, crashing connections and general sluggishness.

Steps To Reproduce

Create a proxy that receives encrypted TLS connections and resends to a plain socket server. It should be a regular socket server that uses tls.createSecurePair() to decrypt data, and send it unencrypted to the socket server. Bombard it with a lot of connections and relatively small packets. Soon enough the proxy will hog down and become unresponsive.

client -> proxy -> server

It is best to create multiple workers to act as clients, since otherwise the main event loop will be too busy sending data.

Sample Script

Run the script combined-tls.js:

git clone https://github.com/logtrust/tls-server-demo
cd tls-server-demo
npm install
# raise the number of available file descriptors to ~10k
ulimit -n 10240
node lib/combined-tls.js

Running the last line using Node.js v8 or v9 will produce a responsive server, that prints a trace every five seconds; using Node.js v10.0.0 pre will yield an unresponsive server that prints traces erratically, generates strange errors and takes ages to shut down after ^C'ing it.

See the README.md for runtime options such as modifying the number of incoming connections. The script uses the stdio module only for parsing options, and strictly Node.js core modules after that.

Possible Explanation

After the great work by @addaleax in #17882, the unholy mess that was lib/_tls_legacy.js was greatly simplified to use new tls.TLSSocket() underneath. A crucial bit was not ported, though: the buffering capabilities of tls.SecurePair().

As a consequence, the new code in lib/internal/streams/duplexpair.js sends events (either data or readable depending on the stream mode) as they are received and decrypted. When many small messages are sent to the TLSSocket it will hog down. In contrast, the old code behaved like TCP: when heavily loaded it started buffering data and sending bigger and bigger packets to the cleartext stream.

The sample code shows an average of 32 bytes per event with TLSSocket (v10), the same exact value sent by the client per packet; while the original createSecurePair() (v8 and v9) shows a bigger rate that depends on system load: I have seen anything from 60 to 300 bytes/event. The number of events in TLSSocket is so high with v10 that attempting to resend everything from the proxy makes the event loop unresponsive.

A possible workaround is to buffer packets in the proxy server, but this is suboptimal as it means double-guessing what the stream is doing. Also, buffering in the proxy would minimize packet resend time, but we would still be processing a lot of unnecessary events.

Importance

This issue is critical for us since it affects our ability to ingest data using both secure and insecure connections to the same port.

@Trott Trott added crypto Issues and PRs related to the crypto subsystem. tls Issues and PRs related to the tls subsystem. and removed crypto Issues and PRs related to the crypto subsystem. labels Apr 24, 2018
@Trott
Copy link
Member

Trott commented Apr 24, 2018

@nodejs/crypto

@addaleax addaleax added the performance Issues and PRs related to the performance of Node.js. label Apr 24, 2018
@addaleax addaleax self-assigned this Apr 24, 2018
@addaleax
Copy link
Member

Thanks for the extensive work prior to reporting! I’ll definitely take a look as soon as I have the time.

tls.createSecurePair() has been deprecated for a while now, though. Have you looked in some way into migrating to TLSSocket directly? I get that that’s a major, non-trivial API shift, but it would be nice to have the comparison if that’s possible.

I’m assigning this to myself because I agree that the patch identified by @alexfernandez is by far the most likely cause here, but if somebody else wants to investigate I don’t want to stand in the way. :)

@alexfernandez
Copy link
Author

alexfernandez commented Apr 24, 2018

Thanks for looking into it, @addaleax! Sure, we have considered migrating to new tls.TLSSocket() as recommended by the docs, that is where our investigation begun. It behaves very similarly to tls.createSecurePair() in v10, and the underlying cause is probably the same: lack of buffering. Then we tried v10 to see how the new code behaved, and to our dismay not even the old tls.createSecurePair() gives good performance.

If you run the sample script under Node.js v8 or v9 with --new or -n it will attempt to use new TLSSocket(), and the results will be very similar to v10 with tls.createSecurePair().

Note: the class Pipe in the script is just a way of easily switching between readable and data events using the --readable option, which has no noticeable effect on performance. A simple first.pipe(second) yields the exact same results.

@benjamingr benjamingr added the regression Issues related to regressions. label Apr 25, 2018
@benjamingr
Copy link
Member

Excellent bug report @alexfernandez - just saying it out loud in case it's not obvious.

If you'd like to work on the PR and get guidance/help from @addaleax if/when you're having issues with it that's also something we're very much in favor of. The report itself shows an understanding of the code and indicates a capability to do so.

Of course, this is not a request but an offer - you are in under no obligation or pressure to work on a PR. I'm just saying clearly that it's welcome :)

@apapirovski
Copy link
Member

apapirovski commented Apr 25, 2018

Hmm... I'm also seeing a massive regression somewhere along the way in v9.x in the lib/simple-tls-new.js. From 2.4mb/s to 1.4mb/s. That's troubling... I'm going to try bisecting when I have a moment, or at least find which exact version saw the regression.

Edit: The regression landed in 9.7.0. Will report when I have more info.

@apapirovski
Copy link
Member

apapirovski commented Apr 25, 2018

@addaleax Reverting 82c43ae fixes the regression I experienced above (unrelated to the other one) and actually yields a performance improvement to 3.2mb/s. Any thoughts?

Edit: After testing some more, I think we need a C++ nextTick... 😆 When using env()->isolate()->EnqueueMicrotask instead of SetImmediate the performance recovers. Of course, I do not think we should interleave with promises like that. Or we could make it possible for those calls to be sync.

@alexfernandez
Copy link
Author

Thanks, @benjamingr! I will try to understand the code at a low level and maybe propose something.

@benjamingr
Copy link
Member

Of course, I do not think we should interleave with promises like that.

Can you elaborate on that a little? Promises don't actually make any guarantee about when they run except that they do when only platform code does (which the C++ qualifies as).

We make no official guarantees on the current nextTick (and new ticks) -> promise queue (and chained promises) -> other stuff order and I don't think that we should (as long as we never run promise thens in a way that violates the spec).

@apapirovski
Copy link
Member

Can you elaborate on that a little? Promises don't actually make any guarantee about when they run except that they do when only platform code does (which the C++ qualifies as).

Yeah, it's probably not a big deal to interleave. Also the way EnqueueMicrotask is implemented is pretty efficient so I don't really want to re-implement it on our end. We can't get anywhere near the same perf with std::vector, at least based on a quick test.

I opened a PR in #20287

@addaleax
Copy link
Member

Fwiw, the reproduction in https://github.com/logtrust/tls-server-demo does not seem to work very reliably on Linux… :/

@alexfernandez
Copy link
Author

alexfernandez commented Apr 26, 2018

@addaleax Yes, it is not simple to reproduce and my script is not very polished. I think I have reduced it to a benchmark similar to benchmark/tls/throughput.js, I will send a PR presently to include it in Node.js.

@benjamingr
Copy link
Member

@alexfernandez can you also try running the benchmark with #20287 applied?

(curl -L https://github.com/nodejs/node/pull/20287.patch | git am in your node folder)

@apapirovski
Copy link
Member

apapirovski commented Apr 26, 2018

@benjamingr I don't recommend it. It won't affect the SecurePair piece at all. (Hence only Refs not Fixes.)

More importantly, I'm re-working that PR to make sync writes to TLSWrap possible instead of just shifting around the type of async scheduler. After further exploration, I think async writes are actually good and we need to potentially disable DoTryWrite for TLS. I'm seeing very sizeable gains from that type of a change.

@alexfernandez
Copy link
Author

alexfernandez commented Apr 26, 2018

@benjamingr Sure enough! Thanks for the curl line, it makes it much simpler 👍 .

apapirovski pushed a commit that referenced this issue May 16, 2018
PR-URL: #20344
Refs: #20263
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
@alexfernandez
Copy link
Author

@addaleax Now that c346cb6 has landed, we finally have a reliable way to reproduce this isssue! 😄 The procedure is explained in the PR #20344, basically just run node benchmark/tls/secure-pair.js which will show the result in megabits (in 5 seconds). On my laptop it goes from ~16 to ~3 with 2-byte packets, and from ~2k to ~1k with 1 KB packets. Performance for larger packets improves slightly, from 3.2M to 3.7M.

MylesBorins pushed a commit that referenced this issue May 22, 2018
PR-URL: #20344
Refs: #20263
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Benjamin Gruenbaum <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
@alexfernandez
Copy link
Author

alexfernandez commented Nov 2, 2018

Quick update now that Node.js 10.x is becoming LTS next week: 10.13.0 is still painfully slow when using tls.createSecurePair(). So is 11.1.0, while 8.12.0 and 9.11.2 are still fast. From with the repo:

node benchmark/tls/secure-pair.js

The results for using 2, 1024 and 104856 bytes-long packets using tls.createSecurePair() are show below. Results for new TLSSocket() are quite similar across versions. All durations are 5 seconds.

Node.js version packet size MB transferred
8.12.0 2 15.3
9.11.2 2 15.5
10.13.0 2 3.5
11.1.0 2 4.4
8.12.0 1024 2050
9.11.2 1024 1796
10.13.0 1024 1069
11.1.0 1024 1180
8.12.0 1048576 2985
9.11.2 1048576 3109
10.13.0 1048576 3582
11.1.0 1048576 3561

In summary, anyone using tls.createSecurePair() is heavily penalized unless they are transferring upwards of 1 MB in 5 seconds.

@alexfernandez
Copy link
Author

cc @mcollina

@mcollina
Copy link
Member

Sorry for replying this after a long time. I did not forget about this, just the last month has been very hectic.

@addaleax maybe you can take another shot at this? or at least, can you take a look and point us into the right direction for a fix?

@tufosa
Copy link

tufosa commented Mar 18, 2019

Hi there!! I am taking over @alexfernandez's work on node and TLS and I was wondering if there's been any progress on this issue, @addaleax. This regression in performance is blocking us from upgrading to node 10 in production, and we would love to move up and enjoy the rest of improvements made in this version.

Thanks in advance for your excellent work!!

@addaleax
Copy link
Member

@alexfernandez @mcollina @tufosa I think #27861 has addressed this issue – feel free to double-check, though.

@tufosa
Copy link

tufosa commented May 30, 2019

@addaleax @mcollina @alexfernandez I've just run the benchmark in the repo and some others of my own and the results are spectacular: performs 4 times better than the old _tls_legacy with chunks of 2 bytes, 2.5 times better with chunks of 100 bytes and 1.1 times better with chunks of 1024 bytes. Great, great great job!!! While doing the experiments I've noticed that clear sockets perform way worse than TLS sockets with small chunks of data (2 and 100 bytes). But it happened the same in node 8, so there is no regression. I'll open a different issue for this. Thanks again for the excellent work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues and PRs related to the performance of Node.js. regression Issues related to regressions. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

No branches or pull requests

7 participants