-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process.report.getReport();
is potentially very slow
#46060
Comments
From what I can tell so far with my limited understanding of Node internals, the report is delayed by the generation of the To generate this section, all open handles will be inspected and information about them is attached to the report. The handles that cause the majority of the delay are the ones relating to sockets. There seem to be at least 20+ open sockets at the time the report is generated. Line 12 in a03529d
I must assume the DNS queries take especially long in my environment for this to be this problematic, but I also feel like it might not be ideal to run DNS lookups in a fully synchronous code path. I got the impression that generating the report should be fairly quick. So any chance to introduce minute-long delays should likely be avoided. |
This code reproduces the issue reliably on one of my systems: const https = require("node:https");
const SOCKET_COUNT = 10;
const sockets = [];
let allResolved = false;
const keepAliveAgent = new https.Agent({ keepAlive: true });
const performRequest = () => {
const socketPromise = new Promise((resolve) => {
https
.get(
{
agent: keepAliveAgent,
hostname: "registry.npmjs.org",
},
() => {
process.stdout.write(".");
resolve();
}
)
.on("error", (error) => {
console.error(error);
resolve();
});
});
sockets.push(socketPromise);
};
process.stdout.write(`${new Date().toISOString()} opening ${SOCKET_COUNT} sockets`);
for (let loops = 0; loops < SOCKET_COUNT; ++loops) {
performRequest();
}
Promise.all(sockets).then(() => {
allResolved = true;
console.log(
`\n${new Date().toISOString()} all sockets created. requesting report...`
);
process.report.getReport();
console.log(`${new Date().toISOString()} destroying client`);
keepAliveAgent.destroy();
console.log(`${new Date().toISOString()} end`);
});
const check = () => {
if (!allResolved) {
setTimeout(check, 200);
return;
}
};
check(); On this system, it takes about 20s to generate the report. |
FWIW, on my machine (M1 MBP), the report is generated in much less than one second. |
Interesting. It was suggested in the yarn source that this is somehow WSL-specific. So far, I'm having a hard time understanding what aspect of WSL would introduce this, but I also haven't compared with non-WSL Linuxes yet. |
After playing around a bit at https://github.com/oliversalzburg/report-repro/, I can see that RDNS is being performed on the addresses of all open sockets. With a large enough number of sockets, I could also push the report generation over the multi-second mark in GitHub Actions on ubuntu-latest. I still feel like this is problematic, but the behavior is certainly amplified in WSL. The default behavior of which is to register an internal DNS resolver into When I inspect the answers from this resolver, the problem also becomes clearer: $ dig -x 1.1.1.1 @172.22.176.1 +noall +answer
1.1.1.1.in-addr.arpa. 0 IN PTR one.one.one.one.
$ dig -x 1.1.1.1 @8.8.8.8 +noall +answer
1.1.1.1.in-addr.arpa. 1006 IN PTR one.one.one.one. The WSL resolver will set the TTL of all answers to A workaround is to configure a different DNS server for the WSL environment. https://learn.microsoft.com/en-us/windows/wsl/wsl-config#network-settings |
I also reported this for WSL: microsoft/WSL#9423 |
@cjihrig I can reproduce (on Linux) when I blackhole outgoing DNS traffic. It's the |
process.report.getReport();
is very slowprocess.report.getReport();
is potentially very slow
Perhaps we can add an option to |
I was facing similar issue when using yarn v2 on a Ubuntu VM with hyper-v, instead of wsl. Thanks to the code by @oliversalzburg to reproduce the issue I found that the systemd-resolved was the culprit and this thread helped me to find a workaround: systemd/systemd#28166 Maybe this can help someone with similar setup as mine:
|
Hey folks, I'm going to try fixing this! I'm going to implement @joyeecheung 's idea: #46060 (comment) |
By the way I don't know what exactly yarn is using |
It looks like yarn is using it to determine libc variant https://github.com/yarnpkg/berry/blob/6db7b2195bea567a275f3b95221c0517bd925dea/packages/yarnpkg-core/sources/nodeUtils.ts#L29 IMO that's using a sledgehammer to crack a nut, we should probably just provide an API specifically for listing the loaded shared objects. |
@joyeecheung Please note this comment: yarnpkg/berry#5167 (comment), specifically the reference to #41338. |
New option `--report-disable-network`, also available as `report.disableNetwork`, enables the user to disable networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. fixes: nodejs#46060
New option `--report-disable-network`, also available as `report.disableNetwork`, enables the user to disable networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. fixes: nodejs#46060
New option `--report-disable-network`, also available as `report.disableNetwork`, enables the user to disable networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. fixes: nodejs#46060
Adds a new option `process.report.networkDisabled` and cli option `--report-network-disabled` which will disable any netowkring operations for the `report` generation. Fixes: nodejs#46060
Adds a new option `process.report.networkDisabled` and cli option `--report-network-disabled` which will disable any netowkring operations for the `report` generation. Fixes: nodejs#46060 PR-URL: nodejs#51645
Adds a new option `process.report.networkDisabled` and cli option `--report-network-disabled` which will disable any netowkring operations for the `report` generation. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645 Co-authored-by: Joyee Cheung <[email protected]>
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645 Co-authored-by: Joyee Cheung <[email protected]>
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645 Co-authored-by: Joyee Cheung <[email protected]>
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645 Co-authored-by: Joyee Cheung <[email protected]>
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: #46060 PR-URL: #51645 Co-authored-by: Joyee Cheung <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: nodejs#46060 PR-URL: nodejs#51645 Co-authored-by: Joyee Cheung <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
New option `--report-exclude-network`, also available as `report.excludeNetwork`, enables the user to exclude networking interfaces in their diagnostic report. On some systems, this can cause the report to take minutes to generate so this option can be used to optimize that. Fixes: #46060 PR-URL: #51645 Co-authored-by: Joyee Cheung <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
Now npm is also using it for this exact same purpose since 10.4.0 and this issue still seem to happen even with |
I just tried with the test case in #46060 (comment) Running And I can confirm this issue is not solved, the excludeNetwork option did not target the root of the issue which was libuv doing reverse DNS queries With even one socket open And the script also never exits because of https.get There is the same problem with fetch (minus the program not exiting)
If I close the socket manually then everything works as expected
|
Version
v18.12.1
Platform
Linux DE-ADN-2GPTTQ3 5.15.79.1-microsoft-standard-WSL2 #1 SMP Wed Nov 23 01:01:46 UTC 2022 x86_64 GNU/Linux
Subsystem
report
What steps will reproduce the bug?
Calling
process.report.getReport();
is all it takes. The call will not return for longer than 1 minute, if the failure conditions are met.How often does it reproduce? Is there a required condition?
The issue is 100% reproducible within scenarios I can only construct with our private code bases so far.
I am seeing this as part of the package installation process of yarn, which call
process.report.getReport();
internally. This usually takes less than a second, but in certain scenarios, it takes more than a minute.What is the expected behavior?
The call should return reliably within a timely fashion, not consuming more than 2-3 seconds.
What do you see instead?
Node hangs for more than a minute and is completely unresponsive.
Additional information
While looking into this issue, I found this comment in the source code of yarn:
These comments are above the code path that leats to the report being generated. This suggests that the behavior is a WSL-specific issue. Assuming that's the case, I'd still like to get better insights into the root cause.
The text was updated successfully, but these errors were encountered: