-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel 6.6.57+ io_uring stall ("yarn install takes indefinitely") #353709
Comments
I've had this issue too. It doesn't just hang, in goes into disk sleep. Meaning you can't kill it, not even by shutting down the system. |
Yes! Not even |
I confirmed my suspicion. diff --git a/src/libstore/unix/build/local-derivation-goal.cc b/src/libstore/unix/build/local-derivation-goal.cc
index 2a09e3dd4..baeae54f8 100644
--- a/src/libstore/unix/build/local-derivation-goal.cc
+++ b/src/libstore/unix/build/local-derivation-goal.cc
@@ -509,11 +509,11 @@ void LocalDerivationGoal::startBuilder()
/* Create a temporary directory where the build will take
place. */
topTmpDir = createTempDir(settings.buildDir.get().value_or(""), "nix-build-" + std::string(drvPath.name()), false, false, 0700);
-#if __APPLE__
+//#if __APPLE__
if (false) {
-#else
- if (useChroot) {
-#endif
+//#else
+// if (useChroot) {
+//#endif
/* If sandboxing is enabled, put the actual TMPDIR underneath
an inaccessible root-owned directory, to prevent outside
access. which basically reverts NixOS/nix@0e4baff Doing this on any newer nix version without the above diff fails. So this is exactly the reason. Not sure how to tackle this problem, though. It is unlikely that @thufschmitt any idea here? Also, in light of ZHF #352882 a bit of a pressing problem |
I haven't seen this before. The directory names got longer, and unix sockets have a very restricted length on darwin. Some software does not expect a long(er) TMPDIR and may not handle that correctly, leading to undefined/strange behavior. Although Is each node in this chain of directories that makes up TMPDIR readable (+rx) by the sandboxed build process? If not, would it be ok to make it readable only by the build user? This is slightly less secure, but might be ok. This could probably be fixed on either side, Nix or yarn. Could you open an issue on the https://github.com/NixOS/nix repo for the regression? It'd help to get more eyes on this. (I'd move the issue if it was clearly one or the other, fwiw) Another practical note: @thufschmitt has changed jobs and isn't contributing actively to the Nix/NixOS ecosystem anymore. |
I have the same issue on Linux. There is nothing really suspicous in
|
As I don't see it explicitly named: I think it is definitely not yarn only (pnpm is shown in the prev. comment). I had observed similar issue, when trying to build stalwart-mail.webadmin when trying to reproduce a recent build failure. I was running a maybe 1-2 weeks old nixos-unstable. |
@roberth thanks for chiming in. This is a non darwin issue. As it is only present when the code is executed on an non Also, even worse, when trying to build |
AFAIS, yes.
done |
Also seeing this on a x86-64 linux machine running hydra, the command |
Same here on my x86-64 linux development VM. I did a |
@datafoo when was your last known good commit? |
I investigated further and I narrowed it down to something between these commits: broken 4c2fcb0 2024-10-18 1809433 Tested as the input for a NixOS VM with a fixed I haven't found an easy culprit with |
On my system, manually (as in: typing it into my terminal) running My system is running on nixpkgs commit 807e915. Edit: Steps to reproduce (at least on my machine):
I kept running
|
Possibly related: Does downgrading to npm 10.3.0 work for you? |
Bun and Deno seem to not be affected. |
@Garmelon I think this is an unrelated bug. What I described here is a bug in a build process from nix, which always uses the same |
This is a kernel bug, specifically with io_uring. Edit: source: https://lore.kernel.org/io-uring/2024110620-stretch-custodian-0e7d@gregkh/T/#u |
The last generation that works for me is 310: $ nixos-version
24.05.5562.1bfbbbe5bbf8 (Uakari)
$ nix --version
nix (Nix) 2.18.8
$ which nix
/run/current-system/sw/bin/nix
$ l /run/current-system/sw/bin/nix
lrwxrwxrwx 1 root root 62 1970-01-01 01:00 /run/current-system/sw/bin/nix -> /nix/store/x6b4rr799djkf8a2abwf59fadcbyasc1-nix-2.18.8/bin/nix
$ uname --all
Linux redacted 6.6.54 #1-NixOS SMP PREEMPT_DYNAMIC Fri Oct 4 14:30:05 UTC 2024 x86_64 GNU/Linux
The first generation that fails for me is 311: $ nixos-version
24.05.6122.080166c15633 (Uakari)
$ nix --version
nix (Nix) 2.18.8
$ which nix
/run/current-system/sw/bin/nix
$ l /run/current-system/sw/bin/nix
lrwxrwxrwx 1 root root 62 1970-01-01 01:00 /run/current-system/sw/bin/nix -> /nix/store/ikj1h47p1msvkg7nbyqxabk14n75pfwj-nix-2.18.8/bin/nix
$ uname --all
Linux redacted 6.6.58 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 22 13:46:36 UTC 2024 x86_64 GNU/Linux
Observations:
|
I'm using this as a workaround for now: boot.kernelPackages = pkgs.linuxPackages_latest; |
Thanks! I'm bisecting right now (which takes a long time), but since I'm rebuilding the kernel right now, this sounds about right! |
Bisect is done:
So, yes, this is a kernel bug |
@K900 shall we revert this specific commit? |
No, a proper fix should be in the next batch of stable kernels. |
Ok. I'll keep the issue open until the next batch landed and for everyone looking for a quick fix #353709 (comment) seems like a viable option (or reverting, or not updating) |
Do you have a |
We don't have |
It's hardcoded in yarn berry: https://github.com/yarnpkg/berry/blob/f59bbf9f3828865c14b06a3e5cc3ae284a0db78d/packages/yarnpkg-core/sources/nodeUtils.ts#L26 Btw if you are not using yarn berry (which is 95% of the users of yarn) then getReport is not used at all either |
What irks me is that this issue mixes up apparent darwin sandboxing issues and linux kernel regressions into one, because they both block the build in a similiar fashion apparently. |
Its not darwin sandbox issues, but linux and the linux kernel regression is the root cause. I closed the |
For completeness, there's also nodejs/node#55587 |
This occurs in a pnpm build in modrinth-app. |
+1 occurs in |
No reason to report back further until you run at least Kernel 6.6.60 or 6.12.5 |
As far as I understand, the bug is not present in non-lts kernels, i.e. 6.11 or later. It should hopefully be fixed in lts kernels 6.1.116 and 6.6.60. |
thanks i added latestet kernel to my config and I have no issues ! |
A comment on this workaround: This only works for non ZFS configs. The zfs config (at least the stable ones) rely on a kernel <= 6.11 so the only workaround here is a revert/rollback until 6.6.60 is released |
Wow thanks for this, I was just fighting with my config trying to make this work. I see zfs_unstable supports 6.11 now, but when I tried to enable that on a test machine:
I'm not sure why it's pulling in 2.2.6 when zfs_unstable is 2.3, but anyways that is off-topic here probably. Any not all systems can use zfs unstable.
Rolling back to LTS 6.1 isn't an option unfortunately. Do I understand right kernel 6.6.56 is the last-known-good? What would be the proper way to revert/rollback to 6.6.56? |
The way I did it, was to selectively revert the update commits on my own
And of course you can revert your NixOS config to an older generation which worked and wait for the kernel update |
In my case (laptop, 6.6.59) shutdown is fine but delayed, (deep) sleep does not appear to resume. Thanks, everyone. |
As said before this is happening with pnpm not just yarn. |
Kernel 6.6.60 is merged now with a fix. |
critical security fix for Sharkey; but first kernel must be upgraded because of NixOS/nixpkgs#353709 which caused me pain for WEEKS. i had pnpm running for dozens of days! at one point, four instances at once! awful pain. thankfully fixed now i could just upgrade linux lol
Update:
The root cause is due to a linux kernel regression on the build system. The affected kernels are:
and the first one landed in nix in 0e4c64f
This kernel regression causes
npm
andyarn
to hang and be unkillable.It has nothing to do with nix sandboxing, as I've first suspected.
Describe the bug
Currently
yarn install
hangs at the steplinking dependencies...
Steps To Reproduce
Steps to reproduce the behavior:
1.Try to build
pgadmin4
on master2.Wait for
linking dependencies...
3. ...
or just run
nix build github:nixos/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2#pgadmin --rebuild
Expected behavior
yarn install
should continue with the install processAdditional context
I've noticed this issue on an unrelated small bugfix in
pgadmin4
which caused a rebuild, which did not work. (#353092). Ofborg worked just fine, which is why I merged this small fix, but the package never did build on my system. Neither does it currently on hydra (See e.g. https://hydra.nixos.org/build/277185860/nixlog/1)I'm not sure what changed, since nothing substantially changed on the package. I've also tried to re-run the update script which resulted in exactly the same
yarn.lock
.Running
strace
orlsof
did not result in any trace of the issue.Also, interestingly, running
--check
on an oldernixos-unstable
pgadmin4
derivation fails to build at the same step.Is there anything in the nix builder, which changed sandbox or build behavior which stalled
yarn
?I've looked at NixOS/nix#10312 which changed stuff related to the sandbox and found an old unpatched nix version in
24.05
(which is running nix version 2.18.2 which according to GHSA-q82p-44mg-mgh5 hasn't been fixed, yet) and it does compile the currentpgadmin4
just fine!This does not work with a patched nix version (doesn't matter whether its 2.18.4 or newer)
So the patch to fix the build-dir seems to have broken at least pgadmin.
Notify maintainers
@roberth
Metadata
Add a 👍 reaction to issues you find important.
The text was updated successfully, but these errors were encountered: