Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/dhcpcd: fix race between namespace setup and resolvconf #348305

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

rnhmjoj
Copy link
Contributor

@rnhmjoj rnhmjoj commented Oct 13, 2024

systemd requires paths in ReadWritePaths= to exist before setting up the service sandbox and there is apparently no way to control this with the usual After=, Wants= etc.
Instead, we have to mark all the paths as optional, even if they're not, then manually check if they exist, fail if they don't and wait for the service to be restarted.

so dhcpcd should be ordered after resolvconf. Making resolvconf a oneshot service ensure After=resolvconf.service works correctly.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested via dhcpcd.tests and --rebuild a few times
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Oct 13, 2024
@mweinelt mweinelt added the 1.severity: channel blocker Blocks a channel label Oct 13, 2024
@arianvp
Copy link
Member

arianvp commented Oct 13, 2024

and there is apparently no way to control this with the usual After=, Wants= etc.

I think the bug here is that resolvconf.service incorrectly uses Type=simple.

https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/config/resolvconf.nix#L141

Type=simple are considered "started up" after fork(). At that point the file doesn't exist yet. Ordering on Type=simple and Type=exec units is mostly non-sensical

Ordering only works on Type=notify, Type=forking and Type=oneshot. It looks to me like resolvconf.service could be a Type=oneshot? Then you're usre that the file exists at the moment it is considered "started up"

@uninsane
Copy link
Contributor

systemd requires paths in ReadWritePaths= to exist before setting up the service sandbox

that's not just a systemd thing, that's a linux namespace thing. i believe that even with this fix, if another service were to edit /etc/resolv.conf by, say, ln -sf /etc/NetworkManager/resolv.conf /etc/resolv.conf, that wouldn't propagate into your namespace since it's an operation on the /etc directory -- not the /etc/resolv.conf directory entry.

the /etc directory is secured by user permissions, so it should be safe to just bind ReadWritePaths=/etc?

@rnhmjoj
Copy link
Contributor Author

rnhmjoj commented Oct 13, 2024

Ordering only works on Type=notify, Type=forking and Type=oneshot. It looks to me like resolvconf.service could be a Type=oneshot? Then you're usre that the file exists at the moment it is considered "started up"

Yes, it seems the correct solution. Thank you.

the /etc directory is secured by user permissions, so it should be safe to just bind ReadWritePaths=/etc?

Yes, but probably then /run/resolvconf would be the next to trigger the error, and I don't think giving access to all of /run is ideal.

@@ -210,7 +210,8 @@ in
{ description = "DHCP Client";

wantedBy = [ "multi-user.target" ] ++ lib.optional (!hasDefaultGatewaySet) "network-online.target";
wants = [ "network.target" ];
wants = [ "network.target" ] ++ lib.optional useResolvConf "resolvconf.service";
after = lib.optional useResolvConf "resolvconf.service";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shouldn't need to be optional since resolvconf.service just won't exist if disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok, done.

systemd requires paths in `ReadWritePaths=` to exist before setting up
the service sandbox, so dhcpcd should be ordered after resolvconf.
Making resolvconf a oneshot service ensure `After=resolvconf.service`
works correctly.
@vcunat
Copy link
Member

vcunat commented Oct 14, 2024

For me the test is still hanging, I'm afraid. I tried a couple times on two different x86_64 machines.
@ofborg test networking.networkd.dhcpSimple

@vcunat
Copy link
Member

vcunat commented Oct 14, 2024

@ofborg test networking.networkd.dhcpSimple

@rnhmjoj
Copy link
Contributor Author

rnhmjoj commented Oct 14, 2024

It should be networking.scripted.dhcpSimple, not networking.networkd.dhcpSimple.

@vcunat
Copy link
Member

vcunat commented Oct 14, 2024

Well, the scripted ones do pass on Hydra. It's the networkd ones that block the channel. See e.g.
https://hydra.nixos.org/build/274771827#tabs-constituents

@rnhmjoj
Copy link
Contributor Author

rnhmjoj commented Oct 14, 2024

Ah, so it has nothing to do with dhcpcd! It's the test grep -q 2001:db8::1 /etc/resolv.conf I added that fails for networkd. It seems networkd is ignoring the RDNSS from the router advertisement, it should support that, in theory:

       UseDNS=
           When true (the default), the DNS servers received in the Router Advertisement will be used.

           This corresponds to the nameserver option in resolv.conf(5).

           Added in version 231.

@rnhmjoj
Copy link
Contributor Author

rnhmjoj commented Oct 14, 2024

@vcunat: you can revert bad5251 to quickly unblock the channel.
This should be investigated, though.

vcunat added a commit that referenced this pull request Oct 14, 2024
This reverts commit bad5251.

#348305 (comment)
Should've known that commit starting with `bad` will be no good.
Fixes nixosTests.networking.networkd.dhcpSimple
https://hydra.nixos.org/build/274843085/nixlog/8/tail
@vcunat vcunat removed the 1.severity: channel blocker Blocks a channel label Oct 14, 2024
@vcunat
Copy link
Member

vcunat commented Oct 14, 2024

bad commit!

All other channel-critical tests succeeded on the previous eval, so I hope we'll good now 🤞🏽

@vcunat vcunat changed the title dhcpcd: fix race between namespace setup and resolvconf nixos/dhcpcd: fix race between namespace setup and resolvconf Oct 14, 2024
@rnhmjoj
Copy link
Contributor Author

rnhmjoj commented Oct 14, 2024

(Note: the race condition is still a thing, but it's not blocking the channel)

ners pushed a commit to ners/nixpkgs that referenced this pull request Oct 14, 2024
This reverts commit bad5251.

NixOS#348305 (comment)
Should've known that commit starting with `bad` will be no good.
Fixes nixosTests.networking.networkd.dhcpSimple
https://hydra.nixos.org/build/274843085/nixlog/8/tail
@rnhmjoj
Copy link
Contributor Author

rnhmjoj commented Oct 14, 2024

Ok, it seems networking.useNetworkd = true sneakily enables systemd-resolved, which hijacks /etc/resolv.conf with the stub resolver. For this reason the upstream nameservers are never written to this file.

@rnhmjoj rnhmjoj merged commit 35618d0 into NixOS:master Oct 14, 2024
36 of 38 checks passed
yuanwang-wf pushed a commit to yuanwang-wf/nixpkgs that referenced this pull request Oct 17, 2024
This reverts commit bad5251.

NixOS#348305 (comment)
Should've known that commit starting with `bad` will be no good.
Fixes nixosTests.networking.networkd.dhcpSimple
https://hydra.nixos.org/build/274843085/nixlog/8/tail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants