Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet failed to create "/system" cgroup #35214

Closed
yujuhong opened this issue Oct 20, 2016 · 8 comments
Closed

kubelet failed to create "/system" cgroup #35214

yujuhong opened this issue Oct 20, 2016 · 8 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@yujuhong
Copy link
Contributor

yujuhong commented Oct 20, 2016

On a debian-based containervm node, kubelet no long creates the "/system" cgroup and moves processes into it.

From kubelet.log:

W1020 16:16:51.129436    3552 container_manager_linux.go:491] [ContainerManager] Failed to ensure state of "/system": [failed to list PIDs for root: open cgroup.procs: no such file or directory, ran out of attempts to create system containers "/system"]

/cc @kubernetes/sig-node

@yujuhong yujuhong added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. labels Oct 20, 2016
@yujuhong
Copy link
Contributor Author

By the way, we have poor to no test coverage on features like this.

@derekwaynecarr
Copy link
Member

isnt that running systemd? you shouldn't do any of this anymore on systemd nodes.

@yujuhong
Copy link
Contributor Author

@derekwaynecarr there are two bugs (I didn't include enough information in the original issue):

  1. From what I observed, kubelet still tries to create this on the new GCI image (with non-init systemd), and fails repeatedly. Kubelet should not try to create this at all.
  2. On the old debian-based container vm image (i.e., no systemd) kubelet should create this cgroup, but it fails to do so.

@yujuhong
Copy link
Contributor Author

@dchen1107 should we fix this for 1.5?

@vishh
Copy link
Contributor

vishh commented Oct 27, 2016

FYI: #35319 should fix the GCI
aspects of this issue.

On Thu, Oct 27, 2016 at 11:35 AM, Yu-Ju Hong [email protected]
wrote:

@dchen1107 https://github.com/dchen1107 should we fix this for 1.5?


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#35214 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKLEzCKZmZGcEycX2nIJfmtDDzEnMks5q4O8BgaJpZM4KcX6U
.

@yujuhong yujuhong added this to the v1.5 milestone Nov 1, 2016
@yujuhong
Copy link
Contributor Author

yujuhong commented Nov 9, 2016

This regression is caused by a change in libcontainerd: opencontainers/runc#1013
The old code calls getCgroupPath, but the new code relies on Manager.Paths being populated beforehand.

I looked around a little bit and we have a CgroupManager in kubelet that wraps around the library, but the Pids() method it provides seems to be recursive (which is probably not what we need). I am not familiar with the the code and not sure whether we should reuse any of the existing code. @derekwaynecarr and @vishh will have better suggestions.

@vishh
Copy link
Contributor

vishh commented Nov 9, 2016

I can take a look at this today.

On Wed, Nov 9, 2016 at 12:25 PM, Yu-Ju Hong [email protected]
wrote:

This regression is caused by a change in libcontainerd:
opencontainers/runc#1013
opencontainers/runc#1013
The old code calls getCgroupPath, but the new code relies on Manager.Paths
being populated beforehand.

I looked around a little bit and we have a CgroupManager
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cgroup_manager_linux.go
in kubelet that wraps around the library, but the Pids()

func (m *cgroupManagerImpl) Pids(name CgroupName) []int {

method it provides seems to be recursive (which is probably not what we
need). I am not familiar with the the code and not sure whether we should
reuse any of the existing code. @derekwaynecarr
https://github.com/derekwaynecarr and @vishh https://github.com/vishh
will have better suggestions.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#35214 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKKjTpR-d5dVCyfC3X0oq4l1TrcQoks5q8iwwgaJpZM4KcX6U
.

@timstclair
Copy link

Spoke with Vish, I'll take this. I'm going to try and get a patch by tomorrow that we can cherry-pick into 1.4.6 (cc @jessfraz )

@timstclair timstclair assigned timstclair and unassigned dchen1107 Nov 9, 2016
k8s-github-robot pushed a commit that referenced this issue Nov 10, 2016
Automatic merge from submit-queue

Fix getting cgroup pids

Fixes #35214, #33232

Verified manually, but I didn't have time to run all the e2e's yet (will check it in the morning).

This should be cherry-picked into 1.4, and merged into 1.5 (/cc @saad-ali )

```release-note
Fix fetching pids running in a cgroup, which caused problems with OOM score adjustments & setting the /system cgroup ("misc" in the summary API).
```

/cc @kubernetes/sig-node
k8s-github-robot pushed a commit that referenced this issue Dec 3, 2016
Automatic merge from submit-queue (batch tested with PRs 37692, 37785, 37647, 37941, 37856)

Verify misc container in summary test

Should detect issue from #35214, #37453

/cc @piosz @dchen1107
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants