-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong memory ceiling in cgroup v2 environments #47259
Comments
This is a known issue (as you mention) and blocked on the next libuv release, tracked in libuv/libuv#3887. The release will likely happen in the next few weeks but keep in mind it takes a while to make its way into node's LTS lines. I'm going to go ahead and close this but I can convert it to a discussion if you have follow-up questions. |
That's great, thank you. We can keep it like this, that way we have somewhere to point people to that are encountering it as well. |
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 PR-URL: #48078 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
This wasn't failing on arm boxes, increase the `runInNewContext()` timeout a bit to make sure the code it's allowed to fail. PR-URL: #48078 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
A libuv `LICENSE-extra` file was added and a couple of files were removed (stdint-msvc2008.h and pthread-fixes.c). PR-URL: #48078 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 PR-URL: #48078 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
This wasn't failing on arm boxes, increase the `runInNewContext()` timeout a bit to make sure the code it's allowed to fail. PR-URL: #48078 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
A libuv `LICENSE-extra` file was added and a couple of files were removed (stdint-msvc2008.h and pthread-fixes.c). PR-URL: #48078 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
@targos Will this be released for Node 18 LTS as well? We are seeing same behaviour on Node 18 LTS too |
Do you think it can cause the Node also down in kubernetes cluster as no resources left on node to run system processes and then kubelet goes down which ultimately removes the node also from worker nodes group in cluster? |
I can't tell you for sure what happens in your specific k8s setup, but it's certainly possible that your pods using more memory has additional adverse effects that could impact your node and node pool. |
This wasn't failing on arm boxes, increase the `runInNewContext()` timeout a bit to make sure the code it's allowed to fail. PR-URL: #48078 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
A libuv `LICENSE-extra` file was added and a couple of files were removed (stdint-msvc2008.h and pthread-fixes.c). PR-URL: #48078 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
This wasn't failing on arm boxes, increase the `runInNewContext()` timeout a bit to make sure the code it's allowed to fail. PR-URL: nodejs#48078 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
A libuv `LICENSE-extra` file was added and a couple of files were removed (stdint-msvc2008.h and pthread-fixes.c). PR-URL: nodejs#48078 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 PR-URL: nodejs#48078 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
This wasn't failing on arm boxes, increase the `runInNewContext()` timeout a bit to make sure the code it's allowed to fail. PR-URL: nodejs#48078 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
A libuv `LICENSE-extra` file was added and a couple of files were removed (stdint-msvc2008.h and pthread-fixes.c). PR-URL: nodejs#48078 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 PR-URL: nodejs#48078 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
This wasn't failing on arm boxes, increase the `runInNewContext()` timeout a bit to make sure the code it's allowed to fail. PR-URL: nodejs#48078 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
A libuv `LICENSE-extra` file was added and a couple of files were removed (stdint-msvc2008.h and pthread-fixes.c). PR-URL: nodejs#48078 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: nodejs#43931 Fixes: nodejs#42496 Fixes: nodejs#47715 Fixes: nodejs#47259 Fixes: nodejs#47241 PR-URL: nodejs#48078 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 PR-URL: #48078 Backport-PR-URL: #49591 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 PR-URL: #48078 Backport-PR-URL: #49591 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
- linux: introduce io_uring support libuv/libuv#3952 - src: add new metrics APIs libuv/libuv#3749 - unix,win: give thread pool threads an 8 MB stack libuv/libuv#3787 - win,unix: change execution order of timers libuv/libuv#3927 Fixes: #43931 Fixes: #42496 Fixes: #47715 Fixes: #47259 Fixes: #47241 PR-URL: #48078 Backport-PR-URL: #49591 Reviewed-By: Ben Noordhuis <[email protected]> Reviewed-By: Mohammed Keyvanzadeh <[email protected]> Reviewed-By: Debadree Chatterjee <[email protected]> Reviewed-By: Luigi Pinca <[email protected]> Reviewed-By: Colin Ihrig <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Michaël Zasso <[email protected]> Reviewed-By: Rafael Gonzaga <[email protected]> Reviewed-By: Richard Lau <[email protected]> Reviewed-By: Tobias Nießen <[email protected]> Reviewed-By: Jiawen Geng <[email protected]>
I am seeing CPU throttling in my case. I already set |
I don't see how it could be related. This issue is only aboute Node's ability to set its heap size based on the environment. By the way, I'm happy to report that the fix has made it into recent versions of Node.js and we no longer need to set |
FYI, the v18.18.1 release (#50066) rolls back libuv to a version without the cgroupsv2 fixes. |
Well it was fun while it lasted :) Thanks for the heads-up, would've walked right into that one otherwise. |
Since #27508 Node.js is able to automatically limit its heap size based on cgroup limits instead of the host's physical memory. This is very important for containerized workloads, where we have many pods with smaller memory limits running on a node with a large amount of memory. Node.js has to consider the cgroup limit and not the amount of physical memory, in order to avoid allocating more than allowed and getting OOM killed.
This seems to have broken with cgroup v2, because
libuv
'suv_get_constrained_memory
does not support cgroup v2, at least in the currently released version 1.44.2. It has already been implemented in libuv/libuv#3744 in September 2022, but there hasn't been a release since.For us the issue manifested itself in many Node.js pods getting OOM killed after a Kubernetes cluster upgrade to 1.25 where cgroup v2 support is introduced. Because the solution depends on the next
libuv
release I don't expect a fix here, this report is intended more as a tracking issue for people affected by it.At the moment I can see several options:
--max-old-space-size
libuv
libuv
dependency)Version
v13.0.0 and upward
Platform
5.15.0-1034-azure
Subsystem
src/api/environment.cc
What steps will reproduce the bug?
I reproduced this by comparing the behavior on two AKS clusters, one on 1.24.9 (cgroup v1) and the other on 1.25.5 (cgroup v2). Both had a single node with 4 GiB memory. But you should be able to observe the behavior in any environment using cgroup v2, you can verify that is the case by running
stat -fc %T /sys/fs/cgroup/
which should outputcgroup2fs
.My repro:
600Mi
node -e "console.log(v8.getHeapStatistics())"
and check the outputHow often does it reproduce? Is there a required condition?
Always when using cgroup v2
What is the expected behavior? Why is that the expected behavior?
heap_size_limit
should be less than the cgroup maximum reported bycat /sys/fs/cgroup/memory.max
(v2) orcat /sys/fs/cgroup/memory/memory.limit_in_bytes
(v1). In my specific repro with the 600 MiB pod limit,heap_size_limit
was 312 MiB in the cgroup v1 environment.What do you see instead?
heap_size_limit
is more than the cgroup maximum reported bycat /sys/fs/cgroup/memory.max
, it seems influenced by the host's physical memory instead. In my specific repro with the 600 MiB pod limit,heap_size_limit
was 1.96 GiB in the cgroup v2 environment, which will eventually lead to the pod being OOM killed.The text was updated successfully, but these errors were encountered: