Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: properly configure default heap limits #25576

Closed
wants to merge 1 commit into from

Conversation

ofrobots
Copy link
Contributor

@ofrobots ofrobots commented Jan 18, 2019

Unless configured, V8 defaults to limiting the max heaps size to 700 MB
or 1400MB on 32 and 64-bit platforms respectively. This default is
based on the browser use-cases and doesn't make a lot of sense
generally. This change properly configures the heap size based on
actual available memory.

This should reduce the number of instances where we run out of memory processing larger data-sets. It is still possible to pass --max-old-space-size to use a different limit.

CI: https://ci.nodejs.org/job/node-test-pull-request/20203/
CI: https://ci.nodejs.org/job/node-test-pull-request/20310/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Jan 18, 2019
@ofrobots ofrobots requested a review from hashseed January 18, 2019 22:40
@mhdawson
Copy link
Member

What will this do if the total memory is small like something in a 128M cloud container?

I think I've seen Node.js be able to have heaps which are bigger than the available memory (which with a default of 1.4G could often be the case for containers) and things still run ok because some parts are paged out.

I'm wondering if this might cause some of those deployments to start failing?

@mhdawson
Copy link
Member

Also do you have a quick overview of how the size of the heap is configured based on the total memory. For example if I have 32GB of memory is it one for one so that its x% of 32G. If so what is x%?

@mhdawson
Copy link
Member

We probalby need to add at least a doc update somewhere that explains the behaviour change that people will see as part of this PR.

Copy link
Member

@joyeecheung joyeecheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewIsolate is exposed to embedders, do they have any issues with the new default?

@ofrobots
Copy link
Contributor Author

@mhdawson

What will this do if the total memory is small like something in a 128M cloud container?
Also do you have a quick overview of how the size of the heap is configured based on the total memory. For example if I have 32GB of memory is it one for one so that its x% of 32G. If so what is x%?

The details of how heap sizes this is computed presently by following the logic in ResourceConstraints::ConfigureDefaults. Right now, V8 still seems to have a hard-coded ceiling of 2GB on the old generation size.

size_t Heap::ComputeMaxOldGenerationSize(uint64_t physical_memory) {
  const size_t old_space_physical_memory_factor = 4;
  size_t computed_size = static_cast<size_t>(physical_memory / i::MB /
                                             old_space_physical_memory_factor *
                                             kPointerMultiplier);
  return Max(Min(computed_size, HeapController::kMaxSize),
             HeapController::kMinSize);
}

There is a floor of 256MB and a ceiling of 2048MB.

To summarize the behavior change, given current implementation of V8 (restricting discussion to 64-bit builds):

  • Machines with less than <512MB RAM will configure old space to be 256MB instead of 1400MB. This is a behavior change but this is arguably better than using an old space 10x larger than total physical memory on a 128MB machine.
  • Machines with between 512MB - 2.8GB RAM will see a reduction of max old space size. This is a behavior change that we should probably think more about.
  • Machines with more than 2.8GB RAM will see an increase of the old space size up to a ceiling of 2048MB. This is fine, although the hard-coded ceiling of 2GB is too low.

With this spelled out, it is worth pausing and thinking about this more deeply. I do think we need change the default on large machines. I routinely see workloads where the job fails with 'out of memory' when there clearly is enough memory available.

Note that this touches on the implementation details of the Heap in V8 which will likely change as V8 changes. My aim here was to provide V8 more information to configure the heap sizes more intelligently. Node.js unilaterally deciding the memory limits is likely not a good solution. Nor do we expect that default value hard-coded in V8 would match our use-cases.

I'll reflect upon this more, and if people have opinions, it would be great to hear them.

/cc @nodejs/v8.

@ofrobots ofrobots added the wip Issues and PRs that are still a work in progress. label Jan 19, 2019
src/node.cc Outdated
@@ -1813,6 +1813,14 @@ bool AllowWasmCodeGenerationCallback(
Isolate* NewIsolate(ArrayBufferAllocator* allocator, uv_loop_t* event_loop) {
Isolate::CreateParams params;
params.array_buffer_allocator = allocator;

double totalMemory = uv_get_total_memory();
if (totalMemory > 0) {
Copy link
Contributor

@refack refack Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. is it safe to continue if totalMemory < 0 i.e. uv_get_total_memory returned an error?
  2. semver-ity (in continuation of discussion thread). Maybe this should be major and so we can think less about nuanced implications?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. does this have priority over CLI parameters (e.g. --max-old-space-size)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. does this have priority over CLI parameters (e.g. --max-old-space-size)?

No, the CLI flags take precedence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it safe to continue if totalMemory < 0 i.e. uv_get_total_memory returned an error?

I believe so. For whatever reason (security policy?) we don't know how much memory is available, but that doesn't imply memory allocation is going to fail.

@refack
Copy link
Contributor

refack commented Jan 19, 2019

  • Machines with less than <512MB RAM will configure old space to be 256MB instead of 1400MB. This is a behavior change but this is arguably better than using an old space 10x larger than total physical memory on a 128MB machine.
  • Machines with between 512MB - 2.8GB RAM will see a reduction of max old space size. This is a behavior change that we should probably think more about.
  • Machines with more than 2.8GB RAM will see an increase of the old space size up to a ceiling of 2048MB. This is fine, although the hard-coded ceiling of 2GB is too low.

Maybe we can code these limitation here (and keep this < semver-minor)? Or alternatively mark this as semver-major?

@refack refack added the v8 engine Issues and PRs related to the V8 dependency. label Jan 19, 2019
Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a counterargument but for your consideration:

One reason we never changed the defaults so far (it's come up a few times) is that node is often used in a multi-process setup. With that kind of workload you don't want to use physical memory as an input signal because you'll overcommit.

(It speaks in this PR's favor that it still has an upper limit in place so it's unlikely it'll go completely runaway.)

src/node.cc Outdated
@@ -1813,6 +1813,14 @@ bool AllowWasmCodeGenerationCallback(
Isolate* NewIsolate(ArrayBufferAllocator* allocator, uv_loop_t* event_loop) {
Isolate::CreateParams params;
params.array_buffer_allocator = allocator;

double totalMemory = uv_get_total_memory();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: total_memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/node.cc Outdated
@@ -1813,6 +1813,14 @@ bool AllowWasmCodeGenerationCallback(
Isolate* NewIsolate(ArrayBufferAllocator* allocator, uv_loop_t* event_loop) {
Isolate::CreateParams params;
params.array_buffer_allocator = allocator;

double totalMemory = uv_get_total_memory();
if (totalMemory > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. does this have priority over CLI parameters (e.g. --max-old-space-size)?

No, the CLI flags take precedence.

@ofrobots
Copy link
Contributor Author

With that kind of workload you don't want to use physical memory as an input signal because you'll overcommit

Setting the max old space limit is not necessarily the same as over-committing. It just means that we now allow old space can grow to that value if the processes really needs to consume that much memory. The cases where I expect the behavior changes is when we run memory hungry processes on an under-provisioned machine. In such a scenario, today, things would run out of memory and crash. We would change the behavior so that the programs run for a bit longer (on swap) before crashing. IMO, we are changing an error outcome to a slightly different error outcome. And, as you observe also, the ceiling of 2GB limits the runaway case for leaky/buggy programs.

@ofrobots ofrobots force-pushed the default-max-heap-size branch from 93966b2 to 0cb38e2 Compare January 23, 2019 02:04
@ofrobots
Copy link
Contributor Author

ofrobots commented Jan 23, 2019

Maybe we can code these limitation here (and keep this < semver-minor)? Or alternatively mark this as semver-major?

I think the new behavior is more sensible, specially on low-memory machines. I'm fine with calling this a semver-major.

@ofrobots ofrobots added semver-major PRs that contain breaking changes and should be released in the next major version. and removed wip Issues and PRs that are still a work in progress. labels Jan 23, 2019
@ofrobots
Copy link
Contributor Author

I am going to land this tomorrow unless objections show up. /cc @mhdawson

@ofrobots
Copy link
Contributor Author

ofrobots commented Jan 30, 2019

@Trott
Copy link
Member

Trott commented Jan 31, 2019

@ofrobots
Copy link
Contributor Author

AIX flaked. Resume build: https://ci.nodejs.org/job/node-test-pull-request/20482/

@ofrobots ofrobots added notable-change PRs with changes that should be highlighted in changelogs. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Jan 31, 2019
@ofrobots
Copy link
Contributor Author

AIX seems to be having infrastructure issues - Java errors. Another resume build: https://ci.nodejs.org/job/node-test-pull-request/20490/

@gruckion
Copy link

  • Machines with between 512MB - 2.8GB RAM will see a reduction of max old space size. This is a behavior change that we should probably think more about.

We are not encountering an issue, Docker by default uses 2048 MB of memory. So our local builds no longer work. We now have to manually set max-old-space-size or increase Docker's resources.

@filipesilva
Copy link

@ofrobots regarding your comment:

I'll reflect upon this more, and if people have opinions, it would be great to hear them.

In Angular CLI one of the most common problems we have is out of memory errors. Undoubtedly we could do better on our side to use less memory, but it's always surprising for our users to hit a hardcoded memory limit when their development and CI machines have a lot more available memory than that.

We were delighted to hear that Node 12 would configure the limit based on available memory (as described in the blog post), so we told our users they should update to Node 12. But we weren't aware this still meant there was a new static limit. And although the new limit is still around 50% more than before, it looks like Node 12 uses ~30% more memory than Node 10. So users with bigger projects started having memory problem just from moving to Node 12.

@ofrobots
Copy link
Contributor Author

@filipesilva You're right. I've been working with the V8 team to get this fixed more comprehensively. Here's the V8 bug: https://bugs.chromium.org/p/v8/issues/detail?id=9306. There are now more ergonomic APIs for Node.js to use to configure the memory usage, and I expect we can start using these once these trickle into Node.

It would be great to have some example projects / repros that consume large heaps that we can experiment with. I'll look into the links you provided.

@filipesilva
Copy link

angular/angular-cli#13734 (comment) includes analysis of the repositories in https://github.com/filipesilva/angular-cli-perf-benchmark. You can find more detailed numbers there.

The scripts in that repository benchmark a number of projects using CircleCI. You can also run these benchmarks locally with e.g. ./benchmark-project.sh https://github.com/filipesilva/awesome-angular-workshop 9076a3d npm "ng build 5-ngrx-end --prod". You'll need to do npm install -g @angular/cli and npm install -g ./angular-devkit-benchmark-0.800.0-beta.18.tgz first though (in CI the global-setup.sh does that).

We've also been trying hard to reduce our memory usage on bigger projects but it's hard to do so, as the Chrome devtools for Node crashes when gathering CPU profiles and Heap timelines. This makes it harder to figure out where memory/cpu time is being spent.

@ofrobots
Copy link
Contributor Author

ofrobots commented Jun 25, 2019

@filipesilva

as the Chrome devtools for Node crashes when gathering CPU profiles and Heap timelines.

You may want to use a system with fewer moving parts to get the profiels, for example: npm.im/pprof is a node module that can gather (sampling) heap profiles and CPU profiles for you programmatically without involving DevTools or the Chrome DevTools Protocol.

@filipesilva
Copy link

@ofrobots I did not know of pprof, trying it out right now and I like the data I'm getting!

Capturing the CPU profile using the node pprof was a breeze. The browser visualization for the CPU profile seems to fail to render for view->top with a Uncaught TypeError: Failed to construct 'URL': Invalid URL console message, and view->flamegraph fails with the same plus Uncaught RangeError: Maximum call stack size exceeded. view->graph/peek/source seem to work. I need to spend some time looking at the information in here. Haven't yet tried the heap profile mode.

I had been trying the direct node inspector API for cpu profiles and heap snapshots, and then also tried modifying the API usage to take heap profiles in a similar way to your https://github.com/v8/sampling-heap-profiler. But chrome devtools crashes on heap snapshots larger that 2gigs so that doesn't help. Maybe pprof will handle these better.

Anyways thanks a bunch for the tip!

@kalyanac
Copy link

@ofrobots Thank you for opening google/pprof#475
We will debug the pprof issue there

@uyu423
Copy link

uyu423 commented Apr 6, 2020

@ofrobots I have some question.
According to the thread, it seems that up to 2G memory is allocated to one node.js process.
We have a huge Node.js service. And this service is being used by allocating 7G using max_old_space_size option. Is this option unnecessary now?
If the feature refers to available system memory, why isn't heap memory allocated more than 2G?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. notable-change PRs with changes that should be highlighted in changelogs. semver-major PRs that contain breaking changes and should be released in the next major version. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

Successfully merging this pull request may close these issues.