-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datastore: (using NDB) Linear memory usage consumption when using fetch_page #752
Comments
This sounds alarming; we use Does the |
Hi, I'll try to address some of your questions to the best of my knowledge !
We actually experienced this bug on our productions servers; we use "B8" instances; which hold 2048MB of memory. We initially thought our code was simply not optimized enough, but with this snippet above we do reproduce the "soft memory exceeded" error. Our application usually runs such long loops as Cloud Tasks. In that configuration, we managed to get around this bug by using ndb cursors. The function used in Cloud Tasks only processes a few loops at a time; and triggers another execution of itself with an updated cursor to let the instance clean its memory, and start again fresh.
I could'nt confirm how python 2 / legacy ndb behaves; we're currently using python 3.10 and google-cloud-ndb v1.11.1; which isn't legacy ndb.
As I mentionned, this issue occured on real B8 servers; so we did experienced soft memory being exceeded in that sort of cases.
I have tried to include your suggestion of investigating more into the GC stats. I must admit I have very little experience with GC; let alone GC Stats. import gc
import os
import psutil # https://pypi.org/project/psutil/
from google.cloud import ndb
from rsa.prime import gcd
from app.models import Enterprise
print(ndb.__version__)
def memory_usage():
pid = os.getpid()
process = psutil.Process(pid)
return process.memory_info().rss / 10 ** 6
with ndb.Client().context():
more = True
cursor = None
standard_usage = memory_usage()
while more:
# 'Enterprise' is rather a big entity in our datastore, which makes the memory consumption increase more visible
result, cursor, more = Enterprise.query().fetch_page(200, start_cursor=cursor, use_cache=False, use_global_cache=False)
print("memory: ", memory_usage() - standard_usage)
print(f"{gc.get_stats()}")
gc.collect() And the output of the program is attached . I'll let you investigate the GC stats and hopefully draw some conclusions. Without prior knowledge, nothing is Hope this clarifies some your interogations; and makes it easier to investigate that issue for maintainers. |
Thanks for responding! We do things like this all the time in tasks running on f1-micro instances and it works fine, memory gets freed up along the way because nothing is stored in instance cache. So if you're having memory issues fetching pages with 2GB of memory available, it seems like a pretty major regression. Re. Python 2 ndb, if I'm not mistaken this library was made mainly to coax developers off the old appengine standard runtime. At some point, messages started appearing in issues about how development was stopping on this and no further support, Google started recommending against using it for new projects, and finally seemed to give in and just let developers continue using (for now, anyway) most of the legacy appengine services, including the ndb library, on Python 3 using this: https://github.com/GoogleCloudPlatform/appengine-python-standard I'd be curious to know if the ndb support using the legacy bundled services sdk exhibits this issue for you. As it is, and without any comment from the team on it, it's kind of a red flag for us since we know our instances would just run out of memory right away if this is bug exists. If legacy bundled services support goes away at some point, using this instead wouldn't really be an option. |
In certain circumstances, we were not respecting use_cache for queries, unlike legacy NDB, which is quite emphatic about supporting them. (See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187) In googleapis#613 we tried to match legacy NDB behavior by updating the cache using the results of queries. We still do that, but now we respect use_cache, which was a valid keyword argument for Query.fetch() and friends, but was not passed down to the context cache when needed. As a result, the cache could mysteriously accumulate lots of memory usage and perhaps even cause you to hit memory limits, even if it was seemingly disabled and it didn't look like there were any objects holding references to your query results. This is a problem for certain batch-style workloads where you know you're only interested in processing a certain entity once. Fixes googleapis#752
In certain circumstances, we were not respecting use_cache for queries, unlike legacy NDB, which is quite emphatic about supporting them. (See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187) In googleapis#613 we tried to match legacy NDB behavior by updating the cache using the results of queries. We still do that, but now we respect use_cache, which was a valid keyword argument for Query.fetch() and friends, but was not passed down to the context cache when needed. As a result, the cache could mysteriously accumulate lots of memory usage and perhaps even cause you to hit memory limits, even if it was seemingly disabled and it didn't look like there were any objects holding references to your query results. This is a problem for certain batch-style workloads where you know you're only interested in processing a certain entity once. Fixes googleapis#752
Hi, I'm working on a PR to fix this issue. Thanks for bringing it up. In the meantime, you can try the following workaround of explicitly setting a never-cache policy: Instead of with ndb.Client().context(): you can do: with ndb.Client().context(cache_policy=lambda l: False) |
In certain circumstances, we were not respecting use_cache for queries, unlike legacy NDB, which is quite emphatic about supporting them. (See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187) In googleapis#613 we tried to match legacy NDB behavior by updating the cache using the results of queries. We still do that, but now we respect use_cache, which was a valid keyword argument for Query.fetch() and friends, but was not passed down to the context cache when needed. As a result, the cache could mysteriously accumulate lots of memory usage and perhaps even cause you to hit memory limits, even if it was seemingly disabled and it didn't look like there were any objects holding references to your query results. This is a problem for certain batch-style workloads where you know you're only interested in processing a certain entity once. Fixes googleapis#752
Giving some more thoughts to this cache management issue, it seems to me that taking the The local cache as it is today is just a dict object, to which entities are added on If it is a good idea, I could give it a try. |
Hi @rwhogg, Thank you for your response and your work on this issue. Just to let you know that your suggested fix :
does the trick for me ! The maximum memory usage reached is 72 Mo, which is expected in our case. |
In certain circumstances, we were not respecting use_cache for queries, unlike legacy NDB, which is quite emphatic about supporting them. (See https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/59cb209ed95480025d26531fc91397575438d2fe/ndb/query.py#L186-L187) In #613 we tried to match legacy NDB behavior by updating the cache using the results of queries. We still do that, but now we respect use_cache, which was a valid keyword argument for Query.fetch() and friends, but was not passed down to the context cache when needed. As a result, the cache could mysteriously accumulate lots of memory usage and perhaps even cause you to hit memory limits, even if it was seemingly disabled and it didn't look like there were any objects holding references to your query results. This is a problem for certain batch-style workloads where you know you're only interested in processing a certain entity once. Fixes #752
We were recently working on large scale migration batches (on ~550.000 entities) using NDB on App Engine. As we are limited to 2048MB of ram, memory consumption is a big subject when doing this kind of things.
When trying to lower it, I noticed something that I found an unexpected (in my opinion) behaviour of
fetch_page
.I would have expected the memory consumption to be close to a plateau, as the new entities will weight approx. the same as the previous ones. I found that it instead grows linearly until it ends, even without processing the results at all.
Environment details
Code example
Output
I truncated the output, as it loops ~400 times. Using the best App Engine servers (B8) we would not be able to finish that script.
The text was updated successfully, but these errors were encountered: