-
-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Astroid is leaking memory #1780
Comments
I did some investigation - I profiled memory using muppy: from pylint.lint import Run
from pylint.lint import pylinter
from pympler import muppy
from pympler import summary
Run(["pylint/checkers/variables.py"], exit=False)
pylinter.MANAGER.clear_cache()
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
for _ in range(15):
Run(["pylint/checkers/variables.py"], exit=False)
pylinter.MANAGER.clear_cache()
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1) I got following result:
Clearly majority of allocated data are dictionaries and lists... |
It's hard to know what is inside the dict/list without being able to explore the data. But it's probably part of the AST that are not cleaned up properly, right ? |
I am able to get to that information. If you run following script: from pylint.lint import Run
from pylint.lint import pylinter
from pympler import muppy
from pympler import summary
from pympler import asizeof
Run(["pylint/checkers/variables.py"], exit=False)
pylinter.MANAGER.clear_cache()
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
obj = []
for o in all_objects:
if not isinstance(o, dict):
continue
try:
# some dictionary items are raising exception when accessed
_ = str(o)
except Exception:
continue
obj.append(o) Please notify that cache was cleaned so, in theory the memory should not contain any extra astroid information about AST. You get a lot of following dictionaries: >>> pprint(obj[-5:])
[{'col_offset': 29,
'end_col_offset': 33,
'end_lineno': 1674,
'fromlineno': 1674,
'lineno': 1674,
'name': 'node',
'parent': <Attribute.name l.1674 at 0x1125b3490>,
'position': None},
{'col_offset': 37,
'end_col_offset': 42,
'end_lineno': 1436,
'fromlineno': 1436,
'lineno': 1436,
'name': 'nodes',
'parent': <Attribute.FunctionDef l.1436 at 0x112589fd0>,
'position': None},
{'col_offset': 46,
'end_col_offset': 59,
'end_lineno': 1876,
'fromlineno': 1876,
'lineno': 1876,
'name': 'default_names',
'parent': <Call l.1876 at 0x111e92fd0>,
'position': None},
{'col_offset': 64,
'end_col_offset': 68,
'end_lineno': 2105,
'fromlineno': 2105,
'lineno': 2105,
'name': 'node',
'parent': <Attribute.node_ancestors l.2105 at 0x11260cfa0>,
'position': None},
{'col_offset': 31,
'end_col_offset': 39,
'end_lineno': 2105,
'fromlineno': 2105,
'lineno': 2105,
'name': 'ref_node',
'parent': <Attribute.parent l.2105 at 0x11260ce80>,
'position': None}] >>> len(obj)
216564
>>> len([i for i in obj if 'position' in i])
160789
>>> len([i for i in obj if 'parent' in i])
161191 Could you help me understand where in the code base these dictionaries can be created? |
It does look like the AST, so probably |
Those dictionaries are the arguments that get passed to I tried to do some digging, but I really have no experience here. The only thing I could come up with is that something is sustaining a reference to a number of import gc
from io import StringIO
from pylint.lint import Run, pylinter
from pylint.reporters.text import TextReporter
from pympler import muppy, summary
from astroid import nodes
prev_nodes = 0
prev_dicts = 0
for _ in range(5):
count_nodes = 0
count_dicts = 0
Run(
["astroid/nodes/node_classes.py"], exit=False, reporter=TextReporter(StringIO())
)
pylinter.MANAGER.clear_cache()
gc.collect()
all_objects = muppy.get_objects()
for o in muppy.get_objects():
if isinstance(o, nodes.NodeNG):
count_nodes += 1
elif isinstance(o, dict):
count_dicts += 1
print()
print("Nodes INC:", count_nodes - prev_nodes)
prev_nodes = count_nodes
print("Nodes NEW:", prev_nodes)
print("Dicts INC:", count_dicts - prev_dicts)
prev_dicts = count_dicts
print("Dicts NEW:", prev_dicts) This shows that the increase in both node and dict objects is constant after initial startup. ❯ python test2.py
Nodes INC: 137841
Nodes NEW: 137841
Dicts INC: 68924
Dicts NEW: 68924
Nodes INC: 112234
Nodes NEW: 250075
Dicts INC: 51192
Dicts NEW: 120116
Nodes INC: 112234
Nodes NEW: 362309
Dicts INC: 51191
Dicts NEW: 171307
Nodes INC: 112234
Nodes NEW: 474543
Dicts INC: 51191
Dicts NEW: 222498
Nodes INC: 112234
Nodes NEW: 586777
Dicts INC: 51191
Dicts NEW: 273689 As far as my understanding of garbage collections goes this show that the persistence of these objects does not come from the garbage collection randomness but from actual references to these objects where they probably shouldn't be referenced. My first thought would be to make |
I think that's exactly what's going on. The child nodes are still holding references to the module in |
That would be a massive breaking change though... So not sure it is feasible. I would definitely want to help explore it though. |
Yep, 3.0 or nothing! ;) |
I think I ran into some issues with the |
One track to explore: we could make |
Sounds like it is worth exploring. Is that something you would want to build? |
Yep, I'm already seeing some promising results. Looking to do a sequence of small PRs. |
Here's a version of Daniel's script from #1780 (comment) that takes care to delete the references created in the measurements: import gc
from pylint.lint import Run
from pympler import muppy
from astroid import nodes
prev_nodes = 0
prev_dicts = 0
for _ in range(5):
count_nodes = 0
count_dicts = 0
Run(
["astroid/nodes/node_classes.py", '--clear-cache-post-run=y'], exit=False
)
gc.collect()
all_objects = muppy.get_objects()
for o in muppy.get_objects():
if isinstance(o, nodes.NodeNG):
count_nodes += 1
elif isinstance(o, dict):
count_dicts += 1
print()
print("Nodes INC:", count_nodes - prev_nodes)
prev_nodes = count_nodes
print("Nodes NEW:", prev_nodes)
print("Dicts INC:", count_dicts - prev_dicts)
prev_dicts = count_dicts
print("Dicts NEW:", prev_dicts)
del all_objects
del o |
Looks like we've solved the node leaking, just dicts are left: $ python3 memory.py
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Nodes INC: 102907
Nodes NEW: 102907
Dicts INC: 65091
Dicts NEW: 65091
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Nodes INC: 0
Nodes NEW: 102907
Dicts INC: 11472
Dicts NEW: 76563
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Nodes INC: 0
Nodes NEW: 102907
Dicts INC: 11460
Dicts NEW: 88023
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Nodes INC: 0
Nodes NEW: 102907
Dicts INC: 11460
Dicts NEW: 99483
--------------------------------------------------------------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Nodes INC: 0
Nodes NEW: 102907
Dicts INC: 11460
Dicts NEW: 110943 |
The remaining uses seem to derive from the custom caching decorator in astroid. Looking into converting those uses to just |
If we want to close this out, here are some next steps for 3.0. The two remaining offenders are Unfortunately, there's some hacking around So, I suggest:
|
We don't need to deprecate |
Oh, right, and |
Astroid is leaking memory when is run multiple times over same file using API
Steps to reproduce
Current behavior
After ~12 iteration memory goes over 1 GB and keeps growing
Expected behavior
Allocated memory should not rise indefinitelly
python -c "from astroid import __pkginfo__; print(__pkginfo__.version)"
output2.13.0-dev0
The text was updated successfully, but these errors were encountered: