Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection issue with GRNBoost2 #15

Open
cflerin opened this issue Jun 24, 2019 · 2 comments
Open

Garbage collection issue with GRNBoost2 #15

cflerin opened this issue Jun 24, 2019 · 2 comments

Comments

@cflerin
Copy link
Contributor

cflerin commented Jun 24, 2019

I'm running Arboreto's implementation of GRNBoost2 via the pySCENIC command line, but I figured this issue probably belongs here. I get the following warning, which repeats a number of times through the run. It seems like most of the time GRNBoost2 does complete successfully, but it would be nice to avoid the performance hit which this warning seems to imply. Any ideas on how to solve this? I've notice that it may occur more often with larger expression matrices (10,000 cells, 20,000 genes). I'm using dask v1.0.0 if that helps.

distributed.utils_perf - WARNING - full garbage collections took 10% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 11% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 12% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 12% CPU time recently (threshold: 10%)
distributed.utils_perf - WARNING - full garbage collections took 12% CPU time recently (threshold: 10%)

Thanks for any help you can provide.
Chris

@rjb67
Copy link

rjb67 commented Jul 1, 2019

I have also encountered this issue as well.

@cflerin
Copy link
Contributor Author

cflerin commented Jul 10, 2019

Just to follow up on this, I have tried messing around with the repartition_multiplier in the create_graph function. It appears that it's currently set to partition the dask graph to the number of client cores:
https://github.com/tmoerman/arboreto/blob/3ff7b6f987b32e5774771751dea646fa6feaaa52/arboreto/core.py#L442-L444

This is my dask graph after repartitioning:

>>> graph
Dask DataFrame Structure:
                    TF  target importance
npartitions=20
                object  object    float64
                   ...     ...        ...
...                ...     ...        ...
                   ...     ...        ...
                   ...     ...        ...
Dask Name: repartition, 80874 tasks

I then tried a bunch of partition settings from no repartitioning at all up to a few thousand, but no matter the setting I would always get these garbage collection warnings. Perhaps it has something to do with the number of tasks instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants