Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #1924] Use a cache to speed things up #2153

Merged
merged 1 commit into from
Sep 1, 2015

Conversation

jonas054
Copy link
Collaborator

I've been working on this for a while, and feel ready to make a pull request. The performance of RuboCop has been discussed lately, so I guess this would be welcomed by some users.

Cache invalidation is generally a hard problem, but I think this solution should work (knock on wood). It uses the source code of the rubocop executable, a selection of the passed options, the inspected file, and its configuration as basis.

Add --cache option and AlwaysRunWithCache configuration parameter (false by default). Store a cache of inspection results at /tmp/rubocop_cache/ (or whatever the temporary directory is). Clean up when it gets too big (governed by the configuration parameter MaxFilesInCache).

Performace

There's a slight performance penalty for storing cached results while inspecting files, around 2%, which is why I chose to only store in the cache if the option is given. The execution when there are cached results to reuse is typically between 10 and 30 times faster.

@bbatsov
Copy link
Collaborator

bbatsov commented Aug 18, 2015

Great feature! I had totally forgot about this discussion. A see a few small issues right now:

  1. If it's not enabled by default - few people will likely use it
  2. It's still won't help people running RuboCop on CIs (as often they use one-off VMs)
  3. This should be extensively documented in the README

@@ -57,6 +57,9 @@ def validate_compatibility
(@options[:only] & %w(Lint/UnneededDisable UnneededDisable)).any?
fail ArgumentError, 'Lint/UnneededDisable can not be used with --only.'
end
if @options.key?(:cache) && @options.key?(:auto_correct)
fail ArgumentError, '--cache can not be used with --auto-correct.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so? Can't we just invalidate the things we auto-correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably. Either way, it makes no sense to have this check as caching could also be turned on in configuration.

@jonas054
Copy link
Collaborator Author

Good feedback. I'll make it on by default, but then I also need to add a --no-cache option. And update README. It's unfortunate that CI runs typically won't benefit. That means we also need come up with some good old-fashioned optimizations. But that's another topic.

@jonas054
Copy link
Collaborator Author

Working on a solution that addresses the issues raised here. I have also identified a couple of more problems.

  • Running without -D, saving to the cache, and then with -D reading from the cache, you get no cop names in the output. I'm sure the problem is more general, i.e., options are not handled correctly in cache invalidation.
  • The configuration parameter AllCops: MaxFilesInCache is global, so if one inspected project sets it really low, it causes cached results to be erased for other projects. One solution could the to store the cache under the current directory, in .rubocop_cache/ instead of under /tmp/rubocop_cache/.

@jonas054
Copy link
Collaborator Author

Pushed some fixes.


Large projects containing hundreds or even thousands of files can take
a really long time to inspect, but RuboCop has functionality to
mitigate this problem. A result caching mechanism is activated if the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the term "result" is kind of confusing/unclear. I'd suggest elaborating on this.

@jonas054
Copy link
Collaborator Author

Pushed some new updates. Please check again.

@bbatsov
Copy link
Collaborator

bbatsov commented Aug 31, 2015

Looks fine to me.

@bbatsov
Copy link
Collaborator

bbatsov commented Aug 31, 2015

Btw, one idea for optimization for dealing with config changes or CLI option changes would be to also cache the ASTs, so we can reuse them directly in such scenarios.

Add `--cache` option and `UseCache` configuration
parameter. Store a cache of inspection results at
`/tmp/rubocop_cache/`. Clean up when it gets too
big (governed by the configuration parameter
`MaxFilesInCache`).
@jonas054
Copy link
Collaborator Author

jonas054 commented Sep 1, 2015

I haven't tested caching the ASTs, so I can't say for sure, but I suspect that it won't do much for performance.

So if we push that aside, maybe we're ready to merge?

bbatsov added a commit that referenced this pull request Sep 1, 2015
[Fix #1924] Use a cache to speed things up
@bbatsov bbatsov merged commit 62d92fd into rubocop:master Sep 1, 2015
@bbatsov
Copy link
Collaborator

bbatsov commented Sep 1, 2015

Fair enough. Let's merge this!

@wli
Copy link
Contributor

wli commented Sep 1, 2015

👍 Can't wait to use this!

@jonas054 jonas054 deleted the cache_option branch September 1, 2015 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants