-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
techStackGeneric: should be faster #249
Comments
This was referenced Mar 20, 2024
ee7
added a commit
that referenced
this issue
Mar 21, 2024
The `getProcNames` procedure was doing some strange things: - It was trying to read files at `/proc/foo/status`, where `foo` is the name of a file (not directory) in `/proc`. - It scanned each file at `/proc/[pid]/status` once per digit of its pid. This wasn't producing any incorrect results because: - It caught and ignored the exceptions from opening nonexistent files. - The names are added to a `HashSet[string]`, which deduplicates them. It was also doing some further unnecessary work: it kept scanning lines of `/proc/[pid]/status` even after it found the `Name` value (which is specified to be on the first line). This commit resolves those issues. Refs: #249
ee7
added a commit
that referenced
this issue
Mar 28, 2024
A significant fraction of the tech stack detection execution time was often from scanning files in any `.git` directory. Prevent that. Clearly, the speedup here depends on the relative size of the `.git` directory versus the rest of the data. But at least on one machine, this commit is a 2x speedup for running in the chalk repo: chalk insert --use-tech-stack-detection ./foo We should eventually improve the directory walking here, and consider ignoring other directories and files, but let's just do this for now. Refs: #249
Closing in favor of #328. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
The techStackGeneric plugin is currently a bit too slow. It'd be good to speed it up. For example:
getProcNames
(refactor(techStackGeneric): improve getProcNames #250).git
(refactor(techStackGeneric): ignore .git directories #253)/dev/random
(38b75eb)scanDirectory
reading each file multiple timesstd/re
to e.g.nitely/nim-regex
. That's a pure Nim regex library which can be faster than PCRE for some types of regular expressions. It claims:For now, I don't think we'd need to parallelize it.
For a release build of chalk, profiling
chalk insert --use-tech-stack-detection foo
in a tiny repo with callgrind:techStackGeneric
scanFile
std/streams
readLine
fd_cache
acquireFileStream
std/re
find
Dependencies
None.
Subtickets
None.
The text was updated successfully, but these errors were encountered: