-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiled regex exceeds size limit of 10485760 bytes. #362
Comments
You can't, as of now, without re-compiling and hard-coding a new limit. I would be interested to hear how well 8,000 entries performs. As of now, it is fed into a single regex as a single alternation. If they are all plain text strings and not patterns, then it might be better to use Aho-Corasick instead (which ripgrep should do, it just doesn't yet). |
If you were so inclined, the limit is here: https://github.com/BurntSushi/ripgrep/blob/master/grep/src/search.rs#L67 FYI, you can't "remove" the limit (but you can set it arbitrarily high). |
Thanks, I have run it with a 640-line file (11K) without problems and very fast. The lines of the file are like these:
I am fairly new to coding, so excuse me but I need a "for dummies" howto set the limit very high. Basically, what should I do and where? |
The limit probably should be exposed as a flag so that it's a knob you can turn easily. That feature is not currently available, so the only way for you to do it is change the source code of ripgrep and recompile it. Briefly:
In order for the above to work, you will need to install Rust. See: https://rustup.rs/ |
OK, solved, thanks. I changed the "size_limit" from 10 to 1000:
|
Cheers |
Can you try increasing the dfa limit as well? Might speed things up too
…On Feb 16, 2017 2:58 PM, "Microbial Genomics Lab" ***@***.***> wrote:
Cheers
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#362 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAb34qqW670PqgAxZQNhO7lrXV_VA0b6ks5rdKpggaJpZM4L_ose>
.
|
I'm going to re-open this because I think this limit should be configurable without re-compiling ripgrep. |
Nice! Down to seconds now! |
Wow. My guess is that you were previously exhausting the cache space of the DFA, so it probably did a lot of thrashing or dropped down to one of the (much) slower NFA engines. |
Just a comment (I have not tried resetting the limit, as detailed above): I ran into this exact issue today, comparing a ~15K list (individual words on separate lines) to another file (~272K; ditto). Large, I know, but grep trounced that task, whereas ripgrep failed:
Arch Linux x84_64; 32MB RAM + swap + tmpfs; ripgrep v.0.7.1; grep (GNU grep) v.3.1 |
@victoriastuart Increase the size limit using |
Hi Andrew; thank you for the project/code, comment -- appreciated. Love ripgrep (v. fast, generally)! :-D Some observations (just a FYI; I'm happy with using grep, here):
|
You need to increase --dfa-size-limit too. |
Noting #497 ,
|
@victoriastuart thanks! Is possible can you share the data you are searching and your regex queries as well? Or tell me how to get it? It reproduce it publicly available data? |
Hi ... it's my own data (private), but simply lists of words; e.g.:
formatted for processing (~15K lines, this particular output):
|
When trying to do what described in: It throws: When trying to incerase limits as per #362 (comment), using the both options, where Then finally when I used: I have 16GB RAM, it just crashed the same way like ug, eating almost all RAM. Concluding: all greps: grep, rg, ug eat incredibly insane ammount of RAM to incerase speed. It would be good if the flawed rg and ug's RAM hungry algorithms would be fixed, I don't mean to sacrifice speed and to remove the current algorithm, the point is to simply have some alternative options to be able to finish the task without insane RAM usage, even if it will take some more time, some alternative option like "slower but less ram hungry and being able to finish the task" or maybe automatic splitting files into smaller chunks, whatever you invent to fix it. |
@garry-ut99 Please open a new issue and provide an MRE. I can't help you if you can't provide a way for me to reproduce the behavior you're seeing. You don't even share the actual |
Strange request, as you didn't tell the same to the previous person who posted similiar issue in the current thread after one year: #362 (comment), hence I assumed it's ok for you to post similiar issues in the same thread, but it seems you strangely suddenly changed your mind, I prefer to continue in the current thread until you explain/convince me: why do you want me to create a separate thread given a fact you didn't want the previous person with a similair issue to create a separate thread.
Another strange statement, as I provided much useful informations, including description of the issue, ammount of my RAM, example content of my files, their sizes, and an exact command line used to compare the both files, I expected you to at least try to reproduce issue on your side, by using files with similair size and similiar content, using any binary, and then eventually to ask me for "MRE", if you still can't reproduce, but you strangely jumped directly into requiring me to provide MRE for you.
Strange, as when no version is provided, the assumption is: the latest, which means Many strange statements from your side. |
I'll repeat one last time: please open a new issue with a minimal reproducible example. If you want my help, that's what I require. |
Hi, I am trying to use a file with more then 8,000 entries (10-20 letter words, one per line. 132K) and get their corresponding lines in a big file (645,151lines, 76M). I use:
rg -w -f query_file target_file
I get the error:
Compiled regex exceeds size limit of 10485760 bytes.
How can I configure it to allow rg to run without the limit?
The text was updated successfully, but these errors were encountered: