-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support searching across multiple lines #176
Comments
Correct. Not even the
The former only works because search is line oriented. A multiline regex can technically match, say, 2GB of data, which is completely incompatible with searching small chunks at a time. The latter could be made to work with multiline search, but memory maps can't search stdin for example. So a multiline search on stdin would have to block and read all of stdin into memory before searching. (There exists a way around even this, but it requires changing the regex engine to be capable of incremental search, which is an even bigger change, but theoretically possible.) multiline searching therefore comes with significant implementation complexity, and IMO is a pretty niche use case. I can also imagine it having a pretty big impact on the printing code. This fact alone is a good reason why it may never be in This is a good example of a feature that The Silver Searcher has that |
Gotcha, thanks for the explanation. I really like ripgrep as a tool, just was hoping to use it for this case too 😉 . |
@joshglendenning Yeah, I admit, it would be nice, and if it were easy, I'd have no problems with it. While I do consider it niche, I have no doubts that it would be quite useful! Once I split out most of the pieces of |
This is a really cool tool, but I might suggest including this as a caveat in the README, alongside the comparisons to |
@maxbrunsfeld I've been meaning to add an "anti pitch" section to the README like the one in my blog post. That's now done. Thanks for the reminder! |
I'm going to re-open this, because it's one of the most highly requested features. Nothing has changed about the problems I outlined above. However, multiline search needn't be the default. If we provide it as a flag, then we can do what we need to do to support multiline search only when that flag is provided. The critical thing that multiline search needs is a complete sequence of bytes in memory to search. Memory maps can provide this, but failing that, we would need to read the entire file into memory before starting a search. Other than using heap space proportional to the file being searched, the fundamental issue with this flag is when it's used in conjunction with searching stdin. Namely, ripgrep will need to block until EOF is read on stdin before a search can even start. Alternatively, multiline search simply wouldn't be allowed on stdin. The silver searcher will in fact do this silently when searching stdin:
I don't like the "silent" idea, but stopping ripgrep with an error is certainly something I'd be open to. Neither seem like good choices to me, but I don't think it should block this feature altogether. N.B. This is a significant feature and it would have to be part of the libripgrep effort. |
The other thing I forgot to mention is that multiline search will negate inner literal optimizations. Normal prefix and, in special cases, suffix, literal optimizations will still be performed as part of the regex engine. (I've long thought about making inner literal optimizations work on arbitrary strings, but it's hard.) |
A naive question/suggestion. Assuming single lines are being loaded for search now, can that be changed to n lines, n set to 10 or 20 or something like that? While a line gets in, another gets out of the load in FIFO fashion? This will not be technically correct for all cases, but may be enough for most cases. |
How significant are the trade-offs to the user experience? Would you actually need a special flag to ripgrep? or can you reliably determine from the expression itself? |
Great questions! Keep'em coming.
They are not. If they were, ripgrep would be very slow. The reasons for this are a bit subtle, but basically, "it's faster to search a huge chunk than it is to break it into little pieces and then search each piece." "Huge chunk" in this case might be the size of some internal buffer, perhaps, 8KB. If you're curious about how a fast grep tool works in more detail, check out this section in my blog post on ripgrep: http://blog.burntsushi.net/ripgrep/#anatomy-of-a-grep
If you have a regex like
I still actually strongly believe that multiline search is a very niche feature, but it is one that can be quite useful when the situation calls for it. (A text editor is perhaps one such situation, but ripgrep is first and foremost a command line tool where multiline search feels a lot less common.) Therefore, taking approach (3) doesn't seem worth it. In the common case, memory maps will work just fine and your OS will manage the memory for you. It's only the corner cases that are sub-optimal: when memory maps can't be used (e.g., on virtual files or stdin).
If
A flag is 100% necessary. A regex like |
I would agree use is actually niche, but desire to use is not.
I would say if you are searching for 2 terms and completeness is important then using multiline would often be your default. However, writing an expression to find termA followed by termB within 5 or less lines is likely not something that rolls off of the fingertips of someone who occasionally uses regular expressions although I think many would find it useful and use such expressions if more intuitive to write. |
@dakaraphi Good points. I'd like to use your comment to constrain this feature, namely, that multiline search is the ability to apply a regex whose matches may span an arbitrary number of lines. With that said:
I think (2) is something that's enabled by multiline search, although, today, you can do something similar with contexts: With all that said, we must be careful not to get too far away from what ripgrep is supposed to be good at doing: searching lines. :-) I say this because there has to be a point at which "write code for your specialized search" becomes a valid thing to say. The key is figuring out where that point is. |
Just to make sure I understand the intention, could you state that as what you see ripgrep would not do that possibly other regex engines do when searching multiline? |
@dakaraphi Sorry, the intention of me saying that was to push UX concerns like "how do I find co-occurring terms, A and B, within a fixed number of lines" out of multiline support. i.e., I don't think that particular UX should be addressed as part of standard multiline support, but should instead be considered as a separate feature (that may or may not happen). :-) I don't think there's anything ripgrep would do differently in terms of UX with respect to the silver searcher, other than 1) not doing it by default and 2) probably not doing silent things. |
Are there are other tools that support multiline search other than the silver searcher? |
I'm not sure about command line tools. Prior to using VS Code I was using Brackets which supported multiline file search. I believe other editors like Sublime, Notepad++ etc also support multiline. |
ok right. Yes I'm not sure if that really should be part of something like ripgrep or not. For example, I've been thinking about maybe writing some extension for VS Code like a regex helper or such that would take something like common patterns or templates and you just plugin the values for such use cases and it would generate the regex. |
@dakaraphi Great! I think we're on the same page now. :-) Thanks for poking! |
After [this comment](BurntSushi#176 (comment)) it seems like the statement about never supporting multiline search should be removed.
@dakaraphi directed me here from Microsoft/vscode #13155 It looks like one of the most common requests for searching across multiple lines is related to text editors. At the moment, my needs are very simple. If I can get a match across multiple files in a project for a multiline selection -- even if it's fully literal -- I could work with it. For most text editors, the menu option to search across multiple lines is separate than a simple search, and so a ripgrep flag, as @BurntSushi suggested, would naturally fit this use case. I'm still making it through @BurntSushi's anatomy of a grep link, but it appears to me that a multiline search for text editors mostly requires a literal search with some multiple literals (white space, line endings) and therefore the search won't even make it to the regex engine for these cases. Isn't the multiple line selection just a contiguous sequence of bytes (in the fully literal case) to be matched in a buffer? Or am I missing something related to optimisation here? I'm sure people will come up with cases where a regex in a multiline search/replace would be mighty handy, but I think support for the simpler multiple literal multiline case would be a good start to give some text editors (such as vscode and atom) missing functionality. btw, a most excellent ripgrep article @BurntSushi! |
Multi-line searching would be a boon to many. See for example this use case. |
Some nits:
Should be
appears, but perhaps this section could be worded differently.
|
@BurntSushi I'm glad you agree with the suggestions! The reworded sentence is indeed much clearer, after fixing the typo pointed out by @mateon1. Here's the diff of that sentence, for future reference/convenience: -That is, even if you use the --multiline flag but your regex cannot
-match over multiple lines, then ripgrep won't consume unnecessary resources.
+Specifically, if the --multiline flag is provided but the regex
+cannot match over multiple lines, then ripgrep won't read each file into memory
+before searching it. Now that I re-read that, I'm not sure "cannot match" is the best choice of words, since it can imply both a neutral statement or an imperative enforcement. (Not sure I'm being clear myself; let me know if I should rephrase!) I suppose you're referring to the case where the regex does not contain any patterns that would match newlines, or it contains |
@mateon1 Thanks! I took your advice, and chose "laid out."
Yes. Whether dotall is enabled or not is mostly orthogonal; what matters is whether a It is possible I should just remove this part of the docs. I'm not sure. I put it there as a way of saying that even if you enable multiline mode but don't make use it, you generally won't pay (much) for it. But maybe that's not that important. |
I think it wouldn't be a problem if it were removed, but it is useful information so I'd have a slight preference to keep it. IMO changing that sentence to something like this:
...would make it sufficiently unambiguous. |
@waldyrious I like it. Much better. Thanks! :) |
This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)
This commit updates the CHANGELOG to reflect all the work done to make libripgrep a reality. * Closes #162 (libripgrep) * Closes #176 (multiline search) * Closes #188 (opt-in PCRE2 support) * Closes #244 (JSON output) * Closes #416 (Windows CRLF support) * Closes #917 (trim prefix whitespace) * Closes #993 (add --null-data flag) * Closes #997 (--passthru works with --replace) * Fixes #2 (memory maps and context handling work) * Fixes #200 (ripgrep stops when pipe is closed) * Fixes #389 (more intuitive `-w/--word-regexp`) * Fixes #643 (detection of stdin on Windows is better) * Fixes #441, Fixes #690, Fixes #980 (empty matching lines are weird) * Fixes #764 (coalesce color escapes) * Fixes #922 (memory maps failing is no big deal) * Fixes #937 (color escapes no longer used for empty matches) * Fixes #940 (--passthru does not impact exit status) * Fixes #1013 (show runtime CPU features in --version output)
Will (I do know that both styles exist among regex engines but couldn't tell which is which) Sry if there is an answer to that somewhere. |
Current master has a I'm not aware of any regex engines that permit a literal |
VS Code matches |
Oh interesting. Is that something VS code layers on, or is it part of JS
regexes?
…On Thu, Aug 23, 2018, 19:36 Rob Lourens ***@***.***> wrote:
VS Code matches \r\n on \n when ctrl+f searching in a single file, it's
useful in an editor but I wouldn't use that as inspiration.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#176 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAb34nyP5Q-P7DWfp_i4QLvDJ-p0RCLDks5uTzx3gaJpZM4KWZCK>
.
|
No, it's just something vscode does. |
@BurntSushi Most probably I only encountered it in text editors like VSCode.
@roblourens Would you mind to elaborate? |
Personally I don't prefer "magic" like that, but yeah I'll have to see whether we can rewrite |
@roblourens @BurntSushi |
Currently, if you try to make a multiline search without the |
|
Please file a new issue. And please don't use phrases like "does not work" without actually showing what you mean by it. Please fill out the complete bug template. |
You're right, sorry for trying to resurrect and derail an old issue. I did more testing on this and I think I neglected to consider something with my test. it seems all to work as expected. |
Do we have an option to limit the max number of lines each match can have? |
No. And I don't see any obvious way to implement that either. You can usually build such limits into your regex instead. |
Can we build a regex to match the following? |
Sure. Will continue the discussion in the Q&A forum. |
Say for example I'm trying to find instances of
click
that reside in alisteners
block, like so:According to the Rust regex docs, I should be able to do:
rg '(?s)listeners.+click'
, but this doesn't seem to work. Does ripgrep not support multiline regex?The text was updated successfully, but these errors were encountered: