ripgrep: case insensitive search is faster than case sensitive search #2444
-
Hi, I was playing around with ripgrep and figured out that the case insensitive search is faster than the case sensitive search on a single file with a single regex pattern:
(The file As you can see, the search with the Can someone explain why? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Could you share a haystack where this occurs? I can try to make one, but an analysis might really depend on the exact haystack. 3GB is too big to share, but perhaps you can share a script to generate one from |
Beta Was this translation helpful? Give feedback.
-
OK, so this is a somewhat intriguing example. The short answer for why you're seeing this behavior is because ripgrep does pretty sophisticated black magic called "literal optimizations." The reason why it's black magic is because they are basically a big bag of heuristics that usually work really well. In this case, both are pretty fast, but it is indeed somewhat unusual to see a case insensitive search so much faster than the case sensitive version. I'll also note that GNU grep demonstrates a similar phenomenon here:
Although for GNU grep, it is doubly weird because it only happens in the So basically, what it comes down to here is that the case sensitive version of But the case insensitive version of Generally speaking, Teddy is much much slower than But... the problem is that
That means that while
That's only a couple times bigger than the total count for GNU grep is vulnerable to similar problems, because for a single substring search, it always feeds the last byte to
Where as ripgrep is just as fast in both cases:
Because ripgrep uses different algorithms and does try to be robust with these sorts of things. But sometimes it guesses wrong, which leads to cases like this. In fact, you don't even need the
Most regex engines don't have an algorithm like Teddy to deal with multiple substring searches like this, so this isn't really a pit that they can fall into. |
Beta Was this translation helpful? Give feedback.
OK, so this is a somewhat intriguing example. The short answer for why you're seeing this behavior is because ripgrep does pretty sophisticated black magic called "literal optimizations." The reason why it's black magic is because they are basically a big bag of heuristics that usually work really well. In this case, both are pretty fast, but it is indeed somewhat unusual to see a case insensitive search so much faster than the case sensitive version.
I'll also note that GNU grep demonstrates a similar phenomenon here: