Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in whether trailing \r\n is included in the results json #1095

Closed
roblourens opened this issue Oct 28, 2018 · 3 comments
Closed
Labels
bug A bug.

Comments

@roblourens
Copy link
Contributor

0.10.0, Mac and Win

The --crlf flag has an unexpected side effect, that it changes whether the trailing \r\n is included in the json output. Very minor issue. I'm not sure which way is right.

Searching a \r\n document. When --crlf is not given, the trailing \r\n in "text" is included:

echo 'test\u0d\u0a' | rg --json test

[...]
{"type":"match","data":{"path":{"text":"<stdin>"},"lines":{"text":"test\r\n"},"line_number":1,"absolute_offset":0,"submatches":[{"match":{"text":"test"},"start":0,"end":4}]}}

When --crlf is included, the trailing \r\n in "text" is not included:

echo 'test\u0d\u0a' | rg --json test --crlf

[...]
{"type":"match","data":{"path":{"text":"<stdin>"},"lines":{"text":"test"},"line_number":1,"absolute_offset":0,"submatches":[{"match":{"text":"test"},"start":0,"end":4}]}}

But when searching a \n document, the trailing \n is always included:

echo 'test\u0a' | rg --json test   

[...]
{"type":"match","data":{"path":{"text":"<stdin>"},"lines":{"text":"test\n"},"line_number":1,"absolute_offset":0,"submatches":[{"match":{"text":"test"},"start":0,"end":4}]}}
@BurntSushi
Copy link
Owner

Yeah, I suspect this is a bug. I can definitely reproduce it. Interestingly, if you don't use --json, then you do get the CRLF line ending. This is probably due to the hack imposed by the --crlf flag. I'm not quite sure yet what the best way to fix it is.

@BurntSushi BurntSushi added the bug A bug. label Oct 29, 2018
@roblourens
Copy link
Contributor Author

We actually found a worse instance of this bug. I assume it's the same issue, I can open a new one if you prefer.

Search for \n in a CRLF document

echo 'a\u0d\u0a' | rg --json '\n' --multiline --crlf  

We get two match messages, one with an empty "submatches" array which should be invalid:

{"type":"begin","data":{"path":{"text":"<stdin>"}}}
{"type":"match","data":{"path":{"text":"<stdin>"},"lines":{"text":"a"},"line_number":1,"absolute_offset":0,"submatches":[]}}
{"type":"match","data":{"path":{"text":"<stdin>"},"lines":{"text":"\n"},"line_number":2,"absolute_offset":3,"submatches":[{"match":{"text":"\n"},"start":0,"end":1}]}}
{"type":"end","data":{"path":{"text":"<stdin>"},"binary_offset":null,"stats":{"elapsed":{"secs":0,"nanos":89223,"human":"0.000089s"},"searches":1,"searches_with_match":1,"bytes_searched":4,"bytes_printed":344,"matched_lines":2,"matches":1}}}
{"data":{"elapsed_total":{"human":"0.002074s","nanos":2074308,"secs":0},"stats":{"bytes_printed":344,"bytes_searched":4,"elapsed":{"human":"0.000089s","nanos":89223,"secs":0},"matched_lines":2,"matches":1,"searches":1,"searches_with_match":1}},"type":"summary"}

roblourens added a commit to microsoft/vscode that referenced this issue Oct 31, 2018
@BurntSushi
Copy link
Owner

Thanks for this report. This was a pretty gnarly bug, and it turned out that I tried to hack around it pretty poorly. I think the fix for this should make ripgrep's CRLF hack a bit more robust in general though, so I hope things work better now!

BurntSushi added a commit that referenced this issue Jan 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

2 participants