Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat files containing invalid byte sequences as non-matches #4310

Merged
merged 1 commit into from
Apr 25, 2017

Conversation

mclark
Copy link
Contributor

@mclark mclark commented Apr 24, 2017

If a file path contains a non-UTF8 character and we are matching
against a regular expression, an ArgumentError is thrown,
failing the entire operation. Since it is not even a valid byte
sequence for the encoding, we can assume it is not a match and
return false instead.

With this change, if there are paths on the file system that are not
valid UTF-8 we will just treat them as not matching the
regular expression and continue processing instead of failing
with an exception.


Before submitting the PR make sure the following are checked:

  • Wrote [good commit messages][1].
  • Used the same coding conventions as the rest of the project.
  • Feature branch is up-to-date with master (if not - rebase it).
  • All tests are passing.
  • The new code doesn't generate RuboCop offenses.
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences.

@mclark mclark force-pushed the fix-regexp-path-matching branch 2 times, most recently from 102eaef to 8cd52a6 Compare April 25, 2017 00:18
@@ -78,7 +78,7 @@

it 'matches regexps' do
expect(described_class.match_path?(/^d.*e$/, 'dir/file')).to be_truthy
expect(described_class.match_path?(/^d.*e$/, 'dir/filez')).to be_falsey
expect(described_class.match_path?(/^d.*e$/, "dir/file\xBF")).to be_falsey
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably have this as a separate example for clarity's sake.

@bbatsov
Copy link
Collaborator

bbatsov commented Apr 25, 2017

Your change should be accompanied by a changelog entry.

@mclark mclark force-pushed the fix-regexp-path-matching branch 2 times, most recently from 9cb1248 to d6b2638 Compare April 25, 2017 13:12
If a file path contains a non-UTF8 character and we are matching
against a regular expression, an ArgumentError is thrown,
failing the entire operation. Since it is not a valid byte
sequence for the encoding, we can assume it is not a match and
return false instead.
@mclark mclark force-pushed the fix-regexp-path-matching branch from d6b2638 to 6469ea0 Compare April 25, 2017 13:13
@mclark
Copy link
Contributor Author

mclark commented Apr 25, 2017

@bbatsov thanks for the 👀 ! I've addressed your feedback, feel free to take another look when you get a chance.

@bbatsov bbatsov merged commit 2b369fd into rubocop:master Apr 25, 2017
@bbatsov
Copy link
Collaborator

bbatsov commented Apr 25, 2017

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants