Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WordArray appears to skip arrays with special characters #1147

Closed
createdbypete opened this issue Jun 12, 2014 · 7 comments
Closed

WordArray appears to skip arrays with special characters #1147

createdbypete opened this issue Jun 12, 2014 · 7 comments

Comments

@createdbypete
Copy link

I could be mis interpreting the rules but take this example from a Jekyll test I've been working on:

# Cyrillic
assert_equal ["ВУЗ", "Вуз", "вуз"], @filter.sort(["Вуз", "вуз", "ВУЗ"])
assert_equal ["_вуз", "вуз", "вуз_"], @filter.sort(["вуз_", "_вуз", "вуз"])
# Hebrew
assert_equal ["אלף", "בית"], @filter.sort(["בית", "אלף"])

# Another example using latin characters mixed with words
["fooê", "foö", "héllo"]

These lines do not get flagged as requiring %w() treatment.

$ rubocop -V
0.23.0 (using Parser 2.1.9, running on ruby 2.1.2 x86_64-darwin13.0)
@bbatsov
Copy link
Collaborator

bbatsov commented Jun 13, 2014

It's amusing that I even know what ВУЗ means. :-)

I guess \w doesn't match non-ansi letter characters. We'll have this fixed.

@createdbypete
Copy link
Author

This probably needs the input of someone with strong RegEx but perhaps it would be easier to exclude characters we don't want? It depends on what you might consider a 'word' in this context.

Is it any characters that would work when wrapped in %w()? Including some symbols such as in this example: ['config.yml', 'dir/config.rb'] would be suitable for %w() in my opinion, but you might not agree that they are 'words'.

@bbatsov
Copy link
Collaborator

bbatsov commented Jun 13, 2014

Something like /\p{Word}+/ should do the trick as far as the regexp goes.

Is it any characters that would work when wrapped in %w()? Including some symbols such as in this example: ['config.yml', 'dir/config.rb'] would be suitable for %w() in my opinion, but you might not agree that they are 'words'.

Perhaps. I think we're currently checking for strings that are made of only letters, numbers, - and _.

@createdbypete
Copy link
Author

Just looked up \p{Word} and wow looks perfect, learnt something new thanks! Do you want me to put together a PR for this?

@bbatsov
Copy link
Collaborator

bbatsov commented Jun 13, 2014

Sure. —
Sent from Mailbox

On Fri, Jun 13, 2014 at 11:19 AM, Peter Rhoades [email protected]
wrote:

Just looked up \p{Word} and wow looks perfect, learnt something new thanks! Do you want me to put together a PR for this?

Reply to this email directly or view it on GitHub:
#1147 (comment)

@camillebaldock
Copy link
Contributor

hi @createdbypete, @bbatsov pushed a quick PR for this here: #1155

@createdbypete
Copy link
Author

Nice one @camilleldn past week has been busy I didn't get chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants