Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] UTF-8 YAML files with accents in version 1.9.0 raise incompatible character encodings: UTF-8 and ASCII-8BIT #606

Open
kivanio opened this issue Jan 27, 2022 · 21 comments · Fixed by msgpack/msgpack-ruby#246
Milestone

Comments

@kivanio
Copy link

kivanio commented Jan 27, 2022

For long time I have a code like this:

STATES = I18n.t('states').with_indifferent_access.freeze

The yaml file is in UTF-8 and it has accents in some words:

pt-BR:
  states:
    Acre: AC
    Amapá: AP
    Ceará: CE
    Piauí: PI
    Paraná: PR

with new release 1.9.0
It starts failing in our CI in a lot of places:

ActionView::Template::Error incompatible character encodings: UTF-8 and ASCII-8BIT
Failure/Error: = f.select(:state, City::STATES,

ActionView::Template::Error:
  incompatible character encodings: UTF-8 and ASCII-8BIT
./app/views/customers/_form.html.slim:86:in `block in 

Downgrade to 1.8.11 make everything works again.

With 1.9.0 use rails console it loads like:

rails c
Running via Spring preloader in process 4964
Loading development environment (Rails 6.1.4.4)
[1] pry(main)> I18n.t('states').with_indifferent_access.freeze
=> {"Acre"=>"AC",
 "Alagoas"=>"AL",
 "Amazonas"=>"AM",
 "Amap\xC3\xA1"=>"AP",
 "Bahia"=>"BA",
 "Cear\xC3\xA1"=>"CE",
 "Distrito Federal"=>"DF",
 "Esp\xC3\xADrito Santo"=>"ES",
 "Goi\xC3\xA1s"=>"GO",
 "Maranh\xC3\xA3o"=>"MA",
 "Minas Gerais"=>"MG",
 "Mato Grosso do Sul"=>"MS",
 "Mato Grosso"=>"MT",
 "Par\xC3\xA1"=>"PA",
 "Para\xC3\xADba"=>"PB",
 "Pernambuco"=>"PE",
 "Piau\xC3\xAD"=>"PI",
 "Paran\xC3\xA1"=>"PR",
 "Rio de Janeiro"=>"RJ",
 "Rio Grande do Norte"=>"RN",
 "Rond\xC3\xB4nia"=>"RO",
 "Roraima"=>"RR",
 "Rio Grande do Sul"=>"RS",
 "Santa Catarina"=>"SC",
 "Sergipe"=>"SE",
 "S\xC3\xA3o Paulo"=>"SP",
 "Tocantins"=>"TO"}

With 1.8.11 use rails console it loads like:

rails c
Running via Spring preloader in process 3411
Loading development environment (Rails 6.1.4.4)
[1] pry(main)> I18n.t('states').with_indifferent_access.freeze
=> {"Acre"=>"AC",
 "Alagoas"=>"AL",
 "Amazonas"=>"AM",
 "Amap\xC3\xA1"=>"AP",
 "Bahia"=>"BA",
 "Cear\xC3\xA1"=>"CE",
 "Distrito Federal"=>"DF",
 "Esp\xC3\xADrito Santo"=>"ES",
 "Goi\xC3\xA1s"=>"GO",
 "Maranh\xC3\xA3o"=>"MA",
 "Minas Gerais"=>"MG",
 "Mato Grosso do Sul"=>"MS",
 "Mato Grosso"=>"MT",
 "Par\xC3\xA1"=>"PA",
 "Para\xC3\xADba"=>"PB",
 "Pernambuco"=>"PE",
 "Piau\xC3\xAD"=>"PI",
 "Paran\xC3\xA1"=>"PR",
 "Rio de Janeiro"=>"RJ",
 "Rio Grande do Norte"=>"RN",
 "Rond\xC3\xB4nia"=>"RO",
 "Roraima"=>"RR",
 "Rio Grande do Sul"=>"RS",
 "Santa Catarina"=>"SC",
 "Sergipe"=>"SE",
 "S\xC3\xA3o Paulo"=>"SP",
 "Tocantins"=>"TO"}

The output seems exactly the same.
Is that a bug in new version?

Probably something between new version and load in rails.
This is just a sample
I have others files with accents and all of them are raising same exception.

Versions of i18n, rails, and anything else you think is necessary

ruby: 3.0.3
i18n: 1.9.0
rails: 6.1.4.3
rspec-rails: 5.1.0
rspec: 3.10.0

and # frozen_string_literal: true in ruby files.

@radar
Copy link
Collaborator

radar commented Jan 27, 2022

Hello, thank you for the detailed reproduction steps.

I was unable to reproduce this issue using:

  • Ruby 3.0.0
  • i18n 1.9.1
  • Rails 6.1.4.3

(The RSpec versions are highly likely to be irrelevant to this issue)

I will try with Ruby 3.0.3 now.

@radar
Copy link
Collaborator

radar commented Jan 27, 2022

I am unable to reproduce this issue with 3.0.3. Could you please put a Rails app that does reproduce this issue on GitHub so that I can clone it down and investigate?

@radar radar added the no-repro label Jan 27, 2022
@joergschiller
Copy link

joergschiller commented Jan 27, 2022

We have a similar issue. I've created this repo to demonstrate the difference.

Using i18n 1.8.11 the keys are symbolized with UTF-8 encoding. With 1.9.1 it's ASCII.

gem 'i18n', "< 1.9.0"
I18n.t("foobar").keys => [:ö]
I18n.t("foobar").keys.first.encoding => #<Encoding:UTF-8>

I18n.t("foobar").values.first.encoding => #<Encoding:UTF-8>
gem 'i18n', "> 1.9.0"
I18n.t("foobar").keys => [:"\xC3\xB6"]
I18n.t("foobar").keys.first.encoding => #<Encoding:ASCII-8BIT>

I18n.t("foobar").values.first.encoding => #<Encoding:UTF-8>

https://github.com/joergschiller/i18n_issue_606/blob/293513b938310239e06bf4062be975e0ec772fc0/Gemfile#L6-L12

(With Ruby 3.0.3)

@kivanio
Copy link
Author

kivanio commented Jan 28, 2022

Thank you @joergschiller
I am not alone 🙏🏻
I was going to make a repo and you saved me.
I was believing that It was something desired in new version but now I think we have a bug.

@radar
Copy link
Collaborator

radar commented Jan 28, 2022

Thank you @joergschiller. I can reproduce this issue now with your repository. I'll find the commit that broke it.

@radar
Copy link
Collaborator

radar commented Jan 28, 2022

Commit that breaks this behaviour is 0fda789, as discovered through a git bisect:

 ~/code/gems/i18n   v1.9.0~7^2 (bisect)  bad
0fda789ea745cd462658a8948ee085201aba5c6f is the first bad commit
commit 0fda789ea745cd462658a8948ee085201aba5c6f
Author: Paarth Madan <[email protected]>
Date:   Wed Nov 3 12:33:12 2021 -0400

    Symbolize and freeze keys when loading from YAML

 lib/i18n/backend/base.rb    | 2 +-
 test/backend/simple_test.rb | 7 ++++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

@radar
Copy link
Collaborator

radar commented Jan 28, 2022

@paarthmadan Do you have any time today to investigate this one?

@radar
Copy link
Collaborator

radar commented Jan 28, 2022

Looked into where the unsafe_load_file method was coming from bootsnap. cc @casperisfine.

/Users/ryan.bigg/.asdf/installs/ruby/3.0.3/lib/ruby/gems/3.0.0/gems/bootsnap-1.10.2/lib/bootsnap/compile_cache/yaml.rb:203:in `unsafe_load_file': wrong number of arguments (given 0, expected 1+) (ArgumentError)

If I remove bootsnap from this test application, the issue goes away:

irb(main):001:0> I18n.t(:foobar).keys[0]
=> :ö

@casperisfine: Would you like me to create an issue on the Bootsnap repo page for this one?

@casperisfine
Copy link

👀

@casperisfine
Copy link

Ah damn it, I know what the problem is. It's because Bootsnap uses msgpack to accelarate YAML parsing, and MessagePack use an API that doesn't preserve symbols encoding properly. See an issue I opened a while ago msgpack/msgpack-ruby#211

Let me go over my old research see how we could sidestep this in bootsnap. I'll update here ASAP.

@casperisfine
Copy link

Ok, so the bug is actually in msgpack, I opened a PR here: msgpack/msgpack-ruby#246

You can apply the patch with:

gem 'msgpack', github: 'Shopify/msgpack-ruby', branch: 'symbolize-keys-fix-encoding'
>> I18n.t(:foobar)
=> {=>"ü"}

Alternatively, if you'd rather not run a gem branch, you can disable Bootsnap YAML caching. Sorry for the bug :/

@radar
Copy link
Collaborator

radar commented Jan 28, 2022

Thank you very much for the deep investigation here :)

@radar
Copy link
Collaborator

radar commented Feb 3, 2022

@casperisfine Is there something we could do to get that msgpack/msgpack-ruby PR merged + released? Is there a bribe of cookies that needs to be made here?

@casperisfine
Copy link

Not that I know of. He did acknowledge seeing some of my other PRs, he's probably busy.

I don't like to nag maintainers, but I see the same bug was reported again yesterday, so I'll ping him on that on PR just this once.

@casperisfine
Copy link

He said early next week hopefully.

@casperisfine
Copy link

If you are interested, I could add some feature checking in I18n, so sidestep the optimization when it's bogus.

@radar
Copy link
Collaborator

radar commented Feb 4, 2022

@casperisfine That could be a good workaround in the meantime, I think. Could you please find out what that would take??

@radar radar added this to the 1.10 milestone Feb 8, 2022
@casperisfine
Copy link

Uh, I just come back to this issue, and I was certain I already answered :/

So first the fix was merged upstream today, but not sure when there will be a release.

For the workaround, we could simply test wether the bug is present with something like:

: ~

and then from ruby:

if YAML.unsafe_load_file("test.yml", symbolize_names: true).keys.first.encoding == Encoding::UTF_8
  # it works we can use the optimization
  ...

@radar
Copy link
Collaborator

radar commented Feb 14, 2022

Reviewing this again, I think we'll just wait for a new msgpack release to happen, and then advise people who encounter this issue to upgrade to that new version.

I'll be leaving this issue open until that new version is out.

@casperisfine
Copy link

msgpack 1.4.5 was released a few hours ago and should solve this issue: https://rubygems.org/gems/msgpack/versions/1.4.5

@kivanio
Copy link
Author

kivanio commented Feb 15, 2022

Thank you all for the hard work! 👏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants