Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid UTF-8 byte at index 1: 0x65 #1831

Closed
hayalbaz opened this issue Nov 8, 2019 · 9 comments
Closed

Invalid UTF-8 byte at index 1: 0x65 #1831

hayalbaz opened this issue Nov 8, 2019 · 9 comments

Comments

@hayalbaz
Copy link

hayalbaz commented Nov 8, 2019

  • Describe what you want to achieve.
    I want to serialize an object that has a string with "Çevrimiçi" in it. When I try to j.dump() I am getting an exception (json.exception.type_error.316)
  • Describe what you tried.
    I have tried to set compiler option /utf-8 and source/execution_charset to utf_8.
    I saved all my files with encoding utf-8
  • Describe which system (OS, compiler) you are using.
    windows 10
    MSVC 19.20.27508
  • Describe which version of the library you are using (release version, develop branch).
    latest release version
@nlohmann
Copy link
Owner

nlohmann commented Nov 8, 2019

Can you provide example code? Is the string "Çevrimiçi" inside your code or do you read it from a file?

@nlohmann nlohmann added platform: visual studio related to MSVC state: needs more info the author of the issue needs to provide more details labels Nov 8, 2019
@hayalbaz
Copy link
Author

hayalbaz commented Nov 8, 2019

Here is the function I am getting exception thrown:
std::string AliveLog::getJSON() { json j = json{ {"createdAt", createdAt}, {"id", id}, {"message", message}, {"detail", detail}, {"deviceId", deviceId}, {"type", type.ordinal}, }; return j.dump(); }
Here message is "Çevrimiçi". It is stored as std::string and is set by a setter method, which I use a struct that has std::string field..

@nlohmann nlohmann removed the state: needs more info the author of the issue needs to provide more details label Nov 8, 2019
@nlohmann
Copy link
Owner

nlohmann commented Nov 8, 2019

It seems that the literal "Çevrimiçi" is not encoded in UTF-8. Can you try this:

for (int c : message)
{
    std::cout << "character: " << std::hex << c << std::endl;
}

to show the full encoded values in the string `message´?

If it's UTF-8, it should be something like

\xC3\x87\x65\x76\x72\x69\x6D\x69\xC3\xA7\x69

@hayalbaz
Copy link
Author

hayalbaz commented Nov 8, 2019

character : ffffffc7
character: 65
character: 76
character: 72
character: 69
character: 6d
character: 69
character: ffffffe7
character: 69

@hayalbaz
Copy link
Author

hayalbaz commented Nov 8, 2019

When I define it like u8"Çevrimiçi" i get
character: ffffffc3
character: ffffff87
character: 65
character: 76
character: 72
character: 69
character: 6d
character: 69
character: fffffffc3
character: fffffffa7
character: 69

@hayalbaz
Copy link
Author

hayalbaz commented Nov 8, 2019

However when i use j.dump() now the message is with different characters(A with ~ ın tıo and something that looks like a cross instead of Ç). I assume this is due to Visual Studio?

@nlohmann
Copy link
Owner

nlohmann commented Nov 8, 2019

The u8 version looks right.

@nlohmann
Copy link
Owner

Do you need further assistance with this issue?

@hayalbaz
Copy link
Author

No, thanks for asking. Weird characters at debug screen was a Visual Studio problem. It works fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants