You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
XmlUtil.escape wrongly assumes that every java char corresponds to one unicode symbol and does not escape the chars correctly.
We came across the symbol '😉' today in an Onix file, which was correctly escaped as 😉 and was correctly parsed as the winking face emoticon. The resulting java String contains two chars, not one, even though it's only a single symbol. XmlUtils.escape produces the invalid String �� from this.
It should have recognized the first char a being part of a two-char-pair, reconstructed the codepoint and output 😉.
The text was updated successfully, but these errors were encountered:
Thanks for reporting this. It seems as if surrogate pairs for supplementary characters are not correctly encoded. I reckon this can be fixed by using String#codepoints() instead of String.chars() in XmlUtil.java line 99.
XmlUtil.escape wrongly assumes that every java char corresponds to one unicode symbol and does not escape the chars correctly.
We came across the symbol '😉' today in an Onix file, which was correctly escaped as
😉
and was correctly parsed as the winking face emoticon. The resulting java String contains two chars, not one, even though it's only a single symbol. XmlUtils.escape produces the invalid String��
from this.It should have recognized the first char a being part of a two-char-pair, reconstructed the codepoint and output
😉
.The text was updated successfully, but these errors were encountered: