-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate std::ctype
, std::ctype_byname
, std::isupper()
, and std::toupper()
#2
Comments
To be fair, |
General_Category is the wrong property, I think. Maybe it's ok for Cc (if what you want to test really is C0&C1 control characters), but definitely wrong for isupper. isupper should check Uppercase, which doesn't match gc. Always doubt yourself when you think what you need is General_Category. |
Fair point, @rmartinho : TR 30112's definition of
|
Can't deprecate this, it's used by iostreams. |
I'm not sure what you are referring to by "this", but deprecation is not removal. We can deprecate features that are still in use. |
std::ctype
, std::ctype_byname
, std::isupper()
, and std::toupper()
Changed title to limit scope. Focus on issues currently identified and described in this issue. |
Here is a potential replacement http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1628r0.pdf Note that this is low level ( doesn't mean it should't be provided as it it useful/necessary for lexers among other things ), but in general Unicode recommend this kind of things to be done on strings rather than code points both in locale independent and tailored fashion. Exhaustive list of functions to deprecate
|
Just the variants that take a |
Good question |
I think we should focus more on what an appropriate C replacement would look like first. |
Would C be interested in supporting unicode character properties?
…On Sat, 3 Aug 2019 at 02:30, Tom Honermann ***@***.***> wrote:
And alternative might be to add deprecated (or deleted, might be a hard
sale ?) overloads for char8_t, char16_t, char32_t
I think we should focus more on what an appropriate C replacement would
look like first.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2?email_source=notifications&email_token=AAKX766HDH3F6KGCS32IUYDQCTGSXA5CNFSM4E34L7E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3PDIFA#issuecomment-517878804>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKX764FIALZV2KEJMEHU6TQCTGSXANCNFSM4E34L7EQ>
.
|
No idea. My guess is that they would require any replacements to work with (wide) execution encoding and thus existing (non-Unicode) encodings. I see the motivation for replacement being:
The point is more that we can’t deprecate these (in C) without replacements (in C). |
The thing is - I'm pretty sure Unicode character properties are NOT a
replacement.
Unicode characters properties should NOT be locale dependent in anyway,
cp_isupper(U'Γ') should always be true, regardless of the execution
encoding, platform, etc
Ignoring the fact that isupper(foo) (for example) does not support anything
but the first 255 value of a given character set, a negative answer means
either
- foo is not a upper case letter
- foo is not part of this non-unicode character set
Is that a useful information? Is a replacement useful? If it is we still
need two api and maybe we can fix the existing one - By fixing your second
and third bullet points.
…On Sat, 3 Aug 2019 at 16:56, Tom Honermann ***@***.***> wrote:
Would C be interested in supporting unicode character properties?
No idea. My guess is that they would require any replacements to work with
(wide) execution encoding and thus existing (non-Unicode) encodings. I see
the motivation for replacement being:
- improved error handling; no EOF value handling.
- no UB on values not representable in unsigned char.
- not code unit value based so that variable length encodings can be
supported.
The point is more that we can’t deprecate these (in C) without
replacements (in C).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2?email_source=notifications&email_token=AAKX763CXOOW5ZORKYZKGPLQCWMDPA5CNFSM4E34L7E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3PP4FY#issuecomment-517930519>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKX76YFT2UR7AY3U2FZLNDQCWMDPANCNFSM4E34L7EQ>
.
|
I agree. They might be used in the implementation of a replacement though.
I agree, but the provided example is specifically passing a Unicode code point, so I don't think anyone would expect a locale dependency (this is not true for case mapping algorithms in general, but is for Unicode code point properties).
Technically, it supports all values that fit in a value of
Or foo isn't a code point at all (e.g., a trailing code unit value). A code point based interface would solve all three of the bullet points I listed. (I would be fine with passing an invalid code point, errm, scalar value being a precondition violation; long live Contracts 2.0!) |
The standard library specifies a number of interfaces that cannot be made to work reasonably well for Unicode. For example, from
<locale>
:std::ctype
,std::ctype_byname
std::isupper()
)std::toupper()
)Such interfaces are candidates for deprecation, replacement, and eventual removal.
The text was updated successfully, but these errors were encountered: