diff --git a/docs/download/76.md b/docs/download/76.md index 9d9467159b35..35650aae455d 100644 --- a/docs/download/76.md +++ b/docs/download/76.md @@ -14,15 +14,29 @@ License & terms of use: http://www.unicode.org/copyright.html # ICU 76 -ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o), used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj). +ICU is the [premier library for software internationalization](https://icu.unicode.org/#h.i33fakvpjb7o), +used by a [wide array of companies and organizations](https://icu.unicode.org/#h.f9qwubthqabj). ## Release Overview -ICU 76 updates to [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog), +ICU 76 updates to +[Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) +([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)), including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations. -It also updates to [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog) locale data with new locales and various additions and corrections. + +It also updates to +[CLDR 46](https://cldr.unicode.org/downloads/cldr-46) +([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html)) +locale data with new locales, signficant updates to existing locales, +and various additions and corrections. +For example, the CLDR and Unicode default sort orders are now very nearly the same. + +Most of the java.time (Temporal) types can now be formatted directly +using the existing ICU4J date/time formatting classes. There are some new APIs to make ICU easier to use with modern C++ and Java patterns. +Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs, +and usable on top of binary stable C APIs, which is a first for ICU. The Java and C++ technology preview implementations of the (also in [tech preview](https://github.com/unicode-org/message-format-wg?tab=readme-ov-file#messageformat-2-technical-preview)) CLDR MessageFormat 2.0 specification have been updated to match recent changes. @@ -34,7 +48,7 @@ Please use the [icu-support mailing list](https://icu.unicode.org/contacts) and/ The initial release has library version number 76.1. -* Release date: 2024-10-TODO +* Release date: _planned for_ 2024-10-24 * [List of tickets fixed in ICU 76](https://unicode-org.atlassian.net/issues/?jql=project%20%3D%20ICU%20AND%20status%20%3D%20Done%20AND%20resolution%20in%20%28Fixed%2C%20%22Fixed%20by%20Other%20Ticket%22%29%20AND%20fixVersion%20%3D%2076.1%20ORDER%20BY%20component%20ASC%2C%20created%20DESC) If there are maintenance releases, they will be 76.2, 76.3, etc. (During ICU 76 development, the library version number was 76.0.x.) @@ -43,51 +57,168 @@ Note: There may be additional commits on the [maint/maint-76](https://github.com ## Common Changes -* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) (TODO: link to blog): - * TODO -* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) (TODO: link to blog): - * TODO: new stuff - * TODO: below is from 45 - * MessageFormat 2.0 tech preview being included into LDML. - * Structural “under the hood” work and limited data bug fixes, but no new data collection. - * Some time zones deprecated following IANA TZ database changes. -* TODO: new stuff -* TODO: below is from 75 -* New Unicode properties APIs for Identifier_Status and Identifier_Type, defined by UTS \#39 Unicode Security Mechanisms, [General Security Profile for Identifiers](https://www.unicode.org/reports/tr39/#General_Security_Profile). ([ICU-11396](https://unicode-org.atlassian.net/browse/ICU-11396)) -* Time zone data (tzdata) version 2024a (2024-jan). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b. +* [Unicode 16](https://www.unicode.org/versions/Unicode16.0.0/) + ([blog](https://blog.unicode.org/2024/09/announcing-unicode-standard-version-160.html)): + * Adds five modern-use scripts: Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar + * Adds two historic scripts & almost 4000 additional Egyptian Hieroglyphs + * Seven new emoji characters + * Over 700 symbols from legacy computing environments + * ICU line breaking improvements have been upstreamed into + [UAX #14](https://www.unicode.org/reports/tr14/tr14-53.html#Modifications) + * ICU 76 adds support for the new UCD property Modifier_Combining_Mark for + [UAX #53](https://www.unicode.org/reports/tr53/) Arabic Mark Rendering + * ICU 76 also adds support for the UCD property Indic_Conjunct_Break + which was new in Unicode 15.1. ([ICU-22503](https://unicode-org.atlassian.net/browse/ICU-22503)) + * [IDNA](https://www.unicode.org/reports/tr46/tr46-33.html#Modifications): + The handling of UseSTD3ASCIIRules was simplified. + Some existing characters changed from disallowed (when that was only for compatibility with + long-obsolete IDNA2003) to valid. +* [CLDR 46](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md) + ([beta blog](https://blog.unicode.org/2024/09/unicode-cldr-46-beta-available-for.html)): + * Significant data updates across all locales + * Locales which are now at modern coverage level: Nigerian Pidgin, Tigrinya + * Locales which are now at moderate coverage level: + Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof + * New measurement units "night" and "light-speed" + * Note: ICU 76 does not yet support `portion-per-1e9` (aka per-billion). (See [ICU-22781](https://unicode-org.atlassian.net/browse/ICU-22781)) + * [MessageFormat 2.0 tech preview updates](https://cldr.unicode.org/downloads/cldr-46#message-format-specification) + * Language matching: Dropped the fallback mapping + desired="uk" → supported="ru" + (so that Ukrainian (uk) doesn’t fall back to Russian (ru)) + * [Collation](https://cldr.unicode.org/downloads/cldr-46#collation-data-changes): + Significant changes to the CLDR root collation (CLDR default sort order) + * Realigned With DUCET: + The order of groups of characters which sort below letters is now the same. + In both sort orders, non-decimal-digit numeric characters now sort after decimal digits, + and the CLDR root collation no longer tailors any currency symbols + (making some of them sort like letter sequences, as in the DUCET). + _These changes eliminate sort order differences among almost all + regular characters between the CLDR root collation and the DUCET._ + * Improved Han Radical-Stroke Order: + The CLDR radical-stroke order now matches that of the Unicode Radical-Stroke Index; + traditional vs. simplified forms of radicals are now distinguished on a lower level than the number of residual strokes. + In alphabetic indexes for radical-stroke sort orders, + only the traditional forms of radicals are now available as index characters. +* Time zone data (tzdata) version 2024b (2024-sep). Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream [tzdata](https://www.iana.org/time-zones) release since 2021b. + * The Asia/Almaty time zone has become an alias following IANA TZ database changes. + * CLDR added support for deprecated timezone codes by remapping: + CST6CDT → America/Chicago, EST → America/Panama, EST5EDT → America/New_York, + MST7MDT → America/Denver, PST8PDT → America/Los_Angeles + (These IANA TZ changes were motivated by CLDR, see + [CLDR-17111](https://unicode-org.atlassian.net/browse/CLDR-17111)) ## ICU4C Specific Changes -* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html) - * TODO: new stuff - * TODO: below is from 75 - * MessageFormat 2.0 tech preview new API ([ICU-22261](https://unicode-org.atlassian.net/browse/ICU-22261)) - * C: Require C11 (up from C99) - * C++: Require C++17 (up from C++11) - * Many changes for more robust string and buffer handling. +* [API changes since ICU4C 75 (Markdown)](https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.md) / [(HTML)](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4c/APIChangeReport.html) + * A UnicodeString can now be converted to & from UTF-16 standard string_view types + (std::u16string_view, and on Windows to/from std::wstring_view) + and other UTF-16 types (string literals, standard string classes). + Several other member functions have been widened to accept standard UTF-16 types as well. + ([ICU-22843](https://unicode-org.atlassian.net/browse/ICU-22843)) + * New APIs for colloquial iteration over the elements of a C++ UnicodeSet or a C USet. ([ICU-22876](https://unicode-org.atlassian.net/browse/ICU-22876)) + * For details and an example see the “C++ Header-Only APIs” section of the [Migration Issues](#migration-issues) below. + * New APIs for colloquial use of C++ Collator / C UCollator with + standard C++ algorithms (e.g, sort) & data structures (e.g., map). + ([ICU-22879](https://unicode-org.atlassian.net/browse/ICU-22879)) + (The UCollator wrappers are also C++ header-only APIs.) + * Note: Some APIs were changed to accept a wider range of input types than before, + but in the API change report they look like the old, stable signatures are removed, + and like the wider signatures are added as “born stable”. + For example, several UnicodeString constructors that take a raw pointer + have been replaced with a signature that accepts such raw pointers but also additional input types. + * Note: Similarly, the API change report appears to show removal+addition of + certain UnicodeString::remove() and UnicodeString::removeBetween() overloads, + but only the _expression_ of one of their default parameter values has changed. + * Many changes for more robust string and memory handling. ## ICU4J Specific Changes -* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html) - * TODO: new stuff - * TODO: below is from 75 - * MessageFormat 2.0 tech preview update ([ICU-22690](https://unicode-org.atlassian.net/browse/ICU-22690)) - * Performance (multi-threading / lock contention) improvement for BreakIterator.clone() and ULocale.getDefault(). ([ICU-22582](https://unicode-org.atlassian.net/browse/ICU-22582)) +* [API Changes since ICU4J 75](https://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-76/icu4j/APIChangeReport.html) + * Most of the java.time (Temporal) types can now be formatted directly + using the existing ICU4J date/time formatting classes. ([ICU-22853](https://unicode-org.atlassian.net/browse/ICU-22853)) + * New APIs for colloquial iteration over the elements of a UnicodeSet. + In addition to the existing ranges(), strings(), and UnicodeSet-is-an-Iterable, + there is a new codePoints() (returns an Iterable), + and new methods that return Streams (e.g., codePointStream() & rangeStream()). + ([ICU-22845](https://unicode-org.atlassian.net/browse/ICU-22845)) ## Known Issues -* TODO: new stuff -* TODO: below is from 75 -* [ICU-22729](https://unicode-org.atlassian.net/browse/ICU-22729) udatpg_getBestPattern requires exact skeleton match in ICU 76 - * Due to a combination of an ICU bug fix and issues with CLDR availableFormats data, some skeletons in some languages yield inconsistent data/time formatting patterns. +* None yet ## Migration Issues -* See [CLDR 46 migration issues](https://github.com/unicode-org/cldr/blob/main/docs/site/downloads/cldr-46.md#migration) - * TODO: new stuff - * TODO: below is from 75 - * ICU4C behavior for ill-formed locale IDs/language tags: uloc_getName(), uloc_getLanguage() and similar functions (and functions that rely on them) may fail with a U_ILLEGAL_ARGUMENT_ERROR when they used to fail only with a U_BUFFER_OVERFLOW_ERROR. (due to changes for [ICU-22520](https://unicode-org.atlassian.net/browse/ICU-22520)) - * On Linux, the configure script now defaults to "cc" rather than preferring "clang". If you want to choose clang, then configure for "Linux/clang". ([ICU-22556](https://unicode-org.atlassian.net/browse/ICU-22556)) +### IDNA Default Option Changed to Nontransitional Processing +After all major browsers have switched to nontransitional processing, +Unicode 15.1 (a year ago) changed the [UTS #46 spec](https://www.unicode.org/reports/tr46/#Processing) +to declare transitional processing deprecated. + +ICU 76 changes the "DEFAULT" API constants from 0 to UIDNA_NONTRANSITIONAL_TO_ASCII | UIDNA_NONTRANSITIONAL_TO_UNICODE. + +ICU 76 does not change the behavior of using options value 0. +(That would change the behavior of existing binaries linking with new ICU libraries.) +However, when code is recompiled against a new version of ICU, +and when it uses the DEFAULT constant, then it will pass these option flags into the factory method. + +* In C/C++: unicode/uidna.h [UIDNA_DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uidna_8h.html#a726ca809ffd3d67ab4b8476646f26635aa1eb63014cdaf41c7ea6cf3abecf1169) +* In Java: IDNA.java [DEFAULT](https://unicode-org.github.io/icu-docs/apidoc/dev/icu4j/com/ibm/icu/text/IDNA.html#DEFAULT) + +See [ICU-22294](https://unicode-org.atlassian.net/browse/ICU-22294) + +### SimpleNumber::truncateStart() Removed +ICU 75 renamed the still-draft SimpleNumber::truncateStart() to setMaximumIntegerDigits(). +ICU 76 removes the never-stable, original function. +Same for the C API usnum_truncateStart(). +([ICU-22900](https://unicode-org.atlassian.net/browse/ICU-22900)) + +### C++ Header-Only APIs +ICU 76 is the first version where we add what we call C++ header-only APIs. +These are especially intended for users who rely on only binary stable DLL/library exports of C APIs +(C++ APIs cannot be binary stable). + +_Please test these new APIs and let us know if you find problems — +especially if you find a platform/compiler/options combination +where the call site does end up calling into ICU DLL/library exports._ + +Remember that regular C++ APIs can be hidden by callers defining `U_SHOW_CPLUSPLUS_API=0`. +The new header-only APIs can be separately enabled via `U_SHOW_CPLUSPLUS_HEADER_API=1`. + +([GitHub query for `U_SHOW_CPLUSPLUS_HEADER_API` in public header files](https://github.com/search?q=repo%3Aunicode-org%2Ficu+U_SHOW_CPLUSPLUS_HEADER_API+path%3Aunicode%2F*.h&type=code)) + +These are C++ definitions that are not exported by the ICU DLLs/libraries, +are thus inlined into the calling code, +and which may call ICU C APIs but not into ICU non-header-only C++ APIs. + +The header-only APIs are defined in a nested `header` namespace. +If entry point renaming is turned off (the main namespace is `icu` rather than `icu_76` etc.), +then the new `U_HEADER_ONLY_NAMESPACE` is `icu::header`. + +([Link to the API proposal which introduced this concept](https://docs.google.com/document/d/1xERVccTYsptzjfbjcj6HDtoKVF_mEKmslPsOiQzzaFg/view#heading=h.cf4bmhjgozry)) + +For example, for iterating over the code point ranges in a `USet` (excluding the strings): + +```c++ +U_NAMESPACE_USE +using U_HEADER_NESTED_NAMESPACE::USetRanges; +LocalUSetPointer uset(uset_openPattern(u"[abcçカ🚴]", -1, &errorCode)); +for (auto [start, end] : USetRanges(uset.getAlias())) { + printf("uset.range U+%04lx..U+%04lx\n", (long)start, (long)end); +} +for (auto range : USetRanges(uset.getAlias())) { + for (UChar32 c : range) { + printf("uset.range.c U+%04lx\n", (long)c); + } +} +``` + +(Implementation note: On most platforms, when compiling ICU itself, +the `U_HEADER_ONLY_NAMESPACE` is `icu::internal`, +so that any such symbols that get exported differ from the ones that calling code sees. +On Windows, where DLL exports are explicit, +the namespace is always the same, but these header-only APIs are not marked for export.) + +### Migration Issues Related to CLDR +* See [CLDR 46 migration issues](https://cldr.unicode.org/downloads/cldr-46#migration) ## ICU4C Platform Support @@ -97,27 +228,30 @@ We routinely test on recent versions of Linux, macOS, and Windows. We accept patches for other platforms. +For ICU 76, we have received a contribution to make ICU4C work again on z/OS, +using a newer (clang-based) compiler. ([ICU-22714](https://unicode-org.atlassian.net/browse/ICU-22714) [icu/pull/3008](https://github.com/unicode-org/icu/pull/3008) + [ICU-22916](https://unicode-org.atlassian.net/browse/ICU-22916) [icu/pull/3208](https://github.com/unicode-org/icu/pull/3208)) + Windows: The minimum supported version is Windows 7. (See [How To Build And Install On Windows](../userguide/icu4c/build.html#how-to-build-and-install-on-windows) for more details.) ## ICU4J Platform Support -ICU4J works on Java 8..17 (at least). +ICU4J works on Java 8..21 (at least). ICU4J should work on Android API level 21 and later but may require “[library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring)”. ## Download -Source and binary downloads are available on the git/GitHub tag page: TODO: https://github.com/unicode-org/icu/releases/tag/release-76-1 +Source and binary downloads are available on the git/GitHub tag page: https://github.com/unicode-org/icu/releases/tag/release-76-rc See the [Source Code Setup](../devsetup/source/) page for how to download the ICU file tree directly from GitHub. ICU locale data was generated from CLDR data equivalent to: -* TODO: fix/update -* https://github.com/unicode-org/cldr/releases/tag/release-46-beta4 -* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta4 +* https://github.com/unicode-org/cldr/releases/tag/release-46-beta3 +* https://github.com/unicode-org/cldr-staging/releases/tag/release-46-beta3 -TODO: Maven dependency: +[Maven dependency](https://central.sonatype.com/artifact/com.ibm.icu/icu4j): +TODO ``` com.ibm.icu