CLDR-16953 add kaa_Latn and kaa_Cyrl locales #3657

emily-roth · 2024-04-26T16:36:17Z

This PR completes the ticket.

ALLOW_MANY_COMMITS=true

srl295 · 2024-04-26T18:44:26Z

@emily-roth the 'merge commit' has the wrong commit messages. Click on https://us-central1-dev-infra-273822.cloudfunctions.net/unicode-github-bot/info/unicode-org/cldr/3657 (it's the 'jira ticket' check that fails) and squash it into one commit with a corrected message

See unicode-org#3657

jira-pull-request-webhook · 2024-04-26T19:07:27Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

emily-roth · 2024-04-26T20:40:02Z

@srl295 do you know why this is failing? I'm stumped.

DavidLRowe · 2024-04-29T19:20:36Z

In country_language_population.tsv file, in the Afghanistan entry, a tab got changed to a space.

emily-roth · 2024-04-29T19:21:52Z

oops, thanks

See unicode-org#3657

jira-pull-request-webhook · 2024-04-30T15:32:47Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

See CLDR-17608

srl295 · 2024-05-06T17:16:22Z

@emily-roth see CLDR-17608 - added some test fixes and fixed merge conflict

srl295 · 2024-05-06T18:02:36Z

tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestExampleGenerator.java

-                "[qw-yàâ-èê-ìîïñòôøùûÿāăēĕīĭōŏœūŭ]",
-                "〖‎🗝️ · ắ ằ ẵ ẳ ấ ầ ẫ ẩ ǎ a̧ ą ą́ a᷆ a᷇ ả ạ ặ ậ a̱ aː áː àː ɓ ć ĉ č ċ ď ḑ đ ḍ ḓ ð ɖ ɗ ế ề ễ ể ě ẽ ė ę ę́ e᷆ e᷇ ẻ ẹ ẹ́ ẹ̀ ệ e̱ eː éː èː ǝ ǝ́ ǝ̀ ǝ̂ ǝ̌ ǝ̄ ə ə́ ə̀ ə̂ ə̌ ə̄ ɛ ɛ́ ɛ̀ ɛ̂ ɛ̌ ɛ̈ ɛ̃ ɛ̧ ɛ̄ ɛ᷆ ɛ᷇ ɛ̱ ɛ̱̈ ƒ ğ ĝ ǧ g̃ ġ ģ g̱ gʷ ǥ ɣ ĥ ȟ ħ ḥ ʻ ǐ ĩ İ i̧ į į́ i᷆ i᷇ ỉ ị i̱ iː íː ìː íj́ ı ɨ ɨ́ ɨ̀ ɨ̂ ɨ̌ ɨ̄ ɩ ɩ́ ɩ̀ ɩ̂ ĵ ǩ ķ ḵ kʷ ƙ ĺ ľ ļ ł ḷ ḽ ḻ ḿ m̀ m̄ ń ǹ ň ṅ ņ n̄ ṇ ṋ ṉ ɲ ŋ ŋ́ ŋ̀ ŋ̄ ố ồ ỗ ổ ǒ õ ǫ ǫ́ o᷆ o᷇ ỏ ơ ớ ờ ỡ ở ợ ọ ọ́ ọ̀ ộ o̱ oː óː òː ɔ ɔ́ ɔ̀ ɔ̂ ɔ̌ ɔ̈ ɔ̃ ɔ̧ ɔ̄ ɔ᷆ ɔ᷇ ɔ̱ ŕ ř ŗ ṛ ś ŝ š ş ṣ ș ß ť ṭ ț ṱ ṯ ŧ ǔ ů ũ u̧ ų u᷆ u᷇ ủ ư ứ ừ ữ ử ự ụ uː úː ùː ʉ ʉ́ ʉ̀ ʉ̂ ʉ̌ ʉ̈ ʉ̄ ʊ ʊ́ ʊ̀ ʊ̂ ṽ ʋ ẃ ẁ ŵ ẅ ý ỳ ŷ ỹ ỷ ỵ y̱ ƴ ź ž ż ẓ ʒ ǯ þ ʔ ˀ ʼ ꞌ ǀ ǁ ǂ ǃ〗〖❬internal: ❭[qw-yàâ-èê-ìîïñòôøùûÿāăēĕīĭōŏœūŭ]〗"
-            },
+            // TODO: This test is too fragile. Commented out for discussion in CLDR-17608


@macchiati Can you review my change here? This test would break every time ScriptExemplars changes...

…standard out noise (#3965) I started this ticket because I was seeing a lot of noisy warnings and errors in the regular tests -- I ended up in a rabbit hole with the generated population data. This change updates the data inputs and fixes errors in the scripts so we can regenerate population data in a stable way now. ### Scripts ran: * mvn package -DskipTests=true * Re-ran these scripts, they need to be run more regularly, some changes happen * java -jar tools/cldr-code/target/cldr-code.jar AddPopulationData # Runs successfully now, some changes happen * java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData # Runs successfully now, some changes happen * These scripts do not run * java -jar tools/cldr-code/target/cldr-code.jar WikipediaOfficialLanguages * java -jar tools/cldr-code/target/cldr-code.jar GenerateMaximalLocales * Running tests on github (I still cannot locally run all of the tests* #### Script output changes A lot of the script Standard Out messages mentioned in the original ticket are now fixed and will not appear -- mostly from fixing input data sources and a few processing scripts. If there are legitimate errors in the future the warnings and errors will appropriately come back. * Suriname had 2 un-distinguished sources of literacy data, this will now take the max value of the two * one was the overall number * the other had filtered institutional data * Since the aggregate regions from `world_bank_data.csv` are now gone, there are no more warnings about aggregates without country codes, eg. "Sub-Saharan Africa (all income levels)` ### Data changed: * `country_language_population.tsv` * Fixed some areas where spaces were used that should the tabs -- this affected how scripts parsed Kara-Kalpak, bug introduced in #3657 * Added `Cantonese (Traditional) yue` row otherwise `yue` would disappear in the re-generated `supplementalData.xml` -- introduced in #3945 * `factbook_gdp_ppp.csv` & `factbook_gdp_ppp.csv`: CIA Factbook data updated and imported using the csv that's exported by the [CIA's website](https://www.cia.gov/the-world-factbook/references/guide-to-country-comparisons/) -- see also [the old CLDR update documentation](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy). * This will update all population counts in `supplementalData.xml` * Some stale data was removed from the Factbook there I added missing entries to `other_country_data.txt` * `other_country_data.txt`: Added information that used to be in earlier versions of the CIA Factbook * `world_bank_data.csv`: Re-generated from [the World Bank Website](https://databank.worldbank.org/reports.aspx?source=world-development-indicators#) . See also [the old CLDR update documentation](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy). * A big difference is that I correctly read the instructions and did not import the country aggregates, eg. "Sub-Saharan Africa (all income levels)` * `alternate_country_names.txt`: Removed no longer needed skipped names since we no longer import CIA Factbook aggregates ### Consequences for`supplementalData.xml` * **Official Languages**:`<language>` territories tag should be the territories where the language is **official** -- so some entries updated. For instance Mocheno was incorrectly considered an official language of Italy in #3665 * **Population counts** are incremented, so some **language population percentages** may increase or decrease if the input data is absolute value (since the denominator changed) * **GDPs** also changed * **Literacy Rates** some have changes * Note there was a wonderful bug where the UN literacy data was mis-parsed, so "96%" would be mis-read as "0.96%" -- I fixed that * **References**: The two Kara-Kalpak references are now grouped correctly, Chinese reference has been given more context too

emily-roth added a commit to emily-roth/cldr that referenced this pull request Apr 26, 2024

CLDR-16953 add kaa_Latn and kaa_Cyrl locales

ec461c0

See unicode-org#3657

emily-roth force-pushed the kaa branch from 44c2849 to ec461c0 Compare April 26, 2024 19:07

CLDR-16953 add kaa_Latn and kaa_Cyrl locales

820ebbf

See unicode-org#3657

emily-roth force-pushed the kaa branch from 10192a7 to 820ebbf Compare April 30, 2024 15:32

srl295 added 2 commits May 6, 2024 12:10

CLDR-16953 comment out a fragile exemplar test

5d4de79

See CLDR-17608

CLDR-16953 merge from main

c23b8c9

srl295 requested review from srl295, macchiati and DavidLRowe May 6, 2024 18:01

srl295 approved these changes May 6, 2024

View reviewed changes

srl295 reviewed May 6, 2024

View reviewed changes

emily-roth merged commit 5881198 into unicode-org:main May 6, 2024
10 checks passed

srl295 assigned emily-roth May 6, 2024

conradarcturus mentioned this pull request Aug 15, 2024

CLDR-17884 Regenerate AddPopulationData, ConvertLanguageData, reduce standard out noise #3965

Merged

5 tasks

Redwoodtj mentioned this pull request Oct 13, 2024

[Snyk] Security upgrade swagger-client from 3.18.5 to 3.29.4 Redwoodtj/cldr#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLDR-16953 add kaa_Latn and kaa_Cyrl locales #3657

CLDR-16953 add kaa_Latn and kaa_Cyrl locales #3657

emily-roth commented Apr 26, 2024

srl295 commented Apr 26, 2024

jira-pull-request-webhook bot commented Apr 26, 2024

emily-roth commented Apr 26, 2024

DavidLRowe commented Apr 29, 2024

emily-roth commented Apr 29, 2024 via email •

edited

Loading

jira-pull-request-webhook bot commented Apr 30, 2024

srl295 commented May 6, 2024

srl295 May 6, 2024

CLDR-16953 add kaa_Latn and kaa_Cyrl locales #3657

CLDR-16953 add kaa_Latn and kaa_Cyrl locales #3657

Conversation

emily-roth commented Apr 26, 2024

srl295 commented Apr 26, 2024

jira-pull-request-webhook bot commented Apr 26, 2024

emily-roth commented Apr 26, 2024

DavidLRowe commented Apr 29, 2024

emily-roth commented Apr 29, 2024 via email • edited Loading

jira-pull-request-webhook bot commented Apr 30, 2024

srl295 commented May 6, 2024

srl295 May 6, 2024

Choose a reason for hiding this comment

emily-roth commented Apr 29, 2024 via email •

edited

Loading