CLDR-17430 add mhn Mocheno locale #3665

DavidLRowe · 2024-05-01T01:43:30Z

This PR completes the ticket.

ALLOW_MANY_COMMITS=true

jira-pull-request-webhook · 2024-05-01T22:21:18Z

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

…standard out noise (#3965) I started this ticket because I was seeing a lot of noisy warnings and errors in the regular tests -- I ended up in a rabbit hole with the generated population data. This change updates the data inputs and fixes errors in the scripts so we can regenerate population data in a stable way now. ### Scripts ran: * mvn package -DskipTests=true * Re-ran these scripts, they need to be run more regularly, some changes happen * java -jar tools/cldr-code/target/cldr-code.jar AddPopulationData # Runs successfully now, some changes happen * java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData # Runs successfully now, some changes happen * These scripts do not run * java -jar tools/cldr-code/target/cldr-code.jar WikipediaOfficialLanguages * java -jar tools/cldr-code/target/cldr-code.jar GenerateMaximalLocales * Running tests on github (I still cannot locally run all of the tests* #### Script output changes A lot of the script Standard Out messages mentioned in the original ticket are now fixed and will not appear -- mostly from fixing input data sources and a few processing scripts. If there are legitimate errors in the future the warnings and errors will appropriately come back. * Suriname had 2 un-distinguished sources of literacy data, this will now take the max value of the two * one was the overall number * the other had filtered institutional data * Since the aggregate regions from `world_bank_data.csv` are now gone, there are no more warnings about aggregates without country codes, eg. "Sub-Saharan Africa (all income levels)` ### Data changed: * `country_language_population.tsv` * Fixed some areas where spaces were used that should the tabs -- this affected how scripts parsed Kara-Kalpak, bug introduced in #3657 * Added `Cantonese (Traditional) yue` row otherwise `yue` would disappear in the re-generated `supplementalData.xml` -- introduced in #3945 * `factbook_gdp_ppp.csv` & `factbook_gdp_ppp.csv`: CIA Factbook data updated and imported using the csv that's exported by the [CIA's website](https://www.cia.gov/the-world-factbook/references/guide-to-country-comparisons/) -- see also [the old CLDR update documentation](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy). * This will update all population counts in `supplementalData.xml` * Some stale data was removed from the Factbook there I added missing entries to `other_country_data.txt` * `other_country_data.txt`: Added information that used to be in earlier versions of the CIA Factbook * `world_bank_data.csv`: Re-generated from [the World Bank Website](https://databank.worldbank.org/reports.aspx?source=world-development-indicators#) . See also [the old CLDR update documentation](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy). * A big difference is that I correctly read the instructions and did not import the country aggregates, eg. "Sub-Saharan Africa (all income levels)` * `alternate_country_names.txt`: Removed no longer needed skipped names since we no longer import CIA Factbook aggregates ### Consequences for`supplementalData.xml` * **Official Languages**:`<language>` territories tag should be the territories where the language is **official** -- so some entries updated. For instance Mocheno was incorrectly considered an official language of Italy in #3665 * **Population counts** are incremented, so some **language population percentages** may increase or decrease if the input data is absolute value (since the denominator changed) * **GDPs** also changed * **Literacy Rates** some have changes * Note there was a wonderful bug where the UN literacy data was mis-parsed, so "96%" would be mis-read as "0.96%" -- I fixed that * **References**: The two Kara-Kalpak references are now grouped correctly, Chinese reference has been given more context too

DavidLRowe requested a review from srl295 May 1, 2024 04:09

srl295 approved these changes May 1, 2024

View reviewed changes

CLDR-17430 add mhn Mocheno locale

52ae499

See unicode-org#3665

DavidLRowe force-pushed the CLDR-17430 branch from 54e9e77 to 52ae499 Compare May 1, 2024 22:21

DavidLRowe merged commit b246d3c into unicode-org:main May 1, 2024
10 checks passed

DavidLRowe deleted the CLDR-17430 branch May 1, 2024 22:54

macchiati mentioned this pull request May 2, 2024

CLDR-17600 BRS 7 Update language names #3666

Closed

1 task

conradarcturus mentioned this pull request Aug 15, 2024

CLDR-17884 Regenerate AddPopulationData, ConvertLanguageData, reduce standard out noise #3965

Merged

5 tasks

Redwoodtj mentioned this pull request Oct 13, 2024

[Snyk] Security upgrade swagger-client from 3.18.5 to 3.29.4 Redwoodtj/cldr#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLDR-17430 add mhn Mocheno locale #3665

CLDR-17430 add mhn Mocheno locale #3665

DavidLRowe commented May 1, 2024

jira-pull-request-webhook bot commented May 1, 2024

CLDR-17430 add mhn Mocheno locale #3665

CLDR-17430 add mhn Mocheno locale #3665

Conversation

DavidLRowe commented May 1, 2024

jira-pull-request-webhook bot commented May 1, 2024