Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-16953 add kaa_Latn and kaa_Cyrl locales #3657

Merged
merged 3 commits into from
May 6, 2024

Conversation

emily-roth
Copy link
Contributor

CLDR-16953

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

@srl295
Copy link
Member

srl295 commented Apr 26, 2024

@emily-roth the 'merge commit' has the wrong commit messages. Click on https://us-central1-dev-infra-273822.cloudfunctions.net/unicode-github-bot/info/unicode-org/cldr/3657 (it's the 'jira ticket' check that fails) and squash it into one commit with a corrected message

emily-roth added a commit to emily-roth/cldr that referenced this pull request Apr 26, 2024
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@emily-roth
Copy link
Contributor Author

@srl295 do you know why this is failing? I'm stumped.

@DavidLRowe
Copy link
Contributor

In country_language_population.tsv file, in the Afghanistan entry, a tab got changed to a space.

@emily-roth
Copy link
Contributor Author

emily-roth commented Apr 29, 2024 via email

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@srl295
Copy link
Member

srl295 commented May 6, 2024

@emily-roth see CLDR-17608 - added some test fixes and fixed merge conflict

@srl295 srl295 requested review from srl295, macchiati and DavidLRowe May 6, 2024 18:01
"[qw-yàâ-èê-ìîïñòôøùûÿāăēĕīĭōŏœūŭ]",
"〖‎🗝️ · ắ ằ ẵ ẳ ấ ầ ẫ ẩ ǎ a̧ ą ą́ a᷆ a᷇ ả ạ ặ ậ a̱ aː áː àː ɓ ć ĉ č ċ ď ḑ đ ḍ ḓ ð ɖ ɗ ế ề ễ ể ě ẽ ė ę ę́ e᷆ e᷇ ẻ ẹ ẹ́ ẹ̀ ệ e̱ eː éː èː ǝ ǝ́ ǝ̀ ǝ̂ ǝ̌ ǝ̄ ə ə́ ə̀ ə̂ ə̌ ə̄ ɛ ɛ́ ɛ̀ ɛ̂ ɛ̌ ɛ̈ ɛ̃ ɛ̧ ɛ̄ ɛ᷆ ɛ᷇ ɛ̱ ɛ̱̈ ƒ ğ ĝ ǧ g̃ ġ ģ g̱ gʷ ǥ ɣ ĥ ȟ ħ ḥ ʻ ǐ ĩ İ i̧ į į́ i᷆ i᷇ ỉ ị i̱ iː íː ìː íj́ ı ɨ ɨ́ ɨ̀ ɨ̂ ɨ̌ ɨ̄ ɩ ɩ́ ɩ̀ ɩ̂ ĵ ǩ ķ ḵ kʷ ƙ ĺ ľ ļ ł ḷ ḽ ḻ ḿ m̀ m̄ ń ǹ ň ṅ ņ n̄ ṇ ṋ ṉ ɲ ŋ ŋ́ ŋ̀ ŋ̄ ố ồ ỗ ổ ǒ õ ǫ ǫ́ o᷆ o᷇ ỏ ơ ớ ờ ỡ ở ợ ọ ọ́ ọ̀ ộ o̱ oː óː òː ɔ ɔ́ ɔ̀ ɔ̂ ɔ̌ ɔ̈ ɔ̃ ɔ̧ ɔ̄ ɔ᷆ ɔ᷇ ɔ̱ ŕ ř ŗ ṛ ś ŝ š ş ṣ ș ß ť ṭ ț ṱ ṯ ŧ ǔ ů ũ u̧ ų u᷆ u᷇ ủ ư ứ ừ ữ ử ự ụ uː úː ùː ʉ ʉ́ ʉ̀ ʉ̂ ʉ̌ ʉ̈ ʉ̄ ʊ ʊ́ ʊ̀ ʊ̂ ṽ ʋ ẃ ẁ ŵ ẅ ý ỳ ŷ ỹ ỷ ỵ y̱ ƴ ź ž ż ẓ ʒ ǯ þ ʔ ˀ ʼ ꞌ ǀ ǁ ǂ ǃ〗〖❬internal: ❭[qw-yàâ-èê-ìîïñòôøùûÿāăēĕīĭōŏœūŭ]〗"
},
// TODO: This test is too fragile. Commented out for discussion in CLDR-17608
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati Can you review my change here? This test would break every time ScriptExemplars changes...

@emily-roth emily-roth merged commit 5881198 into unicode-org:main May 6, 2024
10 checks passed
conradarcturus added a commit that referenced this pull request Aug 16, 2024
…standard out noise (#3965)

I started this ticket because I was seeing a lot of noisy warnings and errors in the regular tests -- I ended up in a rabbit hole with the generated population data. This change updates the data inputs and fixes errors in the scripts so we can regenerate population data in a stable way now.

### Scripts ran:
* mvn package -DskipTests=true
* Re-ran these scripts, they need to be run more regularly, some changes happen
  * java -jar tools/cldr-code/target/cldr-code.jar AddPopulationData # Runs successfully now, some changes happen
  * java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData # Runs successfully now, some changes happen
* These scripts do not run
  * java -jar tools/cldr-code/target/cldr-code.jar WikipediaOfficialLanguages
  * java -jar tools/cldr-code/target/cldr-code.jar GenerateMaximalLocales
* Running tests on github (I still cannot locally run all of the tests*

#### Script output changes
A lot of the script Standard Out messages mentioned in the original ticket are now fixed and will not appear -- mostly from fixing input data sources and a few processing scripts. If there are legitimate errors in the future the warnings and errors will appropriately come back.

* Suriname had 2 un-distinguished sources of literacy data, this will now take the max value of the two
  * one was the overall number
  * the other had filtered institutional data
* Since the aggregate regions from `world_bank_data.csv` are now gone, there are no more warnings about aggregates without country codes, eg. "Sub-Saharan Africa (all income levels)`

### Data changed:
* `country_language_population.tsv`
  * Fixed some areas where spaces were used that should the tabs -- this affected how scripts parsed Kara-Kalpak, bug introduced in #3657
  * Added `Cantonese (Traditional)	yue` row otherwise `yue` would disappear in the re-generated `supplementalData.xml` -- introduced in #3945 
* `factbook_gdp_ppp.csv` & `factbook_gdp_ppp.csv`: CIA Factbook data updated and imported using the csv that's exported by the [CIA's website](https://www.cia.gov/the-world-factbook/references/guide-to-country-comparisons/) -- see also [the old CLDR update documentation](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy).
   * This will update all population counts in `supplementalData.xml`
   * Some stale data was removed from the Factbook there I added missing entries to `other_country_data.txt`
* `other_country_data.txt`: Added information that used to be in earlier versions of the CIA Factbook
* `world_bank_data.csv`: Re-generated from [the World Bank Website](https://databank.worldbank.org/reports.aspx?source=world-development-indicators#) . See also [the old CLDR update documentation](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy).
  * A big difference is that I correctly read the instructions and did not import the country aggregates, eg. "Sub-Saharan Africa (all income levels)`
* `alternate_country_names.txt`: Removed no longer needed skipped names since we no longer import CIA Factbook aggregates

### Consequences for`supplementalData.xml`
* **Official Languages**:`<language>` territories tag should be the territories where the language is **official** -- so some entries updated. For instance Mocheno was incorrectly considered an official language of Italy in  #3665
* **Population counts** are incremented, so some **language population percentages** may increase or decrease if the input data is absolute value (since the denominator changed)
* **GDPs** also changed
* **Literacy Rates** some have changes
  * Note there was a wonderful bug where the UN literacy data was mis-parsed, so "96%" would be mis-read as "0.96%" -- I fixed that
* **References**: The two Kara-Kalpak references are now grouped correctly, Chinese reference has been given more context too
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants