Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22917 Generating / updating the units data is a very clunky and manual process #3262

Merged
merged 2 commits into from
Nov 8, 2024

Conversation

mihnita
Copy link
Contributor

@mihnita mihnita commented Nov 1, 2024

Before these changes the measurement unit code updates were generated from a unit test
to standard output, and had to be enabled / disabled one by one.

Although not 100% automated, after this change the tests are enabled by a flag,
and the result go to some temporary files that can be opened in editor, diff-ed, etc.

Something like an 80% improvement with a 20% effort.

Doing more becomes tricky.
In some cases the old code should be changed, or not.
In others new methods get added to files, if they don't already exist.
Or replace the old method if different.
And that would require parsing Java and C++ files to determine what is there, and how much to delete.

Checklist

  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22917
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

ALLOW_MANY_COMMITS=true

@mihnita
Copy link
Contributor Author

mihnita commented Nov 1, 2024

TBD / Asking for opinions:

  • There are two versions that need to be modified in MeasureUnitGeneratorTest
    It is unclear to me what they represent. Would need help from someone familiar with the code.
    Can I take them automatically from VersionInfo.ICU_VERSION.getMajor ?
    Or from command line?
  • The generated files go in main/common_tests/ and have names stating with genunits_*
    They are easy to discover, as git status shows.
    But there is a risk that they get in the way, or get submitted.
    I can put them in the target folder, where they get cleaned automatically, and also become "invisible" to git. With pros and cons.
  • Is it worth the trouble to try and modify the target files directly?
    Would require parsing the existing files to find what to replace.

@mihnita
Copy link
Contributor Author

mihnita commented Nov 1, 2024

Hi Dragan,

I would especially appreciate your input, as you had to do this several times recently.
So you know the tricks, and also felt the pain.

Even better if you can spend 10 minutes or so to try it.

Separately, see my questions in the previous comment.
Do you know what is the meaning of the versions passed to the generators?
I find it suspicious that generateBackwardCompatibilityTest and generateCXXBackwardCompatibilityTest, which do the same thing for Java / C++, get different version numbers right now.

Thank you,
Mihai

@DraganBesevic
Copy link
Contributor

DraganBesevic commented Nov 2, 2024

Hi Dragan,

I would especially appreciate your input, as you had to do this several times recently. So you know the tricks, and also felt the pain.

Even better if you can spend 10 minutes or so to try it.

Separately, see my questions in the previous comment. Do you know what is the meaning of the versions passed to the generators? I find it suspicious that generateBackwardCompatibilityTest and generateCXXBackwardCompatibilityTest, which do the same thing for Java / C++, get different version numbers right now.

Thank you, Mihai

First, thank you very much for doing this. Any improvement in this process of updating units is very welcomed! I wish I could have better answers, but my experience with this code is mostly based on the latest digging and trying to figure out what it actually does and how it works.

Having said that, I am pretty sure that those two versions should be the same. In the previous workflow, there was a lot of manual updating of those versions values before running the unit test to get the proper output, so it is very possible that I didn't update it with 76 because I had that last minute battle to get the latest units in.

How it worked out was, one had to manually uncomment the line targeted for generating a particular code, update the version, then comment it back. Rinse and repeat for several lines.

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, when it's all done, please do not squash it down to one commit. IMO the first commit should be separate. Other commits should probably be squashed though.

@mihnita
Copy link
Contributor Author

mihnita commented Nov 4, 2024

For this PR, when it's all done, please do not squash it down to one commit. IMO the first commit should be separate. Other commits should probably be squashed though.

ACK. I thought so too.

@mihnita
Copy link
Contributor Author

mihnita commented Nov 4, 2024

Implemented (almost all) review feedback.
Explained why not when I didn't.

@mihnita mihnita requested a review from markusicu November 4, 2024 21:28
@mihnita
Copy link
Contributor Author

mihnita commented Nov 4, 2024

About version: I took git-clone(d) the repo, then did a git checkout on each release-* tag starting with release-60, all the way to release-76-1. Including RC and betas and what not.
Did a grep for all the calls on the generators (the call, inside testZZZ)

And in most of them the version in the Java file matches the ICU version (in the git tag).
In some the version in the code is one step back. I guess that the file was not commited.

In all cases (60-76) all generators were called with the same version, except ICU 76.
So it was probably a case of "not submitted".

So I changed the code to get the ICU version automatically.
No need to update it by hand anymore.

@mihnita
Copy link
Contributor Author

mihnita commented Nov 4, 2024

Any opinions about where to put the generated files?

Right now they go in main/common_tests/

So if you do a git status you will see:

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	main/common_tests/genunits_MeasureUnit.java
	main/common_tests/genunits_MeasureUnitCompatibilityTest.java
	main/common_tests/genunits_MeasureUnitGeneratorTest.java
	main/common_tests/genunits_measfmttest.cpp
	main/common_tests/genunits_measunit.cpp
	main/common_tests/genunits_measunit.h

They are more discoverable, but there is a risk that they are added to the repo if nobody notices them in review.

I can put them in main/common_tests/target.
They become "invisible" to git stat, but also no risk to submit them by mistake.

@markusicu
Copy link
Member

I can put them in main/common_tests/target.
They become "invisible" to git stat, but also no risk to submit them by mistake.

ok

@mihnita
Copy link
Contributor Author

mihnita commented Nov 5, 2024

I can put them in main/common_tests/target.
They become "invisible" to git stat, but also no risk to submit them by mistake.

ok

DONE

@mihnita
Copy link
Contributor Author

mihnita commented Nov 5, 2024

Everything from Markus is implemented.
Thank you very much.

@mihnita
Copy link
Contributor Author

mihnita commented Nov 6, 2024

@markusicu: OK if I squash this in 2 commits (renames / split files, and all the rest)

Thanks,
M.

@markusicu
Copy link
Member

OK if I squash this in 2 commits (renames / split files, and all the rest)

sgtm

@mihnita mihnita force-pushed the mihai_gen_units_data_icu_22917 branch from 1773170 to e967399 Compare November 7, 2024 22:02
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@mihnita
Copy link
Contributor Author

mihnita commented Nov 7, 2024

OK if I squash this in 2 commits (renames / split files, and all the rest)

sgtm

Squashed to 2 commits.

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(partial rs)lgtm

@mihnita mihnita merged commit fbfbe6c into unicode-org:main Nov 8, 2024
16 checks passed
@mihnita mihnita deleted the mihai_gen_units_data_icu_22917 branch November 12, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants