Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-23000 Replace CharString for LocaleBased #3321

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

FrankYFTang
Copy link
Contributor

@FrankYFTang FrankYFTang commented Dec 24, 2024

These changes improve the overall peak memory usage and slightly improve
performance

	main avg	main stdev	branch avg	branch stdev	diff avg	diff avg %
User time (seconds):	81.19	0.37	81.36	0.09	0.17	0.21%
System time (seconds):	5.03	0.08	4.86	0.05	-0.17	-3.38%
Percent of CPU this job got:	237.33%	0.01	235.33%	0.01	-2.00%	-0.84%
Maximum resident set size (kbytes):	227,141	99.14	226,667	366.48	-475	-0.21%
Minor (reclaiming a frame) page faults:	46,009	510.78	43,775	294.47	-2,234	-4.86%
Voluntary context switches:	333,442	4513.76	316,506	8850.48	-16,936	-5.08%
Involuntary context switches:	467	30.60	515	22.65	48	10.36%

Raw data in
https://docs.google.com/spreadsheets/d/10LQ3kO83EwvsFfoiySTzsTbIzfUV00Te4jQnX0b_FM4/edit?usp=sharing

Checklist

  • Required: Issue filed: ICU-23000
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@FrankYFTang FrankYFTang added the incomplete Needs work; do not approve/merge as is. label Dec 24, 2024
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang removed the incomplete Needs work; do not approve/merge as is. label Dec 26, 2024
@FrankYFTang FrankYFTang changed the title Replace CharString for LocaleBased ICU-23000 Replace CharString for LocaleBased Dec 26, 2024
FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Dec 26, 2024
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Dec 26, 2024
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@markusicu markusicu self-assigned this Dec 26, 2024
@markusicu markusicu requested a review from richgillam December 26, 2024 23:54
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.h Outdated Show resolved Hide resolved
icu4c/source/common/locbased.h Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/calendar.cpp Show resolved Hide resolved
icu4c/source/i18n/calendar.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/dcfmtsym.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/dtfmtsym.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/format.cpp Show resolved Hide resolved
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/brkiter.cpp is different
  • icu4c/source/common/charstr.cpp is now changed in the branch
  • icu4c/source/common/charstr.h is now changed in the branch
  • icu4c/source/common/locbased.cpp is different
  • icu4c/source/common/locbased.h is different
  • icu4c/source/i18n/calendar.cpp is different
  • icu4c/source/i18n/dcfmtsym.cpp is different
  • icu4c/source/i18n/dtfmtsym.cpp is different
  • icu4c/source/i18n/format.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/calendar.cpp is different
  • icu4c/source/i18n/dtfmtsym.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang
Copy link
Contributor Author

PTAL

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/brkiter.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Jan 3, 2025
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Jan 3, 2025
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang changed the title ICU-23000 Replace CharString for LocaleBased ICU-23000 Replace CharString for LocaleBased Jan 3, 2025
@FrankYFTang FrankYFTang requested a review from markusicu January 3, 2025 18:54
icu4c/source/common/locbased.cpp Show resolved Hide resolved
icu4c/source/common/locbased.cpp Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/calendar.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/dcfmtsym.cpp Outdated Show resolved Hide resolved
@FrankYFTang FrankYFTang requested a review from markusicu January 3, 2025 23:47
@FrankYFTang
Copy link
Contributor Author

PTAL

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I like it now :-)
  • I realized that the pending changes return nullptr in non-error cases; we should go back to returning an empty string as before -- hopefully I caught the relevant places in the code
  • a few stylistic changes

icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/i18n/dcfmtsym.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/brkiter.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
icu4c/source/common/locbased.cpp Outdated Show resolved Hide resolved
@FrankYFTang FrankYFTang requested a review from markusicu January 4, 2025 05:49
FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Jan 6, 2025
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang merged commit 4c3622f into unicode-org:main Jan 6, 2025
94 checks passed
@FrankYFTang FrankYFTang deleted the charStringLocale branch January 6, 2025 22:09
@richgillam
Copy link
Contributor

richgillam commented Jan 7, 2025

I'm sorry I'm getting to this late. Markus has already worked this PR over pretty thoroughly, so I may not be bringing anything useful to the party (or at least anything useful enough to justify reopening the PR or doing a follow-on PR). But I have a couple thoughts:

  1. There's a spot in the documentation for LocaleBased that says it's likely to be removed in ICU 3.0. That obviously didn't happen, so you might want to update that comment.

  2. Even with all the code in this PR that uses it, I couldn't get my mind around LocaleBased or its intended usage model. I kinda get it, and I think it's probably fine (and it's clearly been there forever), but it feels weird. Some of that might be the name, which seems strange and non-reflective of a utility class that lives for a short period of time on the stack.

  3. I'm also uncomfortable with this usage of CharString. As I understand it, CharString is generally intended to be used on the stack or as an inline member of a class; having a bunch objects that contain pointers to CharString feels uncomfortable and perhaps likely to lead to errors. Furthermore, the whole idea of CharString is to allow the user to use an inline char buffer for small strings and seamlessly move to a heap-allocated buffer for larger strings. If everything is a pointer to CharString, you're always using a heap-allocated buffer, and when the string is long, you're using two heap-allocated buffers. But it does give you the advantage of the empty string being just a null pointer. (Unless, of course, all these pointers are pointers to inline or stack-allocated CharStrings that are owned by something else, but that isn't what it looked like.)

I can't deny the memory-usage advantages you're getting here-- this is clearly a big win over what we had before-- but I can't help wondering if CharString, at least as it's currently implemented, is really the best tool for the job, or whether you want something new that's more tailored to the job you're actually doing here.

At the very least, it might be nice to encapsulate the pointer-ness in a new CharStringOrNull class that does the empty-string conversion automatically and could be swapped out later for something less heap-dependent.

But I'm fine if you don't do any of this, and I'm definitely fine if you just file a ticket to do it later. What you have is a clear improvement over what was there before.

@markusicu
Copy link
Member

  1. There's a spot in the documentation for LocaleBased that says it's likely to be removed in ICU 3.0. That obviously didn't happen, so you might want to update that comment.

good point

  1. Even with all the code in this PR that uses it, I couldn't get my mind around LocaleBased or its intended usage model. I kinda get it, and I think it's probably fine (and it's clearly been there forever), but it feels weird. Some of that might be the name, which seems strange and non-reflective of a utility class that lives for a short period of time on the stack.

Yes, it's very weird. It's really just a temporary helper for setting and fetching two common locale IDs.

It would be more natural to have an additional base class for Format and BreakIterator etc., but I suspect that this hack was introduced because additional base classes are messy in C++ and add a vtable pointer.

  1. I'm also uncomfortable with this usage of CharString. As I understand it, CharString is generally intended to be used on the stack or as an inline member of a class; having a bunch objects that contain pointers to CharString feels uncomfortable and perhaps likely to lead to errors. Furthermore, ...

The reason for using CharString here is mainly to stop having a fixed-size buffer, and to avoid adding manual, error-prone buffer-and-pointer-and-length management.

It would be much better to have a string-by-value field, but CharString is internal, and used in public APIs, so the best we can do is forward-declare it in the public headers and have pointer fields.

I can't deny the memory-usage advantages you're getting here

The nullptr-for-empty-ID part is not really for saving memory but for keeping constructors manageable.

At the very least, it might be nice to encapsulate the pointer-ness in a new CharStringOrNull class that does the empty-string conversion automatically and could be swapped out later for something less heap-dependent.

I think that that would suffer from the same internal vs. public API problem.

It seems like a cleaner way around that would be to create a public-but-@internal clone of CharString, maybe simplified a bit. But that would be more work. Or make CharString itself public-but-@internal?? Or public???

@richgillam
Copy link
Contributor

It would be more natural to have an additional base class for Format and BreakIterator etc., but I suspect that this hack was introduced because additional base classes are messy in C++ and add a vtable pointer.

Would it need to be a base class, or could it just be a separate class that the client classes include as an inline member? Or are their uses too different for that?

It would be much better to have a string-by-value field, but CharString is internal, and used in public APIs, so the best we can do is forward-declare it in the public headers and have pointer fields.

Ick. I forgot about that. Okay, you've convinced me. It's probably not worth messing with further, at least right now. :-)

@markusicu
Copy link
Member

It would be more natural to have an additional base class for Format and BreakIterator etc., but I suspect that this hack was introduced because additional base classes are messy in C++ and add a vtable pointer.

Would it need to be a base class, or could it just be a separate class that the client classes include as an inline member? Or are their uses too different for that?

A class-as-value-field should work. The class would have to be public-but-@internal.

@FrankYFTang
Copy link
Contributor Author

A class-as-value-field should work. The class would have to be public-but-@internal.

ok, I file a new bug to track that in https://unicode-org.atlassian.net/browse/ICU-23005 I will work on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants