Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22940 MF2 ICU4C: Error checking improvements in parser #3306

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

catamorphism
Copy link
Contributor

@catamorphism catamorphism commented Dec 13, 2024

Improve checking for OOM errors when allocating UnicodeSets, per post-merge comments on #3236

Checklist

  • Required: Issue filed: ICU-22940
  • Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

Copy link
Contributor

@FrankYFTang FrankYFTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

}

UnicodeSet* result = new UnicodeSet(*unisets::getImpl(unisets::ALPHA));
unisets::gUnicodeSets[unisets::ALPHA] = initAlpha(status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just

UnicodeSet* isAlpha = unisets::gUnicodeSets[unisets::ALPHA] =  initAlpha(status);
if (U_FAILURE(status)) {
        return nullptr;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e2990d2

}

unisets::gUnicodeSets[unisets::NAME_START] = initNameStartChars(status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplified to

UnicodeSet* nameStart = unisets::gUnicodeSets[unisets::NAME_START] = initNameStartChars(status);
UnicodeSet* digit = unisets::gUnicodeSets[unisets:: DIGIT] = initDigits(status);
if (U_FAILURE(status)) {
   return nullptr;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e2990d2

}

unisets::gUnicodeSets[unisets::CONTENT] = initContentChars(status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplfied to

    UnicodeSet* content = unisets::gUnicodeSets[unisets::CONTENT] = initContentChars(status);
    UnicodeSet* whitespace = unisets::gUnicodeSets[unisets::WHITESPACE] = initWhitespace(status);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e2990d2

FrankYFTang
FrankYFTang previously approved these changes Jan 9, 2025
Copy link
Contributor

@FrankYFTang FrankYFTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

initTextChars depends on
initContentChars
initWhitespace
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the end of this function, we should do the following

if (U_FAILURE(status)) {
    cleanupMF2ParseUniSets();
}

so in case the failure is due to memory stress for the initialization, gMF2ParseUniSetsInitOnce will be reset after all the allocated UnicodeSet inside gUnicodeSets be deleted so it has a second chance to be sucessful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change in 7f19865 -- can you re-approve? Thanks!

Improve checking for OOM errors when allocating UnicodeSets,
per post-merge comments on unicode-org#3236
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@catamorphism catamorphism merged commit f8aa68b into unicode-org:main Jan 10, 2025
94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants