Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: support domains with empty labels in i18n #12707

Closed
wants to merge 1 commit into from

Conversation

watilde
Copy link
Contributor

@watilde watilde commented Apr 27, 2017

Summary

The domainToUnicode and the domainToASCII could be used in the middle of parsing the origin, and it means the input values could be a non-final domain name label. Then the converter should ignore the UIDNA_ERROR_EMPTY_LABEL error. Since long domain name labels and long domain names are invalid already, it might not need to ignore UIDNA_ERROR_LABEL_TOO_LONG and UIDNA_ERROR_DOMAIN_NAME_TOO_LONG.

Refs:

Updates
  • src: support domains with empty labels in i18n
  • test: synchronise WPT url test data
Checklist
Affected core subsystem(s)

src, test

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation. labels Apr 27, 2017
@watilde watilde added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Apr 27, 2017
@watilde
Copy link
Contributor Author

watilde commented Apr 28, 2017

@watilde
Copy link
Contributor Author

watilde commented Apr 29, 2017

cc @nodejs/url

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since long domain name labels and long domain names are invalid already, it might not need to ignore UIDNA_ERROR_LABEL_TOO_LONG and UIDNA_ERROR_DOMAIN_NAME_TOO_LONG.

Can you elaborate on this? What do you mean by "invalid already"?

src/node_i18n.cc Outdated
@@ -461,6 +461,9 @@ int32_t ToUnicode(MaybeStackBuffer<char>* buf,
&status);
}

if (info.errors != 0)
info.errors &= ~UIDNA_ERROR_EMPTY_LABEL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave off the if as the & will do nothing if the error is not present, and I'd add a reference to https://url.spec.whatwg.org/#concept-domain-to-ascii (especially the VerifyDnsLength bit) for why we are ignoring this error.

@watilde
Copy link
Contributor Author

watilde commented Apr 29, 2017

@TimothyGu Sorry for my unclear description. I meant we cannot ignore UIDNA_ERROR_LABEL_TOO_LONG and UIDNA_ERROR_DOMAIN_NAME_TOO_LONG that were mentioned in nodejs/Intl#44 since they are not allowed as is written in the test code.

--
PS
Oops, we should allow the long labels/domains as well in domainToASCII because of VerifyDnsLength set to false as you said. I'm going to update it.

@watilde watilde force-pushed the feature/url-test branch 3 times, most recently from dd3b613 to f06dd84 Compare April 30, 2017 00:59
@watilde
Copy link
Contributor Author

watilde commented Apr 30, 2017

src/node_i18n.cc Outdated
@@ -500,6 +503,11 @@ int32_t ToASCII(MaybeStackBuffer<char>* buf,
&status);
}

// https://url.spec.whatwg.org/#concept-domain-to-ascii
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add an explicit reference to the VerifyDnsLength option.

// The WHATWG URL "domain to ASCII" algorithm explicitly sets the
// VerifyDnsLength flag to false, which disables the domain name length
// verification step in ToASCII (as specified by UTS #46). Unfortunately,
// ICU4C's IDNA module does not support disabling this flag through `options`,
// so just filter out the errors that may be caused by the verification step
// afterwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the actual comment example! I put it into the commit :)

src/node_i18n.cc Outdated
@@ -461,6 +461,9 @@ int32_t ToUnicode(MaybeStackBuffer<char>* buf,
&status);
}

// https://url.spec.whatwg.org/#concept-domain-to-unicode
info.errors &= ~UIDNA_ERROR_EMPTY_LABEL;
Copy link
Member

@TimothyGu TimothyGu Apr 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is a bit tricky. A more detailed comment like the comment should be warranted:

// UTS #46's ToUnicode operation applies no validation of domain name length
// (nor a flag requesting it to do so, like VerifyDnsLength for ToASCII). For
// that reason, unlike ToASCII below, ICU4C correctly accepts long domain
// names. However, ICU4C still sets the EMPTY_LABEL error in contrary to UTS
// #46. Therefore, explicitly filters out that error here.

Follow the spec of domainToASCII/domainToUnicode in whatwg, and
synchronise WPT url test data.

Refs: web-platform-tests/wpt#5397
@watilde
Copy link
Contributor Author

watilde commented Apr 30, 2017

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for upstreaming the WPT changes as well!

watilde added a commit that referenced this pull request May 1, 2017
Follow the spec of domainToASCII/domainToUnicode in whatwg, and
synchronise WPT url test data.

Refs: web-platform-tests/wpt#5397
PR-URL: #12707
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
@watilde
Copy link
Contributor Author

watilde commented May 1, 2017

landed in 0f58d3c

@watilde watilde closed this May 1, 2017
@watilde watilde deleted the feature/url-test branch May 1, 2017 17:24
anchnk pushed a commit to anchnk/node that referenced this pull request May 6, 2017
Follow the spec of domainToASCII/domainToUnicode in whatwg, and
synchronise WPT url test data.

Refs: web-platform-tests/wpt#5397
PR-URL: nodejs#12707
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
@TimothyGu
Copy link
Member

Looks like we should backport this to v7.x, no? /cc @watilde

@watilde
Copy link
Contributor Author

watilde commented May 11, 2017

Yes, we should backport it into v7. I'm not sure I should make a patch for it, will check it if it makes conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants