Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url: parsing should not serialize windows drive letter twice #15490

Closed
wants to merge 1 commit into from

Conversation

bcoe
Copy link
Contributor

@bcoe bcoe commented Sep 20, 2017

es-modules do not currently work on Windows (we missed this when landing them initially because we'd missed adding a variable to vcbuild.bat/Makefile).

The underlying issue was that node::url::URL.path() was serializing windows style paths incorrectly if there was a drive letter in both the path and the base:

resolve("/D:/a/b/c.mjs", "file:///C:/a/b/c")

would result in an incorrect parse of /C:/D:/a/b/c.mjs.

the same parse in Chrome resolves to:

/D:/a/b/c.mjs.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

url,es-module

reviewer: @bmeck, @jasnell

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Sep 20, 2017
@joyeecheung joyeecheung added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Sep 20, 2017
@mscdex mscdex added the windows Issues and PRs related to the Windows platform. label Sep 20, 2017
@bcoe
Copy link
Contributor Author

bcoe commented Sep 20, 2017

Have been talking with @bmeck I think we've actually found an inconsistency in the URL parsing state machine:

new URL("/C:/a/b/c/module.mjs", "file://D:/foo/bar")

parses differently in Node vs. a browser like Chrome, resulting in the path:

/C:/D:/

which is an invalid Windows path; the actual fix is going to be in the URL parsing state machine not, dropping the drive letter is not to spec.

@jasnell jasnell requested a review from TimothyGu September 20, 2017 16:38
@jasnell
Copy link
Member

jasnell commented Sep 20, 2017

Will need to double check that this conforms with the url standard before signing off. Ping @TimothyGu

(should be fine, but just need to verify)

@bmeck
Copy link
Member

bmeck commented Sep 20, 2017

@jasnell it looks like a bug in url standard, file slash state pushes base drive letter even if specifier being resolved has a drive letter.

@jasnell
Copy link
Member

jasnell commented Sep 20, 2017

Ping @domenic @annevk

@bcoe
Copy link
Contributor Author

bcoe commented Sep 20, 2017

@TimothyGu, @jasnell, @bmeck and I were talking a bit before this pull, this seems to pull us closer to the current behavior of Chrome (and brings us closer to @bmeck's interpretation of the spec) please loop me into any tracking issues I should be following ... this was fun to debug 😛

@TimothyGu
Copy link
Member

I think the central issue is:

The underlying issue was that node::url::URL.path() was serializing a Windows style file path like so:

/C:/My Awesome Module/foo.

which really shouldn't be the case if the dedicated function for converting a file URL to a path is used:

node/src/node_url.h

Lines 166 to 168 in 75606c4

// Get the path of the file: URL in a format consumable by native file system
// APIs. Returns an empty string if something went wrong.
std::string ToFilePath();

@bcoe
Copy link
Contributor Author

bcoe commented Sep 20, 2017

@TimothyGu this turned out to be a red herring, the root issue was that the parse method incorrectly uses the Windows drive portion of both the path and base when parsing, creating an invalid Windows path.

@bmeck
Copy link
Member

bmeck commented Sep 20, 2017

@TimothyGu no, the behavior is in url, see:

new URL("/C:/in_c", "file:///D:/in_d")

// spec => file///D:/in_c <- bug in question

// chrome => file:///C:/in_c
// doesn't perform https://url.spec.whatwg.org/#file-slash-state step 2.1.1. ?

// node => file:///D:/C:/in_c
// seems to improperly skip empty check of https://url.spec.whatwg.org/#path-state 1.4.1. ?

@bcoe bcoe changed the title url: path() should not serialize drive letter url: url parsing should not serialize windows drive letter twice Sep 20, 2017
@bcoe bcoe changed the title url: url parsing should not serialize windows drive letter twice url: parsing should not serialize windows drive letter twice Sep 20, 2017
Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! So this bug has just been fixed in the spec two days ago: whatwg/url#343. We haven't had a chance to implement the change yet, but this PR actually unwittingly implements a variant of that PR.

You might also want to import the Web Platform Tests associated with that change to test/fixtures/url-tests.js, available at web-platform-tests/wpt#7326.

src/node_url.cc Outdated
@@ -1698,7 +1699,8 @@ void URL::Parse(const char* input,
} else {
if (has_base &&
base->scheme == "file:") {
if (IsNormalizedWindowsDriveLetter(base->path[0])) {
if (IsNormalizedWindowsDriveLetter(base->path[0]) &&
!(remaining > 0 && IsWindowsDriveLetter(ch, p[1]))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec uses the "starts with a Windows drive letter" algorithm which checks a few other things. You might want to use that instead by factoring out

node/src/node_url.cc

Lines 1670 to 1676 in 75606c4

if ((remaining == 0 ||
!IsWindowsDriveLetter(ch, p[1]) ||
(remaining >= 2 &&
p[2] != '/' &&
p[2] != '\\' &&
p[2] != '?' &&
p[2] != '#'))) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyGu 👍 something like?

static inline bool StartsWithWindowsDriveLetter(char ch, const char* p, int remaining) {
  bool starts_with_drive_letter = false;
  if !IsWindowsDriveLetter(ch, p[1]) || 
      (remaining >= 2 && 
       p[2] != '/' && 
       p[2] != '\\' && 
       p[2] != '?' && 
       p[2] != '#'))) { 
    starts_with_windows_drive_letter = true;
  }
  return starts_with_windows_drive_letter;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops logic is reversed in that snippet, but you get the idea; we'd switch the code you shared to !StartsWithDriveLetter().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check remaining before accessing p[1]?

@TimothyGu TimothyGu removed the lib / src Issues and PRs related to general changes in the lib or src directory. label Sep 20, 2017
@bcoe
Copy link
Contributor Author

bcoe commented Sep 20, 2017

see fix in: whatwg/url#343, discussion in whatwg/url#345

@TimothyGu
Copy link
Member

@bcoe Hey uh, have you seen #15490 (review) yet?

@bcoe
Copy link
Contributor Author

bcoe commented Sep 20, 2017

@TimothyGu whoops missed that 👍 lol, all converging on the problem at the same time. I've got a work day at npm, Inc today, but would love to see this over the finish line tonight if you don't mind. It unblocks some other pulls I have open around getting es-module tests passing in CI.

Thanks for your help.

@bcoe
Copy link
Contributor Author

bcoe commented Sep 21, 2017

@TimothyGu DRYd up as requested, switched tests over to test the same scenarios as whatwg.

Copy link
Contributor

@rmisev rmisev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some suggestions how to make this implementation closer to the URL spec.

src/node_url.cc Outdated
// https://url.spec.whatwg.org/#start-with-a-windows-drive-letter
static inline bool StartsWithWindowsDriveLetter(char ch,
const char* p,
int remaining) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL spec. does not use the remaining here, because it isn't (well) defined when c is EOF.
So I think this function can be rewritten in a bit simpler and universal form:

static inline bool StartsWithWindowsDriveLetter(const char* p,
                                                const char* end) {
  const size_t length = end - p;
  return length >= 2 &&
    IsWindowsDriveLetter(p[0], p[1]) &&
    (length == 2 ||
     p[2] == '/' ||
     p[2] == '\\' ||
     p[2] == '?' ||
     p[2] == '#');
}

src/node_url.cc Outdated
p[2] != '?' &&
p[2] != '#'))) {
if (remaining == 0 ||
!StartsWithWindowsDriveLetter(ch, p, remaining)) {
Copy link
Contributor

@rmisev rmisev Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining == 0 is unnecessary, because StartsWithWindowsDriveLetter tests for an empty or too short substring, so:

-              if (remaining == 0 ||
-                  !StartsWithWindowsDriveLetter(ch, p, remaining)) {
+              if (!StartsWithWindowsDriveLetter(p, end)) {

src/node_url.cc Outdated
@@ -1698,7 +1710,8 @@ void URL::Parse(const char* input,
} else {
if (has_base &&
base->scheme == "file:") {
if (IsNormalizedWindowsDriveLetter(base->path[0])) {
if (IsNormalizedWindowsDriveLetter(base->path[0]) &&
!StartsWithWindowsDriveLetter(ch, p, remaining)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your implementation is a bit different than in the spec. I thought about such implementation too, but it has a performance problem. Try the test: new URL("/c:/foo/bar", "file://host/path"), it sets URL’s host to base’s host here and later empties URL's host in the path state 1.4.1.1.
So I think it is better to follow the spec. and avoid unnecessary steps:

            if (has_base &&
-               base->scheme == "file:") {
+               base->scheme == "file:" &&
+               !StartsWithWindowsDriveLetter(p, end)) {
+            if (IsNormalizedWindowsDriveLetter(base->path[0])) {
-            if (IsNormalizedWindowsDriveLetter(base->path[0]) &&
-              !StartsWithWindowsDriveLetter(ch, p, remaining)) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmisev thank you for the through review, I've implemented your suggestions. I think it might be good to eventually pull in all of url/urltestdata.json to our test suite, I might do that as a follow up pull request (thought @TimothyGu?).

@TimothyGu
Copy link
Member

@bcoe We already have the entirety of urltestdata.json. It's called test/fixtures/url-test.js. See #15490 (review).

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after url-test.js is updated (and the nits are addressed).

@@ -104,3 +104,15 @@ TEST_F(URLTest, ToFilePath) {

#undef T
}

// https://github.com/w3c/web-platform-tests/pull/7326/files
TEST_F(URLTest, PathDriveLetter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test will be necessary once url-test.js is updated.

@@ -552,6 +552,19 @@ static inline bool IsSpecial(std::string scheme) {
return false;
}

// https://url.spec.whatwg.org/#start-with-a-windows-drive-letter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a https://

Copy link
Contributor Author

@bcoe bcoe Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something:

// https://url.spec.whatwg.org/#start-with-a-windows-drive-letter

Does have https, perhaps misread?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm sorry. It's probably a Chrome extension I'm using.

@TimothyGu
Copy link
Member

Would you mind updating the commit hash in the file header for url-tests.js as well?

https://github.com/w3c/web-platform-tests/blob/8df7c9c215/url/urltestdata.json

Address issue with Windows drive letter handling that was
causing es-module test suite to fail, see:
whatwg/url#343
@bcoe
Copy link
Contributor Author

bcoe commented Sep 21, 2017

@TimothyGu I think I've addressed your comments.

@BridgeAR
Copy link
Member

@BridgeAR
Copy link
Member

Landed in 456d8e2

@BridgeAR BridgeAR closed this Sep 24, 2017
BridgeAR pushed a commit that referenced this pull request Sep 24, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail.

PR-URL: #15490
Ref: whatwg/url#343
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
jasnell pushed a commit that referenced this pull request Sep 25, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail.

PR-URL: #15490
Ref: whatwg/url#343
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
@sam-github sam-github mentioned this pull request Sep 25, 2017
2 tasks
addaleax pushed a commit to addaleax/ayo that referenced this pull request Sep 30, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail.

PR-URL: nodejs/node#15490
Ref: whatwg/url#343
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Ruben Bridgewater <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. whatwg-url Issues and PRs related to the WHATWG URL implementation. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants