Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support requiring .mjs files #30891

Conversation

weswigham
Copy link

This implements the ability to use require on .mjs files, loaded via the esm loader, using the same tradeoffs that top level await makes in esm itself.

What this means: If possible, all execution and evaluation is done synchronously, via immediately unwrapping the execution's component promises. This means that any and all existing code should have no observable change in behavior, as there exist no asynchronous modules as of yet. The catch is that once a module which requires asynchronous execution is used, it must yield to the event loop to perform that execution, which, in turn, can allow other code to execute before the continuation after the async action, which is observable to callers of the now asynchronous module. If this matters to your callers, this means making your module execution asynchronous could be considered a breaking change to your library, however in practice, it will not matter for most callers. Moreover, as the ecosystem exists today, there are zero asynchronously executing modules, and so until there are, there are no downsides to this approach at all, as no execution is changed from what one would expect today (excepting, ofc, that it's no longer an error to require("./foo.mjs").

As of right now, this is missing:

  • Support for "type": "module" in the cjs loader. (where did that go? I thought we already had something that did that internally - getPackageType or something?)
  • New tests (I have no idea how to write tests for node, help appreciated)
  • Speculative integration tests for src: add support for top level await #30370 - the code is already written assuming ModuleWrap.evaluate may eventually return a promise.

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Dec 10, 2019
@ljharb
Copy link
Member

ljharb commented Dec 10, 2019

cc @nodejs/modules-active-members

@guybedford
Copy link
Contributor

Discussion for this PR in nodejs/modules#454.

Copy link
Member

@devsnek devsnek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cannot be generalized to TLA modules because their execution could be waiting on something not owned by the libuv event loop, which would result in execution deadlocking.

@devsnek devsnek added esm Issues and PRs related to the ECMAScript Modules implementation. experimental Issues and PRs related to experimental features. module Issues and PRs related to the module subsystem. and removed c++ Issues and PRs that require attention from people who are familiar with C++. labels Dec 10, 2019
@weswigham
Copy link
Author

weswigham commented Dec 10, 2019

this cannot be generalized to TLA modules because their execution could be waiting on something not owned by the libuv event loop, which would result in execution deadlocking.

You have an example? TLA can already trivially deadlock in a huge number of ways (there's an faq entry defending it), what are you thinking of?

@devsnek
Copy link
Member

devsnek commented Dec 10, 2019

@weswigham literally any async behaviour that isn't a one-shot call to a libuv api. interestingly we had an issue opened a few days ago asking about deasync behaviour, you might find it interesting: #30634

* So long as the module doesn't contain TLA, this will be sync, otherwise it
* appears async
*/
promiseWait(instantiated.evaluate(-1, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will throw if the last expression of a module source returns a rejected promise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooo spicy - is there a way to distinguish that from a TLA module that threw once #30370 is in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once TLA is enabled, the result of module.evaluate() is either undefined or Promise<undefined>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, yeah, I guess I just don't need this promiseWait call until #30370 is in. I'll remove it for now.

Copy link
Author

@weswigham weswigham Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once TLA is enabled, the result of module.evaluate() is either undefined or Promise.

Suppose I have

// @filename: mod.mjs
new Promise(resolve => resolve(undefined))

how am I to distinguish that from a module with TLA? I think conflating the evaluation of the final expression and the completion promise, API-wise, is problematic.

@weswigham
Copy link
Author

literally any async behaviour that isn't a one-shot call to a libuv api

Such as? AFAIK all the library ones go into the event loop or the microtask queue, no (both of which are being advanced)? As far as I know, this centralization of async handling is, like, a major feature of javascript/node.

Module._extensions['.mjs'] = function(module, filename) {
const ESMLoader = asyncESM.ESMLoader;
const url = `${pathToFileURL(filename)}`;
const job = ESMLoader.getModuleJobWorker(url, 'module');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this just skips the esm resolver entirely? this seems like a rather dangerous divergence.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because we already resolved the specifier, in order to know we needed the esm loader. The cjs resolver and the esm resolver are supposed to resolve the same specifier to the same thing, so it should be fine. This is part of why I kept saying it really should only be considered to be one loader.

@devsnek
Copy link
Member

devsnek commented Dec 10, 2019

@weswigham third party libraries can run their own async logic and then push the result into the microtask queue to be handled. additionally, node can't warn/throw/etc when it won't work, it will just sit there keeping your cpu pinned to 100% while it busy loops.

@weswigham
Copy link
Author

third party libraries can run their own async logic and then push the result into the microtask queue to be handled

That still works, no? Or does node somehow sleep in that scenario normally...? What's the normal behavior in that situation? node's event loop is empty - therefore I would think that the process dies, and the third party native module's async callback gets dropped? Tell me more.

@devsnek
Copy link
Member

devsnek commented Dec 10, 2019

i'm not entirely familiar with how these libraries work, there are more details of weirdness in #30634 and the deasync repo.

@weswigham
Copy link
Author

weswigham commented Dec 10, 2019

Looks like the reported problem in deasync tracked back to an issue in node that was never addressed: nodejs/node-v0.x-archive#25831 (because of the stigma associated with the deasync pattern itself)

Timer handles could mangle one another, apparently. The bug as originally described no longer appears to exist (I can freely unwrap a number of layers of nested async promises/setTimeouts just fine). So that shouldn't matter.

This implements the ability to use require on .mjs files, loaded
via the esm loader, using the same tradeoffs that top level await
makes in esm itself.

What this means: If possible, all execution and evaluation is done
synchronously, via immediately unwrapping the execution's component
promises. This means that any and all existing code should have no
observable change in behavior, as there exist no asynchronous modules as
of yet. The catch is that once a module which requires asynchronous
execution is used, it must yield to the event loop to perform that
execution, which, in turn, can allow other code to execute before the
continuation after the async action, which is observable to callers of
the now asynchronous module. If this matters to your callers, this means
making your module execution asynchronous could be considered a breaking
change to your library, however in practice, it will not matter for most
callers. Moreover, as the ecosystem exists today, there are zero
asynchronously executing modules, and so until there are, there are no
downsides to this approach at all, as no execution is changed from what
one would expect today (excepting, ofc, that it's no longer an error to
require("./foo.mjs").

Ref: nodejs/modules#308
Ref: https://github.com/nodejs/modules/issues/299
Ref: nodejs/modules#454
…pport is in, and until then, the results are a bit off
@weswigham weswigham force-pushed the support-require-esm-in-the-style-of-TLA branch from 5a8c931 to afe50b9 Compare December 10, 2019 23:06
@devsnek
Copy link
Member

devsnek commented Dec 10, 2019

@weswigham that's not the only issue... https://github.com/abbr/deasync/issues

there are several unsolved compatibility issues listed, and unless you could prove they're all fixed and that no new ones would pop up, i don't see how we can use this approach.

Copy link
Contributor

@jkrems jkrems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would introduce official support for deasync to node core. As in: It would be possible to write code that awaits arbitrary async work using a synchronous call. I don't think that there should be an official node API that allows awaiting in the middle of synchronous call chains. All node APIs are designed assuming that certain patterns are safe, like attaching event listeners on returned objects within the same synchronous flow. With this change, innocent looking code would be able to trigger error events before event listeners have been added etc..

I don't believe node's own APIs are compatible with the feature introduced here. Also, adding this API to node moves node away from the expected execution flow of JS which is that code runs to completion, one event loop tick at a time.

@weswigham
Copy link
Author

weswigham commented Dec 10, 2019

there are several unsolved compatibility issues listed, and unless you could prove they're all fixed and that no new ones would pop up, i don't see how we can use this approach.

Aside: you can't prove software is bug free, per sey, you can just test extensively to increase the your confidence that faults likely don't exist~

Also, this is a much, much higher level primitive than deasync exposed - this is essentially just an off-brand await, implemented in the c++ engine that attempts to avoid spinning event loop turns, if possible. deasync left people to compose their own sync methods, which often meant that exceptions weren't handled and got left hanging (and you do have to explicitly unwrap the exception out of the promise and rethrow it into the context to get it right, which is a notable thing missing in may of those issues). IMO, deasync's implementation is also crazy roundabout, since it relies on node-style-continuation callbacks, and quietly fails if the callback isn't actually node-style (eg, error is swapped, the the return value is the error). By only unwrapping promises, this is way, way simpler. And a solid half of it's issues are just related to it being a native module that doesn't always build when the node api changes.

I'm off topic - we shouldn't be discussing an external library that does a kinda-similarish-but-not-really thing to what some internals are doing in this PR. We should be looking to increase confidence in this PR with concrete tests that help us believe it's good, or provide concrete testable examples as to why it's not.

@weswigham
Copy link
Author

I don't believe node's own APIs are compatible with the feature introduced here

I don't believe node's whole ecosystem is reasonably compatible with es modules without the feature introduced here.

@devsnek
Copy link
Member

devsnek commented Dec 10, 2019

deasync internals and this pr do the same thing, even if the c++ is slightly different. I'm pointing to previous examples of this approach as being inherently unstable for generalized use in node.

@weswigham
Copy link
Author

weswigham commented Dec 10, 2019

deasync internals and this pr do the same thing

deasync internals just exports raw uv_run, which is absolutely not what this is doing - this has a fair bit more around that to handle unwrapping and executing promises appropriately (and actually checking loop completion state), all of which is also only doable in the c++ layer. node internals safely use uv_run elsewhere - it's a matter of maintaining the expected usage of the event loop - as is, this is way closer to what the platform is already doing in node_main_instance.cc (including using the caller's environment's loop, rather than the "default" loop, something deasync failed to do, which is likely the cause of the issue with Workers).

@weswigham
Copy link
Author

With this change, innocent looking code would be able to trigger error events before event listeners have been added etc..

Only if that code comes from an es module. And es modules can already do that amongst themselves, thanks to TLA. Ergo, nothing has changed there. This is just porting the ability to utilize that behavior to the cjs resolver, which is immensely useful for compatibility.

@bmeck
Copy link
Member

bmeck commented Dec 16, 2019

@weswigham

Spec-wise, would there anything preventing one, from, say, converting the cjs wrapper function to a wrapper generator and having an invocation of require yield that generator?

No, CJS isn't JS and isn't bound by any TC39 semantics; though, you would need to do a much more thorough transform. Every call site of require() needs every execution context to be in a yield position for such a transform to properly be written in JS (including WASM execution contexts [theoretically possible if you also transform all the WASM?], dynamic codegen execution contexts like Function/eval (would need another hook for V8 to intercept direct eval src and transform it), and C++ execution contexts [improbable?]). Either way, this kind of thinking is about how to take non-JS and make it JS which I don't really find compelling to say that the non-transformed source is JS if it requires a very invasive transform. JS is turing complete so we can compile pretty much anything to it; the questions in this PR's comments are more about the source passed to Node being JS, not if Node can transform something that isn't JS into JS.

@jkrems
Copy link
Contributor

jkrems commented Dec 16, 2019

Every call site of require() needs every execution context to be in a yield position

Just to make sure the scope of this is clear: require is a function call that currently may exist in any expression position, not only in the top-most function body. So a more extreme way to put it would be "if require calls turn into a yield expression, all functions running in node would have to become generators". "All functions" may be too broad but given how dynamic JS is, it's pretty close.

So I would say: Yes, there is something spec-wise to prevent it. Because that transformation would realistically also apply to all "real"/spec'd JS (ES modules), not just to CJS. Otherwise call sites into functions that use inline-require would break. That's of course unless there would be a way to isolate CJS being called from JS code.

@ljharb
Copy link
Member

ljharb commented Dec 16, 2019

There’s also makeRequireFunction in ESM that would need to act the same as CJS require.

@jkrems
Copy link
Contributor

jkrems commented Dec 18, 2019

One thing that we ran out of time for in our call: If the idea is that require(esm) can run a syncified version of the entire module loading pipeline, how does this affect concurrent import jobs? E.g. say an import job is currently active that touches on a subset of the same modules involved in the required esm. What happens to that job? Do we somehow prevent that job to yield back to the event loop? If so, how? Or would we effectively say that all import jobs become synchronous (in relation to the main thread) and accept that loading and compiling code in the background is no longer a feature of ESM that node supports?

@bmeck
Copy link
Member

bmeck commented Dec 18, 2019

@jkrems

say an import job is currently active that touches on a subset of the same modules involved in the required esm. What happens to that job?

Sharing and graph execution prior to the queued async job, same as what import() can do.

// imagine a and b share a module c
const a = import('will load after b (long fetch lets say)');
const b = import('will load before a');

Can lead to c => b => a even though a queued first.

Do we somehow prevent that job to yield back to the event loop? If so, how?

I don't fully understand this question. Off Realm resolvers would be a separate loop in the same way a Worker is. The resolver would yield back to its own event loop eventually still. The main realm would be paused via a mechanism like Atomics.wait.

Or would we effectively say that all import jobs become synchronous (in relation to the main thread) and accept that loading and compiling code in the background is no longer a feature of ESM that node supports?

Not necessarily, you only need to perform locking for synchronous call sites. This is similar to XHR differentiating sync vs async but having the same API (probably not the best example).

@jkrems
Copy link
Contributor

jkrems commented Dec 18, 2019

@bmeck What I meant is the following:

// Simple example, it becomes more complicated with deeper nesting

import('a');
// create import job for a but continue execution.
// create a callback to process the result of loading / instantiate the module on the main thread.

require('a');
// wants to reuse import job for a but that job is currently set up to yield back to the main thread
// when it's not actively interacting with the Isolate.
// What does it do? Does it modify the original job? Does it synchronously set the job's resolutions?
// Also implied: Does this mean that everything to do with import needs to happen in C++?

Right now import isn't one chunk of work that blocks the main thread for an extended period of time (or rather: it doesn't have to be). This is one of the major advantages over require: Loading and compiling code doesn't prevent execution on the main thread.

A possible import flow would be:

  MT                    OT
import
  |                  resolveUrl
(unblocked!)        loadResource
  |                  transform
CompileModule
  |
(unblocked)         compileSource
  |
execute

Multiple concurrent imports aren't an issue - none of them blocks the main thread and could prevent progress in the others. But if we use normal task scheduling on the main thread for loader jobs (which we do right now), then blocking the main thread while a loader job is running will be a problem.

One solution is indeed blocking the main thread, via Atomics.wait or any other mechanism doesn't really matter here. But I haven't seen anything saying out loud that this will mean that ESM loading in node will de-facto have the same main thread blocking issues that CJS currently has. If that's the case, we effectively have a sync loader for ESM that as an implementation detail makes blocking calls to another thread where technically async code is running.

@bmeck
Copy link
Member

bmeck commented Dec 18, 2019

// wants to reuse import job for a but that job is currently set up to yield back to the main thread

I don't understand this, the each ModuleJob is independent and only the entire graph yields back (though I'm unclear on the "yield back" phrasing).

// What does it do? Does it modify the original job? Does it synchronously set the job's resolutions?

Modify and resolve, both synchronously. I'm unclear on the problem being framed with that as ModuleJob is not a Promise in and of itself.

// Also implied: Does this mean that everything to do with import needs to happen in C++?

Not to my knowledge

@jkrems
Copy link
Contributor

jkrems commented Dec 18, 2019

I don't understand this, the each ModuleJob is independent and only the entire graph yields back (though I'm unclear on the "yield back" phrasing).

Yield back as "allow JS to run on the main thread". JS can run on the main thread while a file is loading for example. The way the module job is currently written, we do yield control of the main thread all the time, not just when the entire module graph has been loaded. Which means that the individual steps that need to run on the main thread (like creating objects in the Isolate) have to be scheduled somehow. Right now we use promises to do it. If we don't call them, nothing happens.

So if we can't use promises to schedule those main thread units of work, what is the replacement? Will we have a separate scheduler that needs to be called from the main event loop but also can be called blocking?

I'm unclear on the problem being framed with that as ModuleJob is not a Promise in and of itself.

The module job isn't a Promise but uses Promises to schedule work. That's why I used "resolutions" - a job depends on many resolutions right now, one for each time it needs to do work on the main thread.

@weswigham
Copy link
Author

So if we can't use promises to schedule those main thread units of work, what is the replacement?

ESM graph execution actually isn't "scheduled" at all yet. It's 100% sync (as TLA isn't actually in yet). And, to be clear: to be to spec for TLA, you can't naively hook it up with just chained promises, or you don't preserve sync module graphs, which is required. Not that any of that really matters, since v8 wires up all the executions anyway - we just call .execute on the root v8 module object. Today, that's always sync. Post-TLA, it'll return a promise or not. The non-promise return will indicate execution was purely sync, the promise return'll indicate async execution was required. We can then return that promise or throw (which seems preferable based on today's discussion) as desired. All that ModuleJob scheduling chicanery is actually handled by v8 itself - we just kick it off by specifying an entrypoint (and doing the resolution).

The module job isn't a Promise but uses Promises to schedule work. That's why I used "resolutions" - a job depends on many resolutions right now, one for each time it needs to do work on the main thread.

Yes and no. It does right now, but it doesn't need to. It's implemented as such to be API forward compatible with potential in-thread async resolver hooks; but those are problematic for other reasons, so running them off-thread (and simplifying the API to not be unnecessarily async tagged) is nicer anyway (plus, the API is internal anyway, so it was a bit of a premature over-design).

@jkrems
Copy link
Contributor

jkrems commented Dec 18, 2019

ESM graph execution actually isn't "scheduled" at all yet.

I wasn't talking about execution, I was talking about load and compile. My graph above ended with a single execute but I maybe should've been clear that it was only included for illustrative purposes as "the thing happening after the whole process is over". :)

All that ModuleJob scheduling chicanery is actually handled by v8 itself - we just kick it off by specifying an entrypoint (and doing the resolution).

It's not, the module job is handled by the embedder which in this case is us. We lifted the name "module job" from the HTML spec which is implemented in Chrome/Blink. V8 has no orchestration for resolution and resource loading built-in afaik.

Yes and no. It does right now, but it doesn't need to.

That's what I wrote above. :) But I also mentioned very specific advantages for allowing fetch&compile to happen while execution of the main thread continues. It's not just "because we can". It's because it's a known issue with require that blocking the main thread while dependencies load and parse can be costly. At least from my side running async resolvers in-thread was never a goal.

@BridgeAR
Copy link
Member

Without looking through all the comments: this has four -1 and there was no movement since a couple of weeks. Should this stay open for further discussions or would it be fine to close this?

@MylesBorins
Copy link
Contributor

@BridgeAR it is actively being worked on please do not close

@Raynos
Copy link
Contributor

Raynos commented May 28, 2020

From reading the test suite changes it looks like this PR will fix #33605

@GeoffreyBooth
Copy link
Member

As of right now, this is missing:

  • Support for "type": "module" in the cjs loader. (where did that go? I thought we already had something that did that internally - getPackageType or something?)

@weswigham I think #33229 should have resolved this part, at least. Are you still working on this PR?

@dfabulich
Copy link
Contributor

In a comment on Wes' proposal issue for this PR @weswigham wrote:

Ah, yep, nodejs/node#34558 landed yesterday. I might be able to remake/update nodejs/node#30891 in the hopefully nearish future.

So, yes, please don't close this PR!

@addaleax
Copy link
Member

I’ll close this as an approach that spins the event loop while it is already running is currently a non-starter purely from a technical point of view, as mentioned above, and there hasn’t been activity in a year here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. experimental Issues and PRs related to experimental features. module Issues and PRs related to the module subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.