-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support requiring .mjs files #30891
[WIP] Support requiring .mjs files #30891
Conversation
cc @nodejs/modules-active-members |
Discussion for this PR in nodejs/modules#454. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this cannot be generalized to TLA modules because their execution could be waiting on something not owned by the libuv event loop, which would result in execution deadlocking.
You have an example? TLA can already trivially deadlock in a huge number of ways (there's an faq entry defending it), what are you thinking of? |
@weswigham literally any async behaviour that isn't a one-shot call to a libuv api. interestingly we had an issue opened a few days ago asking about deasync behaviour, you might find it interesting: #30634 |
lib/internal/modules/cjs/loader.js
Outdated
* So long as the module doesn't contain TLA, this will be sync, otherwise it | ||
* appears async | ||
*/ | ||
promiseWait(instantiated.evaluate(-1, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will throw if the last expression of a module source returns a rejected promise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooo spicy - is there a way to distinguish that from a TLA module that threw once #30370 is in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once TLA is enabled, the result of module.evaluate()
is either undefined
or Promise<undefined>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, yeah, I guess I just don't need this promiseWait
call until #30370 is in. I'll remove it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once TLA is enabled, the result of module.evaluate() is either undefined or Promise.
Suppose I have
// @filename: mod.mjs
new Promise(resolve => resolve(undefined))
how am I to distinguish that from a module with TLA? I think conflating the evaluation of the final expression and the completion promise, API-wise, is problematic.
Such as? AFAIK all the library ones go into the event loop or the microtask queue, no (both of which are being advanced)? As far as I know, this centralization of async handling is, like, a major feature of javascript/ |
Module._extensions['.mjs'] = function(module, filename) { | ||
const ESMLoader = asyncESM.ESMLoader; | ||
const url = `${pathToFileURL(filename)}`; | ||
const job = ESMLoader.getModuleJobWorker(url, 'module'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this just skips the esm resolver entirely? this seems like a rather dangerous divergence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because we already resolved the specifier, in order to know we needed the esm loader. The cjs resolver and the esm resolver are supposed to resolve the same specifier to the same thing, so it should be fine. This is part of why I kept saying it really should only be considered to be one loader.
@weswigham third party libraries can run their own async logic and then push the result into the microtask queue to be handled. additionally, node can't warn/throw/etc when it won't work, it will just sit there keeping your cpu pinned to 100% while it busy loops. |
That still works, no? Or does |
i'm not entirely familiar with how these libraries work, there are more details of weirdness in #30634 and the deasync repo. |
Looks like the reported problem in Timer handles could mangle one another, apparently. The bug as originally described no longer appears to exist (I can freely unwrap a number of layers of nested async promises/ |
This implements the ability to use require on .mjs files, loaded via the esm loader, using the same tradeoffs that top level await makes in esm itself. What this means: If possible, all execution and evaluation is done synchronously, via immediately unwrapping the execution's component promises. This means that any and all existing code should have no observable change in behavior, as there exist no asynchronous modules as of yet. The catch is that once a module which requires asynchronous execution is used, it must yield to the event loop to perform that execution, which, in turn, can allow other code to execute before the continuation after the async action, which is observable to callers of the now asynchronous module. If this matters to your callers, this means making your module execution asynchronous could be considered a breaking change to your library, however in practice, it will not matter for most callers. Moreover, as the ecosystem exists today, there are zero asynchronously executing modules, and so until there are, there are no downsides to this approach at all, as no execution is changed from what one would expect today (excepting, ofc, that it's no longer an error to require("./foo.mjs"). Ref: nodejs/modules#308 Ref: https://github.com/nodejs/modules/issues/299 Ref: nodejs/modules#454
…pport is in, and until then, the results are a bit off
5a8c931
to
afe50b9
Compare
@weswigham that's not the only issue... https://github.com/abbr/deasync/issues there are several unsolved compatibility issues listed, and unless you could prove they're all fixed and that no new ones would pop up, i don't see how we can use this approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would introduce official support for deasync to node core. As in: It would be possible to write code that awaits arbitrary async work using a synchronous call. I don't think that there should be an official node API that allows awaiting in the middle of synchronous call chains. All node APIs are designed assuming that certain patterns are safe, like attaching event listeners on returned objects within the same synchronous flow. With this change, innocent looking code would be able to trigger error events before event listeners have been added etc..
I don't believe node's own APIs are compatible with the feature introduced here. Also, adding this API to node moves node away from the expected execution flow of JS which is that code runs to completion, one event loop tick at a time.
Aside: you can't prove software is bug free, per sey, you can just test extensively to increase the your confidence that faults likely don't exist~ Also, this is a much, much higher level primitive than I'm off topic - we shouldn't be discussing an external library that does a kinda-similarish-but-not-really thing to what some internals are doing in this PR. We should be looking to increase confidence in this PR with concrete tests that help us believe it's good, or provide concrete testable examples as to why it's not. |
I don't believe node's whole ecosystem is reasonably compatible with es modules without the feature introduced here. |
deasync internals and this pr do the same thing, even if the c++ is slightly different. I'm pointing to previous examples of this approach as being inherently unstable for generalized use in node. |
|
Only if that code comes from an es module. And es modules can already do that amongst themselves, thanks to TLA. Ergo, nothing has changed there. This is just porting the ability to utilize that behavior to the cjs resolver, which is immensely useful for compatibility. |
No, CJS isn't JS and isn't bound by any TC39 semantics; though, you would need to do a much more thorough transform. Every call site of |
Just to make sure the scope of this is clear: So I would say: Yes, there is something spec-wise to prevent it. Because that transformation would realistically also apply to all "real"/spec'd JS (ES modules), not just to CJS. Otherwise call sites into functions that use inline- |
There’s also makeRequireFunction in ESM that would need to act the same as CJS require. |
One thing that we ran out of time for in our call: If the idea is that |
Sharing and graph execution prior to the queued async job, same as what // imagine a and b share a module c
const a = import('will load after b (long fetch lets say)');
const b = import('will load before a'); Can lead to
I don't fully understand this question. Off Realm resolvers would be a separate loop in the same way a Worker is. The resolver would yield back to its own event loop eventually still. The main realm would be paused via a mechanism like
Not necessarily, you only need to perform locking for synchronous call sites. This is similar to XHR differentiating sync vs async but having the same API (probably not the best example). |
@bmeck What I meant is the following: // Simple example, it becomes more complicated with deeper nesting
import('a');
// create import job for a but continue execution.
// create a callback to process the result of loading / instantiate the module on the main thread.
require('a');
// wants to reuse import job for a but that job is currently set up to yield back to the main thread
// when it's not actively interacting with the Isolate.
// What does it do? Does it modify the original job? Does it synchronously set the job's resolutions?
// Also implied: Does this mean that everything to do with import needs to happen in C++? Right now A possible
Multiple concurrent One solution is indeed blocking the main thread, via |
I don't understand this, the each ModuleJob is independent and only the entire graph yields back (though I'm unclear on the "yield back" phrasing).
Modify and resolve, both synchronously. I'm unclear on the problem being framed with that as ModuleJob is not a Promise in and of itself.
Not to my knowledge |
Yield back as "allow JS to run on the main thread". JS can run on the main thread while a file is loading for example. The way the module job is currently written, we do yield control of the main thread all the time, not just when the entire module graph has been loaded. Which means that the individual steps that need to run on the main thread (like creating objects in the Isolate) have to be scheduled somehow. Right now we use promises to do it. If we don't call them, nothing happens. So if we can't use promises to schedule those main thread units of work, what is the replacement? Will we have a separate scheduler that needs to be called from the main event loop but also can be called blocking?
The module job isn't a Promise but uses Promises to schedule work. That's why I used "resolutions" - a job depends on many resolutions right now, one for each time it needs to do work on the main thread. |
ESM graph execution actually isn't "scheduled" at all yet. It's 100% sync (as TLA isn't actually in yet). And, to be clear: to be to spec for TLA, you can't naively hook it up with just chained promises, or you don't preserve sync module graphs, which is required. Not that any of that really matters, since v8 wires up all the executions anyway - we just call
Yes and no. It does right now, but it doesn't need to. It's implemented as such to be API forward compatible with potential in-thread async resolver hooks; but those are problematic for other reasons, so running them off-thread (and simplifying the API to not be unnecessarily async tagged) is nicer anyway (plus, the API is internal anyway, so it was a bit of a premature over-design). |
I wasn't talking about execution, I was talking about load and compile. My graph above ended with a single execute but I maybe should've been clear that it was only included for illustrative purposes as "the thing happening after the whole process is over". :)
It's not, the module job is handled by the embedder which in this case is us. We lifted the name "module job" from the HTML spec which is implemented in Chrome/Blink. V8 has no orchestration for resolution and resource loading built-in afaik.
That's what I wrote above. :) But I also mentioned very specific advantages for allowing fetch&compile to happen while execution of the main thread continues. It's not just "because we can". It's because it's a known issue with |
Without looking through all the comments: this has four -1 and there was no movement since a couple of weeks. Should this stay open for further discussions or would it be fine to close this? |
@BridgeAR it is actively being worked on please do not close |
From reading the test suite changes it looks like this PR will fix #33605 |
@weswigham I think #33229 should have resolved this part, at least. Are you still working on this PR? |
8ae28ff
to
2935f72
Compare
In a comment on Wes' proposal issue for this PR @weswigham wrote:
So, yes, please don't close this PR! |
I’ll close this as an approach that spins the event loop while it is already running is currently a non-starter purely from a technical point of view, as mentioned above, and there hasn’t been activity in a year here. |
This implements the ability to use
require
on.mjs
files, loaded via theesm
loader, using the same tradeoffs that top level await makes inesm
itself.What this means: If possible, all execution and evaluation is done synchronously, via immediately unwrapping the execution's component promises. This means that any and all existing code should have no observable change in behavior, as there exist no asynchronous modules as of yet. The catch is that once a module which requires asynchronous execution is used, it must yield to the event loop to perform that execution, which, in turn, can allow other code to execute before the continuation after the async action, which is observable to callers of the now asynchronous module. If this matters to your callers, this means making your module execution asynchronous could be considered a breaking change to your library, however in practice, it will not matter for most callers. Moreover, as the ecosystem exists today, there are zero asynchronously executing modules, and so until there are, there are no downsides to this approach at all, as no execution is changed from what one would expect today (excepting, ofc, that it's no longer an error to
require("./foo.mjs")
.As of right now, this is missing:
"type": "module"
in the cjs loader. (where did that go? I thought we already had something that did that internally -getPackageType
or something?)node
, help appreciated)ModuleWrap.evaluate
may eventually return a promise.