-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stream] - callback based API additions #153
Comments
@spion doesn't your first example negate many of the benefits of using streams? i.e. if you have to buffer the whole response, what is gained by using streams at all? Also, these things could easily be userland functions, so, why not create a module and see how it works? |
Answering point 1, I don't think so. The reality is that there are valid use cases with relatively small data where the costs (of complexity and performance due to incremental processing) far outweight the benefits. Just look at all the http request modules - the most popular ones always provide ways to collect all the data. Regarding point 2, I'm experimenting with a module that does some of this (and more) with promises. However I still believe that core should at least be self-consistent. edit: I'll try and make a module that implements the methods and test out some of the use cases I have in mind. |
@spion cool, looking forward to seeing your module. I agree about the lack of consistency but I think node should be looking to (eventually) move to the whatwg streams spec which has pretty good promise integration baked in. |
@phpnode I agree. I'm just suggesting a relatively cheap (in terms of complexity / backward compatibility) ergonomic improvement that could be done in the meantime. (I want to be able to tell bluebird users to promisifyAll their stream prototypes, haha) |
+1 this is an extremely common case that should be solved at the core level. Most other platforms that provide streams have this method and it is immensely useful for a lot of use cases. |
thunk-stream wrap stream to thunk function: var thunkStream = require('thunk-stream'),
stream = require('stream'),
fs = require('fs');
var readableStream = fs.createReadStream('index.js');
var passStream = new stream.PassThrough();
thunkStream(readableStream)(function (error) {
if (error) console.error('error', error);
else console.log('read file end.')
});
thunkStream(passStream)(function (error) {
console.log('file pass through finished.')
});
readableStream.pipe(passStream); |
Streams use callbacks where the operation in question is representable by a callback. A callback can only represent a single operation that eventually passes or fails. For instance, As a counterexample,
Because this can be done in userland with concat-stream; and there's currently no need to collect and buffer an entire stream internally in node, this is not a viable addition to the streams API. Additions to this API should either provide functionality unattainable through userland, or that is directly needed by other core modules.
Re:
I'm definitely interested in that approach, but unfortunately the whatwg spec is built around promises-as-a-core-primitive, so whatever (non-trivial) amount of work necessary to properly support promises as the primitive representing single-operation-pass-or-fail in node core would have to happen first. i.e., adopting the streams spec (or even cribbing from it) would be great, but it is a very long way off, IMO. |
I disagree. As it is, the current node API is in a no-person's land. Its neither
Since the API can't be made more low-level and less complex without breaking backward compatibility, features that don't cost much yet add immense benefit for consumers should be considered even if they don't provide any benefit to core. Streams are already a "batteries included" feature - so why not provide decent batteries while at it? Otherwise I don't see the point. That raises another interesting question, what is the actual strategy regarding the current core library? Will it be deprecated in favor of
You could also argue that I don't understand how this would interfere with var s1 = stream1(opts);
var s2 = stream2(opts);
var s3 = stream3(opts);
s1.pipe(s2).pipe(s3);
async.parallel(s1.ended.bind(s1), s2.ended.bind(s2), s3.ended.bind(s3), done); |
So do you mean from your example that each stream would need a define an Also this would work very well for people using the var s1 = stream1(opts);
var s2 = stream2(opts);
var s3 = stream3(opts);
s1.pipe(s2).pipe(s3);
s3.on('finish', done); Unless we inherently needed to inform the next stream that the previous one had ended, (which it already knows about internally anyway). It seems like we are missing the huge benefits of our To reiterate, the point with streams is that the whole data is not dealt with in one A Similarly doing OK so in other languages we can do the equivalent of Don't get me wrong having a way to do light-weight threading (in our case isolated super light-weight JS runtime) would be be awesome, however the single-threaded runtime is what make currently makes iojs / Node! This kind of defeats the whole flexibility of JavaScript's asynchronous model. There are also multiple ways to buffer a whole stream, some of which are async and others which would block the rest of the runtime. IMO it's like we are taking two steps backwards from everything that iojs / Node has given us with it's evented ( What we need is to agree on an abstraction that handles asynchronous and it's internals that deals with this. with regards to:
It may not be as simple as a callback but it is pretty low-level handling piece of data in such granular pieces, that is the point.
No one is saying that the streams API is perfect but I strongly disagree with it not being well crafted. @isaacs et al. put a great deal of time and work into forming what we currently know of as streams and what we have now is a pattern that makes all streams following that pattern automatically compatible with one another, which IMO is awesome! |
Streams are not a "batteries included" feature. They are an abstraction core uses internally to represent the
libuv.js would become a dependency of node/iojs, one which the current core library is built on top of. Alternate implementations of core functionality could be built in pure JS. However, since there's a massive package ecosystem built around the current set of abstractions, there's going to be a strong incentive to stay compatible with what exists at present. In other words, libuv.js won't solve the problem of "how to evolve core" so much as it will make maintaining core more straightforward, and open up the possibility of building different abstractions on top of libuv to more people. Those abstractions are not guaranteed to be "npm-ecosystem-compatible."
An errored stream is not the same as an errored callback operation. A stream may have a backing resource which may be inspected on error -- for example, to inspect the buffered items, or to look at a TCP socket's remote address -- before disposal. A callback operation can only represent the final state of the entire transaction. For example:
Callbacks can't represent the temporal difference between "end/error" and "close".
It's not that One other consideration to keep in mind is that it's not trivial to expand the API surface area of streams reliably: for instance, imagine getting a stream from a package that has a pegged version of Closing this issue for now:
If you'd like to discuss further, feel free to comment on this issue or find me in IRC on freenode (in #io.js) — I'm always happy to talk streams! |
@chrisdickinson That ascii diagram is a perfect explanation, and should be in the documentation somewhere, if it is not already! |
var s1 = stream1(opts);
var s2 = stream2(opts);
var s3 = stream3(opts);
s1.pipe(s2).pipe(s3);
s3.on('finish', done); And this is precisely the problem with core streams. The example looks perfectly fine - after all its no different than the examples given on the main page. But it contains one major flaw: your process will crash and you will have absolutely no idea why because the stack trace will contain no useful info whatsoever. Guess why? Because you forgot to add any error handling. I've seen this happen over and over again when people use streams. I've done it several times before, just because its awkward to handle all the errors. To me indicates that streams are quite user-hostile in their current state. (By the way, the echo server example given on the main nodejs website will crash on error too). Lets write the proper stateful, ugly wrapper for this function pipeline(done_) {
var isDone = false;
function done(err) {
if (isDone) return;
isDone = true
done(err);
}
var s1 = stream1(opts);
var s2 = stream2(opts);
var s3 = stream3(opts);
s1.pipe(s2).pipe(s3);
s3.on('finish', done);
s1.on('error', done);
s2.on('error', done);
s3.on('error', done);
} Of course, you have to be careful not to call that callback more than once. But you can't use the regular control flow abstractions from userland that ensure that for you, because the square pegs of streams wont fit the round holes of callbacks As far as I can see, the close event happening after the callback has no relation to this whatsoever, as this is just a basic usability enhancement feature. But yeah sure, I suppose we can solve all this in userland. |
Either way the situation would contain the same complexity, the point is that what you are looking for is something to abstract all of that away from you. The The problem with the approach of having one callback to handle all the errors is that you have no visibility of where the error even was, to handle it programatically. It could represent a connection or it could represent some other IO. You might not want to clear up everything based on one part of you pipeline throwing an error. The error could be parsing some chunk of data, that chunk could have corrupted somehow from an external service. There are more combinations and possibilities then there exists streams, and there always will be. The fact is that our current abstraction allows the scoping for implementing a solution to this one use case (even if it might seem like a common or obvious one). If we had only that abstraction we would have no way to retrofit the flexibility, which would be far more problematic. This is further an example of of the fact that this is in fact a lower level API, because there are more things to worry about. The components are more raw and you have to consider handling their errors. IMO that is how it should be. You have to deal with individual stream errors in most other languages as well, so it is no different here.
This is exactly where this problem should be solved if it is not already! |
Then that is an error in usage of the stream library. There should be differentiation between known unrecoverable errors that must end the stream (e.g. socket/handle closed), and intermittent recoverable failures. A case in point is dnode. It uses the Are you saying that since there are no clearly defined semantics (even though its implied that error is fatal to the stream, node core doesn't state this anywhere) for userland, streams are basically unfixable and we'd have to start all over again on top of libuv if we wanted abstractions with saner, well defined behavior? If that is so, I agree. But I don't think itsnecessary to go that far. Forgetting to add an error handler already stops your stream entirely (together with maybe hundreds of other streams by other users, might I add). I think that defines the meaning of the |
Well having a separate error to handle core protocol errors and one for stream specific designed errors could be useful. IMO I think it is better to set a
I am not saying we have to scrap what we have at all. I don't think that what we have in inherently broken either. The internals of streams are a little complex to get your head around at first glance. But I have scoured through every line of the current streams3 implementation and I believe it is incredibly well thought out. It is also very performant based on what it does! I mean if you have a stream, and it does not apply back pressure data is literally just passes straight through it. Yet the backing up of data in the case of back pressure works incredibly well. Some of the internal hooks are there serve core, but again I stand by the fact that they exist for managing low level behaviour. Perhaps a redesign of the internals with a fresh slate and experimentation with different programming paradigms could make the internals simpler, and even more performant. Particularly if they are based off of simpler core / libuv.js API's. However in terms of streams in core and userland, these are specific implementations of particular streams, not the core abstraction. The documentation is already good with resources like @substack's https://github.com/substack/stream-handbook although since there has been no releases of v0.12 regular users have not had the benefit of what streams3 has to offer, without using the (readable-stream)[https://github.com/isaacs/readable-stream] module, and so documenting new features from streams3 is not yet as widely available. (For example the use of It seems what you are looking for is a In terms of the actually streams API for handling low level To me it seems like there needs to be a clear distinction and discussion between a low level stream implementation with a flexible API and a higher-level abstraction for more common use-cases in application development. |
Streams don't have an API that adheres to the node callback convention. This makes them awkward to use with other control flow tools. The following two methods would be the most useful:
readAll([encoding], function(err, data) {})
- on readable streams it collects all data, or is called with the first errorended(function(err) {})
- Like the previous one, except it doesn't collect any data - its just called when everything ends (either successfully or at the first error).finished
could be used for writable streams.Both of the above would act as if an
.on('error', cb)
was attached.The following would also be useful:
readNext(function(err, data) { ... })
Read the next packet or object. The callback is called with the current packet on the "next tick", with the "current packet" changing for subsequent requests right before the callback is called.Would act as if a
.once('error', cb)
was attachedAnd the last method would also be useful, but I'm unsure of the semantics:
writeNext(data, [enc], function(err) { ...})
- The callback would be called when the source is allowed to continue writing. Now I'm not entirely sure what this would mean, but I suppose that for streams with buffers, this would mean "called immediately" if the buffer is not full and called after flush if it is, and for streams without buffers this would be called after flush.This should be taken as a rough sketch, as I'm not sure I understand enough about node streams internals to give a good proposal :)
edit: sorry for the close/open noise, I accidentally the button.
The text was updated successfully, but these errors were encountered: