-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get an iterator of results #243
Comments
Tagging @V0ldek for notifications |
This is a really big feature. The current engine does not support pausing/resuming. It also doesn't play well with the current architecture of Engine-Recorder-Sink &ndash the Recorder would have to pause the engine? There's eight different places where a match might be reported in the current Not saying this is impossible, but it would almost certainly be an entirely new engine. In particular I suspect that simply adding this capability to the main engine would screw with SIMD code generation, even if the caller intended to consume the entire iterator immediately anyway. If the concern here is memory consumption then there is a workaround with multithreading. You can spin up a thread for the engine and then another one as the consumer, and as the sink pass a wrapper around a bounded capacity queue/channel (e.g. crossbeam's I am going to file this into the "Future" category. We could explore adding multithreaded Sink support (maybe even an |
I think iterating through high level event would also allow a SAX-api style of interface. Would be really nice for application that needs the underlying classifiers but not the query compilation. I suspect this is hard to do but it would allow to build efficient validation as in simdjson. The automata construction could then be build on the top of that API. Adding iterators would then simply changing this interface and the reporting results stuff. |
Would be nice to have an iterator of results so that we can post filter and/or deal with each match with a potentially slow external code easily without load in RAM all matches simultaneously.
Example of use case: a json document with a very large list of object, we filter them with jsonpath and obtained a sublist of them that have to be inserted in a DB. Loading then in RAM is not possible (potentially too big). So we want to do a slow operation with each of them and free them from memory after that.
The text was updated successfully, but these errors were encountered: