-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simdjson #1892
Comments
It'd be nice to do this, yes, but there are issues. First, jq's JSON parser is incremental, and so can consume JSON texts in chunks, whereas simdjson seems to need the entire text (and does two passes on the text). That doesn't mean that we couldn't use simdjson's SIMD techniques when we have the full text, but generally those won't be large texts. I'm not sure how easily we could adapt SIMD techniques to work incrementally, though at first glance it seems feasible. Second, jq supports multiple JSON texts in sequence. This is less tricky. |
Third, jq's parser is currently quite lenient -- perhaps too much so in some cases (e.g. allowing 00 for 0), but well within its rights in others (e.g. allowing duplicate keys within a JSON object). Fourth, there is a plan to allow jq to preserve arbitrarily big integers (in the sense that (If jq is to support stricter parsing, then for the sake of utility as well as backward compatibility, it should be possible to specify which mode is wanted.) |
simdjson allows duplicate keys, the RFC does not require uniqueness: https://tools.ietf.org/html/rfc7159
Though simdjson is limited to 64-bit numbers at this time, there are plans to extend the support... |
This issue is being worked on in simdjson...
This issue is also being worked on in simdjson... cc @piotte13 |
simdjson now supports JSON documents in sequence. The JSON can be either line-separated or just in sequence with arbitrary white space between them. The input can be nearly infinite... The performance is quite good... (gigabytes per second) https://github.com/lemire/simdjson/blob/master/doc/JsonStream.md cc @piotte13 |
Is anyone working on this already? (or finished some form of it) |
A few thoughts I would put forth for consideration:
|
That's still open. Is it likely we can get that?
Sweet! Maybe we could use simdjson for |
Yes, this will be a lot of profiling and playing with options to find something that rocks perf-wise and isn't too hard to use. |
Would it be possible to leverage simdjson to improve parsing speed?
https://github.com/lemire/simdjson
Apologies if this is a duplicate.
The text was updated successfully, but these errors were encountered: