Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate allowing UTF8StreamJsonParser to be used without canonicalization (see #994) #995

Open
wants to merge 1 commit into
base: 2.16
Choose a base branch
from

Conversation

carterkozak
Copy link
Contributor

Note: I've created this PR against the default branch because I don't know where it is the best fit, I don't wish to cause contention around what should or shouldn't be shipped in 2.15

Previously, the ReaderBasedJsonParser was used instead, which is less performant when reading from an InputStream (and handling charset decoding in addition to json parsing).

This commit updates the JsonFactory factory methods to respect the canonicalization configuration, where previously a canonicalizing implementaiton was always used.

I have added guards around both _symbols.addName and _symbols.findName based on the existing implementation from SmileParser. For correctness, only the guards around addName are required, but we avoid unnecessary hashing by guarding both.

Note that several methods on the JsonFactory failed to take the canonicalization configuration into account, so I have updated them. I can extract that to a separate PR if you prefer.

Testing

Testing is tricky here because we don't expect any behavior changes, only for a different implementation to be used to get there. I could test that specific implementations are returned, but I suspect that would cause more problems than it would prevent in future refactors.

Benchmarking

Standard caveats apply, resulting numbers are specific to my system, which may have been running background tasks at the time, and isn't representative of all environments.

Testing using the standard benchmark suite from jackson-benchmarks on my workstation with a new JsonNoCanonicalizeReadVanilla benchmark that's identical to JsonStdReadVanilla except it turns off canonicalization. I haven't created a PR to add this class to the jackson-benchmarks project because I suspect it isn't necessary in addition to the arbitrary key benchmark, but I'd be happy to push it up if you like.

Both run with 4 iterations of 4 seconds a piece for both warmup and measurement, with 14 threads:

java -Xmx256m -jar target/perf.jar ".*JsonNoCanonicalizeReadVanilla.*" -wi 4 -w 4 -i 4 -r 4 -f 1 -t 14

Before:

Benchmark                                               Mode  Cnt        Score        Error  Units
JsonNoCanonicalizeReadVanilla.readCurrencyPojoDefault  thrpt    4   445300.635 ±  91483.999  ops/s
JsonNoCanonicalizeReadVanilla.readCurrencyPojoFast     thrpt    4   487798.218 ±  87985.973  ops/s
JsonNoCanonicalizeReadVanilla.readNodeCitmCatalog      thrpt    4      187.024 ±     61.912  ops/s
JsonNoCanonicalizeReadVanilla.readNodeMediaItem        thrpt    4  2388952.753 ± 175689.170  ops/s
JsonNoCanonicalizeReadVanilla.readNodeMenu             thrpt    4  2656010.453 ± 199132.061  ops/s
JsonNoCanonicalizeReadVanilla.readNodeWebxml           thrpt    4  1076557.786 ±  42247.400  ops/s
JsonNoCanonicalizeReadVanilla.readPojoMediaItem        thrpt    4  2559137.945 ±  79045.653  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedCitmCatalog   thrpt    4      203.982 ±    125.292  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedMediaItem     thrpt    4  2445168.708 ±  90202.054  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedMenu          thrpt    4  2684316.144 ±  31097.705  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedWebxml        thrpt    4  1065936.180 ±  48567.641  ops/s

After:

Benchmark                                               Mode  Cnt        Score        Error  Units
JsonNoCanonicalizeReadVanilla.readCurrencyPojoDefault  thrpt    4   410349.740 ±  75994.844  ops/s
JsonNoCanonicalizeReadVanilla.readCurrencyPojoFast     thrpt    4   456740.791 ±  31345.935  ops/s
JsonNoCanonicalizeReadVanilla.readNodeCitmCatalog      thrpt    4      174.713 ±     96.849  ops/s
JsonNoCanonicalizeReadVanilla.readNodeMediaItem        thrpt    4  3828160.077 ± 131464.635  ops/s
JsonNoCanonicalizeReadVanilla.readNodeMenu             thrpt    4  6616025.264 ± 360854.700  ops/s
JsonNoCanonicalizeReadVanilla.readNodeWebxml           thrpt    4   535143.154 ±  12656.456  ops/s
JsonNoCanonicalizeReadVanilla.readPojoMediaItem        thrpt    4  3385776.236 ± 157925.098  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedCitmCatalog   thrpt    4      199.011 ±     50.430  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedMediaItem     thrpt    4  3850479.540 ± 245234.565  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedMenu          thrpt    4  7235989.633 ± 305962.992  ops/s
JsonNoCanonicalizeReadVanilla.readUntypedWebxml        thrpt    4   880906.262 ±  21751.634  ops/s

Values changed quite a bit, unclear if this is due to generally high variance, or the change itself.

JsonArbitraryFieldNameBenchmark shows improvements in the INPUT_STREAM cases without canonicalization, which were previously identical to the READER results:

java -Xmx256m -jar target/perf.jar ".*JsonArbitraryFieldNameBenchmark.*" -wi 4 -w 4 -i 4 -r 4 -f 1 -t 14
Benchmark                                       (mode)                   (shape)        (type)  Mode  Cnt    Score    Error  Units
JsonArbitraryFieldNameBenchmark.parse          DEFAULT            RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  112.945 ± 12.050  us/op
JsonArbitraryFieldNameBenchmark.parse          DEFAULT            RANDOM_KEY_MAP        READER  avgt    4   10.490 ±  0.362  us/op
JsonArbitraryFieldNameBenchmark.parse          DEFAULT  BEAN_WITH_RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  113.070 ± 20.401  us/op
JsonArbitraryFieldNameBenchmark.parse          DEFAULT  BEAN_WITH_RANDOM_KEY_MAP        READER  avgt    4   11.150 ±  0.397  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN            RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  105.848 ±  9.308  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN            RANDOM_KEY_MAP        READER  avgt    4    4.536 ±  0.099  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN  BEAN_WITH_RANDOM_KEY_MAP  INPUT_STREAM  avgt    4  106.422 ± 18.303  us/op
JsonArbitraryFieldNameBenchmark.parse        NO_INTERN  BEAN_WITH_RANDOM_KEY_MAP        READER  avgt    4    5.123 ±  0.175  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE            RANDOM_KEY_MAP  INPUT_STREAM  avgt    4    0.499 ±  0.012  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE            RANDOM_KEY_MAP        READER  avgt    4    4.131 ±  0.086  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE  BEAN_WITH_RANDOM_KEY_MAP  INPUT_STREAM  avgt    4    1.057 ±  0.169  us/op
JsonArbitraryFieldNameBenchmark.parse  NO_CANONICALIZE  BEAN_WITH_RANDOM_KEY_MAP        READER  avgt    4    4.455 ±  0.042  us/op

@cowtowncoder
Copy link
Member

Ok thanks! This definitely needs to wait until 2.16; at which point also need to consider usage from other format modules (mostly Smile, I think).

@carterkozak
Copy link
Contributor Author

Great, thank you!
I'll rebase and retarget this PR for 2.16 once the branch is up, and prepare an update for the smile module.

@carterkozak carterkozak force-pushed the ckozak/UTF8StreamJsonParser_canonicalization branch from 024c3ca to 7854515 Compare April 24, 2023 13:27
@carterkozak carterkozak changed the base branch from 2.15 to 2.16 April 24, 2023 13:27
@carterkozak carterkozak force-pushed the ckozak/UTF8StreamJsonParser_canonicalization branch from 7854515 to 3d565bd Compare April 24, 2023 14:53
@carterkozak
Copy link
Contributor Author

I've rebased this and updated the @since placeholders to 2.16. On the Smile side, I don't think any changes are required as a result, but this factory doesn't respect the canonicalization setting: https://github.com/FasterXML/jackson-dataformats-binary/blob/19317f418ec475a0e752a7c29f58998f96f904b3/smile/src/main/java/com/fasterxml/jackson/dataformat/smile/SmileFactory.java#L415-L418
I've implemented boolean checks similar to this PR, but I think the smile parsers may make it easier to avoid quads in favor of direct byte-array decoding (this is what the blocking smile parser already does, but it's a bit more complex to implement). I'd like to spend a bit more time thinking about the smile change before I create a PR into dataformats-binary if that's alright.

@cowtowncoder
Copy link
Member

@carterkozak That sounds good.

Wrt Smile there's still some benefit from trying to find canonicalized instance by quad, although direct decoding is indeed faster (as there's no escaping). I vaguely remember part about canonicalization setting not being honored; would be nice to fix, so +1 for doing that if you figure out a way.

@cowtowncoder
Copy link
Member

On JMH benchmarks: variations are way high so I think you'd want -f 5 or such instead of -f 1. Would be nice to see more stable outputs.

@carterkozak
Copy link
Contributor Author

carterkozak commented Apr 26, 2023

I haven't had a chance to dig into the smile async parser changes yet, but I've re-run the JsonNoCanonicalizeReadVanilla benchmarks on an aws instance to avoid background noise on my system (though noisy neighbor is still possible) using a c6i.2xlarge and the following jmh args for a ~30 minute run in each configuration:

java -Xmx256m -jar perf.jar ".*JsonNoCanonicalizeReadVanilla.*" -wi 5 -w 3 -i 5 -r 3 -f 5 -t max -rf json

I've uploaded the structured json results here: https://gist.github.com/carterkozak/5f736d2348edbbc873b9629b18de6929
Where they can be visualized here: https://jmh.morethan.io/?gist=5f736d2348edbbc873b9629b18de6929

It's getting a bit late, I haven't had a chance to sit down and analyze the results yet, but I thought I'd share what I've collected nonetheless :-)

Screenshot of the visualization summary for posterity:
2023-04-25_1426x1283_screen

edit: I've shared the benchmark I used here: FasterXML/jackson-benchmarks#8

@carterkozak
Copy link
Contributor Author

carterkozak commented Apr 26, 2023

So far it looks like the original comment was correct, I suspect the performance difference for small inputs is based in part on the 8kb buffer created within InputStreamReader. If I add new BufferedInputStream(original, 8192) to the stream provided to UTF8StreamJsonParser, the reader-based approach slightly outperforms it. We could probably implement an InputStreamReader which uses buffers from the BufferRecycler to close the gap, reducing per-operation allocation by 8k across the board (would such an optimization be worth the implementation complexity? I defer entirely to your judgement. It would be fair to suggest users with that level of performance sensitivity should handle constructing an efficient Reader themselves).
The InputStreamReader uses a CharsetDecoder under the hood, which is able to take advantage of SIMD optimizations that we don't have access to (yet) in pure java. Claes Redestad has a great blog post on some of those optimizations here: https://cl4es.github.io/2021/02/23/Faster-Charset-Decoding.html

@pjfanning
Copy link
Member

@carterkozak thanks for the interesting analysis. Would you be able to try different Java LTS versions? 8, 11, 17 and 20 (as proxy for the forthcoming 21 LTS release). There might be different results - in particular, for the SIMD related support. jackson-core v2.15 is a multi-release jar and in theory, we can support different implementations of classes to suit different Java release.

@carterkozak
Copy link
Contributor Author

Happy to re-run benchmarks across jdk releases, I only tested with the latest build of jdk17 in the results above. It may be a few days before I have a chance to re-run benchmarks though.

regarding multi-release jars, I don't think that will help us quite yet because jep 448 is re-incubating the vector API as a preview feature for jdk 21, so it won't be available as a stable feature until a later release.

@cowtowncoder
Copy link
Member

On Multi-release jars: I don't have much appetite for JDK-specific variants at this point. Just fwtw.

I concur with the suggestion that buffer allocation of InputStreamReader explains much of benefits of UTF8StreamJsonParser for small documents.

So at this point I am not sure it makes sense to merge this PR, if it's not quite clear there are consistent performance improvements. And although we could try creating custom InputStreamReader I don't see that as a good approach (wrt not having optimized SIMD decoder f.ex).

Conversely making UTF8StreamJsonParser faster for non-canonicalizing case would be possible but only as a separate implementation, which would in some ways be more like ReaderBasedJsonParser, not using quads but converting from UTF-8 bytes to char as part of decoding. Not sure I have lots of interest for yet another backend either but if someone really wanted to try it I would of course help.

So I suspect this PR is sort of pending for now.

@carterkozak
Copy link
Contributor Author

I agree with your assessment that the original intent of this PR is unnecessary. It may still be helpful to ensure all factory methods respect the canonicalization setting (perhaps under a new issue+PR, if you agree).

I can update this PR+title to solely add information from this investigation to the comment here if you'd like:

/* and without canonicalization, byte-based approach is not performant; just use std UTF-8 reader
* (which is ok for larger input; not so hot for smaller; but this is not a common case)
*/

Conversely making UTF8StreamJsonParser faster for non-canonicalizing case would be possible but only as a separate implementation, which would in some ways be more like ReaderBasedJsonParser, not using quads but converting from UTF-8 bytes to char as part of decoding. Not sure I have lots of interest for yet another backend either but if someone really wanted to try it I would of course help.

I agree that another implementation likely isn't worthwhile. As an aside with regard to conversion from bytes to quads and chars: I suspect on jdk9+ after the string compactness changes from jep-254 it may be more efficient to create strings directly from byte-arrays rather than converting to an intermediate char-array because strings are no longer backed by chars, rather an underlying byte-array with metadata describing whether the encoding is LATIN1 or UTF-16. The conversion between char[] and byte[] within string is incredibly performant, so the potential upside of making changes here is likely small, but I figured I'd share in case anyone finds it useful or interesting.

@cowtowncoder
Copy link
Member

I agree with your assessment that the original intent of this PR is unnecessary. It may still be helpful to ensure all factory methods respect the canonicalization setting (perhaps under a new issue+PR, if you agree).

Yes, +1 for respecting canonicalization + intern settings.

I can update this PR+title to solely add information from this investigation to the comment here if you'd like:

/* and without canonicalization, byte-based approach is not performant; just use std UTF-8 reader
* (which is ok for larger input; not so hot for smaller; but this is not a common case)
*/

Sounds good.

Conversely making UTF8StreamJsonParser faster for non-canonicalizing case would be possible but only as a separate implementation, which would in some ways be more like ReaderBasedJsonParser, not using quads but converting from UTF-8 bytes to char as part of decoding. Not sure I have lots of interest for yet another backend either but if someone really wanted to try it I would of course help.

I agree that another implementation likely isn't worthwhile. As an aside with regard to conversion from bytes to quads and chars: I suspect on jdk9+ after the string compactness changes from jep-254 it may be more efficient to create strings directly from byte-arrays rather than converting to an intermediate char-array because strings are no longer backed by chars, rather an underlying byte-array with metadata describing whether the encoding is LATIN1 or UTF-16. The conversion between char[] and byte[] within string is incredibly performant, so the potential upside of making changes here is likely small, but I figured I'd share in case anyone finds it useful or interesting.

Yeah, although this is kind of... ugly and nasty, because the real challenge is UTF-8 encoding/decoding.
Meaning that it's not a question of dropping something -- UTF-8 codec is still needed, and JSON has some non-standard aspects wrt Surrogate handling too -- but changing internal buffers, it sounds like.
So change needed sounds like sizable thing, actually complicating things.

I don't know how other decoding libraries deal with this tho. Seems like a challenge for anything that couples tokenizing with Charset decoding (which is beneficial for performance but has the challenges of tight coupling).

@cowtowncoder cowtowncoder changed the title fix #994 Allow UTF8StreamJsonParser to be used without canonicalization Investigate allowing UTF8StreamJsonParser to be used without canonicalization (see #994) Apr 28, 2023
@carterkozak
Copy link
Contributor Author

I haven't forgotten about this, it has been a busy week. Planning to put together the updates as described tomorrow! Thanks for bearing with me :-)

@carterkozak carterkozak force-pushed the ckozak/UTF8StreamJsonParser_canonicalization branch from 3d565bd to 93bc91a Compare May 5, 2023 16:28
@carterkozak
Copy link
Contributor Author

I've updated this PR to expand upon the performance characteristics and point this discussion for additional context.

@schlosna
Copy link
Contributor

schlosna commented Aug 7, 2023

If there's anything I can do to help here, I'd be happy to assist.

Similar to #593 this pops up on JFRs when using Jackson to parse smaller JSON documents that are already in byte[].

image

@cowtowncoder
Copy link
Member

Is this PR still active? For time being it seems it might not be active any more; but don't want to close if others find it useful
(but if so, need to resolve the conflict)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants