-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SVT-AV1 PSY fork with --tune 4 #2412
Comments
Thank you for the suggestion. We may want to wait for other benchmarks to confirm the gains and/or for the fork to be merged into the base repository before using it in libavif. |
Hi @y-guyon! I wanted to follow up on this issue with some updates to reassess the feature. Regarding benchmarks, we just released our first results, which can be found on this page. Depending on the image, SVT-AV1-PSY gets between 5-15% gains over aomenc on the CID22 validation set and the gb82 photographic dataset by using the SSIMULACRA2 metric. Subjective evaluations corroborate the gain. As an example, here's a visual comparison for a typical photo image. The SV1-AV1-PSY encoded image (147 KB) is 90.5% the percent the size of aomenc's (162 KB) for comparable image quality, and overall better consistency. For the record, the aomenc image took 3x as long to encode on my machine. SVT-AV1-PSY also recently gained the ability to encode images with odd dimensions, and with sizes as small as 4x4 px (from the previous 64x64 px), which are especially useful for images as there's no longer a need to crop or pad them for the encoder to accept them. I'd encourage you to give SVT-AV1-PSY's tune 4 a try! |
Fyi, the devs of the svt-av1 -psy fork have have put up a comparison of their --tune 4 vs. other encoders: https://svt-av1-psy.com/avif/ |
Very nice ! Thanks for the info ! Is there anything we should merge in libavif? I am thinking about gianni-rosato@d53aa45#diff-29748a2db41273018a16b71ccb60bcd7b632b86c78cc69bb88784702122ebdde |
I didn't realize there was a commit / fork in the meantime... Apart from using the -psy fork of svt-av1, the one thing that is important is to set the --tune to 4 by default when using svt as the avif encoder. It seems the -svt devs have this covered in 85512d7 Setting the default output depth to 10 could make sense, too - I dunno how many devices are around that fail to decode 10bpp avif though. a1c2afb Since there is a fork, I guess/hope that the -svt devs will offer a pull request to the main libavif repo sooner or later - maybe an indication that this would be welcomed could be useful. |
Here is another benchmark: |
Thanks for the effort - I'd really like to see SVT-AV1-PSY's promising --tune 4 and maybe mainline SVT-AV1's new --avif included. SVT is currently 4:2:0 only though, but for high fidilty that cannot do without 4:4:4 I'd rather try JPEG-XL :-) Especially because of SVT-AV1's higher performance, a "better" preset should be used vs. AOM: It shouldn't be about a "default" effort that was most likely chosen for video encoding, but about comparing the same (reasonable) encoding time for images.
Same for SVT-AV1: Some presets turn on tools that seem essential for image encoding, for example better intra coding with preset 3 and lower. |
That's really interesting, thanks - and it's esp. important that this benchmarks measure the same-ish encoding time. So svt/aom are about the same on 420, but aom 444 has a significant effect on ssimulacra2, correct? What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception? I'm not the number one -psy fanboy :-) but I have to mention that you're measuring aom with the excact metric that it's tuned for, while svt tuning is based on another metric (afaik it's ssim) - so that's a significant bias towards aom. => for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.
The question is what preset/speed is tuned 'best' for images, i.e. what intra tools are enabled. This might have a significant effenct on the repsective results, because aom/svt enables them at different preset/speeds. |
Well I think 444 has a significant effect on very high quality (assuming the source content is not 420). So in that sense yes 444 has a significant effect on ssimulacra2 at very high quality.
IMO chroma subsampling is much harder to notice on natural content, which when you are capturing video is what everything is. Here is an original image from https://github.com/google-research-datasets/web-images and an image encoded to high quality 420.
Actually the hash of aom 37c5c4e6aa tune=ssimulacra2 is copying what was done in psy tune=4 mode plus enhancements for 444.
I'm not sure if I would call vmaf more fair for images as vmaf does not take chroma into account at all. Anyway here are the graphs for vmaf: You can see that the 420 vmaf graph psy tune=4 and aom tune=simulacra2 are very close, much like the ssimluacra2 graph. The 444 vmaf graph says psy 420 is a little better for the same bitrate but again vmaf does not take into account chroma, so when the graph goes to a high quality both psy and aom trend to the same vmaf.
True, I picked svt preset (preset 5) that was close to libaom's default for images (speed 6), but you could pick one and try to match the other. You could also pick the best quality presets from both encoders to see what the best they can do. |
4:2:0 subsampling can have an impact on image quality, even for natural images. For example, look at how blocky this red jacket is: AVIF doesn't specify a normative way to upscale chroma channels, so many viewers actually end up using nearest-neighbor interpolation. |
I strongly not recommend using VMAF for image comparisons, for two main reasons:
In my experience, the image metrics I've found useful so far are SSIMULACRA 2(.1), DSSIM, and Butteraugli (for distances ~<2.5). Fortunately, codec-compare supports all three of them. As previously said libaom's new tune is an almost direct port of PSY's tune 4, so despite the name ( |
The main reason why I'm using vmaf even for images is that I can use ab-av1 to specify a target vmaf, and then let ab-av1 auto-search until the avif (or other encoder) quality is found. Otherwiese, I'd have to essentially script for myself what ab-av1 already does just fine. And right now, I know how to do vmaf with ffmpeg, but I'd have to look at the source for codec-compare on how to calculate (and use the numeric value for a script) for SSIMULACRA, DSSIM or Butteraugli. It's off-topic, but are there any tools other than ab-av1 available that test-encode a source image until a specific metric score is achieved? |
Fyi aom has now adopted all of the avif-related changes into their ssimulacra2 tune, so using the svt -psy fork isn't necessary anymore - it justs threads better than aom. |
The PSY fork of SVT-AV1 features a new --tune 4 which is designed to improve still picture (i.e. AVIF) encoding:
"--tune 4 - A new Tune called Still Picture has been introduced for AVIF encoding, with promising gains observed over aomenc, aomenc 4:4:4, and mainline SVT-AV1" https://github.com/AOMediaCodec/libavif/releases
It would be nice to have libavif binaries use this fork and tune out of the box (if SVT-AV1 encoder is selected).
There are no indepennded benchmarks / image comparisons though as far as I can tell. psy-ex/svt-av1-psy@393cf6d
The text was updated successfully, but these errors were encountered: