Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SVT-AV1 PSY fork with --tune 4 #2412

Closed
gitoss opened this issue Aug 25, 2024 · 14 comments
Closed

Use SVT-AV1 PSY fork with --tune 4 #2412

gitoss opened this issue Aug 25, 2024 · 14 comments

Comments

@gitoss
Copy link

gitoss commented Aug 25, 2024

The PSY fork of SVT-AV1 features a new --tune 4 which is designed to improve still picture (i.e. AVIF) encoding:

"--tune 4 - A new Tune called Still Picture has been introduced for AVIF encoding, with promising gains observed over aomenc, aomenc 4:4:4, and mainline SVT-AV1" https://github.com/AOMediaCodec/libavif/releases

It would be nice to have libavif binaries use this fork and tune out of the box (if SVT-AV1 encoder is selected).

There are no indepennded benchmarks / image comparisons though as far as I can tell. psy-ex/svt-av1-psy@393cf6d

@y-guyon
Copy link
Collaborator

y-guyon commented Aug 27, 2024

Thank you for the suggestion.

We may want to wait for other benchmarks to confirm the gains and/or for the fork to be merged into the base repository before using it in libavif.

@juliobbv-p
Copy link

juliobbv-p commented Sep 28, 2024

Hi @y-guyon! I wanted to follow up on this issue with some updates to reassess the feature.

Regarding benchmarks, we just released our first results, which can be found on this page. Depending on the image, SVT-AV1-PSY gets between 5-15% gains over aomenc on the CID22 validation set and the gb82 photographic dataset by using the SSIMULACRA2 metric. Subjective evaluations corroborate the gain.

As an example, here's a visual comparison for a typical photo image. The SV1-AV1-PSY encoded image (147 KB) is 90.5% the percent the size of aomenc's (162 KB) for comparable image quality, and overall better consistency. For the record, the aomenc image took 3x as long to encode on my machine.

SVT-AV1-PSY also recently gained the ability to encode images with odd dimensions, and with sizes as small as 4x4 px (from the previous 64x64 px), which are especially useful for images as there's no longer a need to crop or pad them for the encoder to accept them.

I'd encourage you to give SVT-AV1-PSY's tune 4 a try!

@gitoss
Copy link
Author

gitoss commented Oct 30, 2024

I'd encourage you to give SVT-AV1-PSY's tune 4 a try!

Fyi, the devs of the svt-av1 -psy fork have have put up a comparison of their --tune 4 vs. other encoders: https://svt-av1-psy.com/avif/

@vrabaud
Copy link
Collaborator

vrabaud commented Oct 30, 2024

Very nice ! Thanks for the info ! Is there anything we should merge in libavif? I am thinking about gianni-rosato@d53aa45#diff-29748a2db41273018a16b71ccb60bcd7b632b86c78cc69bb88784702122ebdde

@gitoss
Copy link
Author

gitoss commented Oct 30, 2024

Very nice ! Thanks for the info ! Is there anything we should merge in libavif?

I didn't realize there was a commit / fork in the meantime...

Apart from using the -psy fork of svt-av1, the one thing that is important is to set the --tune to 4 by default when using svt as the avif encoder. It seems the -svt devs have this covered in 85512d7

Setting the default output depth to 10 could make sense, too - I dunno how many devices are around that fail to decode 10bpp avif though. a1c2afb

Since there is a fork, I guess/hope that the -svt devs will offer a pull request to the main libavif repo sooner or later - maybe an indication that this would be welcomed could be useful.

@y-guyon
Copy link
Collaborator

y-guyon commented Nov 27, 2024

Here is another benchmark:
webmproject/codec-compare#17 (comment)

@gitoss
Copy link
Author

gitoss commented Nov 28, 2024

Here is another benchmark: webmproject/codec-compare#17 (comment)

Here are my findings on a dataset from https://www.compression.cc with multiple codecs at default encoding effort. Let me know if you expect significantly different results at other speeds.

Thanks for the effort - I'd really like to see SVT-AV1-PSY's promising --tune 4 and maybe mainline SVT-AV1's new --avif included. SVT is currently 4:2:0 only though, but for high fidilty that cannot do without 4:4:4 I'd rather try JPEG-XL :-)

Especially because of SVT-AV1's higher performance, a "better" preset should be used vs. AOM: It shouldn't be about a "default" effort that was most likely chosen for video encoding, but about comparing the same (reasonable) encoding time for images.

That said: I'd also recommend testing speed 3, as it enables rectangular partitions, larger transforms, and restoration filters; while still being fast enough for some production scenarios. It'd be interesting to see how the tune's tweaks interact with the larger available tooling repertoire.

Same for SVT-AV1: Some presets turn on tools that seem essential for image encoding, for example better intra coding with preset 3 and lower.

@FrankGalligan
Copy link
Contributor

I ran some tests of PSY --tune 4 vs libaom -tune=ssimulacra2 mode. The dataset was subset1.

On my machine psy v2.3.0 speed 5 was in-between libaom speed 5 and speed 6.

subset1_psy-v2 3 0_vs_aom-37c5c4e6aa_simu2_s-5_t-8_420_2x2_ssimulacra2_graph_bdrate
This first graph is psy v2.3.0 speed 5 vs libaom with a hash of 37c5c4e6aa speed 5 420 plotting ssimulacra2. You can see both graphs are really close.

PSY command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 5 --jobs 8 -y 420 --tilecolslog2 2 --tilerowslog2 2 -a tune=4

libaom command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 5 --jobs 8 -y 420 --tilecolslog2 2 --tilerowslog2 2 -a tune=ssimulacra2

subset1_psy-v2 3 0_vs_aom-37c5c4e6aa_simu2_s-6_t-8_444_2x2_ssimulacra2_graph_bdrate
This second graph is psy v2.3.0 speed 5 vs libaom with a hash of 37c5c4e6aa speed 6 444 plotting ssimulacra2. You can see both graphs start close and then as the quality gets higher the 444 libaom encode has a better ssimulacra2 score.

PSY command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 5 --jobs 8 -y 420 --tilecolslog2 2 --tilerowslog2 2 -a tune=4

libaom command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 6 --jobs 8 -y 444 --tilecolslog2 2 --tilerowslog2 2 -a tune=ssimulacra2

Note, I did not run libaom speed 6 420.

@gitoss
Copy link
Author

gitoss commented Dec 6, 2024

I ran some tests of PSY --tune 4 vs libaom -tune=ssimulacra2 mode. The dataset was subset1.

That's really interesting, thanks - and it's esp. important that this benchmarks measure the same-ish encoding time.

So svt/aom are about the same on 420, but aom 444 has a significant effect on ssimulacra2, correct?

What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception?

I'm not the number one -psy fanboy :-) but I have to mention that you're measuring aom with the excact metric that it's tuned for, while svt tuning is based on another metric (afaik it's ssim) - so that's a significant bias towards aom.

=> for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.

On my machine psy v2.3.0 speed 5 was in-between libaom speed 5 and speed 6.

The question is what preset/speed is tuned 'best' for images, i.e. what intra tools are enabled. This might have a significant effenct on the repsective results, because aom/svt enables them at different preset/speeds.

@FrankGalligan
Copy link
Contributor

FrankGalligan commented Dec 6, 2024

I ran some tests of PSY --tune 4 vs libaom -tune=ssimulacra2 mode. The dataset was subset1.

That's really interesting, thanks - and it's esp. important that this benchmarks measure the same-ish encoding time.

So svt/aom are about the same on 420, but aom 444 has a significant effect on ssimulacra2, correct?

Well I think 444 has a significant effect on very high quality (assuming the source content is not 420). So in that sense yes 444 has a significant effect on ssimulacra2 at very high quality.

What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception?

IMO chroma subsampling is much harder to notice on natural content, which when you are capturing video is what everything is. Here is an original image from https://github.com/google-research-datasets/web-images and an image encoded to high quality 420.

BFM-screenshot01
BFM-screenshot01 png-97-1733417429053499 avif
If you look the green and red colors are much more muted and the mouths and grass are blurry as well.

I'm not the number one -psy fanboy :-) but I have to mention that you're measuring aom with the excact metric that it's tuned for, while svt tuning is based on another metric (afaik it's ssim) - so that's a significant bias towards aom.

Actually the hash of aom 37c5c4e6aa tune=ssimulacra2 is copying what was done in psy tune=4 mode plus enhancements for 444.

=> for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.

I'm not sure if I would call vmaf more fair for images as vmaf does not take chroma into account at all. Anyway here are the graphs for vmaf:
subset1_vmaf_psy-v2 3 0_s-5_vs_aom-37c5c4e6aa_simu2_s-5_420
subset1_vmaf_psy-v2 3 0_s-5_vs_aom-37c5c4e6aa_simu2_s-5_444

You can see that the 420 vmaf graph psy tune=4 and aom tune=simulacra2 are very close, much like the ssimluacra2 graph. The 444 vmaf graph says psy 420 is a little better for the same bitrate but again vmaf does not take into account chroma, so when the graph goes to a high quality both psy and aom trend to the same vmaf.

On my machine psy v2.3.0 speed 5 was in-between libaom speed 5 and speed 6.

The question is what preset/speed is tuned 'best' for images, i.e. what intra tools are enabled. This might have a significant effenct on the repsective results, because aom/svt enables them at different preset/speeds.

True, I picked svt preset (preset 5) that was close to libaom's default for images (speed 6), but you could pick one and try to match the other. You could also pick the best quality presets from both encoders to see what the best they can do.

@juliobbv-p
Copy link

What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception?

4:2:0 subsampling can have an impact on image quality, even for natural images. For example, look at how blocky this red jacket is:

AVIF doesn't specify a normative way to upscale chroma channels, so many viewers actually end up using nearest-neighbor interpolation.

@juliobbv-p
Copy link

juliobbv-p commented Dec 7, 2024

for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.

I strongly not recommend using VMAF for image comparisons, for two main reasons:

  1. VMAF is a video metric. This means VMAF factors in intra-frame and motion masking effects into its scoring. In practice, when VMAF is used as an image metric (i.e. a single frame), it ends up not penalizing certain annoying artifacts (like blocking) as harshly as it should be.
  2. VMAF prioritizes appeal over fidelity. You can boost VMAF scores with techniques like sharpening or contrast enhancement. You can even "hack" VMAF to get outrageously-high scores while the actual image looks nothing like the original. Even the NEG version can be easily manipulated to inflate scores in a visually harmful way. See https://arxiv.org/pdf/2107.04510.

In my experience, the image metrics I've found useful so far are SSIMULACRA 2(.1), DSSIM, and Butteraugli (for distances ~<2.5). Fortunately, codec-compare supports all three of them. As previously said libaom's new tune is an almost direct port of PSY's tune 4, so despite the name (tune=ssimulacra2) it also improves DSSIM and Butteraugli (at d <2.5).

@gitoss
Copy link
Author

gitoss commented Dec 7, 2024

In my experience, the image metrics I've found useful so far are SSIMULACRA 2(.1), DSSIM, and Butteraugli (for distances ~<2.5).

The main reason why I'm using vmaf even for images is that I can use ab-av1 to specify a target vmaf, and then let ab-av1 auto-search until the avif (or other encoder) quality is found.

Otherwiese, I'd have to essentially script for myself what ab-av1 already does just fine. And right now, I know how to do vmaf with ffmpeg, but I'd have to look at the source for codec-compare on how to calculate (and use the numeric value for a script) for SSIMULACRA, DSSIM or Butteraugli.

It's off-topic, but are there any tools other than ab-av1 available that test-encode a source image until a specific metric score is achieved?

@gitoss
Copy link
Author

gitoss commented Jan 6, 2025

Fyi aom has now adopted all of the avif-related changes into their ssimulacra2 tune, so using the svt -psy fork isn't necessary anymore - it justs threads better than aom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants