Use SVT-AV1 PSY fork with --tune 4 #2412

gitoss · 2024-08-25T15:08:06Z

The PSY fork of SVT-AV1 features a new --tune 4 which is designed to improve still picture (i.e. AVIF) encoding:

"--tune 4 - A new Tune called Still Picture has been introduced for AVIF encoding, with promising gains observed over aomenc, aomenc 4:4:4, and mainline SVT-AV1" https://github.com/AOMediaCodec/libavif/releases

It would be nice to have libavif binaries use this fork and tune out of the box (if SVT-AV1 encoder is selected).

There are no indepennded benchmarks / image comparisons though as far as I can tell. psy-ex/svt-av1-psy@393cf6d

The text was updated successfully, but these errors were encountered:

y-guyon · 2024-08-27T12:25:21Z

Thank you for the suggestion.

We may want to wait for other benchmarks to confirm the gains and/or for the fork to be merged into the base repository before using it in libavif.

juliobbv-p · 2024-09-28T02:58:18Z

Hi @y-guyon! I wanted to follow up on this issue with some updates to reassess the feature.

Regarding benchmarks, we just released our first results, which can be found on this page. Depending on the image, SVT-AV1-PSY gets between 5-15% gains over aomenc on the CID22 validation set and the gb82 photographic dataset by using the SSIMULACRA2 metric. Subjective evaluations corroborate the gain.

As an example, here's a visual comparison for a typical photo image. The SV1-AV1-PSY encoded image (147 KB) is 90.5% the percent the size of aomenc's (162 KB) for comparable image quality, and overall better consistency. For the record, the aomenc image took 3x as long to encode on my machine.

SVT-AV1-PSY also recently gained the ability to encode images with odd dimensions, and with sizes as small as 4x4 px (from the previous 64x64 px), which are especially useful for images as there's no longer a need to crop or pad them for the encoder to accept them.

I'd encourage you to give SVT-AV1-PSY's tune 4 a try!

gitoss · 2024-10-30T12:53:28Z

I'd encourage you to give SVT-AV1-PSY's tune 4 a try!

Fyi, the devs of the svt-av1 -psy fork have have put up a comparison of their --tune 4 vs. other encoders: https://svt-av1-psy.com/avif/

vrabaud · 2024-10-30T13:22:41Z

Very nice ! Thanks for the info ! Is there anything we should merge in libavif? I am thinking about gianni-rosato@d53aa45#diff-29748a2db41273018a16b71ccb60bcd7b632b86c78cc69bb88784702122ebdde

gitoss · 2024-10-30T15:08:20Z

Very nice ! Thanks for the info ! Is there anything we should merge in libavif?

I didn't realize there was a commit / fork in the meantime...

Apart from using the -psy fork of svt-av1, the one thing that is important is to set the --tune to 4 by default when using svt as the avif encoder. It seems the -svt devs have this covered in 85512d7

Setting the default output depth to 10 could make sense, too - I dunno how many devices are around that fail to decode 10bpp avif though. a1c2afb

Since there is a fork, I guess/hope that the -svt devs will offer a pull request to the main libavif repo sooner or later - maybe an indication that this would be welcomed could be useful.

y-guyon · 2024-11-27T17:39:56Z

Here is another benchmark:
webmproject/codec-compare#17 (comment)

gitoss · 2024-11-28T21:21:43Z

Here is another benchmark: webmproject/codec-compare#17 (comment)

Here are my findings on a dataset from https://www.compression.cc with multiple codecs at default encoding effort. Let me know if you expect significantly different results at other speeds.

Thanks for the effort - I'd really like to see SVT-AV1-PSY's promising --tune 4 and maybe mainline SVT-AV1's new --avif included. SVT is currently 4:2:0 only though, but for high fidilty that cannot do without 4:4:4 I'd rather try JPEG-XL :-)

Especially because of SVT-AV1's higher performance, a "better" preset should be used vs. AOM: It shouldn't be about a "default" effort that was most likely chosen for video encoding, but about comparing the same (reasonable) encoding time for images.

That said: I'd also recommend testing speed 3, as it enables rectangular partitions, larger transforms, and restoration filters; while still being fast enough for some production scenarios. It'd be interesting to see how the tune's tweaks interact with the larger available tooling repertoire.

Same for SVT-AV1: Some presets turn on tools that seem essential for image encoding, for example better intra coding with preset 3 and lower.

FrankGalligan · 2024-12-04T21:36:44Z

I ran some tests of PSY --tune 4 vs libaom -tune=ssimulacra2 mode. The dataset was subset1.

On my machine psy v2.3.0 speed 5 was in-between libaom speed 5 and speed 6.

This first graph is psy v2.3.0 speed 5 vs libaom with a hash of 37c5c4e6aa speed 5 420 plotting ssimulacra2. You can see both graphs are really close.

PSY command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 5 --jobs 8 -y 420 --tilecolslog2 2 --tilerowslog2 2 -a tune=4

libaom command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 5 --jobs 8 -y 420 --tilecolslog2 2 --tilerowslog2 2 -a tune=ssimulacra2

This second graph is psy v2.3.0 speed 5 vs libaom with a hash of 37c5c4e6aa speed 6 444 plotting ssimulacra2. You can see both graphs start close and then as the quality gets higher the 444 libaom encode has a better ssimulacra2 score.

PSY command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 5 --jobs 8 -y 420 --tilecolslog2 2 --tilerowslog2 2 -a tune=4

libaom command line:
avifenc INPUT.png OUTPUT.avif -q QUALITY --speed 6 --jobs 8 -y 444 --tilecolslog2 2 --tilerowslog2 2 -a tune=ssimulacra2

Note, I did not run libaom speed 6 420.

gitoss · 2024-12-06T12:06:17Z

I ran some tests of PSY --tune 4 vs libaom -tune=ssimulacra2 mode. The dataset was subset1.

That's really interesting, thanks - and it's esp. important that this benchmarks measure the same-ish encoding time.

So svt/aom are about the same on 420, but aom 444 has a significant effect on ssimulacra2, correct?

What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception?

I'm not the number one -psy fanboy :-) but I have to mention that you're measuring aom with the excact metric that it's tuned for, while svt tuning is based on another metric (afaik it's ssim) - so that's a significant bias towards aom.

=> for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.

On my machine psy v2.3.0 speed 5 was in-between libaom speed 5 and speed 6.

The question is what preset/speed is tuned 'best' for images, i.e. what intra tools are enabled. This might have a significant effenct on the repsective results, because aom/svt enables them at different preset/speeds.

FrankGalligan · 2024-12-06T22:24:50Z

I ran some tests of PSY --tune 4 vs libaom -tune=ssimulacra2 mode. The dataset was subset1.

That's really interesting, thanks - and it's esp. important that this benchmarks measure the same-ish encoding time.

So svt/aom are about the same on 420, but aom 444 has a significant effect on ssimulacra2, correct?

Well I think 444 has a significant effect on very high quality (assuming the source content is not 420). So in that sense yes 444 has a significant effect on ssimulacra2 at very high quality.

What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception?

IMO chroma subsampling is much harder to notice on natural content, which when you are capturing video is what everything is. Here is an original image from https://github.com/google-research-datasets/web-images and an image encoded to high quality 420.

If you look the green and red colors are much more muted and the mouths and grass are blurry as well.

I'm not the number one -psy fanboy :-) but I have to mention that you're measuring aom with the excact metric that it's tuned for, while svt tuning is based on another metric (afaik it's ssim) - so that's a significant bias towards aom.

Actually the hash of aom 37c5c4e6aa tune=ssimulacra2 is copying what was done in psy tune=4 mode plus enhancements for 444.

=> for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.

I'm not sure if I would call vmaf more fair for images as vmaf does not take chroma into account at all. Anyway here are the graphs for vmaf:

You can see that the 420 vmaf graph psy tune=4 and aom tune=simulacra2 are very close, much like the ssimluacra2 graph. The 444 vmaf graph says psy 420 is a little better for the same bitrate but again vmaf does not take into account chroma, so when the graph goes to a high quality both psy and aom trend to the same vmaf.

On my machine psy v2.3.0 speed 5 was in-between libaom speed 5 and speed 6.

The question is what preset/speed is tuned 'best' for images, i.e. what intra tools are enabled. This might have a significant effenct on the repsective results, because aom/svt enables them at different preset/speeds.

True, I picked svt preset (preset 5) that was close to libaom's default for images (speed 6), but you could pick one and try to match the other. You could also pick the best quality presets from both encoders to see what the best they can do.

juliobbv-p · 2024-12-07T02:23:01Z

What interests me how "visual" the effect of chroma subsampling is - there is a reason reducing the chroma fidility is done for ages in video / image encoding. I for one cannot tell any difference between 420 and 444 - are there synthetic scores, that are more tuned for this human perception?

4:2:0 subsampling can have an impact on image quality, even for natural images. For example, look at how blocky this red jacket is:

AVIF doesn't specify a normative way to upscale chroma channels, so many viewers actually end up using nearest-neighbor interpolation.

juliobbv-p · 2024-12-07T03:19:20Z

for fairness, it would be interesting to compare vmaf results - that's what I use personally, though I know vmaf is designed for video and not for images.

I strongly not recommend using VMAF for image comparisons, for two main reasons:

VMAF is a video metric. This means VMAF factors in intra-frame and motion masking effects into its scoring. In practice, when VMAF is used as an image metric (i.e. a single frame), it ends up not penalizing certain annoying artifacts (like blocking) as harshly as it should be.
VMAF prioritizes appeal over fidelity. You can boost VMAF scores with techniques like sharpening or contrast enhancement. You can even "hack" VMAF to get outrageously-high scores while the actual image looks nothing like the original. Even the NEG version can be easily manipulated to inflate scores in a visually harmful way. See https://arxiv.org/pdf/2107.04510.

In my experience, the image metrics I've found useful so far are SSIMULACRA 2(.1), DSSIM, and Butteraugli (for distances ~<2.5). Fortunately, codec-compare supports all three of them. As previously said libaom's new tune is an almost direct port of PSY's tune 4, so despite the name (tune=ssimulacra2) it also improves DSSIM and Butteraugli (at d <2.5).

gitoss · 2024-12-07T10:49:47Z

In my experience, the image metrics I've found useful so far are SSIMULACRA 2(.1), DSSIM, and Butteraugli (for distances ~<2.5).

The main reason why I'm using vmaf even for images is that I can use ab-av1 to specify a target vmaf, and then let ab-av1 auto-search until the avif (or other encoder) quality is found.

Otherwiese, I'd have to essentially script for myself what ab-av1 already does just fine. And right now, I know how to do vmaf with ffmpeg, but I'd have to look at the source for codec-compare on how to calculate (and use the numeric value for a script) for SSIMULACRA, DSSIM or Butteraugli.

It's off-topic, but are there any tools other than ab-av1 available that test-encode a source image until a specific metric score is achieved?

gitoss · 2025-01-06T15:06:47Z

Fyi aom has now adopted all of the avif-related changes into their ssimulacra2 tune, so using the svt -psy fork isn't necessary anymore - it justs threads better than aom.

vrabaud mentioned this issue Nov 4, 2024

fix: correct SVT-AV1 defaults to avoid crash #2432

Open

gitoss mentioned this issue Nov 28, 2024

Test new -tune=ssimulacra2 webmproject/codec-compare#17

Open

gitoss mentioned this issue Dec 6, 2024

[ENHANCEMENT] tune psy --tune 4 with featues from aom -tune=ssimulacra2 and svt --avif psy-ex/svt-av1-psy#105

Closed

vrabaud mentioned this issue Dec 16, 2024

Add AVIF plugin (decoder + encoder using libavif) python-pillow/Pillow#5201

Open

gitoss closed this as completed Jan 6, 2025

gitoss mentioned this issue Jan 6, 2025

Use upcoming aom-av1 --tune SSIMULACRA2 for avif encoding strukturag/libheif#1285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SVT-AV1 PSY fork with --tune 4 #2412

Use SVT-AV1 PSY fork with --tune 4 #2412

gitoss commented Aug 25, 2024

y-guyon commented Aug 27, 2024

juliobbv-p commented Sep 28, 2024 •

edited

Loading

gitoss commented Oct 30, 2024

vrabaud commented Oct 30, 2024

gitoss commented Oct 30, 2024 •

edited

Loading

y-guyon commented Nov 27, 2024

gitoss commented Nov 28, 2024 •

edited

Loading

FrankGalligan commented Dec 4, 2024

gitoss commented Dec 6, 2024 •

edited

Loading

FrankGalligan commented Dec 6, 2024 •

edited

Loading

juliobbv-p commented Dec 7, 2024

juliobbv-p commented Dec 7, 2024 •

edited

Loading

gitoss commented Dec 7, 2024

gitoss commented Jan 6, 2025

Use SVT-AV1 PSY fork with --tune 4 #2412

Use SVT-AV1 PSY fork with --tune 4 #2412

Comments

gitoss commented Aug 25, 2024

y-guyon commented Aug 27, 2024

juliobbv-p commented Sep 28, 2024 • edited Loading

gitoss commented Oct 30, 2024

vrabaud commented Oct 30, 2024

gitoss commented Oct 30, 2024 • edited Loading

y-guyon commented Nov 27, 2024

gitoss commented Nov 28, 2024 • edited Loading

FrankGalligan commented Dec 4, 2024

gitoss commented Dec 6, 2024 • edited Loading

FrankGalligan commented Dec 6, 2024 • edited Loading

juliobbv-p commented Dec 7, 2024

juliobbv-p commented Dec 7, 2024 • edited Loading

gitoss commented Dec 7, 2024

gitoss commented Jan 6, 2025

juliobbv-p commented Sep 28, 2024 •

edited

Loading

gitoss commented Oct 30, 2024 •

edited

Loading

gitoss commented Nov 28, 2024 •

edited

Loading

gitoss commented Dec 6, 2024 •

edited

Loading

FrankGalligan commented Dec 6, 2024 •

edited

Loading

juliobbv-p commented Dec 7, 2024 •

edited

Loading