Mask2FormerImageProcessor support overlapping features #35536

mherzog01 · 2025-01-06T17:34:07Z

System Info

transformers version: 4.48.0.dev0
Python version: 3.13.1
OS: Linux (AWS CodeSpace) Linux default 5.10.228-219.884.amzn2.x86_64 #1 SMP Wed Oct 23 17:17:00 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Virtual environment: Conda

Output of pip list

Package                  Version
------------------------ -----------
aiohappyeyeballs         2.4.4
aiohttp                  3.11.11
aiosignal                1.3.2
asttokens                3.0.0
attrs                    24.3.0
Brotli                   1.1.0
certifi                  2024.12.14
cffi                     1.17.1
charset-normalizer       3.4.1
colorama                 0.4.6
comm                     0.2.2
datasets                 3.2.0
debugpy                  1.8.11
decorator                5.1.1
dill                     0.3.8
exceptiongroup           1.2.2
executing                2.1.0
filelock                 3.16.1
frozenlist               1.5.0
fsspec                   2024.9.0
h2                       4.1.0
hpack                    4.0.0
huggingface_hub          0.26.5
hyperframe               6.0.1
idna                     3.10
importlib_metadata       8.5.0
ipykernel                6.29.5
ipython                  8.31.0
jedi                     0.19.2
Jinja2                   3.1.5
jupyter_client           8.6.3
jupyter_core             5.7.2
MarkupSafe               3.0.2
matplotlib-inline        0.1.7
mpmath                   1.3.0
multidict                6.1.0
multiprocess             0.70.16
nest_asyncio             1.6.0
networkx                 3.4.2
numpy                    2.2.1
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
packaging                24.2
pandas                   2.2.3
parso                    0.8.4
pexpect                  4.9.0
pickleshare              0.7.5
pillow                   11.1.0
pip                      24.3.1
platformdirs             4.3.6
prompt_toolkit           3.0.48
propcache                0.2.1
psutil                   6.1.1
ptyprocess               0.7.0
pure_eval                0.2.3
pyarrow                  18.1.0
pycparser                2.22
Pygments                 2.18.0
PySocks                  1.7.1
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.2
pyzmq                    26.2.0
regex                    2024.11.6
requests                 2.32.3
safetensors              0.5.0
setuptools               75.7.0
six                      1.17.0
stack_data               0.6.3
sympy                    1.13.1
tokenizers               0.21.0
torch                    2.5.1
tornado                  6.4.2
tqdm                     4.67.1
traitlets                5.14.3
transformers             4.48.0.dev0
typing_extensions        4.12.2
tzdata                   2024.2
urllib3                  2.3.0
wcwidth                  0.2.13
xxhash                   3.5.0
yarl                     1.18.3
zipp                     3.21.0
zstandard                0.23.0

Who can help?

@amyeroberts @qubvel

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

From

The code below gives an error "ValueError: Unable to infer channel dimension format". Different permutations of ChannelDimension and location of num_features give the same or similar errors.

import numpy as np

from transformers.image_utils import ChannelDimension
from transformers import Mask2FormerImageProcessor  # Assumes torchvision is installed

processor = Mask2FormerImageProcessor(do_rescale=False, do_resize=False, do_normalize=False)

num_classes = 2
num_features = 5
height, width = (16, 16)
images = [np.zeros((height, width, 3))]
segmentation_maps = [np.random.randint(0, num_classes, (height, width, num_features))]

batch = processor(images,
                  segmentation_maps=segmentation_maps,
                  return_tensors="pt",
                  input_data_format=ChannelDimension.LAST)

See https://stackoverflow.com/questions/79331752/does-the-huggingface-mask2formerimageprocessor-support-overlapping-features.

Expected behavior

Processor supports overlapping masks without error.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-01-06T19:39:09Z

cc @zucchini-nlp !

qubvel · 2025-01-06T19:42:12Z

I will take a look, it's related to vision 👍

mherzog01 added the bug label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mask2FormerImageProcessor support overlapping features #35536

Mask2FormerImageProcessor support overlapping features #35536

mherzog01 commented Jan 6, 2025

Rocketknight1 commented Jan 6, 2025

qubvel commented Jan 6, 2025

Mask2FormerImageProcessor support overlapping features #35536

Mask2FormerImageProcessor support overlapping features #35536

Comments

mherzog01 commented Jan 6, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Jan 6, 2025

qubvel commented Jan 6, 2025