Custom 4D tensor caused shape mismatch error #35290

fingertap · 2024-12-16T07:52:41Z

System Info

transformers version: 4.46.3
Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
Python version: 3.10.15
Huggingface_hub version: 0.26.3
Safetensors version: 0.4.5
Accelerate version: 1.1.1
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA A800-SXM4-80GB

Who can help?

@ArthurZucker
Can you take a look at this? I want to pack the samples using the custom attention mask. However, when using a mask of shape [1, 1, seq_len, seq_len], it raises the following error

  File "/data/miniconda3/envs/vllm/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 139, in to_4d
    expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (324) must match the size of tensor b (18) at non-singleton dimension 3

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import torch
import argparse

from contextlib import nullcontext
from transformers import AutoModelForCausalLM, AutoTokenizer


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_path", type=str, default="gpt2")
    parser.add_argument("--tokenizer_path", type=str, default="gpt2")
    parser.add_argument("--use-flash-attn", action="store_true")
    parser.add_argument("--no-grad", action="store_true")

    args = parser.parse_args()
    return args


def prepare_data(tokenizer, packing: bool, device: torch.device):
    texts = [
        "Hello, how are you?",
        "When is the next holiday?",
        "China is a great country.",
    ]
    encoded = tokenizer(texts)

    # Convert to tensor
    if packing:
        res = torch.zeros((1, sum(len(x) for x in encoded["input_ids"])), dtype=torch.long)
        attention_mask = torch.full((1, 1, res.size(1), res.size(1)), dtype=torch.bfloat16, fill_value=float("-inf"))

        offset = 0
        for i, (input_ids, attn_mask) in enumerate(zip(encoded["input_ids"], encoded["attention_mask"])):
            res[0, offset: offset + len(input_ids)] = torch.tensor(input_ids)
            attention_mask[0, 0, offset: offset + len(attn_mask), offset: offset + len(attn_mask)] = 0.
            offset += len(attn_mask)
    else:
        max_length = max(len(x) for x in encoded["input_ids"])
        res = torch.zeros((len(encoded["input_ids"]), max_length), dtype=torch.long)
        attention_mask = torch.zeros((len(encoded["input_ids"]), max_length), dtype=torch.long)
        for i, (input_ids, attn_mask) in enumerate(zip(encoded["input_ids"], encoded["attention_mask"])):
            res[i, : len(input_ids)] = torch.tensor(input_ids)
            attention_mask[i, : len(attn_mask)] = torch.tensor(attn_mask)
    return res.to(device), attention_mask.to(device)


def main(args):
    model = AutoModelForCausalLM.from_pretrained(
        args.model_path, use_flash_attention_2=args.use_flash_attn, device_map="cuda", torch_dtype=torch.bfloat16
    )
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_path)
    device = torch.device("cuda")

    context = torch.no_grad if args.no_grad else nullcontext
    with context():
        model.eval()
        input_ids_no_pack, attention_mask_no_pack = prepare_data(tokenizer, packing=False, device=device)
        input_ids_pack, attention_mask_pack = prepare_data(tokenizer, packing=True, device=device)

        logits_no_pack = model(input_ids=input_ids_no_pack, attention_mask=attention_mask_no_pack).logits
        logits_pack = model(input_ids=input_ids_pack, attention_mask=attention_mask_pack).logits

        logits_no_pack_flatten = torch.zeros_like(logits_pack)
        offset = 0
        for i in range(logits_no_pack.shape[0]):
            length = attention_mask_no_pack[i].sum().item()
            logits_no_pack_flatten[0, offset: offset + length] = logits_no_pack[i, :length]
            offset += length

        print((logits_no_pack_flatten - logits_pack).sum())


if __name__ == "__main__":
    args = parse_args()
    main(args)

Expected behavior

Run without error.

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-12-23T15:21:27Z

Hey! are you running the script with gpt2 as the default model? I don't think it supports custom mask but if so a PR is welcome! 🤗

fingertap · 2024-12-24T06:36:43Z

Can you explain why it does not support custom mask? Also, is there any better way to implementing packing, e.g. letting the mask to be [1, 1, 1, 0, 0, 2, 2, 2, 2, 2, 3, 3, 3, 3] where 1, 2, 3 indicate different samples and 0 is the padding tokens.

Rajatavaa · 2024-12-30T07:11:33Z

Try to add a dimension with torch.unsqueeze() for broadcasting the expanded_attn_mask.

Blackcipher101 · 2025-01-04T07:30:10Z

Can I take this issue up?

sambhavnoobcoder · 2025-01-05T22:25:16Z

hey @ArthurZucker , i was having a look at this and i think i ended up resolving it , in the process , so i raised a PR regarding the same . would really appreciate if you could take a look at it and respond at the earliest .

fingertap added the bug label Dec 16, 2024

ArthurZucker added the Good First Issue label Dec 23, 2024

sambhavnoobcoder linked a pull request Jan 5, 2025 that will close this issue

Add support for 4D custom attention masks in GPT-2 #35517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom 4D tensor caused shape mismatch error #35290

Custom 4D tensor caused shape mismatch error #35290

fingertap commented Dec 16, 2024

ArthurZucker commented Dec 23, 2024

fingertap commented Dec 24, 2024

Rajatavaa commented Dec 30, 2024

Blackcipher101 commented Jan 4, 2025

sambhavnoobcoder commented Jan 5, 2025

Custom 4D tensor caused shape mismatch error #35290

Custom 4D tensor caused shape mismatch error #35290

Comments

fingertap commented Dec 16, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

ArthurZucker commented Dec 23, 2024

fingertap commented Dec 24, 2024

Rajatavaa commented Dec 30, 2024

Blackcipher101 commented Jan 4, 2025

sambhavnoobcoder commented Jan 5, 2025