Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom 4D tensor caused shape mismatch error #35290

Open
1 of 4 tasks
fingertap opened this issue Dec 16, 2024 · 5 comments · May be fixed by #35517
Open
1 of 4 tasks

Custom 4D tensor caused shape mismatch error #35290

fingertap opened this issue Dec 16, 2024 · 5 comments · May be fixed by #35517

Comments

@fingertap
Copy link

System Info

  • transformers version: 4.46.3
  • Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
  • Python version: 3.10.15
  • Huggingface_hub version: 0.26.3
  • Safetensors version: 0.4.5
  • Accelerate version: 1.1.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA A800-SXM4-80GB

Who can help?

@ArthurZucker
Can you take a look at this? I want to pack the samples using the custom attention mask. However, when using a mask of shape [1, 1, seq_len, seq_len], it raises the following error

  File "/data/miniconda3/envs/vllm/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 139, in to_4d
    expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (324) must match the size of tensor b (18) at non-singleton dimension 3

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
import argparse

from contextlib import nullcontext
from transformers import AutoModelForCausalLM, AutoTokenizer


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_path", type=str, default="gpt2")
    parser.add_argument("--tokenizer_path", type=str, default="gpt2")
    parser.add_argument("--use-flash-attn", action="store_true")
    parser.add_argument("--no-grad", action="store_true")

    args = parser.parse_args()
    return args


def prepare_data(tokenizer, packing: bool, device: torch.device):
    texts = [
        "Hello, how are you?",
        "When is the next holiday?",
        "China is a great country.",
    ]
    encoded = tokenizer(texts)

    # Convert to tensor
    if packing:
        res = torch.zeros((1, sum(len(x) for x in encoded["input_ids"])), dtype=torch.long)
        attention_mask = torch.full((1, 1, res.size(1), res.size(1)), dtype=torch.bfloat16, fill_value=float("-inf"))

        offset = 0
        for i, (input_ids, attn_mask) in enumerate(zip(encoded["input_ids"], encoded["attention_mask"])):
            res[0, offset: offset + len(input_ids)] = torch.tensor(input_ids)
            attention_mask[0, 0, offset: offset + len(attn_mask), offset: offset + len(attn_mask)] = 0.
            offset += len(attn_mask)
    else:
        max_length = max(len(x) for x in encoded["input_ids"])
        res = torch.zeros((len(encoded["input_ids"]), max_length), dtype=torch.long)
        attention_mask = torch.zeros((len(encoded["input_ids"]), max_length), dtype=torch.long)
        for i, (input_ids, attn_mask) in enumerate(zip(encoded["input_ids"], encoded["attention_mask"])):
            res[i, : len(input_ids)] = torch.tensor(input_ids)
            attention_mask[i, : len(attn_mask)] = torch.tensor(attn_mask)
    return res.to(device), attention_mask.to(device)


def main(args):
    model = AutoModelForCausalLM.from_pretrained(
        args.model_path, use_flash_attention_2=args.use_flash_attn, device_map="cuda", torch_dtype=torch.bfloat16
    )
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_path)
    device = torch.device("cuda")

    context = torch.no_grad if args.no_grad else nullcontext
    with context():
        model.eval()
        input_ids_no_pack, attention_mask_no_pack = prepare_data(tokenizer, packing=False, device=device)
        input_ids_pack, attention_mask_pack = prepare_data(tokenizer, packing=True, device=device)

        logits_no_pack = model(input_ids=input_ids_no_pack, attention_mask=attention_mask_no_pack).logits
        logits_pack = model(input_ids=input_ids_pack, attention_mask=attention_mask_pack).logits

        logits_no_pack_flatten = torch.zeros_like(logits_pack)
        offset = 0
        for i in range(logits_no_pack.shape[0]):
            length = attention_mask_no_pack[i].sum().item()
            logits_no_pack_flatten[0, offset: offset + length] = logits_no_pack[i, :length]
            offset += length

        print((logits_no_pack_flatten - logits_pack).sum())


if __name__ == "__main__":
    args = parse_args()
    main(args)

Expected behavior

Run without error.

@fingertap fingertap added the bug label Dec 16, 2024
@ArthurZucker
Copy link
Collaborator

Hey! are you running the script with gpt2 as the default model? I don't think it supports custom mask but if so a PR is welcome! 🤗

@fingertap
Copy link
Author

Can you explain why it does not support custom mask? Also, is there any better way to implementing packing, e.g. letting the mask to be [1, 1, 1, 0, 0, 2, 2, 2, 2, 2, 3, 3, 3, 3] where 1, 2, 3 indicate different samples and 0 is the padding tokens.

@Rajatavaa
Copy link

Try to add a dimension with torch.unsqueeze() for broadcasting the expanded_attn_mask.

@Blackcipher101
Copy link

Can I take this issue up?

@sambhavnoobcoder
Copy link

hey @ArthurZucker , i was having a look at this and i think i ended up resolving it , in the process , so i raised a PR regarding the same . would really appreciate if you could take a look at it and respond at the earliest .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants