add qwen2.5vl #35569

ShuaiBai623 · 2025-01-08T18:23:45Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

minostauros

Brought changes from #35466
Thanks for the new model!

minostauros · 2025-01-09T08:04:07Z

src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

+
+    def get_rope_index(
+        self,
+        input_ids: torch.LongTensor,


Suggested change

input_ids: torch.LongTensor,

input_ids: Optional[torch.LongTensor] = None,

minostauros · 2025-01-09T08:04:28Z

src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

+            if attention_mask is not None:
+                position_ids = attention_mask.long().cumsum(-1) - 1
+                position_ids.masked_fill_(attention_mask == 0, 1)
+                position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(input_ids.device)


Suggested change

position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(input_ids.device)

position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(attention_mask.device)

minostauros · 2025-01-09T08:05:12Z

src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

+                attention_mask = attention_mask.to(inputs_embeds.device)
+
+        # if we get 4D attention mask we cannot calculate rope deltas anymore. TODO @raushan fixme
+        if position_ids is None and input_ids is not None and (attention_mask is None or attention_mask.ndim == 2):


Suggested change

if position_ids is None and input_ids is not None and (attention_mask is None or attention_mask.ndim == 2):

if position_ids is None and (attention_mask is None or attention_mask.ndim == 2):

molbap · 2025-01-09T12:39:50Z

Hey @ShuaiBai623 , thanks for the addition! 🎉 before reviewing, the main thing here is that since Qwen2.5VL is very similar to Qwen2VL, it's best to use modular transformers, that is, building a shorter modular_qwen_2_5_vl.py file and calling python utils/modular_model_converter.py --files_to_parse src/transformers/models/qwen_2_5_vl/modular_qwen_2_5_vl.py that will automatically generate the modeling, configuration and processing file, based on the inheriting modules. See #34157 for instance, or https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_next_video/modular_llava_next_video.py for recent examples!

add qwen2.5vl

e693210

ShuaiBai623 requested review from ArthurZucker, qubvel, molbap, yonigozlan, stevhliu and Rocketknight1 as code owners January 8, 2025 18:23

ShuaiBai623 added 2 commits January 9, 2025 02:59

fix

8a713a9

pass check table

be1f811

minostauros suggested changes Jan 9, 2025

View reviewed changes

qubvel added New model Multimodal labels Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add qwen2.5vl #35569

add qwen2.5vl #35569

ShuaiBai623 commented Jan 8, 2025

minostauros left a comment

minostauros Jan 9, 2025

minostauros Jan 9, 2025

minostauros Jan 9, 2025

molbap commented Jan 9, 2025

	input_ids: torch.LongTensor,
	input_ids: Optional[torch.LongTensor] = None,

	position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(input_ids.device)
	position_ids = position_ids.unsqueeze(0).expand(3, -1, -1).to(attention_mask.device)

	if position_ids is None and input_ids is not None and (attention_mask is None or attention_mask.ndim == 2):
	if position_ids is None and (attention_mask is None or attention_mask.ndim == 2):

add qwen2.5vl #35569

Are you sure you want to change the base?

add qwen2.5vl #35569

Conversation

ShuaiBai623 commented Jan 8, 2025

What does this PR do?

Before submitting

Who can review?

minostauros left a comment

Choose a reason for hiding this comment

minostauros Jan 9, 2025

Choose a reason for hiding this comment

minostauros Jan 9, 2025

Choose a reason for hiding this comment

minostauros Jan 9, 2025

Choose a reason for hiding this comment

molbap commented Jan 9, 2025