-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HunyuanVideo w. BitsAndBytes (local): Expected all tensors to be on the same device #10500
Comments
Add import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, HunyuanVideoTransformer3DModel, HunyuanVideoPipeline
from diffusers.utils import export_to_video
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = HunyuanVideoTransformer3DModel.from_pretrained(
"tencent/HunyuanVideo",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.float16,
revision='refs/pr/18'
)
pipeline = HunyuanVideoPipeline.from_pretrained(
"tencent/HunyuanVideo",
transformer=transformer_8bit,
torch_dtype=torch.float16,
device_map="balanced",
revision='refs/pr/18'
)
prompt = "A cat walks on the grass, realistic style."
video = pipeline(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
export_to_video(video, "cat.mp4", fps=15) |
@SahilCarterr |
@tin2tin, this seems to run perfectly fine for me without errors: import torch
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, HunyuanVideoTransformer3DModel, HunyuanVideoPipeline
from diffusers.utils import export_to_video
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
transformer_8bit = HunyuanVideoTransformer3DModel.from_pretrained(
"hunyuanvideo-community/HunyuanVideo",
subfolder="transformer",
quantization_config=quant_config,
torch_dtype=torch.bfloat16,
)
pipeline = HunyuanVideoPipeline.from_pretrained(
"hunyuanvideo-community/HunyuanVideo",
transformer=transformer_8bit,
torch_dtype=torch.float16,
device_map="balanced",
)
print(pipeline.text_encoder.device)
print(pipeline.transformer.device)
print(pipeline.vae.device)
prompt = "A cat walks on the grass, realistic style."
video = pipeline(prompt=prompt, num_frames=61, num_inference_steps=30).frames[0]
export_to_video(video, "cat.mp4", fps=15) The OOM comes from the default height and width set to
Note that you might still OOM despite these changes due to the resolution being used. Even if it doesn't, you need sufficient CPU RAM to be able to hold the models during offloading. If it OOMs on CPU, you could load the transformer directly in float8_e4m3fn and then enable layerwise upcasting - I believe this is similar to what makes it runnable in low vram in UIs |
It's properly caused by the text encoder is on cpu and the rest is on cuda? text_encoder: cpu
|
cc @sayakpaul for device_map Oh my bad, this is because everything was on cuda for me due to testing on A100 😅 Text encoder should've been automatically moved to CUDA when its forward method is called in your case. Will try on lower VRAM GPU and try to replicate |
Well, the text encoder is being placed on a CPU because with the "balanced" device_map there was nothing else available. So, you could first compute the text embeddings and completely delete the text encoders to free up space. https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi#reproducing-the-results-from-the-genmo-mochi-repo shows an example of how |
(The Moshi code is causing OOM crash on RTX 4090) |
I provided that example as a reference for you to adapt in your use case.
|
Describe the bug
Errors in the HunyuanVideo examples here:
hunyuan_video
Reproduction
Run this code from the link:
Gives this error:
HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/tencent/HunyuanVideo/resolve/main/transformer/config.json
Changing the path to:
hunyuanvideo-community/HunyuanVideo
Gives this error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
And the other example crashes on a RTX 4090 due to OOM.
(I wanted to check if FastHunyuan-diffusers would be more vram friendly, but I couldn't due to those errors)
Logs
System Info
Win 11
Who can help?
@DN6 @a-r-r-o-w
The text was updated successfully, but these errors were encountered: