Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] IP Adapters (author @okotaku ) #5713

Merged
merged 161 commits into from
Nov 21, 2023
Merged

[feat] IP Adapters (author @okotaku ) #5713

merged 161 commits into from
Nov 21, 2023

Conversation

yiyixuxu
Copy link
Collaborator

@yiyixuxu yiyixuxu commented Nov 8, 2023

the author of this PR is @okotaku
and the original PR: #4944

this is a demo of alternative design (alterative to #4944) that add the image_projection layer to Unet

works with SD, SDXL

it works with text-to-image, image-to-image, inpaint, see text-to-image example below, and you can find examples for img2img here and inpaint here

from diffusers import StableDiffusionPipeline
import torch
from diffusers.utils import load_image

pipeline = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("yiyi_test_out.png")

yiyi_test_3_out

It works with LCM-Lora out of box

from diffusers import DiffusionPipeline, LCMScheduler
import torch
from diffusers.utils import load_image

model_id =  "sd-dreambooth-library/herge-style"
lcm_lora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipe.load_lora_weights(lcm_lora_id)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "best quality, high quality"
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = pipe(
    prompt=prompt,
    ip_adapter_image=image,
    num_inference_steps=4,
    guidance_scale=1,
).images[0]

yiyi_test_2_out

work with controlnet

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
from diffusers.utils import load_image

controlnet_model_path = "lllyasviel/control_v11f1p_sd15_depth"
controlnet = ControlNetModel.from_pretrained(controlnet_model_path, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png")
depth_map = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image=depth_map,
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    generator=generator,
).images
images[0].save("yiyi_test_2_out.png")
ip_image condition output
statue depth yiyi_test_2_out

work with animate diff

# animate diff + ip adapter
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif, load_image

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "Lykon/DreamShaper"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)

# scheduler
scheduler = DDIMScheduler(
    clip_sample=False,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="linear",
    timestep_spacing="trailing",
    steps_offset=1
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

# load ip_adapter
pipe.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")

pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-tilt-up", adapter_name="tilt-up")
pipe.load_lora_weights("guoyww/animatediff-motion-lora-pan-left", adapter_name="pan-left")

seed = 42
image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
images = [image] * 3
prompts = ["best quality, high quality"] * 3
negative_prompt = "bad quality, worst quality"
adapter_weights = [[0.75, 0.0, 0.0], [0.0, 0.0, 0.75], [0.0, 0.75, 0.75]]

output_frames = []
for prompt, image, adapter_weight in zip(prompts, images, adapter_weights):
    pipe.set_adapters(["zoom-out", "tilt-up", "pan-left"], adapter_weights=adapter_weight)
    output = pipe(
      prompt= prompt,
      num_frames=16,
      guidance_scale=7.5,
      num_inference_steps=30,
      ip_adapter_image = image,
      generator=torch.Generator("cpu").manual_seed(seed),
    )
    frames = output.frames[0]
    output_frames.extend(frames)

export_to_gif(output_frames, "test_out_animation.gif")

yiyi_test_2_out_animation

@yiyixuxu
Copy link
Collaborator Author

@marianbastiUNRN

I think it is fine for now
we would very much like to support face models. It won't take much work and we will ask some help from the community soon :)

let me know if you're interested in working on this

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Let's merge 🚀

@yiyixuxu yiyixuxu merged commit ba352ae into main Nov 21, 2023
22 checks passed
@yiyixuxu yiyixuxu deleted the ip-adapter branch November 21, 2023 17:34
affromero pushed a commit to affromero/diffusers that referenced this pull request Nov 24, 2023
* add ip-adapter


---------

Co-authored-by: okotaku <[email protected]>
Co-authored-by: sayakpaul <[email protected]>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
@alexblattner
Copy link

I've been working on this for 2 weeks and now it's built in.... Thanks haha

@sayakpaul
Copy link
Member

Open a new issue for this. It's ideal for users to comment on PRs after they have been merged.

@TonyLianLong
Copy link
Contributor

This PR seems to break the positional arguments for super calls as it adds a image_encoder before requires_safety_checker. An example of what breaks after the change can be shown here (with the commit fixing it): #5993

We might want to clarify this in the release note for the next release.

@yiyixuxu
Copy link
Collaborator Author

hi @TonyLianLong
I looked at #5993 I think it's because it used StableDiffusionPipeline as the base class, instead of DiffusionPipeline

@MackorLab
Copy link

Hello, I’m just starting to program in Python and I still don’t understand exactly how to do it correctly
Please tell me Can I save the finished file in mp4 format?

@okaris
Copy link

okaris commented Dec 7, 2023

@marianbastiUNRN

I think it is fine for now we would very much like to support face models. It won't take much work and we will ask some help from the community soon :)

let me know if you're interested in working on this

@yiyixuxu I'm interested in implementing this, can you guide me to the steps necessary please?

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* add ip-adapter


---------

Co-authored-by: okotaku <[email protected]>
Co-authored-by: sayakpaul <[email protected]>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
@xhinker
Copy link
Contributor

xhinker commented Dec 28, 2023

oh img2img is really cool

from transformers import CLIPVisionModelWithProjection, CLIPImageProcessor
from diffusers import StableDiffusionImg2ImgPipeline
import torch
from diffusers.utils import load_image
from PIL import Image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    image_encoder = image_encoder, torch_dtype=torch.float16, safety_checker=None)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/vermeer.jpg")
ip_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/river.png")


pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image = image,
    ip_adapter_image=ip_image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_images_per_prompt=1, 
    num_inference_steps=50,
    generator=generator,
    strength=0.6,
).images
images[0].save("yiyi_test_3_out.png")

ip_image image output
river vermeer yiyi_test_3_out

Hi, @yiyixuxu

Could you also provide an img2img IPAdaptor sample for SDXL? I always got below error when using SDXL. Thanks!

RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1664 and 1280x1280)

@xhinker
Copy link
Contributor

xhinker commented Dec 28, 2023

oh img2img is really cool

from transformers import CLIPVisionModelWithProjection, CLIPImageProcessor
from diffusers import StableDiffusionImg2ImgPipeline
import torch
from diffusers.utils import load_image
from PIL import Image

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "h94/IP-Adapter", 
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    image_encoder = image_encoder, torch_dtype=torch.float16, safety_checker=None)
pipeline.to("cuda")

image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/vermeer.jpg")
ip_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/river.png")


pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality', 
    image = image,
    ip_adapter_image=ip_image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_images_per_prompt=1, 
    num_inference_steps=50,
    generator=generator,
    strength=0.6,
).images
images[0].save("yiyi_test_3_out.png")

ip_image image output
river vermeer yiyi_test_3_out

Hi, @yiyixuxu

Could you also provide an img2img IPAdaptor sample for SDXL? I always got below error when using SDXL. Thanks!

RuntimeError: mat1 and mat2 shapes cannot be multiplied (514x1664 and 1280x1280)

Never mind, I figured it out, I need to use the sd_models' image encode explicitly. like this:

image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    <IP-Adapter Model Path>
    subfolder="models/image_encoder",
    torch_dtype=torch.float16,
).to("cuda")

pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    <pretrain model path>
    , torch_dtype=torch.float16
    , image_encoder      = image_encoder
)
pipeline.to("cuda")

@thibaudart
Copy link

is it possible to load multiple image as reference for IP adapter?

@patrickvonplaten
Copy link
Contributor

Hey @thibaudart,

Hope you're doing well - we've just recently opened the Discussion tab on the Diffusers' repo: https://github.com/huggingface/diffusers/discussions
Would you mind posting your question there?

@thibaudart
Copy link

Hey @thibaudart,

Hope you're doing well - we've just recently opened the Discussion tab on the Diffusers' repo: https://github.com/huggingface/diffusers/discussions Would you mind posting your question there?

of course

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* add ip-adapter


---------

Co-authored-by: okotaku <[email protected]>
Co-authored-by: sayakpaul <[email protected]>
Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
@xingyouxin
Copy link

For controlnet and ip-Adapter, I have a question about the multi-computation by using a batch size, e.g., batch_size = 4. I try to put image, prompt, and generator lists, etc. into the pipeline. But the result failed with an error: ValueError: ip_adapter_image must have same length as the number of IP Adapters. Got 4 images and 1 IP Adapters.

Thus, maybe the multi-computation by using a batch size is not added in this project. I am not sure. Could anyone help me? Thanks.

@asomoza
Copy link
Member

asomoza commented May 13, 2024

it would be better if you open a new issue with this, also you will need to provide us with a minimal reproducible code.

Without it, I can say that the error message says it all, you are passing 4 images to the ip adapters but you're only loading one ip adapter.

Probably the error lies in how are you passing the images for the batch.

@xingyouxin
Copy link

it would be better if you open a new issue with this, also you will need to provide us with a minimal reproducible code.

Without it, I can say that the error message says it all, you are passing 4 images to the ip adapters but you're only loading one ip adapter.

Probably the error lies in how are you passing the images for the batch.

Hello, Mr. asomoza. Thanks for your reply. With your help, I have taken some tests but still failed. So I open an issue about the details.

@xingyouxin
Copy link

xingyouxin commented May 14, 2024

it would be better if you open a new issue with this, also you will need to provide us with a minimal reproducible code.

Without it, I can say that the error message says it all, you are passing 4 images to the ip adapters but you're only loading one ip adapter.

Probably the error lies in how are you passing the images for the batch.

Dear asomoza, it seems that I have figured out my problem. Finally, I find that the ip-Adapter embedding is not supported to work with a batch of images separately. It deals with all the images in one batch uniformly. Thus, the better way is to embed the adapter images one by one and then cat (torch.cat) them up. Then we pass the catted embeddings into our pipeline to generate images in one batch separately. The details can be seen in this issue #7933. Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.