Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is fine-tuning RADIO possbile? #100

Open
wuyouliaoxi opened this issue Nov 14, 2024 · 6 comments
Open

is fine-tuning RADIO possbile? #100

wuyouliaoxi opened this issue Nov 14, 2024 · 6 comments

Comments

@wuyouliaoxi
Copy link

Hello, I performed very well on segmentation tasks with the frozen RADIO backbone. For comparison, I want to fully fine-tune or using ViT-adapter.

However, I feel very hard to implement it, specifically, I don't find patch_embed and pos_embed instead of the patch generator (with cpe?). So, I want to kindly ask, is it possible to fine-tune RADIO? If so, how can I do it?

@mranzinger
Copy link
Collaborator

Yeah, can definitely fine tune RADIO. Can you explain why you're looking for patch_embed and pos_embed? We replace those modules with patch_generator as you've noticed: https://github.com/NVlabs/RADIO/blob/main/radio/enable_cpe_support.py#L147-L150

But you should be able to unlock the model in the same way. I'm guessing the challenge is working with ViT-Adaptor, which hooks itself into the model?

@wuyouliaoxi
Copy link
Author

Thank you very much for your quick reply!
Exactly, I try to use ViT-Adapter. From their implementation:

Patch Embedding forward

    x, H, W = self.patch_embed(x)
    bs, n, dim = x.shape
    cls = self.cls_token.expand(bs, -1, -1)  # stole cls_tokens impl from Phil Wang, thanks

    if self.pos_embed is not None:
        pos_embed = self._get_pos_embed(self.pos_embed, H, W)
        x = x + pos_embed
    x = self.pos_drop(x)

patch_embed and pos_embed need to be decoupled from the pretrained ViT backbone. But they are set as None in RADIO: https://github.com/NVlabs/RADIO/blob/main/radio/enable_cpe_support.py#L105-L108.

Moreover, when I look at the patch generator, I don't explicitly find the patch embed (refer to the large kernel convolution layer in ViT) instead of vit_patch_generator loaded from checkpoint.
https://github.com/NVlabs/RADIO/blob/main/radio/vit_patch_generator.py#L279

Do I misunderstand something important? Thanks again for your help!

@mranzinger
Copy link
Collaborator

I don't think ViT-Adapter is doing anything special prior to the transformer blocks, right? If so, then you can replace all of the above referenced lines with simply:

H, W = tuple(d // self.patch_size for d in x.shape[-2:])
x = self.patch_generator(x)

This is because we performed this replacement in our training harness, and so the weights that were actually trained belong to the patch_generator. Can you provide a little bit more code/detail for what you're trying to do so that I can help?

@wuyouliaoxi
Copy link
Author

Ah, you are right. I was too obsessed with finding the position encoder...

But I am not sure: I think RADIOv2.5 uses CPE instead of absolute PE. That's why I don't need to interpolate the pos_embed as in normal finetuning when the input image size differs to the pretrained image size.

@mranzinger
Copy link
Collaborator

I think RADIOv2.5 uses CPE instead of absolute PE.

Yes, this is exactly correct. The patch_generator is dealing with all of that for you!

@wuyouliaoxi
Copy link
Author

I can't thank you enough!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants