is fine-tuning RADIO possbile? #100

wuyouliaoxi · 2024-11-14T18:32:24Z

Hello, I performed very well on segmentation tasks with the frozen RADIO backbone. For comparison, I want to fully fine-tune or using ViT-adapter.

However, I feel very hard to implement it, specifically, I don't find patch_embed and pos_embed instead of the patch generator (with cpe?). So, I want to kindly ask, is it possible to fine-tune RADIO? If so, how can I do it?

mranzinger · 2024-11-14T18:55:24Z

Yeah, can definitely fine tune RADIO. Can you explain why you're looking for patch_embed and pos_embed? We replace those modules with patch_generator as you've noticed: https://github.com/NVlabs/RADIO/blob/main/radio/enable_cpe_support.py#L147-L150

But you should be able to unlock the model in the same way. I'm guessing the challenge is working with ViT-Adaptor, which hooks itself into the model?

wuyouliaoxi · 2024-11-14T19:11:18Z

Thank you very much for your quick reply!
Exactly, I try to use ViT-Adapter. From their implementation:

Patch Embedding forward

    x, H, W = self.patch_embed(x)
    bs, n, dim = x.shape
    cls = self.cls_token.expand(bs, -1, -1)  # stole cls_tokens impl from Phil Wang, thanks

    if self.pos_embed is not None:
        pos_embed = self._get_pos_embed(self.pos_embed, H, W)
        x = x + pos_embed
    x = self.pos_drop(x)

patch_embed and pos_embed need to be decoupled from the pretrained ViT backbone. But they are set as None in RADIO: https://github.com/NVlabs/RADIO/blob/main/radio/enable_cpe_support.py#L105-L108.

Moreover, when I look at the patch generator, I don't explicitly find the patch embed (refer to the large kernel convolution layer in ViT) instead of vit_patch_generator loaded from checkpoint.
https://github.com/NVlabs/RADIO/blob/main/radio/vit_patch_generator.py#L279

Do I misunderstand something important? Thanks again for your help!

mranzinger · 2024-11-14T19:17:05Z

I don't think ViT-Adapter is doing anything special prior to the transformer blocks, right? If so, then you can replace all of the above referenced lines with simply:

H, W = tuple(d // self.patch_size for d in x.shape[-2:])
x = self.patch_generator(x)

This is because we performed this replacement in our training harness, and so the weights that were actually trained belong to the patch_generator. Can you provide a little bit more code/detail for what you're trying to do so that I can help?

wuyouliaoxi · 2024-11-14T19:28:44Z

Ah, you are right. I was too obsessed with finding the position encoder...

But I am not sure: I think RADIOv2.5 uses CPE instead of absolute PE. That's why I don't need to interpolate the pos_embed as in normal finetuning when the input image size differs to the pretrained image size.

mranzinger · 2024-11-14T20:26:08Z

I think RADIOv2.5 uses CPE instead of absolute PE.

Yes, this is exactly correct. The patch_generator is dealing with all of that for you!

wuyouliaoxi · 2024-11-14T20:28:59Z

I can't thank you enough!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is fine-tuning RADIO possbile? #100

is fine-tuning RADIO possbile? #100

wuyouliaoxi commented Nov 14, 2024

mranzinger commented Nov 14, 2024

wuyouliaoxi commented Nov 14, 2024

mranzinger commented Nov 14, 2024

wuyouliaoxi commented Nov 14, 2024

mranzinger commented Nov 14, 2024

wuyouliaoxi commented Nov 14, 2024

is fine-tuning RADIO possbile? #100

is fine-tuning RADIO possbile? #100

Comments

wuyouliaoxi commented Nov 14, 2024

mranzinger commented Nov 14, 2024

wuyouliaoxi commented Nov 14, 2024

Patch Embedding forward

mranzinger commented Nov 14, 2024

wuyouliaoxi commented Nov 14, 2024

mranzinger commented Nov 14, 2024

wuyouliaoxi commented Nov 14, 2024