Skip to content

Latest commit

 

History

History

ImageBind_SAM

ImageBind with SAM

This is an experimental demo aims to combine ImageBind and SAM to generate mask with different modalities.

This basic idea is followed with IEA: Image Editing Anything and CLIP-SAM which generate the referring mask with the following steps:

  • Step 1: Generate auto masks with SamAutomaticMaskGenerator
  • Step 2: Crop all the box region from the masks
  • Step 3: Compute the similarity with cropped images and different modalities
  • Step 4: Merge the highest similarity mask region

Table of contents

Installation

  • Download the pretrained checkpoints
cd playground/ImageBind_SAM

mkdir .checkpoints
cd .checkpoints

# download imagebind weights
wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Run the demo

python demo.py

We implement Text Seg and Audio Seg in this demo, the generate masks will be saved as text_sam_merged_mask.jpg and audio_sam_merged_mask.jpg:

Input Model Modality Generate Mask
car audio
"A car"

By setting different threshold may influence a lot on the final results.

Run image referring segmentation demo

# download the referring image
cd .assets
wget https://github.com/IDEA-Research/detrex-storage/releases/download/grounded-sam-storage/referring_car_image.jpg
cd ..

python image_referring_seg_demo.py

Run audio referring segmentation demo

python audio_referring_seg_demo.py

Run text referring segmentation demo

python text_referring_seg_demo.py