Skip to content

DepictQA: Depicted Image Quality Assessment with Vision Language Models

License

Notifications You must be signed in to change notification settings

XPixelGroup/DepictQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DepictQA: Depicted Image Quality Assessment with Vision Language Models

🌏 Project Page • 📀 Datasets ( huggingface / modelscope )

Official pytorch implementation of the papers:

  • DepictQA-Wild (DepictQA-v2): paper, project page.

    Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue, "Descriptive Image Quality Assessment in the Wild," arXiv preprint arXiv:2405.18842, 2024.

  • DepictQA-v1: paper, project page.

    Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, Chao Dong, "Depicting beyond scores: Advancing image quality assessment through multi-modal language models," ECCV, 2024.

Update

📆 [2024.07] DepictQA datasets were released in huggingface / modelscope.

📆 [2024.07] DepictQA-v1 was accepted to ECCV 2024.

📆 [2024.05] We released DepictQA-Wild (DepictQA-v2): a multi-functional in-the-wild descriptive image quality assessment model.

📆 [2023.12] We released DepictQA-v1, a multi-modal image quality assessment model based on vision language models.

Installation

  • Create environment.

    # clone this repo
    git clone https://github.com/XPixelGroup/DepictQA.git
    cd DepictQA
    
    # create environment
    conda create -n depictqa python=3.10
    conda activate depictqa
    pip install -r requirements.txt
    
  • Download pretrained models.

    • CLIP-ViT-L-14. Required.
    • Vicuna-v1.5-7B. Required.
    • All-MiniLM-L6-v2. Required only for confidence estimation of detailed reasoning responses.
    • Our pretrained delta checkpoint (see Models). Optional for training. Required for demo and inference.
  • Ensure that all downloaded models are placed in the designated directories as follows.

    |-- DepictQA
    |-- ModelZoo
        |-- CLIP
            |-- clip
                |-- ViT-L-14.pt
        |-- LLM
            |-- vicuna
                |-- vicuna-7b-v1.5
        |-- SentenceTransformers
            |-- all-MiniLM-L6-v2
    

    If models are stored in different directories, revise config.model.vision_encoder_path, config.model.llm_path, and config.model.sentence_model in config.yaml (under the experiments directory) to set new paths.

  • Move our pretrained delta checkpoint to a specific experiment directory (e.g., DQ495K, DQ495K_QPath) as follows.

    |-- DepictQA
        |-- experiments
            |-- a_specific_experiment_directory
                |-- ckpt
                    |-- ckpt.pt
    

    If the delta checkpoint is stored in another directory, revise config.model.delta_path in config.yaml (under the experiments directory) to set new path.

Models

Training Data Tune Hugging Face Description
DQ-495K + KonIQ + SPAQ Abstractor, LORA download Vision abstractor to reduce token numbers. Trained on DQ-495K, KonIQ, and SPAQ datasets. Able to handle images with resolution larger than 1000+, and able to compare images with different contents.
DQ-495K + Q-Instruct Projector, LORA, download Trained on DQ-495K and Q-Instruct (see paper) datasets. Able to complete multiple-choice, yes-or-no, what, how questions, but degrades in assessing and comparison tasks.
DQ-495K + Q-Pathway Projector, LORA download Trained on DQ-495K and Q-Pathway (see paper) datasets. Performs well on real images, but degrades in comparison tasks.
DQ-495K Projector, LORA download Trained on DQ-495K dataset. Used in our paper.

Demos

Online Demo

We provide an online demo (coming soon) deployed on huggingface spaces.

Gradio Demo

We provide a gradio demo for local test.

  • cd a specific experiment directory: cd experiments/a_specific_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

  • Launch controller: sh launch_controller.sh

  • Launch gradio server: sh launch_gradio.sh

  • Launch DepictQA worker: sh launch_worker.sh id_of_one_gpu

You can revise the server config in serve.yaml. The url of deployed demo will be http://{serve.gradio.host}:{serve.gradio.port}. The default url is http://0.0.0.0:12345 if you do not revise serve.yaml.

Note that multiple workers can be launched simultaneously. For each worker, serve.worker.host, serve.worker.port, serve.worker.worker_url, and serve.worker.model_name should be unique.

Datasets

  • Source codes for DQ-495K (used in DepictQA-v2) dataset construction are provided in here.

  • Download MBAPPS (used in DepictQA-v1) and DQ-495K (used in DepictQA-v2) datasets from huggingface / modelscope. Move the dataset to the same directory of this repository as follows.

    |-- DataDepictQA
    |-- DepictQA
    

    If the dataset is stored in another directory, revise config.data.root_dir in config.yaml (under the experiments directory) to set new path.

Training

  • cd a specific experiment directory: cd experiments/a_specific_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14 and Vicuna-v1.5-7B are downloaded and (3) their paths are set in config.yaml.

  • Run training: sh train.sh ids_of_gpus.

Inference

Inference on Our Benchmark

  • cd a specific experiment directory: cd experiments/a_specific_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

  • Run a specific infer shell (e.g., infer_A_brief.sh): sh infer_A_brief.sh id_of_one_gpu.

Inference on Custom Dataset

  • Construct *.json file for your dataset as follows.

    [
        {
            "id": unique id of each sample, required, 
            "image_ref": reference image, null if not applicable, 
            "image_A": image A, null if not applicable, 
            "image_B": image B, null if not applicable, 
            "query": input question, required, 
        }, 
        ...
    ]
    
  • cd your experiment directory: cd your_experiment_directory

  • Check Installation to make sure (1) the environment is installed, (2) CLIP-ViT-L-14, Vicuna-v1.5-7B, and the pretrained delta checkpoint are downloaded and (3) their paths are set in config.yaml.

  • Construct your infer shell as follows.

    #!/bin/bash
    src_dir=directory_of_src
    export PYTHONPATH=$src_dir:$PYTHONPATH
    export CUDA_VISIBLE_DEVICES=$1
    
    python $src_dir/infer.py \
        --meta_path json_path_1_of_your_dataset \
                    json_path_2_of_your_dataset \
        --dataset_name your_dataset_name_1 \
                       your_dataset_name_2 \
        --task_name task_name \
        --batch_size batch_size \
    

    --task_name can be set as follows.

    Task Name Description
    quality_compare AB comparison in full-reference
    quality_compare_noref AB comparison in non-reference
    quality_single_A Image A assessment in full-reference
    quality_single_A_noref Image A assessment in non-reference
    quality_single_B Image B assessment in full-reference
    quality_single_B_noref Image B assessment in non-reference
  • Run your infer shell : sh your_infer_shell.sh id_of_one_gpu.

Evaluation

  • cd the evaluation directory: cd src/eval.

  • Various evaluation scripts are explained as follows.

    Script Description
    cal_acc_single_distortion.py accuracy of single-distortion identification
    cal_acc_multi_distortion.py accuracy of multi-distortion identification
    cal_acc_rating.py accuracy of instant rating
    cal_gpt4_score_detail_v1.py GPT-4 score of detailed reasoning tasks in DepictQA-v1. Treat both prediction and ground truth as assistants, calculate the relative score of prediction over ground truth.
    cal_gpt4_score_detail_v2.py GPT-4 score of detailed reasoning tasks in DepictQA-v2. Only treat prediction as an assistant, directly assess the consistency between prediction and ground truth.
  • Run basic evaluation (e.g., cal_acc_single_distortion.py):

    python cal_acc_single_distortion.py --pred_path predict_json_path --gt_path ground_truth_json_path
    

    Some specific parameters are explained as follows.

    For the calculation of accuracy:

    • --confidence (store_true): whether to calculate accuracy within various confidence intervals.
    • --intervals (list of float, default [0, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1]): the confidence intervals, only valid when --confidence is true.

    For the calculation of GPT-4 score:

    • --save_path (str, required): *.json path to save the evaluation results including scores and reasons.

Acknowledgement

This repository is based on LAMM. Thanks for this awesome work.

BibTeX

If you find our work useful for your research and applications, please cite using the BibTeX:

@article{depictqa_v2,
    title={Descriptive Image Quality Assessment in the Wild},
    author={You, Zhiyuan and Gu, Jinjin and Li, Zheyuan and Cai, Xin and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={arXiv preprint arXiv:2405.18842},
    year={2024}
}


@inproceedings{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    booktitle={European Conference on Computer Vision},
    year={2024}
}

About

DepictQA: Depicted Image Quality Assessment with Vision Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published