The source of the dataset is described below:
-
Image Folder: Download 2017 Train images from
https://cocodataset.org/#download
. -
SFT Dataset: Download from LLaVA-Instruct. Like LLaVA, we can also pre-train for feature alignment using SFT datasets, such as CC3M.
-
RM Dataset: Download from LLaVA-Human-Preference. We use the
convert_to_llava_reward_dataset.py
script to convert it into the format of our reward training dataset. -
PPO Dataset: We use the same data format as SFT, but in this answer field we can leave it empty or give it a random string.