An official implementation of the code including T5 models for VQA/VQG can be find here(soon publicly available).
For reproductibility purpose, we also make available the outputs used to compute the metrics in the paper.
@inproceedings{scialom2020bert,
title={What BERT Sees: Cross-Modal Transfer for Visual Question Generation},
author={Scialom, Thomas and Bordes, Patrick and Dray, Paul-Alexis and Staiano, Jacopo and Gallinari, Patrick},
booktitle={Proceedings of the 13th International Conference on Natural Language Generation},
pages={327--337},
year={2020}
}