Skip to content

Latest commit

 

History

History
147 lines (103 loc) · 5.4 KB

README.md

File metadata and controls

147 lines (103 loc) · 5.4 KB

Unsupervised Video Summarization via Multi-source Features

This is the official GitHub page for the paper:

Hussain Kanafani, Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth. 2021. Unsupervised Video Summarization via Multi-source Features. In Proceedings of the 2021 International Conference on MultimediaRetrieval (ICMR ’21), August 21–24, 2021, Taipei, Taiwan. ACM, New York, NY, USA, https://doi.org/10.1145/3460426.3463597

The paper is available on:

Model architecture: Multi-Source Chunk and Stride Fusion (MCSF)

MCSF

Get started (Requirements and Setup)

python 3.6

cd MCSF
conda create -n mcsf python=3.6
conda activate mcsf  
pip install -r requirements.txt

Project Structure

Directory: 
- /data
        - /plc_365 (places features  for summe and tvsum)
        - /splits (original and non-overlapping splits)
        - /SumMe (processed dataset h5)
        - /TVSum (processed dataset h5)
- /csnet (implementation of csnet method)
- /mcsf-places365-early-fusion 
- /mcsf-places365-late-fusion 
- /mcsf-places365-intermediate-fusion
- /src/evaluation (evaluation using F1-score)
- /src/visualization 
- /sum-ind (implementation of SUM-Ind method)


Datasets

Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and [Wei-Lun Chao] and the h5 files were obtained from Kaiyang Zhou.

Download

wget https://zenodo.org/record/4884870/files/datasets.tar

Files Structure

The implemented models use the provided h5 files which have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset

Original videos and annotations for each dataset are also available in the authors' project webpages:

TVSum dataset: https://github.com/yalesong/tvsum

SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark


MCSF Variations and CSNet

We used SUM-GAN method as a starting point for the implementation.


How to train

Run main.py file with the configurations specified in configs.py to train the model. In config.py you find argument parameters for training:

Parameter type default
mode string possible values (train, test) train
verbose boolean true
video_type string (summe or tvsum) summe
input_size int 1024
hidden_size int 500
split_index int 0
n_epochs int 20
m int (number of divisions used for chunk and stride network) 4


For training the model using a single split, run:

python main.py --split_index N (with N being the index of the split)

How to evaluate

Using multiple human-generated summaries per video: To evaluate CSNET and all other MCSF models by comparing, after each training epoch, the generated summary for each test video against a set of reference human summaries that are available for that video (see the '/user_summary' entry in the explanation of the h5 file structure in the Data section above), run the 'src/evalution/evaluate.py' script after specifying which config file to use: 'config_summe.yaml' or 'config_tvsum.yaml'


SUM-Ind

Train and test codes are written in main.py. To see the detailed arguments, please do python main.py -h.


How to train

python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --verbose

How to test

python main.py -d datasets/eccv16_dataset_summe_google_pool5.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results

Citation

@article{kanafani2021MCSF, 
   title={Unsupervised Video Summarization via Multi-source Features},
   author={Kanafani, Hussain and Ghauri, Junaid Ahmed and Hakimov, Sherzod and Ewerth, Ralph}, 
   Conference={ACM International Conference on Multimedia Retrieval (ICMR)}, 
   year={2021} 
}