It is a re-implementation code for the UAVSal model.
Related Project
-
Kao Zhang, Zhenzhong Chen, Shan Liu. A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction. IEEE Transactions on Image Processing (TIP), vol. 30, pp. 572-587, 2021.
Github: https://github.com/zhangkao/IIP_STRNN_Saliency -
Kao Zhang, Zhenzhong Chen. Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 29, no. 12, pp. 3544-3557, 2019.
Github: https://github.com/zhangkao/IIP_TwoS_Saliency
The code was developed using Python 3.6+ & pytorch 1.4+ & CUDA 10.0+. There may be a problem related to software versions.
- Windows10/11 or Ubuntu20.04
- Anaconda latest, Python
- CUDA, CUDNN
You can try to create a new environment in anaconda, as follows
*For GEFORCE RTX 10 series, such as GTX1080, xp, etc. (Pytorch 1.4.0~1.7.1, python=3.6~3.8)
conda create -n uavsal python=3.8
conda activate uavsal
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=10.1 -c pytorch
pip install numpy hdf5storage h5py==2.10.0 scipy matplotlib opencv-python scikit-image torchsummary
*For GEFORCE RTX 30 series, such as RTX3060, 3080, etc.
conda create -n uavsal python=3.7
conda activate uavsal
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
pip install numpy hdf5storage h5py==2.10.0 scipy matplotlib opencv-python scikit-image torchsummary
Download the pre-trained models and put the pre-trained model into the "weights" file.
The parameters
-
Please change the working directory: "dataDir" to your path in the "Demo_Test.py" and "Demo_Train_Test.py" files, like:
dataDir = '/home/name/DataSet/'
-
More parameters are in the "train" and "test" functions.
-
Run the demo "Demo_Test.py" and "Demo_Train_Test.py" to test or train the model.
The full training process:
- We initialize the SRF-Net with the pretrained MobileNet V2 and fine-tune the model on SALICON dataset. Then we train the whole model on EyeTrackUAV2 and AVS1K, respectively.
The training and testing datasets:
- Training dataset: SALICON(2015), UAV2, and AVS1K
- Testing dataset: UAV2-TE and AVS1K-TE
The training and test data examples:
And it is easy to change the output format in our code.
- The results of video task is saved by ".mat"(uint8) formats.
- You can get the color visualization results based on the "Visualization Tools".
- You can evaluate the performance based on the "EvalScores Tools".
- You can get the parameter size of each component based on the "Getmodelsize Tools".
Results: ALL (5.8G):
The model is trained using Adam optimizer with lr=0.0001 and weight_decay=0.00005
The model is trained using Adam optimizer with lr=0.00001 and weight_decay=0.000005
It can achieve faster speed (85FPS) with similar performance by slightly reducing the input size from original 360 x 640 pixels to 288 x 512 pixels
A video demo is provided for comparison with State-of-the-art methods, including: OneDrive (596M)
-
DL based models: STRNN*, TwoS*, TASED*, UNISAL*
-
non-DL based models: GBVSm, AWSD.
-
The models fine-tuned on the corresponding dataset (UAV2 and AVS1K) are marked with *.
-
After fine-tuning, the performance of these models improved significantly.
-
The first four scenes are from the UAV2-TE dataset, the rest are from AVS1K-TE.
If you use the UAVSal video saliency model, please cite the following paper:
@article{zhang2022an,
title={An Efficient Saliency Prediction Model for Unmanned Aerial Vehicle Video},
author={Zhang, Kao and Chen, Zhenzhong and Li, Songnan and Liu, shan},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume={xxxx},
pages={xxxx},
year={2022}
}
Kao ZHANG
Laboratory of Intelligent Information Processing (LabIIP)
Wuhan University, Wuhan, China.
Email: [email protected]
Zhenzhong CHEN (Professor and Director)
Laboratory of Intelligent Information Processing (LabIIP)
Wuhan University, Wuhan, China.
Email: [email protected]
Web: http://iip.whu.edu.cn/~zzchen/