This SISR (Single Image Super-Resolution) architecture is based on the GatedCNNBlock, introduced in the MambaOut repository. While technically the architecture has no direct connection to Mamba, I decided to retain the reference in the name to support the original author's joke. Previously, I developed the MoSR architecture based on this approach, and technically, MoESR is an extended version of it. The main goal of this extension is to compete with ESRGAN, which remains one of the best in the mid-range segment of convolutional networks.
Special thanks to the-database for conducting the testing and creating the pre-trained model.
- Training framework: trainner-redux
- Hardware: RTX 4090 GPU
- Batch size: 32
- LQ size: 64
- EMA: 0.999
- Loss function: MS-SSIM_L1
- Val set: Urban100
- Train set: DF2k
xychart-beta
title "B: MoESR vs ESRGAN"
x-axis [5k, 50k, 100k, 150k, 200k, 250k, 300k, 350k, 400k, 450k, 500k]
y-axis "SSIM (higher is better)"
line [0.7481029033660889, 0.7973534464836121, 0.8052924275398254, 0.8097050786018372, 0.811339795589447, 0.8130168318748474, 0.8137122392654419, 0.8144168853759766, 0.814674973487854, 0.8149675130844116, 0.8150919079780579]
line [0.7562176585197449, 0.8062689304351807, 0.8118854761123657, 0.8150879740715027, 0.8162787556648254, 0.8170512914657593, 0.8173067569732666, 0.8174617290496826, 0.8176395893096924, 0.8176536560058594, 0.8176558017730713]
model_name | psnr | ssim | fps 640x480 | vram 640x480 |
---|---|---|---|---|
ESRGAN | 26.98 | 0.8151 | 3.34 | 2.50 GB |
MoESR | 27.05 | 0.8176 | 4.02 | 0.93 GB |
pretrain safetensors - original
pretrain pth - convert