Skip to content

NSF-HiFiGAN with 44.1 kHz sampling rate

Compare
Choose a tag to compare
@yqzhishen yqzhishen released this 11 Dec 06:34
· 6 commits to main since this release

This release contains the first formal public release of the DiffSinger Community Vocoder Project, which includes:

  • A pretrained model for inference
  • A pretrained model for fine-tuning
  • An ONNX model for lightweight and portable deployment

Overview

Architecture: NSF-HiFiGAN
Training data: ~93h singing voice
Training step: over 1m
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz

Notice

Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.