Pytorch Sound is a modeling toolkit that allows engineers to train custom models for sound related tasks. It focuses on removing repetitive patterns that builds deep learning pipelines to boost speed of related experiments.
- Register models and call it other side.
- It is inspired by https://github.com/pytorch/fairseq
import torch.nn as nn
from pytorch_sound.models import register_model, register_model_architecture
@register_model('my_model')
class Model(nn.Module):
...
@register_model_architecture('my_model', 'my_model_base')
def my_model_base():
return {'hidden_dim': 256}
from pytorch_sound.models import build_model
# build model
model_name = 'my_model_base'
model = build_model(model_name)
- Several dataset sources (preprocess, meta, general sound dataset)
LibriTTS, Maestro, VCTK and VoiceBank are prepared at now.
Freely suggest me a dataset or PR is welcome!
- Abstract Training Process
- Build forward function (from data to loss, meta)
- Provide various logging type
- Tensorboard, Console
- scalar, plot, image, audio
import torch
from pytorch_sound.trainer import Trainer, LogType
class MyTrainer(Trainer):
def forward(self, input: torch.tensor, target: torch.tensor, is_logging: bool):
# forward model
out = self.model(input)
# calc your own loss
loss = calc_loss(out, target)
# build meta for logging
meta = {
'loss': (loss.item(), LogType.SCALAR),
'out': (out[0], LogType.PLOT)
}
return loss, meta
-
English handler sources are brought from https://github.com/keithito/tacotron
- Add types
-
General sound settings and sources
- ffmpeg v4
$ sudo add-apt-repository ppa:jonathonf/ffmpeg-4
$ sudo apt update
$ sudo apt install ffmpeg
$ ffmpeg -version
- install package
$ pip install -e .
- Download data files
- In the LibriTTS case, checkout READMD
- Run commands (If you want to change sound settings, Change settings.py)
$ python pytorch_sound/scripts/preprocess.py [libri_tts / vctk / voice_bank] in_dir out_dir
- Checkout preprocessed data, meta files.
- Maestro dataset is not required running preprocess code at now.
- Source (Speech) Separation with audioset : https://github.com/AppleHolic/source_separation
- Python > 3.6
- pytorch 1.0
- ubuntu 16.04
- Data and its meta file
- Data Preprocess
- General functions and modules in sound tasks
- Abstract training process
- Preprocess docs in README.md
- Add test codes and CI
- Document website.
- This repository is under BSD-2 clause license. Check out the LICENSE file.