Using pyannote.audio
open-source toolkit in production?
Make the most of it thanks to our consulting services.
pyannote.audio
is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.
- Install
pyannote.audio
3.0
withpip install pyannote.audio
- Accept
pyannote/segmentation-3.0
user conditions - Accept
pyannote/speaker-diarization-3.0
user conditions - Create access token at
hf.co/settings/tokens
.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.0",
use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")
# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))
# apply pretrained pipeline
diarization = pipeline("audio.wav")
# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...
- 🤗 pretrained pipelines (and models) on 🤗 model hub
- 🤯 state-of-the-art performance (see Benchmark)
- 🐍 Python-first API
- ⚡ multi-GPU training with pytorch-lightning
- Changelog
- Frequently asked questions
- Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
- Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Adapting a pretrained pipeline to your own data
- Training a pipeline
- Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
- Blog
- Videos
- Introduction to speaker diarization / JSALT 2023 summer school / 90 min
- Speaker segmentation model / Interspeech 2021 / 3 min
- First releaase of pyannote.audio / ICASSP 2020 / 8 min
Out of the box, pyannote.audio
speaker diarization pipeline v3.0 is expected to be much better (and faster) than v2.x.
Those numbers are diarization error rates (in %):
Dataset \ Version | v1.1 | v2.0 | v2.1 | v3.0 | Premium |
---|---|---|---|---|---|
AISHELL-4 | - | 14.6 | 14.1 | 12.3 | 12.3 |
AliMeeting (channel 1) | - | - | 27.4 | 24.3 | 19.4 |
AMI (IHM) | 29.7 | 18.2 | 18.9 | 19.0 | 16.7 |
AMI (SDM) | - | 29.0 | 27.1 | 22.2 | 20.1 |
AVA-AVD | - | - | - | 49.1 | 42.7 |
DIHARD 3 (full) | 29.2 | 21.0 | 26.9 | 21.7 | 17.0 |
MSDWild | - | - | - | 24.6 | 20.4 |
REPERE (phase2) | - | 12.6 | 8.2 | 7.8 | 7.8 |
VoxConverse (v0.3) | 21.5 | 12.6 | 11.2 | 11.3 | 9.5 |
If you use pyannote.audio
please use the following citations:
@inproceedings{Plaquet23,
author={Alexis Plaquet and Hervé Bredin},
title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
author={Hervé Bredin},
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio
library.
pip install -e .[dev,testing]
pre-commit install
pytest