Skip to content

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

License

Notifications You must be signed in to change notification settings

JeremyRippert/pyannote-audio

 
 

Repository files navigation

Using pyannote.audio open-source toolkit in production?
Make the most of it thanks to our consulting services.

pyannote.audio speaker diarization toolkit

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.

TL;DR

  1. Install pyannote.audio 3.0 with pip install pyannote.audio
  2. Accept pyannote/segmentation-3.0 user conditions
  3. Accept pyannote/speaker-diarization-3.0 user conditions
  4. Create access token at hf.co/settings/tokens.
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.0",
    use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# send pipeline to GPU (when available)
import torch
pipeline.to(torch.device("cuda"))

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...

Highlights

Documentation

Benchmark

Out of the box, pyannote.audio speaker diarization pipeline v3.0 is expected to be much better (and faster) than v2.x.
Those numbers are diarization error rates (in %):

Dataset \ Version v1.1 v2.0 v2.1 v3.0 Premium
AISHELL-4 - 14.6 14.1 12.3 12.3
AliMeeting (channel 1) - - 27.4 24.3 19.4
AMI (IHM) 29.7 18.2 18.9 19.0 16.7
AMI (SDM) - 29.0 27.1 22.2 20.1
AVA-AVD - - - 49.1 42.7
DIHARD 3 (full) 29.2 21.0 26.9 21.7 17.0
MSDWild - - - 24.6 20.4
REPERE (phase2) - 12.6 8.2 7.8 7.8
VoxConverse (v0.3) 21.5 12.6 11.2 11.3 9.5

Citations

If you use pyannote.audio please use the following citations:

@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Test

pytest

About

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 70.2%
  • Python 29.8%