Realtime human head pose estimation with ONNX Runtime and OpenCV.
There are three major steps:
- Face detection. A face detector is introduced to provide a face bounding box containing a human face. Then the face box is expanded and transformed to a square to suit the needs of later steps.
- Facial landmark detection. A pre-trained deep learning model take the face image as input and output 68 facial landmarks.
- Pose estimation. After getting 68 facial landmarks, the pose could be calculated by a mutual PnP algorithm.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
The code was tested on Ubuntu 22.04 with following frameworks:
- ONNX Runtime: 1.17.1
- OpenCV: 4.5.4
Clone the repo:
git clone https://github.com/yinguobing/head-pose-estimation.git
Install dependencies with pip:
pip install -r requirements.txt
Pre-trained models provided in the assets
directory. Download them with Git LFS:
git lfs pull
Or, download manually from the release page.
A video file or a webcam index should be assigned through arguments. If no source provided, the built in webcam will be used by default.
For any video format that OpenCV supports (mp4
, avi
etc.):
python3 main.py --video /path/to/video.mp4
The webcam index should be provided:
python3 main.py --cam 0
Tutorials: https://yinguobing.com/deeplearning/
Training code: https://github.com/yinguobing/cnn-facial-landmark
Note: PyTorch version coming soon!
This project is licensed under the MIT License - see the LICENSE file for details.
Meanwhile:
- The face detector is SCRFD from InsightFace.
- The pre-trained model file was trained with various public datasets which have their own licenses.
Please refer to them for details.
Yin Guobing (尹国冰) - yinguobing
All datasets used in the training process:
- 300-W: https://ibug.doc.ic.ac.uk/resources/300-W/
- 300-VW: https://ibug.doc.ic.ac.uk/resources/300-VW/
- LFPW: https://neerajkumar.org/databases/lfpw/
- HELEN: http://www.ifp.illinois.edu/~vuongle2/helen/
- AFW: https://www.ics.uci.edu/~xzhu/face/
- IBUG: https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/
The 3D face model is from OpenFace, you can find the original file here.
The build in face detector is SCRFD from InsightFace.