TalkLipPlus net

This repo is a new SoTA method by adds some features based on the official implementation of 'Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert', CVPR 2023.Paper

Author: Ironeiser.

🔥Feauture🔥

🛠️ Add post-processing to enhance the final performance.
📊 Add data pre-processing to increase the train data quality.
🧑‍🔬 Adjust the method that helps fix the generated face into the original video with higher quality.
🏆 The result achieves SoTA compared with the original TalkLip and other methods before 2023.08.
⏱️ Train from scratch with a 5-minute video. And infer with no limited 🎧 audio input.

The following sections haven't been changed, just copied from the official repo.
If you want to run this repo, please try to run it.

python run_finetune_cctv.py # xxx is the data name, refer to code.
python run_info_demo_cctv.py #

Prerequisite

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html.
Install AV-Hubert by following his installation.
Install supplementary packages via pip install -r requirements.txt
Install ffmpeg. We adopt version=4.3.2. Please double check wavforms extracted from mp4 files. Extracted wavforms should not contain prefix of 0. If you use anaconda, you can refer to conda install -c conda-forge ffmpeg==4.2.3
Download the pre-trained checkpoint of face detector pre-trained model and put it to face_detection/detection/sfd/s3fd.pth. Alternative link.

Dataset and pre-processing

Download LRS2 for training and evaluation. Note that we do not use the pretrain set.
Download LRW for evaluation.
To extract wavforms from mp4 files:

python preparation/audio_extract.py --filelist $filelist  --video_root $video_root --audio_root $audio_root

$filelist: a txt file containing names of videos. We provide the filelist of LRW test set as an example in the datalist directory.
$video_root: root directory of videos. In LRS2 dataset, $video_root should contains directories like "639XXX". In LRW dataset, $video_root should contains directories like "ABOUT".
$audio_root: root directory for saving wavforms
other optional arguments: please refer to audio_extract.py

To detect bounding boxes in videos and save it:

python preparation/bbx_extract.py --filelist $filelist  --video_root $video_root --bbx_root $bbx_root --gpu $gpu

$bbx_root: a root directory for saving detected bounding boxes
$gpu: run bbx_extract on a specific gpu. For example, 3.

*If you want to accelerate bbx_extract via multi-thread processing, you can use the following bash script:

*Please revise variables in the 2-nd to the 9-th lines to make it compatible with your own machine.

sh preprocess.sh

$file_list_dir: a directory which contains train.txt, valid.txt, test.txt of LRS2 dataset
$num_thread: number of threads you used. Please do not let it cross 8 with a 24GB GPU, 4 with a 12GB gpu.

Checkpoints

Model	Description	Link
TalkLip (g)	TalkLip net with the global audio encoder	Link
TalkLip (g+c)	TalkLip net with the global audio encoder and contrastive learning	Link
Lip reading observer 1	AV-hubert (large) fine-tuned on LRS2	Link
Lip reading observer 2	Conformer lip-reading network	Link
Lip reading expert	lip-reading network for training of talking face generation	Link

Train

python train.py --file_dir $file_list_dir --video_root $video_root --audio_root $audio_root \
--bbx_root $bbx_root --word_root $word_root --avhubert_root $avhubert_root --avhubert_path $avhubert_path \
--checkpoint_dir $checkpoint_dir --log_name $log_name --cont_w $cont_w --lip_w $lip_w --perp_w $perp_w \
--gen_checkpoint_path $gen_checkpoint_path --disc_checkpoint_path $disc_checkpoint_path

$file_list_dir: a directory which contains train.txt, valid.txt, test.txt of LRS2 dataset
$word_root: root directory of text annotation. Normally, it should be equal to $video_root, as LRS2 dataset puts a video file ".mp4" and its corresponding text file ".txt" in the same directory.
$avhubert_root: path of root of avhubert (should like xxx/av_hubert/avhubert)
$avhubert_path: download the above Lip reading expert and enter its path
$checkpoint_dir: a directory to save checkpoint of talklip
$log_name: name of log file
$cont_w: weight of contrastive learning loss (default: 1e-3)
$lip_w: weight of lip reading loss (default: 1e-5)
$perp_w: weight of perceptual loss (default: 0.07)
$gen_checkpoint_path(optional): enter the path of a generator checkpoint if you want to resume training from a checkpoint
$disc_checkpoint_path(optional): enter the path of a discriminator checkpoint if you want to resume training from a checkpoint

Note: Sometimes, discriminator losses may diverge during training (close to 100). Please stop the training and resume it with a reliable checkpoint.

Test

The below command is to synthesize videos for quantitative evaluation in our paper.

python inf_test.py --filelist $filelist --video_root $video_root --audio_root $audio_root \
--bbx_root $bbx_root --save_root $syn_video_root --ckpt_path $talklip_ckpt --avhubert_root $avhubert_root

$syn_video_root: root directory for saving synthesized videos
$talklip_ckpt: a trained checkpoint of TalkLip net

Demo

I update the inf_demo.py on 4/April as I previously suppose that the height and width of output videos are the same when I set cv2.VideoWriter(). Please ensure the sampling rate of the input audio file is 16000 hz.

If you want to reenact the lip movement of a video with a different speech, you can use the following command.

python inf_demo.py --video_path $video_file --wav_path $audio_file --ckpt_path $talklip_ckpt --avhubert_root $avhubert_root

$video_file: a video file (end with .mp4)
$audio_file: a audio file (end with .wav)

Evaluation

Please follow README.md in the evaluation directory

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.idea		.idea
.vscode		.vscode
avhubert_modification		avhubert_modification
datalist		datalist
evaluation		evaluation
face_detection		face_detection
models		models
preparation		preparation
utils		utils
.gitignore		.gitignore
README.md		README.md
a.jpg		a.jpg
adjust_au_rate.py		adjust_au_rate.py
b.jpg		b.jpg
config.yaml		config.yaml
down_video_rs_1080to480.py		down_video_rs_1080to480.py
down_video_rs_1080to720.py		down_video_rs_1080to720.py
eval_finetune cctv_la.py		eval_finetune cctv_la.py
eval_finetune.py		eval_finetune.py
eval_finetune_cctv.py		eval_finetune_cctv.py
finetune_w_txt_copy.py		finetune_w_txt_copy.py
finetune_wo_txt.py		finetune_wo_txt.py
finetune_wo_txt_change_gan.py		finetune_wo_txt_change_gan.py
finetune_wo_txt_change_gan_gfpgan.py		finetune_wo_txt_change_gan_gfpgan.py
inf_demo.py		inf_demo.py
inf_demo_la.py		inf_demo_la.py
inf_demo_la_gfpgan.py		inf_demo_la_gfpgan.py
inf_test.py		inf_test.py
infer_demo_fold.py		infer_demo_fold.py
mask_face.py		mask_face.py
oom		oom
precess_video copy.py		precess_video copy.py
precess_video.py		precess_video.py
preprocess.sh		preprocess.sh
process_audio.py		process_audio.py
rename_cctv.py		rename_cctv.py
requirement.txt		requirement.txt
run_finetune.py		run_finetune.py
run_finetune_cctv.py		run_finetune_cctv.py
run_finetune_cctv_gan.py		run_finetune_cctv_gan.py
run_info_demo.py		run_info_demo.py
run_info_demo_cctv.py		run_info_demo_cctv.py
run_info_demo_cctv_la.py		run_info_demo_cctv_la.py
run_info_demo_cctv_la_gfpgan.py		run_info_demo_cctv_la_gfpgan.py
run_scatch_cctv.py		run_scatch_cctv.py
save_resize_head_gt.py		save_resize_head_gt.py
save_video2imgs.py		save_video2imgs.py
split_video_from_subtitle.py		split_video_from_subtitle.py
strace.login.txt		strace.login.txt
strace.podman.txt		strace.podman.txt
submit_run_finetune_job.sh		submit_run_finetune_job.sh
sys_video.py		sys_video.py
task_state.pt		task_state.pt
test_dataset.py		test_dataset.py
test_equalizeHist.py		test_equalizeHist.py
test_multi_la.py		test_multi_la.py
test_resize_step.py		test_resize_step.py
test_saved_npy.py		test_saved_npy.py
test_saved_resize_head_npy.py		test_saved_resize_head_npy.py
test_tqdm.py		test_tqdm.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TalkLipPlus net

🔥Feauture🔥

Prerequisite

Dataset and pre-processing

Checkpoints

Train

Test

Demo

Evaluation

About

Releases

Packages

Languages

Ironieser/TalkLipPlus

Folders and files

Latest commit

History

Repository files navigation

TalkLipPlus net

🔥Feauture🔥

Prerequisite

Dataset and pre-processing

Checkpoints

Train

Test

Demo

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages