Question-Answering-with-BERT-and-Knowledge-Distillation

This repository contains the essential code in order to fine-tune BERT on the SQuAD 2.0 dataset. Additionally, the technique of Knowledge Distillation is applied by fine-tuning DistilBERT on SQuAD 2.0 dataset using BERT as the teacher model. All of the results have been obtained using 1 Tesla V100 GPU using Google Colab.

1. What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

SQuAD 2.0 combines the 100,000 questions in SQuAD 1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. For more information regarding the SQuAD dataset and the current leaderboard, you can visit the following link.

2. How to Run

To fine-tune BERT on SQuAD 2.0, please run Fine_Tune_BERT_SQuAD_2_0.ipynb. This notebook will automatically save the fine-tuned BERT model in ./models/.
To evaluate the fine-tuned BERT model, please run Eval_SQuAD_2_0.ipynb.
To use DistilBERT and apply Knowledge Distillation, please check the README in ./Knowledge_Distillation/.

3. Results

Model	EM	F1	HasAns_EM	HasAns_F1	NoAns_EM	NoAns_F1	No. of parameters (millions)
BERT-base-uncased	72.43	75.74	72.54	79.15	72.33	72.33	110
DistilBERT-base-uncased (with distilled fine-tuning)	70.05	73.23	70.95	77.32	69.15	69.15	66
DistilBERT-base-uncased (without distilled fine-tuning)	66.93	70.26	67.09	73.76	66.78	66.78	66

4. Code and Paper References

A part of the code has been based on the publicly available code of 🤗 HuggingFace Transformers library and their corresponding research project on DistilBERT (code).
V. Sanh, L. Debut, J. Chaumond, T. Wolf. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019. (link)
G. Hinton, O. Vinyals, J. Dean. Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop. (link)
Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT. (Medium Blog)

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Knowledge_Distillation		Knowledge_Distillation
models		models
Eval_SQuAD_2_0.ipynb		Eval_SQuAD_2_0.ipynb
Fine_Tune_BERT_SQuAD_2_0.ipynb		Fine_Tune_BERT_SQuAD_2_0.ipynb
LICENSE		LICENSE
README.md		README.md
args.json		args.json
trainer_qa.py		trainer_qa.py
utils_qa.py		utils_qa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question-Answering-with-BERT-and-Knowledge-Distillation

1. What is SQuAD?

2. How to Run

3. Results

4. Code and Paper References

About

Releases

Packages

Languages

License

AristotelisPap/Question-Answering-with-BERT-and-Knowledge-Distillation

Folders and files

Latest commit

History

Repository files navigation

Question-Answering-with-BERT-and-Knowledge-Distillation

1. What is SQuAD?

2. How to Run

3. Results

4. Code and Paper References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages