CheXpert NLP tool to extract observations from radiology reports.
Read more about our project here and our AAAI 2019 paper here.
Please install following dependencies or use the Dockerized labeler (see below).
- Clone the NegBio repository:
git clone https://github.com/ncbi-nlp/NegBio.git
- Add the NegBio directory to your
PYTHONPATH
:
export PYTHONPATH={path to negbio directory}:$PYTHONPATH
- Make the virtual environment:
conda env create -f environment.yml
- Activate the virtual environment:
conda activate chexpert-label
- Install NLTK data:
python -m nltk.downloader universal_tagset punkt wordnet
- Download the
GENIA+PubMed
parsing model:
>>> from bllipparser import RerankingParser
>>> RerankingParser.fetch_and_load('GENIA+PubMed')
Place reports in a headerless, single column csv {reports_path}
. Each report must be contained in quotes if (1) it contains a comma or (2) it spans multiple lines. See sample_reports.csv (with output labeled_reports.csv)for an example.
python label.py --reports_path {reports_path}
Run python label.py --help
for descriptions of all of the command-line arguments.
docker build -t chexpert-labeler:latest .
docker run -v $(pwd):/data chexpert-labeler:latest \
python label.py --reports_path /data/sample_reports.csv --output_path /data/labeled_reports.csv --verbose
This repository builds upon the work of NegBio.
This tool was developed by Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, and Silviana Ciurea-Ilcus.
If you're using the CheXpert labeling tool, please cite this paper:
@inproceedings{irvin2019chexpert,
title={CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison},
author={Irvin, Jeremy and Rajpurkar, Pranav and Ko, Michael and Yu, Yifan and Ciurea-Ilcus, Silviana and Chute, Chris and Marklund, Henrik and Haghgoo, Behzad and Ball, Robyn and Shpanskaya, Katie and others},
booktitle={Thirty-Third AAAI Conference on Artificial Intelligence},
year={2019}
}