Contribute your own `hebspacy` model

You are welcome to contribute by training and packaging your own hebspacy model. Please follow the instructions below to enable seamless loading process.

Model Training

You may choose the training codebase that best fits your requirements as long as you save the following files:

The pretrained transformer layers (post-fine tuning) separately from the NER heads. file should be named pytorch_model.bin
All the files required to be loaded by transformers.AutoModel, including the standard config.json, special_tokens.json, tokenizer_config.json, vocab.txt.
Each of the NER head weights should be saved as separate bin file with a corresponding index to class mapping json file (see instructions below). Files should follow the ner_<name>.bin and ner_<name>.json name convention.

All weights files should be trained using Hugging Face and PyTorch libraries

For example, the following directory contains all the required files for a model that was jointly trained against the BMC and NEMO corpora:

resources/
├── config.json
├── pytorch_model.bin
├── special_tokens.json
├── tokenizer_config.json
├── vocab.txt
├── ner_bmc.bin
├── ner_bmc.json
├── ner_nemo.bin
└── ner_nemo.json

Index to class mapping file

Each NER head should include a json file that maps between the model class index to the corresponding token class name. Note that indices 0 and 1 should always be associated with [PAD] and O classes, respectively. Also, the token annotation schema should be IBO2.

Here is an example of the index to class mapping json file for ner_bmc:

{
	"0": "[PAD]",
	"1": "O",
	"2": "B-PERS",
	"3": "I-PERS",
	"4": "B-LOC",
	"5": "I-LOC",
	"6": "B-ORG",
	"7": "I-ORG",
	"8": "B-TIME",
	"9": "I-TIME",
	"10": "B-DATE",
	"11": "I-DATE",
	"12": "B-MONEY",
	"13": "I-MONEY",
	"14": "B-PERCENT",
	"15": "I-PERCENT",
	"16": "B-MISC__AFF",
	"17": "I-MISC__AFF",
	"18": "B-MISC__ENT",
	"19": "I-MISC__ENT",
	"20": "B-MISC_EVENT",
	"21": "I-MISC_EVENT"
}

Model Packaging

Once you have prepared all the directory with all required files, please follow these steps:

Fork this repo (in case you haven't already)
Make sure that spacy is installed in your running python environment (make sure it is the same version as mentioned in requirements.txt)
Navigate to the repo's root
Run python setup.py develop, which should create a hebspacy.egg-info directory
Navigate to scripts\model
Update meta.json accordingly (make sure to follow the spaCy package naming conventions)
Run python package.py <RESOURCES_DIR> <OUTPUT_DIR>, where <RESOURCES_DIR> should point to the directory with all the files from the previous section.
Run python -m spacy package <OUTPUT_DIR> <WHEEL_DIR> --build wheel
Your whl and tar.gz files are ready under <WHEEL_DIR>/<lang>_<name>-<version>/dist
Install your files by running pip install XXXXX.whl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Contribute your own `hebspacy` model

Model Training

Index to class mapping file

Model Packaging

Files

README.md

Latest commit

History

README.md

File metadata and controls

Contribute your own hebspacy model

Model Training

Index to class mapping file

Model Packaging

Contribute your own `hebspacy` model