REINDEER is a software for the sturcture-based protein-ligand feature generation.
Currently, REINDEER provides only four below feature vector:
1- Occurrence of Interatomic Contact (OIC) - Ref
2- Distance-Weighted Interatomic Contact (DWIC) - Ref
3- Extended Connectivity Interaction Feature (ECIF) - Ref
4- Multi-Shell Occurrence of Interatomic Contact (MS-OIC) - Ref
Project is not done. I will update the repository and the paper in the following months. But, for now, you can cite the below paper:
REINDEER: A Protein-Ligand Feature Generator Software for Machine Learning Algorithms
Milad Rayka, [email protected]
1- First install python (3.9) then make a virtual environment and activate it.
python -m venv env
.\env\Scripts\activate
Which env is the location to create the virtual environment.
2- Clone reindeer_software Github repository.
git clone https://github.com/miladrayka/reindeer_software.git
3- Change your directory to reindeer_software.
4- Install required packages with pip.
pip install -r requirements.txt
1- Provided protein-ligand complex should have hydrogen atoms
2- File formats for protein and ligand are .pdb and .mol2. In the case of ECIF, instead of .mol2, .sdf file should be provided.
3- All protein-ligand complexes should be provided as the below example:
./test
├── 1a1e
│ ├── 1a1e_ligand.mol2
│ ├── 1a1e_ligand.sdf
│ └── 1a1e_protein.pdb
├── 1a28
│ ├── 1a28_ligand.mol2
│ ├── 1a28_ligand.sdf
│ └── 1a28_protein.pdb
├── 1a30
├── 1a30_ligand.mol2
├── 1a30_ligand.sdf
└── 1a30_protein.pd
REINDEER provides GUI, CLI, and using within python codes for feature generation.
After changing your dicrectory to reindeer_software type the follwoing code for running GUI:
python ./gui_launcher.py
For example check the Tutorial file.
For access to CLI, type the following command (you should be at reindeer_software directory):
python ./reindeer_software.py -h
The output is like this:
usage: reindeer_software.py [-h] -m METHOD -d DIRECTORY -f FILE_NAME -n N_JOBS
Generate features for set of given structures
optional arguments:
-h, --help show this help message and exit
-m METHOD, --method METHOD
Feature generation method. Only OIC, DWIC, ECIF, and
MS-OIC are implemented for now.
-d DIRECTORY, --directory DIRECTORY
directory of structures files
-f FILE_NAME, --file_name FILE_NAME
Name for saving generated features.
-n N_JOBS, --n_jobs N_JOBS
Number of cpu cores for parallelization
Example for OIC:
python ./reindeer_software.py -m OIC -d ../test/ -f feature_vector_oic.csv -n -1
REINDEER can also be used within python codes.
Example for OIC:
from reindeer.feature_generators import oic_dwic
from reindeer.script import utils
oic = oic_dwic.InterAtomicContact(
pathfiles="../test/",
filename="oic_fv.csv",
ligand_format="mol2",
amino_acid_classes=utils.amino_acid_classes_OIC,
cutoff=12.0,
feature_type="OIC",
exp=None,
)
within_python_example.ipynb file provides examples for this usages.
CaseStudy.ipynb contains all code to reproduce the case study section of the paper on Google COLAB.
REINDEER is tested on the following system:
OS | RAM | CPU |
---|---|---|
Windows 10 | 8.00 GB | AMD FX-770K Quad Core Processor (3.5 GHz) |
We don't assume using macOS or Linux can make a problem.
To ensure code quality and consistency the following extensions of VSCode are used during development:
- black
- isort
- pylance
- pylint
- flake8
- AI python docstring generators
Following repositories were used for the development of REINDEER:
-
ECIF Github for ECIF method.
-
ET-Score and RF-Score for DWIC and OIC methods.
-
OnionNet-2 MS-OIC method.
Copyright (c) 2024, Milad Rayka