dockECR: open consensus docking and ranking protocol for virtual screening of small molecules
Authors: Rodrigo Ochoa, Karen Palacio-Rodriguez, Camila M. Clemente and Natalia Adler
The goal of dockECR is to implement multiple docking programs to predict scores between a single target or multiple targets (merging and shrinking aproach) and a library of compounds. The molecules are ranked based on the ECR ranking metric using as input the scores obtained by each program. The protocol is open source and requires to install third-party docking software, as well as information of the targets and ligands structures in PDB format
The script is written in python3 with system calls to the Ubuntu OS. The docking stand-alone programs can be put after installation in the folder auxiliar/software
(alternatively, the code can be modified to call the programs without local paths, in case they have been installed in the system path.). For instructions about how to download and install each program follow these websites:
- AutoDock Vina: http://vina.scripps.edu/download.html - stand-alone tools: vina
- Smina: https://sourceforge.net/projects/smina/files/ - stand-alone tools: smina.static
- LeDock: http://www.lephar.com/download.htm - stand-alone tools: ledock_linux_x86
- rDock: https://sourceforge.net/projects/rdock/files/ - stand-alone tools: rbcavity, rbdock and sdsort. To compile the program it is required gcc > 3 and < 6, plus additional requirements mentioned in the project website
- MGL Tools: http://mgltools.scripps.edu/downloads - stand-alone tools: pythonsh and the scripts prepare_ligand4.py and prepare_receptor4.py
- RDKit: http://rdkit.org - version for python3.
NOTE: To facilitate the compilation/installation of rDock and RDKit, it is recommended to create a conda (https://docs.conda.io/en/latest/) virtual environment. The packages can be installed by adding the corresponding channels after the environment is created:
conda config --add channels bioconda
conda config --add channels conda-forge
Then the packages are installed with:
conda install -c bioconda rdock
conda install -c conda-forge rdkit
In addition to these packages, it is required to install OpenBabel (https://sourceforge.net/projects/openbabel/), which can be obtained from the main OS package repository. The version 2.3 is required (the program called babel instead of obabel).
The workspace requires of the following folders:
auxiliar: Folder containing scripts and the stand-alone programs
target: Folder with the targets PDB files
ligands: Folder with the ligands PDB files
After running docking and ranking campaigns, the following additional folders will be created:
config: Folder that will store the configuration files created per program
results: Docking results per program used
ranks: Ranks per each target and docking program used, as well as the final ECR method.
An example of the script syntax is as follows::
python dockECR.py [-h] -l LIST_LIGANDS -s LIST_SOFTWARE -m MODE -t LIST_TARGETS
where the arguments are:
arguments:
-h, --help show this help message and exit
-l LIST_LIGANDS File with the list of ligands names available in the
ligand folder
-s LIST_SOFTWARE File with the list of software that will be included in
the consensus
-m MODE Mode the script will be run. Options: (i) docking, (ii)
ranking
-t LIST_TARGETS File with the list of the PDB structures used as targets
The script has two main modes. One is for running docking with any of the four programs mentioned in the list_software.txt
file, and the second is for ranking with same list of docking programs. In the following sections we show how to use the main python script. The examples provided in the code were run using the SARS-CoV-2 main protease with PDB id 6y2e and 5re4.
After preparing the list_software.txt
file, and the list_ligands.txt
file with the names of the PDB structures located in the ligands
folder, the command can be run as:
python dockECR.py -m docking -l list_ligands.txt -s list_software.txt -t list_targets.txt
NOTE: Examples of all the input files are included in the code.
For docking, we require to prepare a configuration file with the box center coordinates and the box size dimensions. The following is an example of a box located in the active site of the MPro structure 6y2e provided in the target
folder:
center_x: -11.6
center_y: 14.5
center_z: 68.3
size_x: 30
size_y: 30
size_z: 30
NOTE: The config file should be named config_[TARGET].txt, where the [TARGET] is the name given in the list_targets.txt file.
After running the protocol, the results with 10 poses per program are stored in the folder results/[program]
as files with extensions depending on the docking program output.
With the docking results available in the workspace, the same script can be called using the ranking mode to generate rankings per program from the pool of ligands used in the virtual screening. An example is provided here:
python dockECR.py -m ranking -l list_ligands.txt -s list_software.txt -t list_targets.txt
The list_targets.txt
file can contain the names of multiple targets, in order to do the ranking individually, and using a merging and shrinking approach (see Ref. Palacio-Rodriguez et. al. 2019) to combine the results of multiple version of the same target. The ECR ranking is calculated in both cases.
The ranking results are stored in the ranks
folder with a three-column format file, where the first column is the ranking position, the second column is the ligand id, and the third column the docking score:
1 mol3 -12.0649
2 mol2 -11.8435
3 mol1 -10.6486
With the rankings per software, it is possible to use the ECR method to generate a final consensus ranking. The results are sorted by the calculated ECR ranking in a file located in the ranks
folder with the following format:
1 mol1 1.05880939
2 mol3 0.94259210
3 mol2 0.66164496
NOTE: If multiple targets are included, an ECR ranking per target and for the combination of them is generated.
This final list can be used to prioritize candidates from a large number of molecules in a virtual screening campaign, using the outputs of multiple docking programs at the same time.
An example of the docking and ranking results for the two included systems is available in the folder: example_outputs
In case the protocol is useful for other research projects and require some advice, please contact us to the email: rodrigo.ochoa@udea.edu.co. If this protocol is implemented, please cite:
-
Ochoa, R., Palacio-Rodriguez, K., Clemente, C.M., & Adler, N.S. (2021). dockECR: Open consensus docking and ranking protocol for virtual screening of small molecules. Journal of Molecular Graphics and Modelling, 109, 108023.
-
Palacio-Rodriguez, K., Lans, I., Cavasotto, C. N., & Cossio, P. (2019). Exponential consensus ranking improves the outcome in docking and receptor ensemble docking. Scientific reports, 9(1), 1-14.