Batch Docking

Rowan's batch-docking workflow is a fast pipeline for scoring large ligand libraries against a single protein binding site. It is intentionally simpler than the full pose-refinement workflow: instead of building rich conformer ensembles and performing complex ligand filtering it focuses on robust ligand preparation, standard AutoDock Vina docking, and efficient parallelization, returning a single best docking score per compound for downstream triage.

Batch docking starts from a protein structure and a list of ligand structures given as SMILES strings. For each ligand SMILES, we generate a physically sensible three-dimensional starting geometry using the RDKit. Hydrogen atoms are added and multiple initial conformers are embedded and quickly optimized with the MMFF94 force field using parameters drawn from established best-practice docking protocols. From this small ensemble we select the lowest-energy conformer as the representative structure for docking.

QVina2 (or a different AutoDock Vina-like executable requested by the user) is invoked with the predefined receptor, docking box, exhaustiveness, and number of threads. For each ligand, Rowan parses the Vina text output, extracts all reported binding affinities, and record the most favorable (lowest) score as that compound's docking result. If docking fails for a given SMILES, the error is caught and logged without interrupting the rest of the screen.

To ensure maximum CPU utilization even on large machines, the machine's total core count is detected at runtime and divided into a pool of worker processes; each worker is assigned a fixed number of internal Vina threads, so that multiple ligands are docked in parallel without oversubscribing hardware. All workers operate in an isolated temporary directory that already contains the receptor PDBQT, ensuring consistent inputs and avoiding redundant preprocessing. As each job completes, its SMILES string, best score, and wall time are collected, and the results are stored in order so they can be directly associated with the original input list. The final output is a vector of best docking scores, one per input compound, together with logs that record failures and timings.