Many medicinally relevant molecules exist as multiple tautomers, and understanding which tautomer predominates can be key to subsequent computational tasks: for instance, Hu et al found in 2016 that correct assignment of tautomeric state dramatically improved relative binding affinity predictions. Rowan's tautomer prediction workflow uses machine-learned interatomic potentials to enable fast and minimally empirical prediction of the relative stability of different tautomers.
Tautomers are generated using RDKit, and a subsequent conformational search is performed on each tautomer. The low-energy conformers are collected and optimized using AIMNet2, and the gas-phase Gibbs free energy from AIMNet2 is combined with CPCM-X aqueous solvation free energy predictions from GFN2-xTB to yield a final solvated free energy for each tautomer.
These energies are then corrected through a linear free-energy relationship, and converted to relative free energies. Rowan ranks and scores the tautomers, printing the corresponding Boltzmann weight to enable easy downstream analysis.
On the aqueous subset of the TautoBase benchmark set, Rowan's tautomer workflow displays a mean absolute error of 2.10 kcal/mol and a root mean squared error of 2.99 kcal/mol. This is comparable to the performance of high-level quantum chemical methods reported by Chodera and co-workers: B3LYP/cc-pVTZ/SMD(water) was reported to give an RMSE of 3.1 kcal/mol vs. TautoBase (on a slightly smaller subset).
A more relevant benchmark for real-world usage is classification accuracy—how much of the time can Rowan predict the correct lowest-energy tautomer? On the full dataset, Rowan predicts the correct tautomer 89% of the time. Some of these comparisons are not particularly challenging: for compounds with an experimental ∆∆G of less than 3 kcal/mol (shown in red), Rowan is still correct 77% of the time.
Rowan's tautomer workflow can be run in three modes: careful, rapid, or reckless. Here's what selecting each mode tunes:
Mode | Careful | Rapid | Reckless |
---|---|---|---|
number of initial conformations | 250 | 100 | 50 |
initial energy cutoff (kcal/mol) | 15 | 10 | 5 |
RMSD similarity cutoff (Ã…) | 0.10 | 0.25 | 0.50 |
max number of conformers (xTB) | 20 | 10 | 3 |
final energy cutoff (kcal/mol) | 5 | 5 | 3 |
max number of conformers (AIMNet2) | 10 | 3 | 1 |