Tautomer Search

Many medicinally relevant molecules exist as multiple tautomers, and understanding which tautomer predominates can be key to subsequent computational tasks: for instance, Hu et al found in 2016 that correct assignment of tautomeric state dramatically improved relative binding affinity predictions. Rowan's tautomer prediction workflow uses machine-learned interatomic potentials to enable fast and minimally empirical prediction of the relative stability of different tautomers.

How It Works

Tautomers are generated using RDKit, and a subsequent conformational search is performed on each tautomer. The low-energy conformers are collected and optimized using AIMNet2, and the gas-phase Gibbs free energy from AIMNet2 is combined with CPCM-X aqueous solvation free energy predictions from GFN2-xTB to yield a final solvated free energy for each tautomer.

These energies are then corrected through a linear free-energy relationship, and converted to relative free energies. Rowan ranks and scores the tautomers, printing the corresponding Boltzmann weight to enable easy downstream analysis.

Accuracy

On the aqueous subset of the TautoBase benchmark set, Rowan's tautomer workflow displays a mean absolute error of 2.10 kcal/mol and a root mean squared error of 2.99 kcal/mol. This is comparable to the performance of high-level quantum chemical methods reported by Chodera and co-workers: B3LYP/cc-pVTZ/SMD(water) was reported to give an RMSE of 3.1 kcal/mol vs. TautoBase (on a slightly smaller subset).

A more relevant benchmark for real-world usage is classification accuracy—how much of the time can Rowan predict the correct lowest-energy tautomer? On the full dataset, Rowan predicts the correct tautomer 89% of the time. Some of these comparisons are not particularly challenging: for compounds with an experimental ∆∆G of less than 3 kcal/mol (shown in red), Rowan is still correct 77% of the time.

"Our performance on SAMPL7"
"Our performance on SAMPL7"

Modes

Rowan's tautomer workflow can be run in three modes: careful, rapid, or reckless. Here's what selecting each mode tunes:

ModeCarefulRapidReckless
number of initial conformations25010050
initial energy cutoff (kcal/mol)15105
RMSD similarity cutoff (Ã…)0.100.250.50
max number of conformers (xTB)20103
final energy cutoff (kcal/mol)553
max number of conformers (AIMNet2)1031