Hydrogen-Bond-Basicity Prediction

Understanding and controlling the strength of hydrogen bonds is key to modulating a molecule's interactions and physicochemical properties. Rowan's hydrogen-bond-basicity prediction workflow uses neural network potentials, low-cost quantum chemical methods, and molecular electrostatic potential calculations to efficiently and robustly predict the strength of different hydrogen-bond acceptors.

How It Works

Rowan's hydrogen-bond-basicity workflow begins by using the ETKDG algorithm to run a conformer search on the input molecule. After initial MMFF94 optimization, we filter the resulting conformational ensemble using the CREST screening protocol and GFN2-xTB energies to de-duplicate structures and remove high-energy conformers. We use a 2% rotational constant threshold, a 0.25 Ã… RMSD similarity threshold, and a 50 kcal/mol energy cutoff window for CREST screening. We score & optimize the output conformers with the AIMNet2 neural network potential.

We then locate electrostatic potential minima in the region of each lone pair of the lowest energy conformer, using the low-cost r2SCAN-3c composite density-functional-theory method. pKBHX values are predicted by linearly scaling Vmin, the value of the molecular electrostatic potential at the minimum.

An example molecular electrostatic potential

Visualization of the −0.04 EH/eE_\text{H}/e electrostatic potential isosurface at the r2SCAN-3c level of theory for an example drug-like molecule.

To find values for the scaling constants, we performed Levenberg–Marquardt least-squares fitting to a database of experimentally measured pKBHX values, predicting the measured per-molecule pKBHX values by combining the predicted pKBHX values from each distinct electrostatic potential minimum (up to three per hydrogen-bond acceptor). Here are the scaling constants found through this procedure:

Functional GroupNumberSlope (e/EHe/E_\text{H})InterceptMAERMSE
Amine171-34.4386-1.48840.2120.324
Aromatic N71-52.8126-3.13760.1130.150
Imine28-48.4007-2.33090.1800.236
Nitrile28-50.1167-3.22730.1440.198
N-oxide16-74.3261-4.41590.4550.589
Chalcogen oxide17-47.7009-2.27940.1860.224
Pnictogen oxide16-61.1141-3.38390.4370.549
Carbonyl128-57.2911-3.52710.1600.208
Ether/hydroxyl99-35.9245-2.03380.1880.239
Thiocarbonyl10-51.8837-2.26490.3300.384
Divalent S17-39.1666-2.12430.0860.127
Aromatic O11-35.9245-2.03380.1250.158
Fluorine23-16.4441-1.25400.2020.276
Total4340.1880.270

For a more detailed discussion of outliers and potential sources of error, see the preprint.