Pose-Analysis Molecular Dynamics

This workflow constructs and simulates a protein–ligand complex in explicit solvent, then quantifies pose stability and key protein–ligand contacts from molecular dynamics trajectories. The implementation is fully automated but follows standard practices that a general chemistry audience will recognize.

System Preparation

Rowan's pose-analysis MD workflow begins from a holo crystal structure and an associated ligand SMILES string. The ligand is first separated from the protein and converted to a chemically consistent representation. Stereochemistry is assigned from the three-dimensional coordinates, and the resulting model is converted into an OpenFF Molecule. All ligand atoms are tagged with a dedicated residue name (typically "LIG") so that the ligand can be cleanly identified throughout topology construction, simulation setup, and trajectory analysis. Force field parameters for the ligand are obtained using the SMIRNOFF formalism with the OpenFF 2.2.1 small-molecule force field; the SMIRNOFF template is registered with OpenMM so that the ligand is treated on the same footing as the protein and solvent during system creation.

Pruning

To reduce system size while preserving the local environment of interest, we optionally prune the protein around the ligand. In this mode, we read the input PDB into an internal PDB object and compute the minimum distance between each protein residue and the ligand atoms. Only residues with at least one atom within a user-defined cutoff are retained. For each protein chain, the retained residues are sorted, broken into contiguous sequence segments, and regrouped so that very short segments are merged into larger neighboring segments. Each contiguous segment is then assigned a new chain identifier, yielding a compact protein model focused on the binding site but maintaining chemically sensible chains.

Whether pruned or not, the resulting structure is passed through PDBFixer to identify and reconstruct missing heavy atoms, add missing atoms where needed, and add hydrogens appropriate for pH 7.0. The cleaned structure is exported and used as the starting protein coordinates and topology.

The simulation system is constructed by combining this prepared protein with the parameterized ligand and solvating the complex. Rowan uses AMBER ff14SB for the protein and TIP3P water, together with the OpenFF-derived SMIRNOFF parameters for the ligand. The ligand topology and coordinates are added directly to the protein using the OpenMM Modeller interface. The complex is then solvated in a rectangular box of explicit water with a buffer region around the solute and neutralizing ions plus additional ions to match the target ionic strength (typically 0.1 M). The final solvated system is converted into an OpenMM system with the following default settings:

Particle mesh Ewald (PME) electrostatics
A real-space nonbonded cutoff in the range of 8 Å
SHAKE-style constraints on bonds involving hydrogens when requested
Rigid water
Modest hydrogen mass repartitioning (1.5 amu) to facilitate stable integration
And a 2 fs timestep. These settings can be tuned by the end user as requested.

To prevent unphysical drift of distal regions while allowing local relaxation around the ligand, we impose positional restraints on selected protein backbone atoms. Specifically, we identify C_α atoms belonging to non-ligand residues and compute their minimum distance to any ligand atom. Atoms closer than a short cutoff are left unrestrained to allow the binding site to relax. Atoms beyond this region are restrained to their initial positions using a harmonic potential with a tunable force constant. This strategy preserves the integrity of the overall fold and boundary of the truncated system while leaving the binding pocket and ligand environment flexible on simulation timescales.

Running Molecular Dynamics

Molecular dynamics simulations are run using OpenMM in a GPU-accelerated environment. After system construction, we initialize a Langevin integrator (with the Langevin middle scheme) at the desired temperature, typically 300 K, with a user-defined friction timescale and 2 fs timestep. The solvated, restrained system is subjected to energy minimization until convergence or a maximum number of iterations is reached. The minimized coordinates are written out, and a solvent-stripped version (protein plus ligand only) is saved separately for downstream structural reference and database storage. The minimized system is then gently heated: velocities are initialized at a low temperature and increased in stages to the target temperature using short MD segments, adjusting the integrator temperature stepwise. Following heating, we perform isothermal–isobaric (NPT) equilibration by adding a Monte Carlo barostat (1 atm, 300 K) and propagating dynamics for the requested equilibration time. The equilibrated state (positions, velocities, and box vectors) is saved and reused as a common starting point for production simulations.

Production dynamics are carried out as multiple independent replicas to probe the robustness of the pose. For each replica, the saved equilibrated state is reloaded, the simulation time and step counters are reset, and a new trajectory is generated for the specified production length. Trajectories are recorded at regular intervals in the compact DCD format.

Trajectory Analysis

After each simulation, trajectories are post-processed to focus on the behavior of the complex: water and ions are removed, periodic images are treated so that the ligand remains centered and continuous, and the protein is used as the alignment frame. The heavy-atom RMSD of the ligand (relative to the first frame) is computed without refitting on the ligand itself, providing a straightforward measure of pose stability over time for each replicate.

Finally, we quantify persistent protein–ligand interactions using a simple occupancy-based contact analysis. For each frame, we compute all pairwise distances between protein and ligand atoms (typically restricted to heavy atoms) and record contacts where the distance falls below a chosen cutoff (e.g., 3.0 Å). For every protein–ligand atom pair, we evaluate the fraction of frames in which it is in contact; pairs that exceed a minimum occupancy threshold (for example, 20% of frames) are retained as meaningful contacts.

The result is a concise description of the pose in terms of both geometric stability (ligand RMSD) and frequently maintained interactions (high-occupancy contacts), returned for each replica along with the corresponding trajectories and minimized structure identifiers. This combination of restrained, focused explicit-solvent MD and systematic analysis is designed to provide a practical, interpretable assessment of docking poses or initial models within a computationally efficient but chemically transparent framework.

Final outputs can be viewed through Rowan's web-based trajectory viewer or retrieved throguh Rowan's Python API.