Protein–Ligand Co-Folding

Rowan's co-folding workflow provides a unified interface to several modern structure-prediction and co-folding models—Chai-1r, Boltz-1, and Boltz-2—and wraps them in a way that is reproducible, pose-aware for ligands, and directly integrated into Rowan's data model. Conceptually, it takes a mixture of protein, DNA, RNA, and small-molecule inputs; encodes them into the model-specific format; runs the chosen model on dedicated GPU infrastructure; and then parses the outputs into structured confidence metrics, optional affinity scores, and (when ligands are present) validated poses suitable for downstream physics-based refinement.

Input Construction

The workflow begins from a user-specified co-folding input, which may contain multiple protein sequences, nucleic acid sequences, and an optional list of ligand SMILES, together with configuation settings such as the co-folding model and whether to compute ligand affinity or refine poses. All polymer sequences are mapped to concise chain identifiers (A, B, C, etc) in a consistent way across proteins, DNA, RNA, and ligands.

For Chai-1r, the input is assembled into a multi-record FASTA file in which each sequence is labeled by type and index; small-molecule ligands, if present, are included as "ligand" entries by SMILES.

For Boltz-based models, the workflow constructs a Boltz-compatible YAML specification encoding all polymer chains, their sequences and cyclicity, and all unique ligand SMILES with their associated chain IDs. Optional contact and pocket constraints specified at the Rowan level—e.g., residue–residue contacts, ligand–residue contacts, or pocket definitions—are translated into Boltz's JSON/YAML-style constraint objects using the same chain and atom naming scheme, so that geometric priors expressed through the UI are faithfully enforced during co-folding.

Running Co-Folding

Once the input has been constructed, the workflow dispatches the calculation to suitable GPU hardware which runs the job and writes standard Boltz/Chai outputs. These outputs are then post-processed into a compact, compressed binary containing the best predicted complex structure in mmCIF format, model confidence metrics (including pTM/iPTM-style scores and average or per-residue lDDT-like values), and, for Boltz-2 affinity runs, predicted affinity outputs.

Additional Parsing and Validation

When ligand SMILES are provided and one ligand is marked for affinity prediction, the workflow additionally attempts to extract and validate the corresponding ligand pose from the predicted complex. Using a deterministic mapping from ligands to chain IDs, it identifies the ligand chain in the co-folded structure, removes it to recover an apo-like protein, and converts the ligand coordinates into a standalone molecule. It then checks that the generated ligand matches the requested chemistry: first at the level of connectivity (same canonical SMILES), and then at the level of stereochemistry, emitting user-facing warnings if the co-folded pose appears to represent the wrong molecule or incorrect stereochemistry.

If pose refinement is enabled, the workflow runs a conformer search and harmonically constrained optimization using the AIMNet2 neural network potential to relax the ligand while keeping it close to the co-folded geometry, optionally computing a strain energy relative to the conformer ensemble. The refined pose is checked with PoseBusters structural sanity checks against the protein and a refined complex structure is saved for downstream use. If refinement is disabled, a single-point energy is computed and PoseBusters checks are still applied, providing at least a minimal physical consistency screen on the raw co-folded pose.

Finally, the workflow includes a variety of checks for physical validity. For example, when Boltz-based models are used with ligands exceeding a certain size (more than ~50 heavy atoms), it records explicit warning messages that predictions may be unreliable given the training distribution. Errors from the external tools (e.g., failed MSAs, ligand Kekulization issues, missing output files) are surfaced as clear runtime errors rather than silent failures.