Protein Binder Design

Rowan's protein-binder-design workflow works by encoding the user's inputs into BoltzGen's YAML format, running the BoltzGen calculation on suitable GPU hardware, and parsing the resulting designs back into a structured and machine-readable data model for visualization and analysis.

The process starts from a user's design specification, which describes the design problem in terms of entities and constraints. Entities can be existing 3D structures ("file" entities), abstract protein chains to be designed ("protein" entities), or ligands ("ligand" entities). For structural inputs, Rowan maintains protein coordinates in its internal PDB representation; alongside these coordinates, the workflow can specify which parts of each structure should be visible to BoltzGen (include/exclude regions), where binding is allowed or disallowed, proximity-based design regions, desired secondary-structure patterns, and optional insertion sites for adding new residues. All of these are encoded in a structured way on the Rowan side and then translated to the concise dictionaries that BoltzGen expects.

To launch a design campaign, the workflow converts this internal input into a BoltzGen-compatible YAML specification. Each file entity becomes a file block pointing at its PDB file, annotated with include/exclude masks, binding regions, proximity selections, or design insertions as needed. Protein entities become protein blocks carrying sequence ranges (for de novo segments), cyclicity flags, and optional aggregate secondary-structure preferences. Ligands are expressed as ligand blocks with identifiers and SMILES strings, and any covalent linkage constraints between atoms are translated into bond constraints. The result is a human-readable YAML document that is, by construction, aligned with the BoltzGen repository's schema and exactly reproduces the system and constraints the user specified through Rowan's interface.

This YAML specification, together with the relevant PDB files, is then submitted to suitable high-performance GPU hardware. The workflow passes along key runtime parameters: which BoltzGen protocol to use (for example, protein-anything, peptide-anything, protein-small_molecule, or nanobody-anything), how many designs to generate in total, and the "budget" specifying how many diverse, high-quality candidates should be retained after BoltzGen's internal filtering and ranking. The heavy lifting—diffusion-based structure generation, inverse folding, co-folding validation with Boltz-2, affinity prediction where applicable, and BoltzGen's own post-processing—is performed with the open-source models and logic described in the BoltzGen documentation.

After BoltzGen finishes, each output binder design is converted by Rowan into a structured result containing three core elements: the designed binder sequence, a reference to the corresponding bound complex structure (stored in Rowan and identified by a UUID), and a score record that captures BoltzGen's own numerical assessments of that design (such as model-based confidence or predicted affinity, depending on protocol). These results are attached back to the original workflow, making them immediately available for visualization, comparison, and follow-up calculations (for example, docking, MD, or additional filtering) within the Rowan environment.