Running Protein–Ligand Co-Folding with Boltz-1 and Chai-1

Transcript

Hello, it's Corin here and today I want to look at running protein–ligand co-folding through Rowan. So protein–ligand co-folding is one of the most exciting new technologies in structure-based drug discovery and what this essentially allows us to do is not only to generate protein 3D structures from 1D sequences, so that's what AlphaFold does, that's protein folding, but also to generate these 3D structures in the presence of small molecule ligands, so co-folding.

So at once, given a 1D structure of a protein and a string representation of a ligand, we can simultaneously solve the two structures to predict the structure of the protein ligand complex. Now, this is very, very cool. And so you can access protein–ligand co-folding workflows through Rowan by scrolling down, going to the drug discovery section, and clicking on the icon labeled protein–ligand co-folding.

So to do this, obviously, we want to start with a 1D sequence of our protein because, you know, that's what we're doing here. The whole point is that we don't need to start with a 3D structure. So we can paste in this protein text. Here, this is for a TYK2 structure and say "Add Sequence." Now, if we want, we can add multiple structures, but here we're just going to delete it and just do one. You can study complexes of multiple proteins here, but we're not going to do that for this demo.

And then here we can input a ligand. So I'll paste in the SMILES string and say "Add." This is xeljanz, so this is a Janus kinase inhibitor, just a random compound I chose. So now we have our protein sequence representation. We have our ligand representation. Again, we can put in another ligand if we want to. We can supply ligands via different ways, as always in Rowan. But for now, I think this is good enough as a demo.

And what we're going to do now is select our model. So Rowan supports both Chai-1r and Boltz-1x. So these are foundation models for protein–ligand co-folding. We use Boltz-1x here and say "Submit Co-Folding." OK, so right now, this job status is shown as Running. We have here the initial sequence, the initial SMILES, and we will wait a few minutes for this job to finish.

Okay, so I fast-forwarded just a few minutes into the future, just to save us from having to look at a blank screen while this runs. So the protein–ligand co-folding job using Boltz-1x runs in only two minutes and six seconds, so it's by no means a slow calculation in the grand scheme of things, but nevertheless it doesn't make for very interesting video just refreshing and waiting for it to finish.

So if we look, we have indeed successfully gone from our initial sequence and ligand smile string to a 3D structure. So we see here, you know, we can move it around. This is a protein, you know, we have beta sheets over here. We've got alpha helices over here. You know, it looks, you know, more or less like a protein. And indeed we have a ligand now that's sitting in here. So we've got, you know, an aromatic ring, tertiary amide, a little like ring conformation here if we zoom in for the piperidine here, and then we've got an amide.

So this, you know, this looks all very sane and reasonable. And like we've successfully constructed something that is at least a potential protein–ligand complex. So given that I just chose random things, it's not easy to actually assay how accurate this is. So obviously you can find your own benchmarks for Boltz-1x and Chai-1r all over the place. There's lots of papers that you can find. They're not perfect, but they're getting quite good. And I think future generations of these will become the standard for lots of structure-based drug design tasks.

In the meantime, while we wait for these models, we can actually use the co-folding scores to assess how good this complex is likely to be. And we can see here that there's these different scores that Boltz 1x gives us back. So there's a predicted TM score that's sort of an overall confidence metric. There's an interface predicted TM score for predicting multi-chain structure predictions. And then there's just sort of an overall confidence and aggregate score.

And we can see for this particular case that the values are actually all quite low, which is why they're shown in red. So although we've gotten a 3D structure back from this model, we're not actually very confident in this prediction. So, you know, this 0.23, 0.30, 0.34, this might be right, but Boltz-1x is saying, yeah, I'm not so sure about this. Maybe you should double check this, try running a different model, see if the predictions align, et cetera, et cetera.

And that I think is one of the nicest thing about these family of ML models is that, you know, while they're not yet perfectly accurate, they give you tools to help you understand when they are and are not likely to be accurate, which makes them, I think, very useful for practicing scientists because you can use them when they are useful and ignore them when they're not. So anyhow, this is just a brief overview of how to run protein ligand co-folding workflows through Rowan. Obviously, feel free to make a free account, try these out on your own structures, and see if this can save you time in the lab or complement your other computational efforts. Cheers.