How to Run Free-Energy Perturbation

Transcript

In this video I'm going to be demonstrating how to run free-energy-perturbation calculations through Rowan. Now what this video is not going to do is give a big overview of the theory behind FEP: the Zwanzig equation, how it works, adaptive lambda scheduling, all these nitty-gritty under-the-hood details. This stuff is very interesting. I recommend you learn about it or pick up a statistical mechanics textbook or read a bunch of papers if you're into that, but we're just not going to do that here. We could spend many many videos talking about this and frankly I'm not sure I'm the right guy.

So what we're going to do instead is just go through, play-by-play, what it looks like to run relative binding free-energy calculations through Rowan's interface. And the way that this works, so just briefly if you have no idea what FEP is, it's a way using molecular dynamics and some very nice theory which I alluded to just now to quite accurately predict protein–ligand binding affinity along a congeneric series.

So the flavor of FEP we're doing does not predict absolute binding affinity. It predicts the relative binding affinity between separate analogs. And it does so by alchemically mutating these analogs into one another and seeing how the binding affinity changes. So obviously there are limitations to FEP, there's things that aren't handled, but nevertheless it is the “gold standard” for computer-assisted drug design when you want to predict binding affinity.

And in Rowan, running FEP actually takes three steps. So first we use analog docking or other methods, but we prefer analog docking where possible, to generate a bunch of bound poses. Next, we actually build the alchemical perturbation graph, and then finally we run the simulations and get the results back.

So to run analog docking, we need a protein, we need a bound reference pose, and we need a bunch of analogs. So I'm going to come over here to these Uni-FEP benchmarks. I've selected one for this tutorial roughly at random. We've tested a bunch of these. And what we're going to do here is we're just going to take this protein structure. We'll download it. We will take the ligand structure. We will also download it. And then we'll go over here to their list of suggested analogs.

And so now within Rowan's view, we'll upload the PDB file. And so right away, we get this error, which is very common, showing that we're missing hydrogens. So we'll enter the “Prepare Protein” view. We'll add hydrogens. We'll say pH 7.4. There's all these default things that we'll leave on because they're not usually necessary but a good idea.

This protein preparation step is ubiquitous because many experimental and computational methods don't predict protein hydrogens. They don't resolve very well crystallographically because there's just not a lot of electrons, a lot of electron density to resolve where the hydrogens are. And so you need to get down to really fine, extreme resolution to actually see them, which we don't typically do for proteins.

So protein preparation is just finished, and it seems to have gone well. We can actually manage the protein view and look at the... Manage protein view, show polymer atoms, show hydrogens. If we look at all the hydrogens here, we can see that they exist. So that protein has hydrogens now, which is great. And there's no short interatomic distances, so it doesn't seem like we blew anything up.

So now we've got our protein that's all ready for analog docking. And we'll go here and we'll upload our bound reference pose. So, yep, looks good. And then we'll come over here and we'll select all of the ligand SMILES or enough of them to keep us entertained for the course of this video. We will go here. We will say “input SMILES file,” 21 valid SMILES strings, and say “Add Structures.” And we can just spot check to make sure: does this look like an analog? Yeah, it does.

OK, so let's go here. We'll say “Submit Analog Docking.” And we'll rename this now to “generate poses.” And so what this is going to do is for each analog that we submitted, 21 in total: we're going to use the maximum common structure with the bound reference pose to initialize the conformation, generate a lot of conformations based around that, and then locally optimize them within the docking pocket to try to prevent clashes and generally see what the output poses we get are. So the goal is to have high overlay, which will create well-behaved FEP simulations. And if we don't see that, then we'll have to ask ourselves some tough questions, like will the simulation be well-behaved or not? Will we have issues? So on and so forth.

This is typically what we do in cases where we have some prior knowledge about the disposition of the protein that we want to use in a gainful way so that we're not just blind docking every compound and hoping we get the same pose for each one. And we can see that analog docking actually runs nice and quick. So we're getting poses floating in. Most of these are in the good docking-score regime. Between -5 and -10 for docking score is pretty typical. We see in most cases pretty low core RMSD, so good overlay.

And then every now and then we do get something like with this bromine that seems to be clipping the protein structure. Yeah, so if we look here, what we're seeing is that the bromine is actually bumping into things on this, because it's too long and too large. And so this is failing PoseBusters and is not happy. So we're probably not going to run FEP on this pose. We could run a separate docking function from scratch if we wanted to get a bound conformer for this pose. But for the purposes of this tutorial, we'll probably just omit it.

Okay, so what we're going to do here, um we'll take all the PoseBusters, we'll filter only to cases where PoseBusters was successful. So we'll show all of these. We will resubmit these. We'll take our initial protein structure, the 13 structures, and we'll submit these to the RBFE graph workflow. I'll quickly renumber these so that their names are a little bit shorter. And again, if we had actual compound identifiers, we could put these in earlier or put them in here so that we can actually correlate these to whatever's in different software. But since this is just a demonstration system, I don't actually have compound identifiers for these. I just pasted them from the internet, as you can see.

And I'll just keep the default settings here. So Greedy is generally recommended. You can use StarMap if you're in a hurry, which will essentially run single edge FEP. The default scoring method is good and then “k min cut” tells us how many edges we'll have at minimum per node, and three is a standard default. So we'll run this and we'll rename this to “generate alchemical perturbation graph.” And this again should run very quickly for us. What's going on behind the scenes is it's trying to do some similarity matching to figure out which ligands to connect to which without creating disconnected graphs. And it's also going to be doing some atom mapping, so figuring out when we actually do the alchemical simulation which atoms in one ligand are gonna actually go to which atoms in a different ligand.

And what we can see here is that we've done this okay. So we can click on the edges: this one is a very simple mapping because we only have a few changes here. You know, this one might be a little bit more complicated. Here it thinks there's a lot of dummy atoms probably because the poses are not overlaid very well. And now we can actually resubmit this and see where we're at.

So we have a few things going on. We have our protein target again, if you remember. That autoloads because we ran that as part of this sequence. We have our graph we just generated. We can check that all of these things are actually part of the graph. Yeah, good. We can edit the graph if we want to. So if I think, “wow, I really, really care about knowing the difference in free energy between five and three”, I can actually go here and I can add this edge. So I don't need to just use the graph that the computer gives me. I can edit it myself.

We can go here and adjust our RBFE settings. So here I'm going to change the charge method to NAGL, which is just a little bit faster than AM1 BCC. It's a machine-learning model. And I can adjust any of these other parameters I want to as well. So if I want to run longer simulations, if I want to adjust the window overlap, toggle local resampling, I have full control over all of that here. Although I'm not going to mess with it because the defaults are pretty good, if I do say so myself.

The last thing we have to do before we submit is actually just check before we go and queue an expensive GPU runner and start trying to run all this stuff and sit in the queue. We want to actually make sure that the system can be parameterized because it's very frustrating to wait to queue an expensive GPU and then to die within the first 10 seconds because the protein residues aren't named in a way that AMBER expects. And so I'll just say check here. And it says “passed,” so the protein is parameterizable by our forcefield. If we had misnamed a residue or we were missing hydrogen somewhere, this would fail and make us go back and properly prepare the protein.

And so now we'll just say “Submit RBFE.” This is going to queue a big L40S 4 GPU runner and hopefully this will run very well. So I'll catch you after this finishes.

Hey folks, just checking back in here. It's been a couple hours and we're starting to see results pour in. So I wanted to just show what a partly completed FEP graph looks like because it does stay in this state for a few hours. So if we look at these edges, we can zoom in here and see that some of them are actually fully completed. So right here, this is a ∆∆G of 0.66. You can see the full values here. Every leg, there's both a complex leg, so the protein leg and complex binding energy, and then a solvent leg, so the change in just ligand free energy and solvent.

And so we can see what this is predicted to be going from four to five. You know, we can look at different ones of these. We can rearrange. Sometimes they hide a little bit. There's also some of these green legs where just one leg is finished. And typically the solvent leg finishes first because it's a little bit easier. And so this is, this is kind of the state that we find ourselves in.

So there's no ligand results yet because we can't get ligand results until we finish all the edges. But we can look at some preliminary cycle closure results, and see that because free energy is a state function, we expect that going from four to five to six, sorry, from five to four to six back to five will sum to zero. And here we find that it doesn't quite sum to zero, but it could be worse. It's not that far from zero. And this is just a way to guess how close we are. So this is shown as yellow. It's not amazing, but it's not terrible. And we can probably dial in simulation parameters if we want to get a little bit better to find errors.

The last thing that we can look at here is actually just to actually view the lambda overlaps. And so here we can see the lambda windows. Lambda going from zero to one is how we mutate these ligands into each other. Again, sorry, not really going into the theory of FEP here, but 0 corresponds to the starting ligand and 1 corresponds to the ending ligand. By default, there's 48 Lambda windows, but in practice, TMD uses this adaptive-lambda-scheduling-type technique. So we reduce the number for easy transformations because the sampling isn't that hard.

And so what we can do here is actually see what lambda values it picked, which is 13, a lot faster than 48. And then we can look at the overlap between adjacent lambda regions. And they're color-coded green, showing that the sampling is all good here, which is what we want. Yeah, so we're about two hours in. We've completed part of the simulation, but not all. And we will check back in when this finishes.

All right, folks, just checking back here. So the FEP calculation just finished. And in total, this took six hours. And that works out to about a half hour per ligand, or about 17 minutes per leg. So pretty fast. We've seen faster. And I think a better set of ligands would probably run faster. But all the same, these are nice, nice quick results by the standards of typical FEP runs.

So if we look here, we can see that the ligands now, we have all of the ∆∆G values per edge, and then we have final ∆G values per ligand. And so we can see if we go to this ordered list, that ligand 3 is predicted to be the best relative to our initial structure, a kcal and a half better. Ligand 4 is a little bit better, maybe about the same: 4, 2, 6, 1, so on and so forth. And then some of these ligands are predicted to be really, really bad.

One of the things I didn't realize when I was just copying and pasting structures in, as I said, we don't really rehearse these demo videos too much, so I guess that's on me. There's actually different enantiomers mixed together here. So the ones that bind well typically have on this piperidine, at the 3 position, you have this coming out. This—I don't even know how to name this on the fly, but this nitrogen biaryl system—is coming out of the screen. Whereas over here, all these compounds are actually going into the screen. And that explains why some of these FEP edges were so difficult and actually so slow to run. So if we go to the log file, most of these finished a while ago. But the last couple took another two hours beyond the other ones.

And we can see this if we look at the lambda overlap values. So an edge like this, 27 lambda windows, 33. Okay, 34. Yeah, some of these are a little slow. Some of these are quicker. This one's only 15. This one's 34. This one's 9. But as we get over here, we actually see this is 48, and even 48 isn't quite enough for some of these difficult samplings, so the diagonal overlap is not quite what it should be. Yeah, for this as well, that's 11, so that's a pretty easy one, but this is 48. And again, we're starting to see some of these bad adjacent diagonal overlaps.

So, okay, so this is maybe not a picture-perfect FEP run. Maybe if we were gonna run more calculations on this system, we'd run the two enantiomers separately. We're getting really poor overlay here, as we did see visually earlier and just decided to ignore. But that's okay. So yeah, so we have our graph, we have ligand results here. We have cycle closure. And some of these cycle closures are really bad, the ones that are over here, again, because these ligands are all different shapes. But most of the cycles, the majority of these are really summing to zero on this right half with all the same enantiomer.

This is FEP through Rowan. So we've got fast results. We can download these, choose which ones to synthesize. We can use these to help train a QSAR model if we want to. And then we could take these, the same protein, take our next set of analogs, maybe based around this, maybe do some aza scanning, maybe some methyl or fluoro scanning, some other stuff, and just use this to help make better decisions, triage compounds, and hopefully discover drugs faster.