Neural Network Potentials: Towards Accurate, Scalable Simulation of Matter

This is a video recording of a workshop given by Corin Wagen, co-founder of Rowan, at the AI for Materials and Molecules conference in Montreal.

Transcript

Ameer: Okay, so let me just say a couple of words. All right, everybody. So this is part two of our workshop for AI for materials. So this one focuses on neural network potentials. Last week we had it on our last session was on graph neural networks in general. So this is a bit more specific and on topic to what a lot of your domains are.

And we have here with us, Corin, who has a startup called RowanSci. And they integrate some of these neural network potentials. They've worked a lot with them, et cetera. So he definitely has some expertise, particularly in the medicinal domain from what I saw. Without further ado, Rowan, I'll leave the floor to you.

Corin: Thanks, Amir, for the intro. And thanks for putting this all together. Is it possible for me to visually see who's in the room just for like context, just so I can wave. It's kind of abstract to giving a talk to the ceiling. Oh, it doesn't need to be prolonged. I just want to see your beautiful smiling faces.

Ameer: The few that did manage to show up are here.

Corin: Hi, folks. Nice to meet you. Yeah, so let me, so I was given some instructions here. I'm going to talk a little bit about like neural network potentials sort of in general. So a little bit about like who I am, a little bit about like what are neural network potentials? Why do we care? Trying to ground this scientifically. And then we'll do some code workshop and then some like more higher level workshops visually. And then sort of talk about where we think where I think this field is going and stuff to pay attention to. So that's sort of the high level overview of what we're going to try to cover today in the time that we have. And please, at any point, please, please, please just yell at Ameer or Mehdi or whomever or just yell at me to interrupt. So this can be dynamic. It's much more fun that way. More of a workshop, less of like a state of the union address. Do you guys have that in Canada? We have that in the US. It's pretty boring.

Ok, I'm going to share my screen if I'm able to. All right, is this coming through ok? Yep. All right, so yeah. So yeah, this is about neural network potentials, which are sort of a subclass of… so you guys have already talked about sort of like geometric deep learning, graph neural networks. I mean, this is intended to sort of be like a specific deep dive into one area of ML for science, material science, chemistry. I'm a chemist, so I think of this in terms of chemistry, but it's equally as important for material science. Now try to focus on that much more today.

So just briefly about me. So as I mentioned, I'm actually a chemist. So I worked on like ion pair sensing before, homogeneous catalysis, reaction mechanism, analytic chemistry. And my journey to sort of like computation and simulation was very roundabout. So I used to work in a lab and derived a lot of joy from that. But the more I studied reaction mechanism, the more sort of it became clear to me that like simulation and being able to actually model what was going on was going to be like one of the defining and I think transformative sort of technologies of how we approach, you know, atomistic sciences. So chemistry, biology, material science, chemical engineering and sort of sort of the way I got there and got so interested in ML was just by really encountering very harshly the limits of conventional simulation.

Sorry, I promise this is not an organic chemistry talk. I won't do very much organic chemistry, but I did used to be an organic chemist. One of the things we were trying to study was this mechanism, so how this reaction happens. Those of you who have some experience in reaction mechanism can probably guess that like you can either do this in a concerted one step, sort of two plus two process, or a stepwise process. We did a bunch of various NMR experiments, and we convinced ourselves that it was happening through this two step, stepwise mechanism.

But all of our calculations essentially showed that it should be operating through a concerted mechanism. And so what we had to do in the computer to finally get the correct mechanism out was we had to explicitly model hundreds of solvent molecules surrounding this. So we had to actually use the sort of the nuclear option of explicit solvation to model this mechanism correctly. More abstractly, and ok, I think this is the second of two organic chemistry slides, you can actually model the free energy surface and show that not modeling the solvent, so doing what people usually do, gets it wrong. But when you actually throw in hundreds of solvent atoms, you actually predict an intermediate in the top left here, like a minimum on the energy surface.

The problem was that even for this tiny, tiny system, so you weren't able to model more than like 20 atoms, using this way, so these explicit solvent simulations on the right here. A single MD time step to model this reaction took about five minutes, and that gives you one femtosecond of time. We had to run to converge the surface just barely. It's not even very well converged, but it took almost three nanoseconds of simulation time. If you crunch that math, this works out to about 25 CPU years of simulation time, or that's about 200K worth of money just to model this one simple reaction.

And so I think this is emblematic of a lot of challenges that simulation in the chemical sciences faces, which is just that it's really hard. And it's hard to get answers that are worth having. And so practically, we often won't do simulation at all. So I think this is even more apparent if we compare industries a long time ago versus today. So this is Fairchild Semiconductor in the 1960s. This is how semiconductors work now. It's super computerized, a ton of computer intuition involved in laying out chips, designing all this stuff, positioning elements, monitoring heat flow.

This is chemistry in the 1910s. And this is what my lab bench looks like when I was doing like antibiotic discovery. You know, the flasks are different. The instruments have gotten a lot better. There's more things you can buy now, but fundamentally it's still an experiment first field as is true for a lot of polymer science, a lot of materials discovery. It's sort of like do experiments first, and maybe do computation second to figure out what we did, if we can do that at all.

In a broad sense, I think this is just because it's really hard, what I mentioned. So modeling heat flow and fluid flow, these are hard problems, but chemistry and material science are harder. So reactions, biomolecules, surfaces and catalysis, solutions, electrolytes, ionic liquids, dissolution, even just modeling like sodium chloride in water turns out to be really, really, really hard to get right. And so these, I mean, this is like pretty much everything wecare about at a high level, right? Like either solids, liquids, polymers, reactions. I mean, this is like, this is the whole field on some level. And so I think, know, faced with this, a lot of people have sort of given up on simulation a little bit, like it's impossible, we'll never be able to do it. But it turns out that a lot of deep learning approaches are still struggling. So deep learning methods that try to circumvent physics often behave in very own physical ways.

So this has been the most studied, I think, in protein ligand docking, where you're trying to model how biomolecules interact, where you can show that all of these AlphaFold 3-type methods suffer from really bad data leakage. So as you move away from similarity to the training set, the success rate approaches zero. So we're not learning physics correctly. We're almost just memorizing. Even when we do get the correct pose, often it's not actually physically valid. So the bonds are in weird places. The geometries are insane.

And then even in these cases, it turns out even when we get it right, we can actually remove the binding site and see that the prediction is still the same. So even the correct pose for the correct ligand is still not quite learning the physics of protein ligand interactions right. Like if you fill up the binding site with bulky residues or you delete everything and take away all of the key interactions, it still predicts the same pose. And I think what this shows is sort of like, it's an appealing idea that we can just get rid of actually simulating things and actually caring about the physics, but it's not yet really possible.

Like it's not a, it turns out like actually being able to simulate things is still really valuable. And so what this talk is going to be is sort of how we can address this, like sort of this quandary, like this big issue that faces our field of like simulation doesn't work. We'll talk about why it doesn't work, how we can try to solve this with neural network potentials and sort of then what this actually looks like in practice.

So sort of two state of the art simulation methods, just to give context for like what happens, what does it look like to simulate something right now? Why doesn't it work? One way is we actually do all this quantum mechanics. We do density functional theory or something similar. I'm sure you guys have heard of that if you've done any computational work. You generate orbitals, electron densities, all this fun stuff. This looks like there's a ton of software here. If you've ever used one of these softwares, that's doing some quantum chemistry under the hood.

Yeah, and this this field has come a long way, right? So it used to be the like eight atoms. This would get you like a state of the art paper at a very low level of theory. And now, you know, like four decades later, we can do like hundreds of atoms with pretty good accuracy and we can actually match experimental values. The problem is that this is still just like so, so, so slow. So in the US, the UK and Japan, this accounts for like a sizable fraction of all national supercomputer time in national labs and supercomputing centers. So we did a study here on our blog. It's like 20 to 40 % on average, sometimes more, mostly just VASP. It just turns out that scaling the stuff to the big systems, to the stuff we actually care about, is really, really expensive. you can scale it. So if you sort of pour more GPUs on the fire, you can scale it. So you can use 500 tensor processing units from Google, and you can get 10,000 water molecules. You can use 27,000 GPUs and you can get to like 600,000 electrons, know, 150,000 atoms. And then, you know, people have actually been able to break the one exaflop per second barriers. So exa is like beyond tera, you know, I don't even remember what it is. And you need like 10,000 different compute nodes to do that.

And so this is like, this is fine, but this is clearly not a fantastic solution because it takes a ton of power. It takes a ton of money. It takes a ton of resources. And it's just like, you know, average researchers will never be able to do this. You know, like we cannot, we don't have exaflops of compute lying around to give to graduate students or startups for that matter. So this is clearly sort of like a, not a very good solution for the simulation problem.

The other solution is molecular mechanics. So this is sort of where things go on the other end. So if you want to study big boxes of solvent or big proteins or big polymers, we say, well, we know what bonds are like. We know what angles are like. We know what dihedrals are like. Why not just put those in? We'll have a simple function. We don't have to do all this coupled, differential, Schrodinger equation math. And we can just put in the physics that we know and understand well. And this will probably behave in a sane manner.

So this works. This makes things really, really fast. But it's a little too simple. What you get out of this is very massive errors. And at this point, there's a pretty big molecular mechanics related issues with the shape of molecules, the structure of polymers, of dynamics of protein folding, free energies and solution. So, you know, it sort of qualitatively gives things that are semi-sane, but when you actually scratch the surface and you start trying to make predictions out of these simulations, you find that they're really, in many respects, quite disappointing. Not in every case. There's a lot of people who do really good work in the space, but it sort of doesn't quite feel like a solution that gets us to where we want to go in terms of predictively accurate simulations.

And I think, honestly, material science does this to a much lesser extent than drug design, simply because the field is more complicated. There's more things to study. And the shortcomings of this solution are much more apparent. And so when I started doing computational science, this was the state of the art.

So there were like, this is a plot of how slow things are. So a log is the scale of time. This is for evaluating shapes of some molecules and ranking them. This is like the accuracy relative to experiment—so the R squared, the correlation with the truth. You can, so you can sort of smash your piggy bank and spend like weeks on each molecule and get the right answer. You can get answers really, really fast, like virtually instantly, but that are very bad. So an R squared of 0.4 on an easy task like this is pretty disappointing. And then you can sort of go anywhere along this frontier you want. So you can be fast and inaccurate. You can be extremely slow and accurate, or can be pretty slow and pretty inaccurate.

But none of this is really where we want to be, which is in this top left corner. We'd like to be fast and accurate because that's actually how we can do useful science. It seemed like there's this unbreakable relationship between the more accurate you got, every linear increase in accuracy demanded an exponential or an order of magnitude increase in cost. This sucks. Basically, this sucks.

Ok, so this is where we sort of bring in neural network potentials. So against this backdrop, this is why NNPs are so exciting. But before we talk about NNPs, I think it's worth acknowledging that sort of the core idea behind an NNP has been used since the 1960s, actually. And this is to say, well, if I just want to study one system for a really, really long time, I don't need to solve quantum mechanics at every single step. I can just solve a bunch of quantum mechanics once and then fit a polynomial to the potential energy surface.

And so we might say, you know, for a very simple system, H3, where we have three hydrogen atoms, like an H atom and an H2 molecule, and that can, those can react and combine in different combinations, we can actually plot with a fairly simple function of what the potential energy surface loOks like. So say this is one distance, this is another distance, there's an angle term that's not captured here. But we can basically just make a plot of the energy and then we can run molecular dynamics. We can predict vibrational frequencies. We can find minima. Like we can do anything we want once we've put in all the work to build the potential energy surface analytically.

And indeed, you can find papers that just put a functional form, they say, and then we fit all these constants, and this is what we got. These are the seven constants, and we pretty much can get the exact thing perfectly right within the limits of our precision. And then you can do a ton of really interesting science on the H3 system with this, because you've essentially got a solution that is both fast and accurate.

The problem is, one, it takes a lot of work to get that potential energy surface fit. And two, this approach really, really, really, really does not scale. So, you know, this is again a system I showed earlier that I worked on this, this, you know, the single sort of metal atom, single ligands, you know, it's not that complicated in the grand scheme of things, but this has 342 internal degrees of freedom, right? So to create, to think about like writing a polynomial or some other function that actually, you know, has 342 individual variables, each of which influences the energy in some sort of way, is just a very, very difficult task. It seems impossible.

Furthermore, there's this issue of permutational invariance. So I shouldn't care which carbon is in which spot. You want the function to have this permutational invariance property. You maybe want it to have rotational invariance. And you start encoding all these things: how do you write a function that satisfies all these constraints? How do you make it scalable? How do you make it so you can even fit all these parameters with so much data.

And by this point, this is probably starting to sound pretty familiar. Like, oh, wait, I think we know functions that can do things like this. And this is actually where we sort of bring in the idea of graph neural networks. So graph neural networks applied to potential energy surface prediction is just a neural network potential. So that's all it is. A neural network potential is just a GNN that takes atomic positions and gives you properties out, most commonly the energy and forces.

And believe it or not, this was actually done as long ago as 2007. So this is like the first true NNP. And this you know, this is Jorg Beller, Michelle Parnello, the fantastic physical chemist. And he basically outlines these exact reasons. He says, you know, DFT is too slow. We can create a high dimensional potential energy surface. So not like a low dimensional potential energy surface like H3, but a high dimensional one by using a neural network to represent the DFT potential energy surface. And this has a number of really good advantages. So it's flexible, it's permutational invariant, and we can construct it in such a way that it's local. So we can use a cutoff graph method like this so that each atom's energy only depends on its neighbors and not on the whole thing, which makes this much more easy to train.

Ok, I've been yapping for a while. Do you guys have any questions or thoughts? I can't actually see you guys right now, so let me back off and take a quick break. Does anybody have any questions?

Ameer: No, Ok, so far in person, nobody has their hands up.

Corin: Ok, Ameer, can ask you just yelling at me if people have questions?

Ameer: Yeah, sure, sure will do.

Corin: All right, very well. Ok, so I'm not going to go through, know, history is good up to a certain point, but I think it's worth just sort of fast forwarding from 2007 to today. So 2007, I'd say the first NNP was for amorphous silicon. And so this is like a few thousand learnable parameters and then a few thousand DFT calculations of training data.

In 2012, Beller strikes again with a neural network potential for liquid water. So this is now obviously a bit more complicated. You have multiple types of atoms. You have sort of a much more directionally oriented system.

But then I think in 2017 is the real next breakthrough for the field. And this is the release of ANI-1. So this is a general neural network model for neutral closed shell organic species with CHNO as supported elements. And essentially what this means is now we don't have to train a new neural network potential for every molecule under study, which takes a while and demands sort of specialized expertise. We can use this sort of foundation type model that can generalize to unseen molecules that are comprised of the same building blocks.

So we're not generalizing to unseen elements. So ANI doesn't know what to do with fluorine, for instance, or iron. But nevertheless, if you have a molecule that sort of fits within the scope of like non-charged closed shell organic molecules, you can probably get pretty good performance with ANI. And so this is about a half a million learnable parameters and it's trained on almost 20 million DFT calculations.

So this is actually pretty cool just to look at the results from their paper. So these are torsional scans, so rotating around the number of atoms and seeing how the energy changes. And you can see that the training data, the ground truth, at least for ANI, is in black. The ANI performance is in red. And then two other sort low-cost computational methods are shown in blue and orange. And you can see that ANI is getting things like substantially more correct than the other low-cost methods. And so this is like a pretty nice success.

And it's worth saying that also all of the DFT calculations here were on eight or fewer atoms. And so these molecules shown here aren't just not in the training set, but like no molecules their size are in the training set. So you're generalizing, you can't generalize the unseen elements, but you can generalize sort of from large, from small to large, which I think is pretty interesting. And thinking through what dimensions are easy or hard to generalize along is like, I think an open question in NNPs and one that deserves careful study and that we can talk about more, maybe at the end.

So where are we at now? I'd say there's like two…

Ameer: So Corin, sorry, think we have a question.

Audience Member: Can you go back to that picture of all of the different models modeling the rotation? You said ANI did the best, but what are you comparing it to? Like, do you have an actual chemical reaction that you can compare and say ANI is doing the best?

Corin: Right. So in this case, they're focused on just learning from this sort of quantum chemical surrogate function, DFT. So we can benchmark different DFT methods against very high level computations or against gas phase values in this case. And we can show that if you spend enough money in the computer, you can pretty much approach the ground truth for this. So this is a task for which in the gas phase, DFT does very, very well. The method that ANI authors used is fine. So it's wB97X/6-31G(d) if that means anything to you.

So it's like not. It's not amazing, frankly, but it's decent and it's definitely like I think within the realm that you expect it to be much more accurate than the blue or orange here. So I mean, I think more broadly there's always sort of two questions within NNP. It's like how accurate is the model at fitting the data and how accurate is the training data relative to like the objective truth? And you so you obviously the observed error when you have an NNP versus truth is like the sum of those two errors.

And what this shows is just that like given a set of data, ANI can learn it like very, well. And it doesn't really address the like, how good was the data in the first place? Does that make sense? Or does that answer your question?

Audience Member: Yeah, partly.
Corin: What would answer the other part of your question? Or like do you want to?

Audience Member: Hey, well, you're just showing a graph here of different models and you mentioned Annie is doing far better than the DFT or the other two blue and brown.

Corin: Oh sorry, I understand what you're saying. No, so Annie is trying to fit the black the DFT so it's it's like black. The black is the one that we believe is truth. So for the purposes of this experiment, this is truth in real life. It's pretty close, but it's not exactly perfect, but you can see here they quantify the RMSE relative to the oracle function, essentially, which is in black. And so in this case, Andy is 0.4 away from the oracle. The blue DFTB is 0.7. And then PM6, the stars, is 1.5. And then over here, for lisdexamphetamine, it's even more apparent. So Andy is 0.9, whereas the other two are over 4 kcals RMSE.

Audience Member: Ok, makes sense. Thank you.

Corin: Yeah. So yeah, I mean, think the question is, if we use really, really high quality data, can we still train to that? And the answer, I can send you the papers. I don't have them in these slides, but the answer is basically yes. Actually, the ANI authors did this later. The problem is acquiring the really high quality data ends up being so expensive, you can't acquire very much of it. So you don't get the same generality. I think one of the big open questions is basically how do you balance getting enough data versus getting good enough data?

And that's, we can talk about that. I have a slide on fine tuning later, but basically there's a few ways to approach that, but it's like something you to think about pretty carefully in general, I'd say. Yeah, good question.

Okay, so like where are at now? This is my summary. So, you know, take this with a grain of salt. But in my mind, there's like two sort of big buckets of neural network potentials right now that exist. And this field has blown up in the past like two years, by the way. This is probably one of the hottest, if not the hottest area in like computational chemistry and material science. I'm clearly biased, but I think it's, you know, probably has the most activity because I have to read all the frigging papers.

So there's material science focused NNPs. So this is things like Mace MP 0. A bunch of the big tech companies have gotten involved. So MatterSim, Gnome, OMAT24, and then some startups like Orbital Materials. And these are usually trained on like the entire periodic table. A lot of inorganic materials, a lot of like battery looking stuff, a lot of like perovskites, etc, etc.

And then on the other category is sort of these bio-organic NNPs. So that's much more like ANI. where you're focusing on carbon-based molecules, water, stuff like that. So much more things that would be in biology. And you usually can't support random leads or irons thrown in there. So these are different training data construction. If you have a given amount of training data, do you put all your points into just different-looking organic molecules, or do you try to spread them evenly throughout the periodic table? And they end up being good at different things.

The scale today is big for chemistry, but pretty small for ML. Usually under 200 million, learnable parameters are fewer and about that many DFT structures. This might seem weird. It's weird that you have one-to-one parameters for training data, but it actually isn't that weird because each DFT structure has a ton of per-atom information. If you imagine what's the token equivalent, there's many more tokens than 200 million, but that's how many structures there are in general.

And I think it's worth to say there's just there's like a ton of progress in this field and over the next 12 months. I'm sure that this list will probably almost completely turn over so. Yeah, pretty fun. Yeah, and it's you know one of the one of the big things here is just like it's a zoo. Frankly, to keep up with all this people have made jokes about this like benchmarking and figuring out what's good for given applications is like an incredibly valuable and useful activity as with sort of all other fields of ML right now.

So I want to just share sort of like one story on the benchmarking, just to sort of show what this can look like and like, just cause it's the work that we did and I got invited to speak. So we'll talk about my benchmarking work. There's a lot of good benchmarking work out there. One of the things I think that is always a question for these is like how generalizable are these models? And so like if we, you know, if you give it a weird looking molecule, will it just freak out? Will it start hallucinating? You know, when do these things break down and stop being reliable?

And so one of the ways that we tested that in collaboration with the Gair lab at Michigan State University was we just took these like three pretty simple molecules that have a bunch of different sort of structural features in them. And then we just like tormented them. So we ran metadynamics and extracted the weirdest possible poses we could find. And so what this gives you was like huge differences in bond distance. So like up to over 10% of the bond length, which you never see ordinarily. Big differences in bond angles, big differences in dihedrals, and really massive relative energies. So these are very, very unhappy and strange structures that are pretty likely to be underrepresented or not represented at all in the training data. And so we compared our set, Wiggle150 is what we called it, in green to the conventional sort of conformer sets in black. And you can show that their dihedrals are all very close to zero relative to the ground state. So we're really creating much weirder geometries.

And what we find, there's a ton of different methods on this slide, but it's essentially another version of like time. So fast on the left, slow on the right. And then like here we're looking at RMSE. So good is on the bottom and bad is on the top. And what we see is like if we ignore the green, which is the neural network potentials, know, like force fields are fast and bad, semi-empirical methods are like medium good and medium slow. And then DFT is like good, but very slow.

And what we can actually see is now like, if we look at all of these green dots, some of them are not great, but some of them are really, really good. So this is where we want to be right down here, where we're fast and accurate. And specifically, so these two are the organic-focused ones we studied, and these three are the materials ones we studied. And so for this task, the organic ones do well, and the materials ones do much worse. And that kind of makes sense, because it's like an organic task. So if we looked at a materials task, the materials ones would do well, and the organic ones would do poorly.

So you do sacrifice some generality, at least with today's models. But in exchange, we're like eight orders of magnitude faster than what a comparable method would be. So this would take about four, 10 to the fourth seconds. And this takes like 10 to the negative fourth seconds. So that's a pretty good speed up, I'd say. And that's part of why this field is so exciting. Because yeah, you have to check that you're doing everything right. But you can legitimately speed something up by eight orders of magnitude for the same accuracy. And you really don't see that very much in science. So it's worth taking seriously when you do.

Ok, so I think maybe now is a good time to move over to actually, what does this look like in practice? How do we do this? How do we do this? So I thought we could look at this solid state crystal benchmark set, because it's some of the cleanest solid state benchmark data out there. And then look at Mace MP 0, which is one of the most commonly used materials models on trying to reproduce that data. Does that sound good to everybody? Any questions before we switch gears a little bit?

Very well. That's great. I'll take your silence as acceptance. Ok, so here. Oh man. Alright Google Colab I was trying to use for this. I usually don't use Google Colab. Alright, so here's a link to the data. So it's just what this benchmark set is, is X23. So it's a set of molecular structures and solid state structures. And what the task is, is try to predict the lattice energy, which is like, don't know, it's a pretty good test of how well you're capturing the stabilization from crystallinity. And it also comparing lattice energies lets you find the polymorph. So it's a pretty important crystal structure prediction thing both in pharma and various inorganic materials type stuff. So let me share my screen again.

All right. So, we'll be using ASE for this, which is probably the most common material science simulation package, so the atomic simulation environment. And then here we're going to look at two different NNPs. So there's mace-torch, which is the pip-installable package for the MACE MP0 model. And then there's the Orb models package, which is from Orbital Materials. So these are just two of the of common foundation models for materials simulation. So if we install these, it all kind of works. You need pynanoflann for orb models, but they didn't include it in the requirements for whatever reason. And then we can import all of our stuff from ASE and from these models.

Okay, and so sort of what this is going to look like, again, apologies if you guys haven't used ASE before, but what we're going to want to do here is essentially we take in a molecule, like an isolated molecule, We optimize that and get the energy. We take in a solid, like a crystal structure, and optimize that, including sort of the box volume, and get the energy from that. And then we sort of take the energy of the solid minus the energy of the molecule, modulo, however many molecules per cell, to make sure we do our math right on the energies. And that will give us back, will return both the lattice energy and the cell volume. So sort of how tightly things want to be together and how big the cell ends up being.

And what's nice about this benchmark is there's actually really good experimental values for both of these, for all of these compounds. And so this ends up being pretty easy to check and see how we're doing. And so what we're going to do here to load in neural network potential, we'll take in this ASE calculator object, which I actually left out of the doc string. There we go. that. so this, know, ASE calculators can be anything. So it could be DFT, like it could be VASP, it could be Quantum Espresso.

But in this case, almost all neural network potentials ship with like an ASC calculator interface. And so since they're pre-trained, you actually don't need to do anything. You basically just load in the model. And then when you call inference on the model, it returns you an energy or a force. And it's basically just the same as working with DFT, except it's way faster.

Ok, and this is folders and sample data. I'm just going to rename it X23, just so I don't have to change any of my scripts. So now what we should have here, if you ignore all this other junk, is a folder called X23. It has a folder called mulls. And then that has 23 molecules, solids has 23 solids. Boom.

Okay, yeah, so this is the function. I mean, it's really not very complicated. You just optimize both of them. The one thing, I'm not a huge periodic calculations guy. I do mostly drug discovery, but you have to use this weird Frechlet cell filter. And you can choose two different optimizers, Quasi-Newton or FIRE. For whatever reason, some work better than others with different neural network potentials because the gradients end up being a little noisy relative to DFT. And so you have to choose somewhat noise tolerant methods.

So it's a little bit of trial and error. I think later generations of model are going to fix that. But for now, there's a little chaos. So what this is, here we just enumerate how many molecules there are per unit cell. So if we actually look here at what the molecules are, the unit cell has varying numbers of things in them. And so we just have to account for that when we call the function. so then we'll just, what we do here is we say calc equals MaceMP.

So that loads in the Mace MP model. So it's Mace Torch on GitHub. Just to show you guys. There's like whole directions around this. You can set different options here if you want to. But just the default MaceMP sort of has like sane options. And this loads in the Mace MP 0 model. We call our function and then we just click run. And it starts to work. So this is tracking sort of the energy, the maximum force to check for convergence and sort of this step. So this is Quasi-Newton. So it's doing a little BFGS line search.

And what I think is just sort of noteworthy about this is just like one, it just works. So this is running on like literally a potato in the cloud. And we're still getting like multiple steps per second. And this, like if you've ever done like periodic DFT or non-periodic DFT, like that each step here could easily take like 10 minutes or an hour. So this is like, this is just remarkable sort of how fast this is. And I mean, I really think that's like sort of the, that's the beauty of neural network potentials.

So if you're going to run a benchmark set like this with DFT and you wanted to actually do it properly, you'd need a high-performance compute cluster. You would need a bunch of resources. If you did it on AWS, it'd cost you a bunch of money. And here, literally, again, we have this potato in the cloud, and yet it still works. We've already done the first three values in our benchmark set. You could do this on your laptop. You could probably do this on your phone or your smart, you know, fridge or whatever, it requires very few resources. And what that lets you do is actually do like big MD studies on like long systems, which is awesome.

All right, it's going to take a sec to get through all 23 of these. Any questions while this is running?

Ameer: No questions on our end. No questions.

Corin: All right. No one wants to take pity on me.

Alright, well we can look at the results as they come out then. So here's the first few. So this is like energy, lattice volume and then time. So it's taking like 20, 23 or so seconds to run each one. I have the reference values here. I scraped them from the literature. This paper is easy to find. But let me let me actually delete the ones that I had checked earlier.

Audience Member: So so I have a question so yeah, we're we're feeding it 23. Structures and it's going to just output three parameters here, volume, E, and time, right?

Corin: Yeah, that's right.

Audience Member: Ok, so it's not being trained. It's already being trained.

Corin: Yeah, so this is sort of like the beauty. At least we're saying so what we're going to find out through this is like how well does this work out of the box versus would we need to fine tune this? And I think it's worth sort of like for any application, right? is like.

I mean, this is always true with computations. You always need to benchmark them. It's even true for DFT. It's also true for this. You need to like compare carefully to what you want to do and like see is this tool going to waste my time or is it going to give me something I can work with? Will this give me insight? Will this help me get somewhere? But for this, you know what's nice is like there's there's a sort of a like a menu of pre trained ones out there. They're all a little different, so they they have different architectures. We haven't talked about architecture at all. You'll notice and that's because the architectures are like wildly different.

And I'm very much a pragmatist, so I'm not like much of an ML theory guy. I understand some of it, but I like I just care about what works. So there's like you know some are like equivariant tensor operations. Others are like global attention. Others are like message passing like blah blah blah. Different featurization, different cutoff distance. You know all sorts of strategies out there are sort of being tried in parallel. And what's nice about about being an applications person is you just try them all at what you want to do and then you pick the one that works best.

And then if none of them work well, then you try to build a better one.

Audience Member: Ok, yeah, that was my next question. It seems more of an art than a science.

Corin: Yes and no. A little bit. mean, picking good computational methods, I suppose, there's a little art to it. But it's quite literally science, too, because you run the experiment to see how good it is. And then you follow the data and pick the best one. So don't know. Same as screening electrodes or something.

We do, we host some benchmarks here at Rowan just to try to keep track of what's good. So like some speed benchmarks, some like quality benchmarks for different models. So these are like some thermochemistry benchmarks here where we compare like different properties like errors against models. So this this might be helpful. We're always trying to think of new benchmarks to add just whatever is useful. Another one is there's MatBench discovery, which is like sort of a different type of a, they benchmark different things, but they do a bunch of benchmarking here as well too. So there is like data you can use to make these decisions instead of just vibes.

Ok, let's see where we're at data-wise. Ok, some progress. So if we go here, we can start maybe drawing some preliminary conclusions even before we have all of it finished. So split the columns. Boom. Ok. So if we look, we have two values essentially that we have from the reference. And so we have what we expect the reference energy to be and then what we expect the reference volume to be.

So how big does the unit cell end up being, which obviously tells you what the density will be and some measure of the non-covalent interaction strength. And then there's how strongly are these things predicted to bind. And what we see here is actually is kind of interesting. So we're seeing the volume is predicted to be pretty much too big across the board. So we're about 61% too big in the unit cell volume, which is actually a pretty substantial difference.

And then the binding energy is strikingly overestimated, it seems like, if we look at this. know, where this carbon dioxide is predicted to have a lattice energy of seven, here we've got 11. And it's predicted to have a volume of 41 cubic angstroms. And here we're at 55 cubic angstroms. So I think it's safe to say it's not horrible, but it's not amazing. This is, I'm not saying not a fantastic performing method.

So we'll crunch through all of these sooner or later. Some of these are just like a crazy overbound. So ammonia, right, is like almost double the lattice energy here, which is, I think, kind of interesting.

Ok, for the sake of time, I might pause this before we run through the whole set so we can look at another one. Is that ok with you guys?

Audience Member: Ok, so we are dealing with gases, but we are having lattice constants, right? So how are those lattice constant determined? Obviously, if we decrease the lattice constants, the interactions will be higher. The energy will be increasing. If we increase the lattice constant, the interaction will be less and the energy will be decreasing. So where is that number that lattice constant is coming from?

Corin: Yeah, well, so yes. So yeah, part of the problem with running things through CoLab is it's actually pretty hard to visualize the outputs. But if you look at what's happening here, we actually are like we're doing an optimization on the cell parameters on the lattice constants. If I understand what you're saying properly.

So we're not just optimizing the positions of the atoms, but we're also optimizing like A, B, and C, like the very dimensions of the cell based on the stress. So this is like, things wanna be together a little bit, but they don't wanna be too much together. And so there is, is sort of like we update the Hessian and we try to find the right value for the size of the cell essentially. And so that is like, things that overestimate or underestimate weak interactions, we'll see that show up in like what the predicted cell volume is and so that is in some sense like a probe of like how accurate these things are Does that answer your question?

Audience Member: Oh, yes, and and how So you have like four molecules inside the box?

Corin: Yeah

Audience Member: How those numbers were chosen?

Corin: Um, I Will let me just show you… the paper excerpt talked about it a little bit. Um, this is like I think one of the best benchmarks. So these are like what the cells look like visualized. I can load them into Rowan later if we want. Like what sort of what the unit cells are. And this is just like people experimentally measured for these like structures like what the measurements is. So this isn't a computational benchmark. This is like people did experimental data and like tried to actually measure the lattice energy for these crystal structures. So this is just what we have data for.

Ameer: And the name of that paper just to show it for the video recording?

Corin: Revised values for the X23 benchmark set of molecular crystals from physical chemistry, chemical physics. I'll send it in the chat. Thank you. Yeah, there's others. So you can do a lot of stuff. You can benchmark thermal conductivity. I mean, pretty much anything you can think of and you can find data for, can usually find a benchmark for.

This is just one that fits nicely within a Colab and doesn't involve like a ton of MD. So you know, another thing you can do is liquid densities, right? So you can create a box of liquids. You can run an NPT simulation where you keep the pressure at one atmosphere and you keep the temperature at room temperature and you try to match, know, like we know water has a density of 1.00. When we run with this model, a box of water, what density do we get out? Or, you know, what about 50, 50 water ethanol? Or what about like an ionic liquid? Or what about uranium at 1000 Kelvin?

You know, like, so you can do all this stuff to try to find where things break and don't break, just the same way as you can with DFT. Like, what's the boiling point of this? What's the freezing point of this? The problem is just that, like, some of these things are easier to, like, put in a little function and loop over than others. So trying to run, like, 300 picoseconds of, like, liquid MD simulations and converge that in, a live demo is just a little bit harder, if that makes sense.

All right. Let's see where we're at. And if it's not, we'll probably just cut it off around here. Oh, hey. Look at that. Ok. So I'll copy those in in a sec. So what I wanted to do is just show, like, here's how easy it is to do a different calculator. we just now, so here we loaded in the MACE MP01 from that repo. Here now we've loaded in the orb calculator from the Orbital Materials repo.

I'll just show the repo, and I can send the link in the chat as well. This is like Orbital Materials. They put their model out there. It's free. It's open source. It's good for academic and commercial use. So you just swap Calc equals Orb Calculator, and then you click Run. And this one, Orb is actually notable because it's significantly faster. So Orb is like a speed-optimized model. So you can do things like MD for longer time periods. And so this should be a little faster to run.

Yeah, so I mean, I think one of the themes that I want to get across here is like if you're doing computational stuff, basically it's like a little bit too easy not to try this. Somehow we skipped some numbers in the middle there. I don't know what happened. We went from 12 to 17. That's Ok. It's probably a bug in my code. I'm going to worry about it later.

I think the point comes across pretty well. So like, if you do any sort of like computational stuff, you should definitely be looking at these because when it works, it's like, it's just extremely good. And even if it doesn't work for the final answer, it can help you clean things up or get reasonable results much faster than like a conventional simulation or scale to things that a conventional simulation could not. Ok, oh yeah, so here's some ORB results. We're almost done here, so we'll just let it, let it finish and then we'll see sort of who does better and if either of them are any good at all in like an absolute sense.

Yeah, and I promise the next demo I do will be much more visual because this is really just staring at numbers on a screen. Yeah, it's kind of like the matrix. Leave something to be desired from the sort of intuition of chemistry perspective.

I wonder if Google stopped maintaining Google Colab or something. Oh, there we go. I feel like I've had a not very pleasant time working with it over the past day.

All right, so this is, do we skip one here? We skipped 18. All right, yeah, so I think what it can't converge is just skipping it. So we could add a smarter convergence check, but really, this is going to be fine for the sake of this. All right, so we'll say here, MACE MP 0, this is ORB D3 V2. Ok, and so what do we have? Like, what's our conclusion here?

So if we go to the bottom, so these are our reference energies for the lattice energy. And this is a reference volume in cubic angstroms for the cell. Oh, sorry, this is the volume per atom. This is the volume per cell, just modulo of the number. But it doesn't matter. And so here we can see that MACE, if we just ignore these rows that didn't run, I think these are even counted as zeros in this formula. But MACE overall, has like a 13.

Ok next plan. Let's delete all of the ones that didn't run because they're going to mess up our comparison. Boom. Can I work? Yeah. Ok. So yeah, we'll delete these. right. Debugging on the fly. Fantastic.

So we have here, like for lattice energy, we have a mean error of 11 versus a mean error of like 24 for orb. So the energies seem pretty not amazing here. In particular, this one could not be worse. So cyanamide. So for those of you keeping track at home, that's this molecule. Yeah, cyanamide. Something about this molecule really, really, Orb does not like, and we're way overestimating the lattice energy. S

o this is sort of what I mean, where you're like, you want to check that the system you're studying is normal, because you do sort of see these hallucinations with today's models sometimes. On the flip side, the volumes from Orb are really, really good. So even though the energies are bad, the volumes are pretty solid. And so we'd want to obviously do much, much bigger tests than this, but depending on what you care about.

So if you're trying to screen for a specific cell volume or structure, this might be decent. I'd say neither of these are fantastic at this task, because this really looks more like an organic task. This is less of all these non-covalent interactions. But there is also a big speed difference here if you look. So the average time for MACE is, well, we can just put that formula in, is like 24 seconds, whereas here it's four seconds. So orb is like 5x faster, which about matches what they say in the literature. And that's kind of cool.

Ok, so questions about this demo at all?

Ameer: No, I don't. None here.

Corin: Yeah, these results are they're like fine. They're not amazing. This is I think a task which I chose bad models for, but what it does shows I think like how. Easy it is to plug this in and so how to like, especially using these like foundation models stuff just works like it pretty much you plug it in. You have to benchmark it carefully, but you don't have to be like a deep ML expert with access to like a battery of GPUs to deploy this stuff. So this was just running on CPU and it's still very fast. Obviously it be faster on GPU.

So what was I going to say next? Where am I at? Oh yeah, I wanted to show briefly. This is not an advertisement for my company. My point here is not to talk about Rowan, but I did want to show we do have like a web interface for all this stuff, which makes it a little bit more visual. And so you guys can still see my screen, right? Yeah, Ok, so one of the things that we make it easy to do is sort of like try to run and visualize all this stuff. Like just like live, sort of in real time.

So we can do like, this is for instance, like a MOF optimization that we did to try to get the lattice constants for this MOF that I pulled from the materials bench. So you can see if we click play, like we load in the structure and it like, it was pretty close. It's pretty good. So it goes down like 20 kcals in energy. And like the cell volume increases a little bit. You know, we can, one of the nice things here is you can pretty much pull in like whatever you want, because you have support for the whole periodic table.

Um, so this is like a nickel as in structure, um, and it, know, you can run the optimizations in like, you know, a minute or so this like took a minute and 21 seconds. And then you can also do some like significantly fancier stuff. Um, so for instance, this was like a scan we ran, um, for this, like carbon monoxide binding to like a copper surface with various hydroxyls decorating it. Um, I have no idea if that shows up well, but this is sort of the structure.

So you have this like carbon monoxide here. You have all these hydroxyls decorating the surface of this copper slab. Then what you want to do is study: how does this actually happen? We want to form a formic acid looking thing here. How does this work? We want to oxidize this carbon monoxide. We can actually run the scan with the neural network potential and show it binding and then hopping over. We can try to estimate from this CO bound minimum here.

you know, up to here, we estimate like a 16 kcals per mole barrier for sort of the highest energy point here. You know, if you think that's sort of a weird structure, like maybe this is a better transition state mimic, but this is, you know, like 10 or 12 kcals. So 11 above the minimum. And this is like, I'm not trying to say that this is like the world's most elegantly done calculation. I was just doing random stuff the other day to sort of try out our copper slab handling.

But this whole thing takes under 20 minutes and can be done like sort of just for free, it doesn't take massive compute or anything. And it lets you quickly, I think, evaluate and look at geometries and structures in a very reasonable way. And so it's just to show how easy this is. And this is mainly enabled by the people who make the ML potentials. We just visualize it.

We can upload a file. So we can go to upload. I remember I had some SIF file lying around. Yeah, so this is like a random cadmium MOF I pulled from the materials project. Yeah, that's cadmium. And so we can go here. We support a lot of different neural network potentials. So we can say, let's do this one with the orb, I guess. Actually, let's do Mace. I think it's more robust. And then we can say, let's optimize it. Let's optimize the cell constants here. We'll submit it to a job in the cloud. And then it starts running.

And so if we refresh the view here, We'll start to see steps pile in as they finish running. I think just being able to view these things is pretty nice too, frankly. There we go. So you can see there's little things sort of flickering in and out on the sides of the cell. We see that the cell volume increases a little bit. The energy goes a little bit down. And one of the other things we can do in these optimizations is if we want to track key bond distances, for instance, if we think this contact is important somehow, we can try to visualize how this length is changing along the optimizations if we care about these structural parameters.

Yeah, so we've done five steps now. And this has taken about 45 seconds. So you can do the math. If you've done computations like this where you have to spend days, letting you do things like this that are laser fast is pretty awesome. Yeah, so here we are started from an experimental crystal structure. So it's pretty close to convergence. So it only takes six steps. But I think this is just the real power of neural network potentials.

Any questions on on any of this shenanigans here?

Audience Member: Yeah, I'll ask. So, you know, let's say you've done this optimization step, neural network potential. I mean, what would you then suggest afterwards? Do you like then take on a maybe a much more sophisticated functional and do, you know, a single point calculation? Or should there be maybe further optimization steps to see if there's any massive change and if it all goes wrong?

Corin: So, I mean, I think one of the difficulties and one of the things I think is underappreciated about like material science as a field, I don't know why I'm just looking at myself on the screen, is that like sort of the diversity and litany of like things that you care about doing is like so much larger. like coming from drug design world there, people want to do the same four things over and over and over again. And so you can get like hyper-specific on what people actually want to do.

In material science, it's, there's like, you know, what people want to do for polymers versus batteries versus like OLEDs versus catalysis is like all like very, very, very different. So it's tough to answer that in the abstract, I'd say. But in general, yeah, people, think people use this two ways.

One is like, if you're not sure if this is good or you don't think that these results are going to be reliable or accurate enough for your application, then you can use this as a pre-screening or pre-filtering step. So this is used a lot, for instance, in crystal structure prediction. Right now, this is starting to make it into the big companies. where if you're evaluating a ton of different structures or polymorphs, you do a pre-filtering step like this, and then you proceed to go on with the big guns. But you can essentially save the big DFT guns for when you actually need it.

The other way is if you're confident in your area of simulation that you have a model that works really, really well, let's say you've fine-tuned it or you've just run a ton of really good benchmarks, then you can use it as a high-throughput calculation thing where you don't need to go back and recheck the results.

But I think either way, it can still have the potential to be pretty useful. So if you like either: as a pre-filtering step that saves you money, but lets you get stuff at the same accuracy or lets you get stuff much faster than you would otherwise, I think both, both can matter. And for instance, like, you know, just the ability to like, just optimize stuff and like, like bind. For instance, I just wanted to generate, like, I just want to stick this CO to the surface of this copper here.

And like, I don't, suck at drawing, frankly, I'm not very good at drawing. So just the ability to like, say I'll just run a scan and watch what happens as it binds and watch the surface relax. I could then choose this for a DFT calculation, but I wouldn't have to do all of these essentially like trivial steps here on my massive expensive computer where I'm like waiting in the queue to get time. You know, I can just do this on my laptop or wherever and essentially get these results for free. And then when I need the big results, I'll go get those. You know, so think there's, I think there's actually a lot of places where this can be useful in workflows.

Does that answer your question?

Audience Member: Oh yeah, for sure, for sure. I just, you know, I've heard about the functionals and things like that. And I've thought, you know, should I throw these composite functionals with the Gaussian basis sets at my periodic metal structures? I don't know if that's, know, is that too different? Because normally it's just, you know, do a periodic plane wave and you go with that. But there's not a nice pre-optimizer step. I mean, maybe I just make the energy cut off lower and that goes a bit faster. I use less K points for my periodicity, but I sometimes the gains in speed aren't actually that great. these are questions and I don't know if there's an answer yet because I think this field is still so developing that there's not a well established. Ok, this is how you want. If you want to go a little bit faster for high throw point, this is what you would do.

Corin: So I do think that the future of high throughput, I think is neural network potentials, right? So I think we'll always want these hyper accurate physics based methods. I don't think where the goal is to get rid of those, but we don't always need them. And for instance, like when you, you when you reduce the cutoff, you know, in molecular land, we just use smaller Gaussian basis sets. We make them as small as we can. Essentially, I don't have this data on this plot, but essentially what you do is you like, you can't get it really that fast. You just start increasing the energy, like the slope of like speeding up DFT essentially curves like this on the plot. Whoa.

So you sort of hit some limit where there's only so fast you can make it before the errors just get horrible. And so if you do want high throughput stuff that's still somewhat accurate for any simulation task, I think the ML surrogate function, the NNP approach is significantly better. It seems, at least right now, it seems like that's the direction things are going. So let me maybe just run through any other questions.

Audience Member: I saw that you can do frequency captivation to throw on as well. How good they are or you can't.

Corin: Yeah, so I mean you get frequencies or like phonons right just from the second derivatives of the energy. So the Hessian so you know an underrated thing about these is you can get that through backprop. So getting Hessians from like DFT is very, very difficult, but getting Hessians through back propagation. You know it's slower than just a single inference call, but it's not that slow in the grand scheme of things.

So it works. We haven't done phonon benchmarks for materials. Other people have done those. They're on matbench discovery. A link I probably should send in the chat, huh? What we have done is we've done the vibrational IR frequency benchmarks. And those we find that the good ones, again, the good neural network potentials are about as good as DFT. not maxing out DFT, the best you can buy, but average DFT were about as good as that for vibrational frequencies. And so that does seem to work.

I think there's a paper on Archive actually yesterday evening exploring training to Hessian. So using Hessian fitting in the loss function as an interesting way to try to get robust neural network potentials from Justin Smith and coworkers at NVIDIA. So it's worth keeping an eye on this space, I think.

Audience Member: Thank you.

Corin: Yeah, so yeah, what I want to talk about now just to close this out and then maybe do like a bigger Q &A thing is like. So what now? What are we missing? What's this going to look like and what's what's practically useful for people in different walks of life? So we did like a little tiny demos just to sort of show where things were. A question that we've all been thinking about, like fine tuning, right? Like what does this look like if the models don't work out of the box like the stuff we looked at didn't work out of the box for lattice energies? What do we do?

Ameer did want me to do a fine-tuning demo during the, during the presentation and I said I thought that was too ambitious, but he was absolutely right that this is super super important and so it definitely deserves talking about. Yeah, and so basically fine-tuning if you guys don't know is just like you take a pre-trained model like one of these big foundation models and then you say let's train it again on a specific subset of data and the idea is you need much less data than if you just train something from scratch. So you're already.

The model sort of already learns like chemistry, what bonds are, et cetera, et It just needs to specifically study for this specific application. And what people have shown is like here, this is like a benchmark of one of the polymorphs of ice, ice IH. I forget which one that is. And essentially it shows that, if you're training a MACE model from scratch, you know, it takes to get the correct density and energy, you know, it takes you hundreds of examples.

That's not many for a foundation model, for a specific molecule, that's a decent number. You have to be running hundreds of DFT calculations. Whereas if you're fine-tuning from a pre-trained MACE, you actually only like a handful of examples already pretty much gets you to where you want to be. So with 50 examples, you get the density exactly right and you get the potential energy almost right. And then at 100, you're there and you don't need any more data. So I think fine-tuning is a better solution than training your own bespoke models from scratch.

There's code so like this exists. It's pretty easy to do, so this is from the Mace docs, Mace dash docs. You can get to it from the GitHub I linked and there's like just an executable to fine tune on your data set. You do usually want to do multi-head replay so that you don't forget everything you learned previously, so they do recommend that. But yeah, so if you have a GPU and you have some data, you should definitely consider fine tuning a model.

I think it's a really robust way to do it. And then you have more confidence that stuff works for your specific domain.

Audience Member: So when you feed it data, do you just feed it that one best local optimized structure? You also give them some of the steps leading up to it, like the trash file. And it also wants some of the not so good structure, yeah, things.

Corin: I think an important thing with neural potentials is like they're much more robust when they see structures that are not just minima. So a lot of there's been a big move over the past, like maybe six months to like try to do non-equilibrium fitting where you show it structures where there's like some forces, you know, where the forces aren't all zero because you're actually it's just much more information rich. So yeah, I think the tragic file would be good. Another thing that you can do is you can sort of just like randomly shake things up. So you get to add Gaussian noise to every atom, compute all those calculations and get the forces and energies charges if you want. And then like fit to those as well. You know, so that you learn like, oh, if these two get close together, they get really unhappy. And this leads to like much more stable simulations and sort of more robust, like more fault tolerant.

Audience Member: Yeah, Ok. That's I think that some people for like absorption, they just literally bombarded randomly. They fed that to a neural net. And I know the third iteration of Facebook Meta and Carnegie's hackathon was specifically just global optimized structures. So I don't know how they ever did it, but yeah, there's been different discussions about that.

Corin: I mean, it's possible. I think if you're going to be doing optimizations or you're going to be doing like MD, like you're going to be dealing with structures that are not at minima. And like, I think it's important to try to consider all of this thru the information theory lens. Like, am I asking my model to learn things that it just has no way of learning? You know, that's like, think we understand, like, if you only show a molecule, like, certain elements, like, if you don't show it any metals, and then suddenly you start asking it to predict on, like, lead or iron complexes, right? Like, it's not going to be able to succeed at that. Like, that's just, you're asking too much. Like, there's just no, even the best ML algorithm in the world with infinite parameters, I don't think, can just, like, extrapolate to totally unseen elements where it's never seen anything like that before.

I think there's a similar argument where if you've only seen, like, perfect bond geometries, perfect angles, everything, like, just as it should be. And then you start running MD. It's not surprising that you see pretty poor performance in those cases, because the information content of the training data doesn't prepare it for those tasks.

Audience Member: Ok, thank you.

Corin: That's a very qualitative explanation. Someone, I'm sure, could quantify that. But that's what makes sense to me, and I think is borne out by the data.

Ok, so I think it's just worth backing up and like trying to fit this into a conceptual model a little bit. So two slides on conceptual model and then two slides on future directions and then I promise I'm done.

One is just like you know, so for me I said like you know did this. I was doing simulation for years before this was possible and like you know how does this work like we essentially are going from the ability where like you had to wait weeks to get an answer to now saying oh just put it on your laptop like you get an answer right away and sure you know in some cases it's not great but like that's true for DFT as well. It seems like we're able to approach the DFT limit. It seems at least like it should be possible with more data. I think this, like, breaks people's brains a little bit, especially the people who have been doing computation for a long time.

I think it's worth trying to do a metaphor here. This is the Python logo, this is the C logo. We know that Python write is an interpreted language, so you run a Python script, it's pretty slow and then you get a result. With C, obviously you have to first compile it, which is usually slower than just running the script once would be if it's a simple script. You generate some assembly code, right? But then once you've done all the assembly code is laser fast to get the right answer. So hundreds, thousands, etc, 10,000 times faster.

I think we can think of this as going like neural network potentials sort of in a similar way. So if we just have a single structure that we care about, we can solve the Schrodinger equation or an approximation thereof with DFT, right? And then we can get an answer. So here's our, whatever energy this is in eV, I don't know this means, but like just an example of an answer you get from DFT. This could take a few hours. When we do a neural network potential, we're actually doing something that's way, way slower. So we first have to run like 100, 1,000, 10,000 calculations to generate training data. And then we have to train a model. And we get like this sort of like just a bunch of parameters, right? A GNN like you guys have been dealing with.

But then what we've essentially done is we've taken all the information that was in the DFT and we've compiled it into this thing that then we can infer on really fast. And so I think that is, for me, a pretty useful mental model of thinking about how all this is done. I don't know that it's not magic. We just run a bunch of DFT calculations at once, so we don't have to run them individually every single time. And then we essentially learn some sort of representation of all of that knowledge in the parameters and then are able to infer over that.

Another way to think about this is usually when we try to do approximate models like semiempirical theories or force fields, it's theory first. So we think of some approximation like, oh, what if we modeled torsions as a cosine or like, oh, what if we do a 12-6-4 non covalent thing? And then you like you try it out, you run some simulations and then you see if it works. So you might say, oh, well, this works fine for this, but we get the lattice constants wrong, blah, blah, blah, blah. And then you sort of like, well, what if we tried a different theory? So then you think of the theory, you run simulations and see how it works.

With NNPs, it's almost the exact opposite. So first, we take the data that we want to model, and we don't come up with a theory at all. Instead, we just train an NNP to match that data. So instead of being theory first, it's data first. And I think this lets us come up with much, much more flexible and efficient domain-specific theories, almost, that we would never have been able to come up with through manual parameterization.

So if you want to train an NNP that just handles ionic liquids, you can do that, whereas it's very difficult to think of like, what would a DFT functional that can only work on ionic liquids be? You know, like it's, really not easy to think about what that looks like. Um, but just by choosing the data we want to model, we can come up with these like hyper efficient, hyper specific models that have really, really good performance. And, you know, it seems like there's limits to all of these models, you know, like we can't do like crazy, massive metal clusters or electronic excited states.

And I think that that actually is sort of why we're able to get such good performance. So it's not like there's free lunch. It's not that physics is just bad. It's that we're able to apply the sort of really powerful inductive bias through selecting our training data. And that's what gives us such good efficiency. And I think this really does tie in with sort of this big idea for machine learning, which is maybe getting a little bit too famous.

But it's this idea of the bitter lesson, right? That like AI researchers always want to build knowledge into their methods. Whenever you encode some piece of your personal knowledge, it always helps on the short term and you feel smart. But then later on you realize that scaling is always the better solution. So encoding expert knowledge doesn't scale as well as just adding more data. And so we can think of conventional force fields as like we're adding our expert knowledge about bonds and non-conventional interactions and charge terms and all this. And NNPs are sort of the contrasting solution where we just choose a flexible architecture. We dump in a bunch of data. I mean, that actually lets us scale to things we could never do before. And so I think that's actually a pretty, pretty cool…

So where are we at? Like where are the big challenges to the field? So I think one is just like, things aren't accurate enough. So they should be more accurate. I think we'll get there. But like, know, solvent interactions aren't great. Like big bulk materials are sort of still a challenge. There aren't covalently connected. So like non-covalent interactions like these molecular crystals are sort of tough and like everything could just be more accurate.

Sort of a more philosophical question is like how local is, atomistic simulation able to be. So one of the things that we traditionally think of as like a long range effect is like electrostatics. So like charge-charge interactions in electrolytes, you know, et cetera. This is something which some models handle explicitly and others don't. So some you actually learn like a charge function and others you do not. I think this is a big open question on whether that's necessary or whether like if you just scale learning, you can implicitly learn that as well. I this is like really, really hotly debated in the literature right now.

NNPs are a lot faster than DFT, 3, but they're still not as fast as conventional force fields. And so it's very difficult to scale to huge systems like proteins, polymers, like big boxes of liquid, et cetera. A ton of people are working on this. like specialized GPU kernels, coarse graining, better architectures, quantization, distillation, et cetera, et cetera, et cetera. There's probably four papers published on this in the past week that I've seen. So this is like an open question. I think it'll be solved, but I don't know how.

Point four is like very pragmatic, which is that in this case we can actually generate stimulated training data so we can run DFT calculations. And the question is like how do we do that most strategically? So we actually have the luxury that we can generate data in silico in this field. So how do we do that? Like Ameer was saying, do we want to do trajectories? Do we want to do random perturbations? Is it better to have more poses on fewer structures or like one pose for a lot of structures? Is it better to have a ton of like low quality data or like a tiny amount of very high quality data? Should we try to, how do we mix in experimental data? If we have like experimental densities or frequencies, how do we train to that?

Um, so this is sort of fun. I think another question is like, how do we combine these? I think of NNPs as like physics-ish methods because they, they act a lot like physics and they reproduce a lot of the behavior of physics. Clearly they're not fully physical cause they're, it is an ML model. But it's different than like MatterSim or like diffusion models, which are much less physical, I think. And so the question is like, how do we, these two things are, think in many ways, very complementary. So like, how do we combine those various tasks? You know, can we generate trial structures with a diffusion model and then score them with an NMP? I think this is like something that people are, is going to demand a lot of focus and attention.

And then I think the biggest question for people like you guys is just creativity. Like we're able to make a lot of simulation tasks now like a million plus times faster. Like what do we do with that? You know, this is something that people have been dreaming about literally for 100 years in like the quantum physics quantum chemistry regime. Like what, what do we do like this is? This is like such a fantastic time to be in this field.

And I think one of the things sort of our answer to that here Rowan is like we want to build like the Microsoft Word or like the Solidworks for chemistry. You know, back in the day, you needed a big computer to do word processing or like these sort of computational tasks. And it was very clunky and specialized. And now we have these sort of like user-centered applications where, you know, everyone can have this on their laptop. It's just part of how you interface with the world. In our case, we want to take computational chemistry out from the supercomputer and make it something that any scientists can do for their research. And so that's sort of like our vision of what we're trying to build with Rowan and why we're so excited about NNPs because it actually makes it fast enough for us to do this.

But I think there's many things to be excited about beyond this. And I, you know, I'm curious to talk about that or hear what you guys think. Yeah, anyhow, blah, blah, like DFT is everywhere. More people should be able to do it and use it for their research. So yeah, we're a team of five. This is more appropriate for an academic talk. I'm really curious, I think just in the remainder of the time we have just to like discuss this, hear what you guys think and sort of dialogue about like how this can be useful and where for you guys.

So that's all I prepared. Okay, I guess we're clapping. Okay, does anybody have a question? I'll let the audience…

Ameer: Ok, yeah, I guess I'll go. Yeah, you know, I've just for me it's always been the high throughput screening. I'm in lithium sulfur, so we have our adsorbent is what's called polysulfide. So the issue with that is it's it's you know exotic. It's not if you look at Metas big database that they made in there with Carnegie. You know they did CO2, CO, oxygen, ammonia. So those people in those fields, you they had a field day, you know, they got it all done for them by a big, company.

But a lot of us for the rest of us for adsorption, you know, we're sitting here with these, you know, maybe if you're in water treatment, you got some chemical runoff from agriculture, like, yeah, we're, we're kind on our own here. And yeah, the models learn a lot, but you probably have to fine tune that because those are just, you're probably extrapolating or something. And it's just, you don't know how to navigate that field.

And like I don't feel confident to be like just publish this and be like, yeah, this is how I did it. And I think it's, it's worth showing, but I don't know if it's like, like, well, this, you know, this is not legitimate. This is not standard protocol. And it's just like, well, what is the protocol, right? When you're, when you're going into exotic, you know, metal slab, but play a periodic structure. that's, that's been one, one consideration.

Corin: Well, I mean, I think there's, there's two ways that we think of interfacing with computation, right? And one is like much difficult, much more difficult to think about from the publication perspective. One is like the computations come at the end of the project where you learn something from it that you want to publish is like figure four or whatever. That was always the joke in organic reactions, which is like you develop the reaction and then you send it to Ken Houk and he makes you your figure four. So it gets into JACS, you know, like you've already finished the project. You just need some computations to make it look good in that. Like I think you actually can learn a lot that way. I make jokes, but like that it can be useful. And then you need to be really, really careful about like how you run the calculations, how you benchmark it, all that stuff. And you want to adhere to best practices because the calculations are not easily testable.

I think when you're doing calculations for high throughput screening, it's very different because in some way, you're the person who needs to care most about if they're correct. if you find a new hit from a virtual screen, you almost don't even need to publish it. You can publish the method if you want to. It does make you look smart. But fundamentally, you're the person who you need to not be tricking, not the reviewer. Um, because if you, if you find a good result, then you know, works. It's already done what you wanted it to.

Right. So, yeah, I don't know. I think you can look at various papers who have fine-tuned it. There is a consensus around this. Like you probably want to run good DFT calculations on the systems you care about. Probably like r2SCAN is like what's emerging as the standard or PBE, right? With the appropriate cutoffs and everything, you make a set of those, you wiggle them around, you choose some candidate stuff, and then you compare the error against that. And then you can fine tune it, show that the error approaches the reference, or error relative to the reference approaches zero. then I think calculations are never 100%, right? The true proof is always in the experiment. But I think at that point, you've been responsible, in my mind. Yeah.

Audience Member: The next thing, too, is the big, big, big tech players and stuff, just, everybody's focus seems to be on just creating massive swaths of periodic, generating periodic crystal, all these sort of crystal structures or 2D crystal structures. It's just, I know, surely make a million of them and then they'll say, Okay, there's a good 40, 50 here that are good candidate for some application. But I don't, I find like, are we moving towards models that are more applied like more for adsorption more for? I don't know another application, but I just. I find the focus is to make all these crystals and I said you're making too many crystals. We need to apply. We need a bit more application of it. I don't know if that's the next thing.

Corin: I mean, I'm inclined to agree with you, right? Like I think. Frankly, most of the people wandering into this space from Big Tech don't really care about the applications that much. You know, the reason why like Google or like Fair Chem focuses on this, from what it seems to me, or the Microsoft AI, whatever their team is called, I don't think it's that Google plans to open up like a materials discovery business. Maybe that's what they're working on, but that seems unlikely to me. I think it's part of a way of like they see AI as a core competency for these big tech companies, which in many ways it is, and they want to make sure that they're on top of AI in every field and that they're like learning. It's like a gym for them. You know, they're like going to work out at the AI gym to make sure that when they optimize the YouTube ads next, they get it like as good as possible.

So I think they don't Yeah, maybe this is like maybe this is just wrong, but I do think a lot of it is like the reason this stuff gets funded at Google is just like it's sort of interesting and it seems hard and like it's good to have smart AI people working at Google and you have to keep them doing something when they're not doing the YouTube ads. So like I might as well.

So part of the question is like what is the future of this look like? Do we look like a ton of different… is there going to be a lithium-sulfur model fine-tuned for that in the future that's super applications-focused? That's one answer. Another answer is maybe the models will just get big and really, really smart, and then they'll start to work for everything. It'll just be as good at DFT at virtually everything once we hit the 1 billion or the 10 billion parameter mark, and then we won't need specialized models. I think we don't know. I think what we are seeing with... It's very analogous what we're seeing in LLM space.

So I think if we watch the LLM space, we can imagine the dynamics playing out here as well. you know, small models are always better than like fine-tuned small models are always better than like the non fine-tuned small models. But then while someone is fine-tuning a model, someone else is making a bigger model. And then the bigger model is like just as good as the fine-tuned small model and you don't have to fine-tune it. So again, I think we don't really know how that's going to finish playing out. Like, you know, big models are good. Google also dropped their ldistilled small coding model Gemma 3 today, like a couple hours ago, that seems really, really good. So it's sort of tough to know.

And I think the same thing will be true here. My guess is you'll see a spectrum. So there'll probably be like a protein specific model just for proteins. It's like really small and really good at proteins. So can run your like mega long proteins structures. And then you'll have some sort of Swiss-army-knife models that are good for everything. And maybe some, you know, like maybe some researchers will have like hyper specialized models just for their own specific area of work.

Ameer: Yeah, I think like or or about least I think they started an actual physical laboratory where they're doing stuff for CO2. So they they have their own personal end application in mind at least for one project. So that that is interesting to see. Still, I don't think anybody else has any question. Yeah, I'll ask one more. So I don't know if you've seen it for your medicinal chemistry. Kind of seeing here and there and there was this MRS AI event last week. So people are doing these quantum variational and auto encoders to train some other new. I think a neural net potential. And that's for me. I'm like that's at least five plus years ahead. I'm not going to bother yet to learn quantum computation. I don't know if maybe it's closer than I think of. What's? What have you seen?

Corin: Yeah. I'm like, so me personally, I'll just, I don't want this to be the opinion of Rowan. don't want this to be the opinion of like anyone who matters, but like me personally, I put myself down as like pretty skeptical of the quantum computing applications I've seen. So I'll share, I put a blog post right above this where I laid out all my thoughts with sources. like, yeah, I, a lot of people like attribute like, oh, quantum computing is going to be great for drug discovery. Oh, it's going to be crazy. Oh, it's going to be so good for drug discovery. And then you like, you sort of look at it, like the things they actually say and you're like, wow, this doesn't seem like it matters that much.

I think fundamentally, for a lot of the quantum computing applications, it makes you use it much slower and somewhat more accurate. But for many cases, at least in drug discovery, already accuracy of DFT is not the issue, it's the speed and the scale. Yeah, you can get 50 atoms right, but what in drug discovery matters that's 50 atoms, nothing. It needs to be big. You want to model proteins, you want to model membranes. You don't want to model 50 atoms. so NNPs for me is like, the direction of like maybe slightly less accurate, but like much faster. And that to me is like a much, much more interesting direction scientifically.

I don't know the quantum ML stuff. Like the, you were, saying quantum variational autoencoder. I don't know. I've seen a couple of presentations on this and I've never been that impressed because like the regular ML is just pretty good. Like usually the speed of training is not usually the problem. It's like the data and the better quantum training stuff doesn't get you more data.

So at least for science, like I think we're so often data limited that I don't…I don't really see like throwing qubits at the variation autoencoder is giving you that much edge, but maybe maybe people will think of some clever stuff and I'll be. I'll look stupid. Yeah, I don't know. I try to keep an eye on the space, but I have yet to see anything that makes me think like oh wow, that's awesome.

Ameer: I'm sort of similar. I just, I look at it and I look at the, their circuits look like spreadsheets of music or whatever. And I'm like, are we going with this?

Corin: It's cool. It's undeniably cool science. Reading it all, it's like, it's really impressive. I just don't really, at least for like chemistry, I don't really see the urgent point. Like I just don't really see that like the application, you know, it doesn't seem to me like it's a billion dollar opportunity. I do think on the variational autoencoder point, I think representation learning for molecules and materials is like pretty underrated.

This is not an NNP task, trying to figure out better data-rich representations of things, I think, is still underrated and deserves more attention. It's something we've bounced around here at Rowan trying to figure out how to do that well. I don't think we don't have any genius ideas yet. One of the things people have talked about is trying to take an intermediate layer of an NNP and use that as a representation, as an embedding for a molecule. And I think even taking that step back, interpretability, what are these things doing? What can we learn? If an NNP is learning chemistry, can we crack its head open and figure that out at all? People have done that a bit for the protein language model space, but not yet in the NNP space. So I imagine people are working on that and I'm excited to see what they come up with because it might be cool.

Audience Member: Thank you for presenting. Can you hear me?

Corin: I can

Audience Member: You mind showing your website and kind of doing like a quick preview?

Corin: Yeah, for sure. Let me let me do that.

Audience Member: Is it free to use?

Corin: It is free.

Audience Member: No way, nothing is paid?

Ameer: Well, there is a premium, but you can also get more credits as a student. So what we do?

Corin: Yeah, so we try to give a lot of time to academics. We can't give away unlimited free compute because that's a great way to run out of money as a business. And letting people run unlimited DFT on your website is, it's an expensive proposition. But most of, we have a ton of graduate student users and most of them never hit their limits. So we're trying to make it, we're trying to give away as much as we can without running out of money, basically.

So this is the labs.roanside.com. You can go to, I'll put this link in here. You can just sign up with, or sign in with your Gmail. You don't have to like give us your password or anything if you don't want to. Not that we would steal it, but you know.

And we have a lot of essentially different workflows here that you can you can pin or do things with so we can predict redox potentials, bond strengths, spin states, solubility. This is kind of a fun one actually.

Ameer: This is new by the way.

Corin: Oh yeah, this is pretty new. Oh no, hold on, I clicked the wrong button here. This is so new I can't remember how to do it.

So we can like, that we have an ML model here for predicting solubility. That's kind of cool. And so the idea here is it's sort of supposed to be like a Google Drive for computational chemistry. So to make it easy enough to use that you know, you don't have to be an expert in using the software. You just have to know what's going on with the chemistry. So we try to document everything we're doing here like in various log files and make it just like transparent and reproducible for the. For the world.

I'm trying to think what else. Oh yeah, so here like we you can store everything in folders. So this is the calculations for the workshop here that I did just in preparation to try them. This is like what the transition state looks like for this. I mean, it's supposed to just sort of all the goal is that it all just sort of works. So like it's fast, it works out of the box and you don't need to like waste time trying stuff out or like fighting your software.

Yeah, I'm trying to think what other demos there's like you can… the visuals are like you can tune the visuals. You can do all sorts of stuff. You can load and edit like 2D and 3D structures. Oh, yeah. So here's the predicted solubility for this molecule in like a variety of different solvents. So like it's supposed to be soluble in ethanol. It's not supposed to be very soluble in hexanes, which makes sense. And this works on polymers too, for polymer lovers out there. Yeah, I'm trying to think.

I think this is my, that's my ultra fast demo. You can go to our website. We have video walkthroughs for a lot more stuff if you do want to dig in and we have a documentation site where we document how everything works for SIs and stuff. So it is. We are doing things in a way that we hope is like rigorous and reproducible.

Audience Member: Ok, thank you. Appreciate it. Looks very cool. I have one other question. You mind going back to your slideshow to show off your team?

Corin: Oh yeah, for sure. Wow, so many questions that I like, I was not expecting. Trying to make sure this is useful and not just talk about me and our company the whole time. But yeah, so our team is five people. So here we are or here some of us are. This is my co-founder Ari, who handles the business and front-end and graphic design side. Jonathon, who was at Schrodinger Material Science, former polymer chemist. Spencer, who's a software engineer from Meta, who keeps everything working. And then Eli, who is in charge of training our own models and the whole ML side of things. And then this is me in my short shorts. I do mostly a lot of the chemistry and drug-design facing stuff.

Yeah, and we've had the privilege of working with a bunch of collaborators on various things. So Merck, MIT, Harvard, Michigan, Michigan State, Colorado State. So we're working on a lot of collaborative projects with various academics trying to benchmark or develop new models. So yeah, just trying to build useful tools and stay on top of the space. And hopefully one day make something that helps people do science better.

Audience Member: Thank you. Appreciate it.

Corin: For sure. We also work with classes, so we've been in a lot of classes are using us now for like helping students run calculations. So if you're a TF and you want to do that for a piece that or something happy to email and we can try to set that up. I don't know if that's relevant at all for what you guys do, but you know, it's kind of nice.

Ameer: Yeah, we have a couple of courses or one or two graduate. Modeling courses for that.

Corin: Yeah, I know we partnered with the MIT like physical organic course. It's just kind of nice. mean, I think computations right now in like over a third of like papers, at least in JACS, the American Chemical Society Journal. like, it's like, so I felt like when I was in school, we basically didn't cover computations at all. And then when you start doing research, you're like, why does every paper have computations in it? And I was not taught anything about this virtually.

So it's like, think part of it is it's tough to integrate, right? Because it's tough to teach someone to use VASP or Gaussian or whatever in the, like in a week. And so, you know, Rowan at least solves the problem of being really, really hard to use. So you can give an undergraduate who's never run computations before, like a link to one of our YouTube videos and they can figure out how to do it, which I think is cool. So, and ideally it works sort of as a virtual lab a little bit. I think you do conceptually learn a lot about chemistry when you start having to run calculations, cause you actually have to think about where all the atoms are.

Which sounds stupid, like you don't always have to think about that when you're just drawing things on paper. You can have a little bit of magical thinking and like computation sort of hold a mirror up to that and like force you to think everything through in such detail.

Ameer: Ok. I think that's it for questions. I don't see any questions from the virtual audience. Nobody's raised their hand. I know a lot of people don't talk. I think experimentalists, so they don't want to ask as they're scared of looking like they don't know.

Corin: I mean, that's the whole point of Rowan is to make software that experimentalists can use. Because I did a PhD with all experimentalists. I was like 50/50 and like they couldn't do any calculations. So our hope is that we can fix that someday, right?

That's the whole point is like, you guys are very smart people. I can't see your smiling faces, but I'm sure they look very smart. You should be able to use tools for your research without someone yelling at you or you having to learn how to program in Fortran. So that's the goal.

Thanks so much for the time and feel free anyone to reach out. I'll send my email in the chat or Ameer can give it to you. I would like I'm happy to talk to anybody, answer questions, or just, you know, just bounce ideas around. So yeah, if I can be useful, please let me know.