Running Rowan's Descriptors Calculation Workflow

Transcript

Hi, I'm Ari, one of the founders of Rowan, where I lead product and strategy. In today's video, I'm going to be talking about Rowan's descriptors calculation workflow. We'll go over how to submit a workflow, what the workflow does, and how you can use the results, both in Rowan and export them to use in further projects.

Now descriptors, at a very high level, are a way of generating a feature vector for your molecules so that you can run data science and machine learning workflows on your molecules. And we think that this is especially good when you're working with properties that are hard to run physics-based simulations to predict.

So to submit a descriptors calculation workflow on Rowan, we'll go ahead and click "descriptors calculation" and we'll upload the files that we're going to be looking at today. For this example, we're just going to look at this super basic set of alcohols. You can see that there's nothing fancy here, and that these structures loaded in all right.

And so we'll just go ahead and submit these six molecules, and we can submit them all at once, and they'll start running. And what's gonna happen is Rowan is going to use Mordred to generate some molecular graph and conformer-dependent features, as well as XTB to generate some per-atom descriptors.

These are small molecules and the descriptors workflow super fast, so it's already finished running. We'll go ahead and hide the sidebar, and we can view these results. So at a glance, you see that Rowan gives you this table view. You can sort by any descriptor. You can look and see what the descriptor actually calculates. And of course, you can filter.

I think we should look for hydrogen bond donors today. So I'll just type in "hydrogen bond." You can search, and we can see that there are two features related to hydrogen bonds. That's this hydrogen bond acceptor feature and this hydrogen bond donor feature. When we're working with these features, often we'll wanna take it and use it somewhere else. And so you can always download all of this data as a CSV and load it into whatever software you're using for data science or machine learning.

One cool thing though is that inside of the Rowan platform, you can run principal component analysis on a library of descriptors that you've calculated. And PCA is a statistical method that will cluster our data. So we're looking at PCA 1 and 2 and that gives us this nice XY plot. And it spreads our data out so we can see similarities and differences and make sure that whatever library we're working with covers chemical space nicely.

And so you can see that one thing that's really separating our compounds in this example is just this atom bond connectivity index. And if we want to see the actual values for each of these data points, we can click on this feature and it'll color. And by hovering over the points, we can sort of visually inspect and get a sense for, okay, what's contributing to the variance in my data right now? And if we want to look at number of hydrogen bond donors, again, we can do that too. And we'll see the data colors nicely. And, you know, with this basic example, of course, this isn't super impressive, but I think it is really cool. And if you want to work with these PCA coordinates outside of Rowan, you can download the data as a CSV right here again.