In this tutorial we will cover:
how we can optimize conformations in biobuild
When building a molecule we don’t want to have any clashes in there. While Biobuild tries its best to avoid clashes during building, sometimes they still happen, especially when merging larger structures together.
To get rid of them again, Biobuild offers a suite of optimization techniques to obtain better conformations for your molecules. Biobuild optimized molecules by rotating around bonds within the structures. This ensures that the resulting conformations are never skewed and is much faster to compute than classical “atom wiggling”. The downside of it is that small changes in rotational angles can have large effects on the evaluation of a conformation, which can be challenging for some algorithms.
We will go through how we can optimize molecules in Biobuild and discuss some good and bad practices.
Short summary#
Biobuild allows users to customize optimization workflows for better results. We encourage users to do so in order to improve the results. However, users not wishing to go to the trouble can make use of default settings and use short-cuts to get “one-line” optimized structures.
Let’s work on the following molecule:
[1]:
import plotly
plotly.offline.init_notebook_mode()
[2]:
import biobuild as bb
mol = bb.molecule("files/x_wing.json")
mol.show()
Quick Optimization#
Each Biobuild Molecule has an optimize method that we can call. This will perform all necessary steps automatically for you. Naturally, this method is very convenient, but not nearly as precise as an optimization setup that we create manually. Still, for most molecules that are not too large and not too crowded it should perform reasonably well. Also, we can modify the default settings, thus guiding the optimization to some degree.
If we want to improve the conformation we can simply call on the optimize method and that’s it…
[3]:
mol_opt = mol.optimize(inplace=False)
mol_opt.show()
As you can see, the structure is different now. The x-shaped branches are further away now, which is why we like to call this molecule the “X-Wing” (from Star Wars) :-)
Of course that the call to optimize worked so nicely is in part because the molecule is easy to optimize and in part because the default settings are tailored to molecules of this size and complexity. If you have a molecule that is markedly larger of much more branched you may find the output of a blank optimize-call to look less pleasing.
So, let’s check out how we can customize the optimization procedure abit. To that end we will look at a full custom workflow and in the end circle back to how we can use Biobuild’s functions and methods to automate parts of the process.
Custom Optimization#
The workflow of setting up a custom optimization workflow is not difficult, but involves a number of decisions we need to make.
Decide on a graph representation of the molecule (AtomGraph or ResidueGraph)
Choose bonds around which to rotate when optimizing.
Decide on an environment to use for evaluating conformation candidates
choose an optimization algorithm to solve our environment
Graphs#
Biobuild has two types of graph representations for molecules: the AtomGraph (which every molecule uses to handle its connectivity), and the ResidueGraph. The ResidueGraph is an abstraction of the molecule with a subset of nodes to approximate residues’ shapes. For optimization, AtomGraphs are much more precise than ResidueGraphs (after all, ResidueGraphs are always missing some atoms) and thus it is more likely that you may find some clashes in optimized structures that worked with the
Residuegraph than with an AtomGraph. However, the ResidueGraph is faster to compute. In most cases a ResidueGraph representation will suffice for getting a good result! First try with a ResidueGraph, you can always switch later on…
In our example X-Wing molecule we will use a ResidueGraph.
[4]:
graph = mol.make_residue_graph()
# make sure to populate it with a
# subsample of atom nodes!!!!
graph.make_detailed()
graph.show()
Environments#
Biobuild implements a torsional optimization scheme. That is to say, Biobuild rotates around bonds in the structure until it finds a nice conformation. We implemented three different Environments called Rotatrons to handle this. The three Rotatrons are:
DistanceRotatronThe universally usable environment that performs well with both small and very large structures. It uses purely geometric considerations and is the fastes environment to work with.
OverlapRotatronThis environment fits multi-variant Gaussians to approximate the structure and minimizes overlaps between the fitted distributions. It performs well with ResidueGraph inputs. It can be very fast on smaller molecules and can work with large structures but is usually slower than the DistanceRotatron.
ForceFieldRotatronThis environment uses RDKit to actually compute a conformation’s energy. This environment is the slowest but very accurate when working with AtomGraphs. If given a ResidueGraph the environment is usually useless, however! Technically, the environment works with large structures as well, but keep the computation times in mind! It often does not outperform the DistanceRotatron!
For this example we will use the default DistanceRotatron. We can set it up like this:
[5]:
env = bb.optimizers.DistanceRotatron(graph)
If you look at the arguments the DistanceRotatron can accept you will see the second argument rotatable_edges. This expects a list of bonds from the molecule (edges from the graph). We do not need to specify anything there if we want the Rotatron to automatically identify all edges that can be rotated around. However, it is good practice to think of the bonds beforehand, because most likely not every bond will need to be considered during optimization. The fewer bonds to evaluate, the
quicker the optimization!
We can either visually inspect the structure, find parts that we don’t like and manually identify some bonds we want to optimize for, or we use the graphs’ find_rotatable_edges method that allows us to perform some fine-tuning in selecting the edges.
[6]:
# find edges that have at least 10 nodes downstream (nodes that will be affected by rotation)
# when directed so that all edges point away from the central node (if we have the rotate the wrong parts we might implode the molecule!)
edges = graph.find_rotatable_edges(graph.central_node, min_descendants=10)
len(edges), len(env.rotatable_edges)
[6]:
(16, 18)
So, with our manual selection of edges we were able to save two edges to compute, because the Rotatron automatically found 18 edges to optimize, while our own filtering reduced the number to 16. The save is not large, but it can be substantial in larger structures!
Let’s make a new environment with the edges we like…
[7]:
env = bb.optimizers.DistanceRotatron(graph, edges)
Hyperparameters of the DistanceRotatron#
The DistanceRotatron uses an evaluation function that favors extending the structure to maximize spacial occupancy. This is driven by two factors called unfold and pushback. Unfold guides the global unfolding and gives weight to increasing the global distance from one node to all others. Pushback adds weight on a local basis, stressing that close-by nodes must not get too close. There is one more important parameter that we might want to consider, that’s the radius. To save some
computations the DistanceRotatron will only consider node-distances that are within the radius. If the radius is very large additional weight is given to unfold automatically because far-away nodes will impact the average distance. On the other hand, small radii automatically favor pushback.
In fact, the art of using the DistanceRotatron is usually to find a suitable combination of these parameters that work well for your structure! As a general comment, first try tweaking the pushback a little values between 1-5 often work well, but feel free to experiment even with values below 1. Unfold should not be raised too much since it is used as an exponent during computation! Values between 1-4 are probably enough in most cases. As for the radius, values between 10-25 have worked well in our experiments.
Keep in mind, there are default values for all these parameters, so we do not have to set our own. But the moment the molecules we want to optimize become larger and highly branched, we definitely want to tweak around with them. There are also more parameters than the three mentioned above, be sure to quickly read through the docstring of the Rotatron.
Optimization algorithms#
The Rotatron environments hold the problem, not the solution. In order to get a solution we need to apply an optimization algorithm. Biobuild offers four pre-implemented algorithms that can be called directly:
swarm_optimize: A particle-swarm optimization (PSO). This is usually very fast and performs very well even on larger structures.genetic_optimize: A genetic algorithm. This is often the slowest but often performs very well. However, it is often outperformed by the PSO, especially in terms of computation speed!anneal_optimize: A simulated annealing algorithm, for those who like both PSO and genetic algorithms. If is slower than the PSO but performs on a similar (maybe slightly worse) scale.scipy_optimize: A gradient-based optimization through all of scipy’s implemented optmizations. This one sometimes performs really great, sometimes really badly, depending how “rugged” the evaluation space is. The larger the molecule, the less likely you want to use this one!
Let’s say we want to stick with the good-ol’ PSO to optimize our molecule, we can do this like so:
[8]:
# solve the optimization problem
sol, _eval = bb.optimizers.swarm_optimize(env, n_particles=30)
# sol is a numpy array of angles
# _eval is the final (best) optimization score
PS: notice the
n_bestargument. If you provide a higher number, let’s say5, you will get the five best solutions and can thus generate 5 new conformers for the price of running one optimization!
Once we have angles, we need to rotate our molecule itself. We can use the apply_solution function to do this like so:
[9]:
# apply the solution to a copy of the molecule
mol_opt_manual = bb.optimizers.apply_solution(sol, env, mol.copy())
# show the optimized molecule
mol_opt_manual.show()
Combining custom and quick#
So, now we have seen how we can manually set up a complete optimization workflow. In parctice we do not need to perform each step that thoroughly, however. For once, we can use the optimize function to pass a Rotatron we want to optimize, specify an algorithm, and a molecule, and at least the entire “apply-solution” procedure is handled for us automatically. We can also pass Rotatron environments to the Molecule.optimize method as well. If you check out the docstrings of the optimize
function and method you will see how easily we can specify our own settings to fine tune optimizations without needing to setup everything on our own.
As one final example let’s optimize the molecule again but using a higher unfold=3.
[10]:
# optimize the molecule using the DistanceRotatron (default anyway, so actually no need to specify)
# and the swarm algorithm (default as well)
# but this time, pass unfold=3 to the rotatron (default would be 1)
mol_optimized_high_unfold = mol.optimize(rotatron="distance", algorithm="swarm", rotatron_kws={"unfold":3}, inplace=False)
# show the optimized molecule
mol_optimized_high_unfold.show()
So that’s it, it doesn’t really look all that different from the others…
Anyway, this concludes this little introduction to conformational optimization with Biobuild. Of course we only covered the use of one Rotatron environment, but using the others is no different - only hyperparameters change…
Thank you for reading all this way and good luck with your project involving Biobuild!