Conformational Optimization#

In this tutorial we will cover:

  • how we can optimize conformations in BuildAMol

When building a molecule we usually prefer having conformations without clashing atoms. While BuildAMol tries its best to avoid clashes during fragment assembly, sometimes they still happen, especially when merging larger structures together.

To get rid of them again, BuildAMol offers a suite of optimization techniques to obtain better conformations for your molecules. BuildAMol optimized molecules by rotating around bonds within the structures. This ensures that the resulting conformations are never skewed and can be faster than classical “atom wiggling” techniques. The downside of it is that small changes in rotational angles can have large effects on the evaluation of a conformation, which can be challenging to optimize for some algorithms.

We will go through how we can optimize molecules in BuildAMol and discuss some good and bad practices.

Let’s work on the following molecule:

[1]:
import plotly
plotly.offline.init_notebook_mode()
[2]:
import buildamol as bam

mol = bam.Molecule.from_json("files/x_wing.json")
mol.show()
/Users/noahhk/anaconda3/envs/glyco2/lib/python3.11/site-packages/plotly/express/_core.py:1985: FutureWarning:

When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.

Overview#

Each Molecule is equipped with an optimize method which can be used to optimize it’s conformation. A number of customizations can be optionally passed to this method, or it can be run without any arguments using default settings. Alternatively, there is the optimizers.optimize function which can be used to optimize a molecule.

Energy Minimization with RDKit#

As a general note, BuildAMol relies on heuristics to optimize conformations and does not perform a direct energy minimization! Naturally, however, an energy optimization would yield more accurate results. Thus, if you have the computational power use RDKit’s energy minimization to optimize your conformers. BuildAMol offers a quick-link the form of the optimizers.rdkit_optimize function or the argument Molecule.optimize(algorithm="rdkit") when using the Molecule’s optimize method to perform energy minimization using RDKit. If for some reason this is not feasable or takes too long, you can rely on BuildAMol’s less accurate but more robust optimization techniques.

[3]:
# setup some visualization to compare before-after
v = bam.utils.visual.MoleculeViewer3D()
v.draw_edges(*mol.get_bonds(), color="red", linewidth=2, showlegend=False)

# optimize with RDKit
rdkit_optimized = mol.optimize(algorithm="rdkit")

# visualize the optimized molecule
v.draw_edges(*rdkit_optimized.get_bonds(), color="blue", linewidth=2, showlegend=False)
v.show()

Optimization without RDKit#

If we run the Molecule.optimize method without any arguments, it will default to using a rotational optimization environment named the DistanceRotatron to optimize the molecular structure. This environment will rotate around bonds and try to find the conformation that will maximize inter-atomic distances. There are three rotatron environments available:

Rotatron Optimization Environments#

Let us talk some more about BuildAMol’s optimization environments. BuildAMol implements a torsional optimization scheme. That is to say, BuildAMol rotates around bonds in the structure until it finds a nice conformation. We implemented three different Environments called Rotatrons to handle this. The three Rotatrons are:

  • DistanceRotatron

    • The universally usable environment that performs reasonably well with both small and very large structures. It uses purely geometric considerations and tries to maximize inter-atomic distances. It uses two hyper-parameters named pushback and unfold. They can be tuned to give more weight to either “globally unfolding/extending” a structure (at the potential cost of incurring local clashes), versus focussing on avoiding local clashes while sacrificing “the big picture” of structure extension. It is usually not necessary to fiddle around with them but but it may be necessary for large and highly branched molecules!

  • OverlapRotatron

    • This environment fits multi-variant Gaussians to approximate the structure and minimizes overlaps between the fitted distributions. It performs well with ResidueGraph inputs. It can be very fast on smaller molecules and can work with large structures but it has a tendency to run into trouble when the gaussians are too far apart because in this case the overlap converges to zero, which makes optimization nearly impossible. To mitigate this effect one can adjust the hyper-parameter artificial_spread which will articially enlarge the gaussians to allow for a longer range overlap computation.

  • ForceFieldRotatron

    • This environment actually uses RDKit to compute a conformation’s energy. This environment is the slowest but very accurate when working with AtomGraphs. If given a ResidueGraph the environment is usually useless, however (after all a ResidueGraph is not a valid chemical structure so optimizing this metric yields no good results)! Technically, the environment works with large structures as well, but keep the computation times in mind!

We can describe the conformational optimization as either of these three problems to be optimized by an optimization algorithm.

Optimization Algorithms#

Four optimization algorithms are available as stand-alone functions. These are:

  • swarm_optimize: global-best particle swarm optimization

  • anneal_optimize: simulated-annealing algorithm

  • genetic_optimize: a genetic algorithm (usually the slowest of all algorithms)

  • scipy_optimize: a scipy-forward to a stochastic-gradient descent optimization (could be changed to any other scipy-implemented algorithm). This one can struggle with very rugged optimization landscapes so it may not perform as well as a particle-swarm.

They will sample rotational angles to optimize a molecular conformation and eventually return one or multiple solutions.

Quick Optimizations using the Rotatron Environments#

The optimize function and method accept arguments for which kinds of rotatron to use and what algorithms to employ when finding solutions. So, if we want to improve the conformation using a DistanceRotatron and a particle-swarm optimization we can simply call on the optimize method and that’s it…

[4]:
# optimize with the swarm algorithm and the distance rotatron
improved = mol.optimize(rotatron="distance", algorithm="swarm")

# and add the optimized bonds to the visualization
v.draw_edges(*improved.get_bonds(), color="green", linewidth=2, showlegend=False)
v.show()

The optimize method is convenient because it does all the environment-setup for us. However, there may be cases where we want or need to be precise in what we want to optimize - maybe we only care about one particular part of a molecule or the molecule is very large so optimizing all of it in full-detail would take a long time. Let’s therefore discuss how we can make a fully custom optimization workflow

Custom Optimization#

The workflow of setting up a custom optimization workflow is not difficult, but involves a number of decisions we need to make.

  1. Decide on a graph representation of the molecule (AtomGraph or ResidueGraph)

  2. Choose bonds around which to rotate when optimizing.

  3. Decide on an environment to use for evaluating conformation candidates

  4. choose an optimization algorithm to solve our environment

BuildAMol does not actually optimize “Molecules” but instead it optimizes moleculars “graphs”. BuildAMol has two types of graph representations for molecules: the AtomGraph (which every molecule uses to handle its connectivity), and the ResidueGraph. The ResidueGraph is an abstraction of the molecule with a subset of nodes to approximate residues’ shapes. For optimization, AtomGraphs are much more precise than ResidueGraphs (after all, ResidueGraphs are always missing some atoms) and thus it is more likely that you may find some clashes in optimized structures that worked with the Residuegraph than with an AtomGraph. However, the ResidueGraph is faster to compute. In most cases a ResidueGraph representation will suffice for getting a good result! If your molecule isn’t highly branched, you may try with a ResidueGraph, you can always switch later on…

We can either select specific bonds that we care about or use methods to obtain subsets of bonds. Both types of graphs have methods that will help you obtain edges / bonds around which to optimize. Of interest are probably these methods:

  • <GRAPH>.find_rotatable_edges: as the name implies

  • <GRAPH>.sample_edges: cluster and sub-sample edges to reduce the number of edges to optimize

  • <MOLECULE>.get_residue_connections: find bonds that connect different residues / fragments

  • <MOLECULE>.get_bonds: get particular bonds (e.g. maybe from a particular residue)

Example: Optimize the ResidueGraph of our Molecule#

Let us optimize the X-Wing molecule using it’s ResidueGraph-representation and the DistanceRotatron again from before…

[5]:
# step 1: make a residue graph
# (IMPORTANT: We must make it detailed to also have some atoms in there.
# Otherwise there would be no valid rotatable edges in the graph...)
graph = mol.make_residue_graph(detailed=True)

# step 2: get some edges to optimize
# (we will just take all residue-connections here)
edges = mol.get_residue_connections()
edges = graph.direct_edges(edges=edges) # important to make sure the edges do not cancel out each other in the optimization!

# step 3: make the rotatron environment
env = bam.optimizers.DistanceRotatron(graph, edges)

# make a new visualization for before-after
v = bam.utils.visual.MoleculeViewer3D()
v.draw_edges(*mol.get_bonds(), color="red", linewidth=2, showlegend=False)

# step 4: optimize the molecule
# (here we use the stand-alone optimize function!)
optimized = bam.optimizers.optimize(mol, env, algorithm="swarm")

# and add the optimized structure to the visualization
v.draw_edges(*optimized.get_bonds(), color="blue", linewidth=2, showlegend=False)
v.show()

Example: Optimizing the ResidueGraph using an OverlapRotatron#

Let us also showcase the OverlapRotatron. This time we will also sub-sample the rotatable edges to reduce the search space even further and thus make the optimization faster…

[6]:
# step 1: make a residue graph
graph = mol.make_residue_graph(detailed=True)

# step 2: get some edges to optimize
# (we filter for edges with a certain number of descendant nodes)
edges = graph.find_rotatable_edges(min_descendants=2, max_descendants=20)

# step 3: make the rotatron environment
env = bam.optimizers.OverlapRotatron(graph, edges, artificial_spread=5)

# step 4: optimize the molecule with the swarm algorithm
optimized = bam.optimizers.optimize(mol, env, algorithm="swarm")

# and visualize the optimized molecule
v.draw_edges(*optimized.get_bonds(), color="green", linewidth=2, showlegend=False)
v.show()

Example: Optimizing with the ForceFieldRotatron#

Finally, let’s showcase the ForceFieldRotatron. We will optimize both the ResidueGraph and AtomGraph to compare the outputs…

[9]:
# step 1: make the graphs
atom_graph = mol.make_atom_graph()
residue_graph = mol.make_residue_graph(detailed=True)

# step 2: get some edges to optimize
# (the residue connections will be in both graphs, so we can use them for both)
edges = mol.get_residue_connections()
edges = atom_graph.direct_edges(edges=edges)

# step 3: make the rotatron environment
atom_env = bam.optimizers.ForceFieldRotatron(atom_graph, edges)
residue_env = bam.optimizers.ForceFieldRotatron(residue_graph, edges)

# step 4: optimize the molecule with the swarm algorithm
atom_optimized = bam.optimizers.optimize(mol.copy(), atom_env, algorithm="swarm")
residue_optimized = bam.optimizers.optimize(mol.copy(), residue_env, algorithm="swarm")

# and visualize the optimized molecules
v = bam.utils.visual.MoleculeViewer3D()
v.draw_edges(*mol.get_bonds(), color="red", linewidth=2, showlegend=False)
v.draw_edges(*atom_optimized.get_bonds(), color="blue", linewidth=2, showlegend=False)
v.draw_edges(*residue_optimized.get_bonds(), color="green", linewidth=2, showlegend=False)
v.show()

Okey, seems like this time the optimization worked even with the residue graph, but don’t trust it! The only meaningful input for the ForceFieldRotatron is an AtomGraph!

Anyway, that’s it for this tutorial on the conformational optimization suite. Hopefully you are now able to optimize your molecules using BuildAMol! Also check out the conformational sampling tutorial which is directly related to what we have discussed here.

Thanks and good luck with your project using BuildAMol!