Building Workflow#

In this tutorial we will cover:

  • the basic “workflow” of building a larger molecule in BuildAMol from smaller components

Building molecules is a straightforward task in BuildAMol. It involves always the same steps:

flowchart TB
  node_1["identify suitable sub-structure A"]
  node_2["identify suitable sub-structure B"]
  node_3["identify what you want to build"]
  node_4["identify atom to connect to B"]
  node_5["identify atom to connect to A"]
  node_6["identify atom(s) to\nremove when connecting to B"]
  node_7["identify atom(s) to\nremove when connecting to A"]
  node_8(("connect"))
  node_9("repeat")
  node_3 --> node_1
  node_3 --> node_2
  node_1 --> node_4
  node_2 --> node_5
  node_1 --> node_6
  node_2 --> node_7
  node_6 ---> node_8
  node_4 ---> node_8
  node_5 --> node_8
  node_7 --> node_8
  node_8 --> node_9

As for steps 1 and 2, finding a structure to build and identifying subcomponents - that’s on you, BuildAMol cannot help you there! As for steps 3 and 4, thats where BuildAMol can start to help you. How do you identify which atoms to connect and which to remove? Well, by looking at the structures, of course!

If BuildAMol has a linkage already defined for your specific task, great! But most of the time we want to build custom molecules for which the CHARMM force field does not conveniently have a patch defined. So we need to define the linkage ourselves. We can do this by looking at the structure, checking out what the atoms we want to connect are called, and then define our linkage. We can either connect molecules using the Molecule.attach method where we can directly provide the information regarding atoms-to-connect and -remove, or we first set up a Linkage which can then use instead.

The advantage of setting up a Linkage first is that we can then re-use the same linkage time and time again, without needing to keep filling in the information manually. We can even add custom linkages to the default settings to make them permanently available whenever we use BuildAMol (checkout the tutorials regarding built-in resources and setting defaults).

Working by example is usually best, so let us build the following molecule:

image.png

We can start by identifying the different components that we might want to work with. For example we can choose these:

image0

With our components decided, we can start to build. The first step is getting the molecular structures we need. PubChem is sure to have the structures available, so we are not going to load any built-in structures.

[1]:
import plotly
plotly.offline.init_notebook_mode()
[2]:
import buildamol as bam

# get the molecules we need
phenyl_methanol = bam.molecule("phenylmethanol")
aldehyde = bam.molecule("butyraldehyde")
amine = bam.molecule("methoxyethanamine")
naphthalene = bam.molecule("1,2,4a,8a-tetrahydronaphthalene")

# because PubChem structures use a simple enumeration rather than hierarchical atom labelling
# we can (optionally) call `autolabel` to assign more descriptive atom labels which is useful for
# defining linkages
phenyl_methanol.autolabel()
aldehyde.autolabel()
amine.autolabel()
naphthalene.autolabel()
[2]:
Molecule(1,2,4a,8a-tetrahydronaphthalene)

Now that our molecules are all ready, we can start building. We start by attaching the phenyl_methanol to the aldehyde. To do this, we first visualize both structures to identify the atom ids of the atoms we want to connect and remove in both cases.

[3]:
# visulize the methanol
phenyl_methanol.draw2d(atoms="id").draw(draw_hydrogens=True)
[3]:
../_images/examples_building_workflow_11_0.png

From the inspection we can tell that the Oxygen is called O7, it is connected to C7 and HO7. C7 has two hydrogens called H71 and H72. That’s all we needed to know at this point. Next up is the aldehyde:

[4]:
aldehyde.draw2d(atoms="id").draw(draw_hydrogens=True)
[4]:
../_images/examples_building_workflow_13_0.png

Creating Linkages manually#

We can tell that the tail carbon is called C4 with its hydrogen atoms H41, H42, H43. The carbon before that was called C3 and it had hydrogens H31 and H32. In fact, the labelling is quite intuitive, and that’s thanks to autolabel. It uses a hierarchical ranking scheme to determine atom labels, starting with the highest ranked carbon and then iteratively labelling lesser carbons and hetero atoms. As a rule of thumb, start with the most connected carbon in a circular structure, that’s C1, then try forming the longest possible chain of consequtive carbons. If you reach a dead-end find the next most connected carbon again and continue from there. Using this scheme, you won’t even need to look at small structures like these to know their atom labelling.

Equipped with this information we can attach the aldehyde to the methanol like so:

[5]:
# define a linkage for use with bam.connect
# (let's say we don't care about stereochemistry and therefore don't specify and atoms to delete.
# In this case just one of the hydrogens of C7 and C3 will be deleted, which one is random; we can control which one's though, if we wanted to).
link1 = bam.linkage(atom1="C7", # the atom in phenyl_methanol to connect
                    atom2="C3", # the atom in aldehyde to connect
                   )

# now connect the molecules
mol = bam.connect(phenyl_methanol, aldehyde, link1)
[6]:
# visualize the result (no hydrogens this time)
mol.draw2d(atoms="id").highlight_bonds(mol.get_bonds("C7", "C3"), color="cyan").draw()
[6]:
../_images/examples_building_workflow_18_0.png

And there we have connected the two fragments by a new bond between atoms C7 and C3. And this is what it looks like in 3d:

[7]:
mol.draw3d().draw_bond(*mol.get_bonds("C3", "C7")[0], linewidth=5, color="cyan").show()
/Users/noahhk/anaconda3/envs/glyco2/lib/python3.11/site-packages/plotly/express/_core.py:1985: FutureWarning:

When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.

Now we can go about attaching a second phenyl-methanol to the structure. This time by connecting it’s oxygen O7 to the C1 of the aldehyde. We will use the method-syntax Molecule.attach this time, as opposed to the function connect. Assuming that we do not actually care much about which Hydrogen atom is getting replaced (i.e. we don’t care about stereochemistry) or if there is only a single Hydrogen atom to replace anyway, we do not actually need to specify any atoms to delete. If we leave them blank, BuildAMol will just look for the first Hydrogen it can find.

[8]:
# make a second linkage
# this time both atoms only have one hydrogen so we don't need to specify which to delete either
link2 = bam.linkage(atom1="C1", atom2="O7")

# connect a second phenyl-methanol
mol.attach(phenyl_methanol, link2)

# visualize the result
mol.draw2d(atoms="id").highlight_bonds(mol.get_bonds("C1", "O7"), color="cyan").draw()
[8]:
../_images/examples_building_workflow_22_0.png

Now it’s time for the amine to join the show. We can savely predict that the highest connected carbon is the one between the nitrogen and oxygen, so we can predict that the N will be called N1 and the O O1. But let’s look at the structure all the same to be sure:

[9]:
amine.draw2d(atoms="id").draw(draw_hydrogens=True)
[9]:
../_images/examples_building_workflow_24_0.png

Phew, lucky we checked the structure. The oxygen was called O3 because hetero atoms are labelled by the highest labelled neighbor. Nevertheless, we learned the next connection will need to be between N1 in the amine and C7 of the second phenyl-methanol in the molecule we are building.

Since we are building “from one end to the other” we do not need to bother with speciying in which residue to search for the atoms to connect, because BuildAMol will by default use the latest residue for attaching new molecules. However, if we do find ourselves working with branched structures, we can always use the at_residue and other_residue arguments to specify which residues to use when attaching different molecules.

[10]:
# connect the amine to the molecule
# (this time we care about stereochemistry and specify that HN1A should be deleted)
link3 = bam.linkage("C7", "N1", delete_in_source=["HN1A"])

# attach and visualize
mol.attach(amine, link3)

mol.draw2d(atoms="id").highlight_bonds(mol.get_bonds("C7", "N1"), color="cyan").draw()
[10]:
../_images/examples_building_workflow_26_0.png

Almost there, the only thing left is the naphthalene. We know the drill by now, so let’s just quickly visualize to find out what the carbon we want to connect is called.

[11]:
naphthalene.draw2d(atoms="id").draw(True)
[11]:
../_images/examples_building_workflow_28_0.png

With the information that we are looking for C6 and its hydrogens, we can finish our structure:

[12]:
link4 = bam.linkage("C3", "C6", delete_in_source=["H62"])

# attach (using another whacky syntax) and visualize
final = mol % link4 + naphthalene
final.draw2d(atoms="id").highlight_bonds(final.get_bonds("C3", "C6"), color="cyan").draw()
[12]:
../_images/examples_building_workflow_30_0.png

Optimizing a molecule#

Alrighty, that is the structure we just built! Admitedly, it looks a bit crammed. So, let’s optimize the conformation a little. Optimizing conformations in BuildAMol is a topic all to its own, so go check out the tutorial on optimizing structures for a better introduction. Here we will just use the Molecule.optimize method with default settings and won’t worry anymore about it…

[13]:
# optimize the conformation
final_optimized = final.optimize(inplace=False)

# visualize the optimized conformation
v = final.plotly(atoms=False, line_color="red")
v += final_optimized.plotly(atoms=False, line_color="blue")
v.show()

That one looks a little better. Let’s check if we have any clashes left in our final structure using:

[14]:
print("clashes before:", final.find_clashes())
print("clashes after:", final_optimized.find_clashes())
clashes before: []
clashes after: []

With that out of the way, we can think of what to do next with our molecule. Perhaps our project involves molecular docking simulations that require a PDB input. We can easily export our new molecule using the to_pdb method or the write_pdb function:

[15]:
bam.write_pdb(final_optimized, "./files/final_optimized.pdb")

With that we have reached the end of this little walkthrough to the basic work-routine of building structures in buildamol. As you hopefully agree, it is quite straightforward to use BuildAMol to create larger structures from small components. Of course, even though we have only used single-residue components in this tutorial, there is (in principle) no limit to what BuildAMol can connect. It becomes especially useful if you build parts of a molecule separately and then simply join them together. With that, good luck in your project using BuildAMol!