Getting Started#
Welcome to the first biobuild tutorial, where we shall learn the basics of operating with biobuild!
In this tutorial we will cover:
which (main) classes and modules exist and when to use them
how to read input to make
Moleculeshow to connect two molecules together
Main Classes in biobuild#
The most important class is the Molecule class which houses ~90% of all functionality that the average user is likely to use. Next, comes the Linkage class that defines how multiple molecules can be connected, so the user can build larger structures. Third, are the ResidueGraph and Rotatron environments of the optimizers module - to optimize conformations. Finally, the MoleculeViewer3D may come in quite handy. It is essentially just a plotly 3D-plot but it can be
selectively colored easily so as to highlight specific parts of your molecule - it is a great tool to check if your building process is working (or to debug why it may not be). Finally, the resources module houses a wealth of reference structures for users to use when building their favourite structures.
That’s already about it! Of course, there is a lot more - but everything else is mostly underneath the surface and will most likely not bother the average user directly.
flowchart TB
node_0["core"]
node_1("Molecule")
node_2("ResidueGraph")
node_3("MoleculeViewer3D")
node_4["Optimizers"]
node_6("Rotatron")
node_7{"{}_optimize"}
node_5["Linkage"]
node_8["resources"]
node_1 -.-> node_2
node_1 -.-> node_3
node_0 --- node_1
node_4 --- node_6
node_4 --- node_7
node_6 -.-> node_7
node_2 -.-> node_7
node_0 --- node_5
[1]:
import plotly
plotly.offline.init_notebook_mode()
import biobuild as bb
Resources#
biobuild includes the PDBE component library for small molecules (or at least a pretty large part of it). Since this is quite large, biobuild will not by default load any data, but offers a buch of functions for the user to load what they are interested in using selectively. We can also set defaults that are always loaded if we want to re-use our own custom builds in different sessions.
Available pre-loadable sets are:
# to load the amino acid and derivative compounds
bb.load_amino_acids()
# to load nucleotide and derivative compounds
bb.load_nucleic_acids()
# to load sugar and derivative compounds
bb.load_sugars()
# to load miscallaneous small molecules of less than 40 atoms
bb.load_small_molecules()
Molecules#
Molecules are the essential data unit in biobuild. Each Molecule houses atoms that form a molecular structure. We can generate Molecules from: - a PDB or CIF file (e.g. “my_structure.pdb”) - a PDB ID (e.g. “GLC” - the PDB id for alpha-D-glucose) - a trivial name (e.g. “triacetamide”) - a chemical formula (e.g. “C2H6O” - caution, may produce ambiguities!) - SMILES and InChI/InChIKey
The Molecule class has classmethods for all of these named from_pdb, from_compound (for PDB ID, name etc.), from_pubchem and so forth. But there also exists the toplevel molecule function that will try to automatically figure out the user input and generate a molecule for you.
[2]:
# make sure we have amino acids available
bb.load_amino_acids()
# get a serine
ser = bb.molecule("SER") # (using the PDB id)
# check what it looks like
ser.show()
We can always check out if a compound is available in a loaded set using the has_compound function. But even if a specific compound is not available biobuild can retrieve the molecular structure directly from PubChem to guarantee a smooth workflow!
[3]:
# check if we have diacetamide
bb.has_compound("diacetamide")
[3]:
False
[4]:
# get a diacetamide (using its name)
# since diacetamide was not available in the loaded set of compounds
# biobuild will automatically download it from PubChem
tri = bb.molecule("diacetamide")
# check what it looks like
tri.show()
Linkage#
Now assuming we want to take diacetamide and connect the serine to it. We can define a Linkage to specify which atoms to link between the two. To do so, we need to know what the atoms are labelled as (another reason to always visualize the structures). In the Linkage definition we specify, for instance, that the diacetamide-N1 should be connected to serine-C. In the process, we remove the diacetamide-H1 and the serine’s OXT and HXT.
Note
The linkages do not need to follow chemical reasoning - they only connect structures by adding and removing connections. Hence, we could connect the molecules by one of the methyl-groups making one of the hydrogens the leaving group (even though chemically such a reaction would probably never occur) - biobuild builds structures, but it does not imitate chemical reactions!
[5]:
# define the linkage between the two molecules
# remember to check the atom labelling as these atom labels must match
# the existing atom labels in both molecules
link = bb.linkage(atom1="N1", atom2="C", delete_in_target=["H1"], delete_in_source=["OXT", "HXT"])
Now we can connect the two molecules for example using the connect function:
[6]:
# connect the two molecules
new = bb.connect(tri, ser, link)
# and check what it looks like
new.show()
Optimization#
If we observe a clash in the resulting conformation, we can apply an optimization. This is where the optimizers module comes into play. Biobuild uses a rotational optimization system to find conformations without the need for molecular dynamics. This can be efficient or very expensive (up to unsolvable, in fact!) depending on the pre-processing / pre-thinking that the user provided. But don’t worry, just don’t try to optimize around 100 bonds at once (that won’t work, trust me!).
In general the workflow of optimization works like this: - decide on how “detailed” (=computationally expensive) your structure should be represented during optimization - decide on which bonds to rotate around - choose an optimization algorithm - run and check out the structure
For this example we will choose: - full details (since it is a small structure) - only bonds that are relatively “central” - a genetic algorithm
[7]:
from biobuild import optimizers
# step (1)
# make a graph representation of the molecule
graph = new.get_atom_graph()
# step (2)
# get edges/bonds to rotate around
# we choose only edges/bonds that have at least 10 atoms before and after them
rotatable_edges = graph.find_rotatable_edges(min_descendants=10, min_ancestors=10)
# step (3)
# make an environment
# we use the DistanceRotatron environment
env = optimizers.DistanceRotatron(graph, rotatable_edges)
# step (4) and (5)
# run the optimization algorithm
new_optimized = optimizers.optimize(new, env, algorithm="genetic")
# and check what it looks like
new.show()
Formats and Integrations#
Now that we have a structure we like we can for instance save it as a PDB or mmCIF file. If we have more intricate interests in the molecules we have built we can export them directly to rdkit or openbabel objects for further analysis.
To make working with different data formats as easy as possible, each biobuild Molecule can be created and saved to the following formats:
flowchart TB
node_1("Molecule")
node_2(["PDB"])
node_3(["mmCIF"])
node_9(["Molfile"])
node_11(("SMILES"))
node_4(["JSON\n(biobuild)"])
node_5(["pickle\n(biobuild)"])
node_6{{"rdkit\n(Chem.rdchem.Mol)"}}
node_7{{"openbabel\n(pybel.Molecule)"}}
node_8{{"biopython\n(PDB.Structure)"}}
node_10{{"openmm\n(openmm.PDBFile)"}}
node_2 <--> node_1
node_3 <--> node_1
node_9 <-.-> node_1
node_11 <-.-> node_1
node_4 <--> node_1
node_5 <--> node_1
node_6 <--> node_1
node_7 <--> node_1
node_8 <--> node_1
node_10 <--> node_1
Here are is a table with the corresponding methods available in the Molecule class:
Format |
As input |
As output |
|---|---|---|
PDB |
|
|
mmCIF |
|
|
Molfile |
|
|
SMILES |
|
|
JSON |
|
|
pickle |
|
|
rdkit |
|
|
openbabel |
|
|
openmm |
|
|
biopython |
|
|
[8]:
# so let's save our small molecule to a PDB file
new.to_pdb("our_first_molecule.pdb")
With that we have reached already the end of this first little introduction into what biobuild can do, and how it feels to work with biobuild. Thanks a lot for reading this far! And good luck with whatever you may do with biobuild in your projects!