Syntax Flavors#

In this tutorial we will cover:

  • the different syntax flavours of building molecules

If you have checked out some of the other tutorials you will have definitely come accross the bb.connect function that is used to connect two molecules together. However, biobuild comprises three different syntaxes to use when performing different tasks. A functional, a method-based, and an operator-based syntax.

Functional API#

Biobuild contains a lot of functions spread over a number of modules. Especially the structural module contains a great number of them. Most biobuild functions are intended for usage by an end-user, so it is completely fine to import whatever functions you need and use them. This gives a more R-like user experience. Many functions are also automatically imported when loading biobuild.

Examples#

  • bb.connect

  • bb.read_pdb

  • bb.structural.autolabel

  • bb.get_compound

Method API#

For convenience, most functions that a user is likely to use on a regular basis have already been integrated into methods of the Molecule or other classes. Hence, it is probably the most convenient for most users to rely on the method based API the most since it saves you the time of importing the modules necessary. Most methods support the full range of parameters that the functions they are linked to support, but this is not always the case! Of course, there are many methods that are only implemented as methods and not available as stand-alone functions. Additionally, there are some synonymous methods available such as bb.Molecule.get_residue_graph and bb.Molecule.make_residue_graph for historic reasons and compatibility in the code base.

Examples#

  • bb.Molecule.attach

  • bb.Molecule.from_pdb

  • bb.Molecule.autolabel

  • bb.PDBEComponds.get

Operator API#

Operators are a great way of writing very short code. Biobuild implements a syntax called “molecular arithmetics” that supports the basic operations for connecting molecules together. For instance, it allows us to connect mol_c = mol_a + mol_b. Naturally, this syntax is the most constrained out of the three, but it offers a wonderfully short way of creating larger structures.

Available Operators#

Function

Method

Attribute

Operator

mol_a.set_linkage(link)

mol_a.linkage = link

mol_a % link

mol_a.set_attach_residue(res)

mol_a.attach_residue = res

mol_a @ res

mol_a.set_root(root_atom)

mol_a.root_atom = root_atom

mol_a ^ root_atom

mol_c = bb.connect(mol_a, mol_b, link)

mol_c = mol_a.attach(mol_b, link, inplace=False)

mol_c = mol_a + mol_b

bb.connect(mol_a, mol_b, link, copy_a=False)

mol_a.attach(mol_b, link)

mol_a += mol_b

mol_c = bb.polymerize(mol_a, n, link)

mol_c = mol_a.repeat(n, link, inplace=False)

mol_c = mol_a * n

bb.polymerize(mol_a, n, link, inplace=True)

mol_a.repeat(n, link)

mol_a *= n

A general note on functions versus methods

The functional API is usually taylored toward non-inplace operations, while the method based API is taylored toward inplace operations. If you look at the table above closely, you will notice that the copy and inplace arguments are always switched between the two. So, when calling Molecule.attach, the operation will be in-place by default, while calling connect will by default return a copy.

Examples#

Let’s look at an example. Because we like sugars so much, we’ll build a glycan structure (yep, not very creative, but it does the trick)… If you are unfamiliar with glycans, just tag along for the ride and don’t think too much about it. It’s just an example…

flowchart TB
  node_1["Glucose"]
  node_2["Glucose"]
  node_3["Galactose"]
  node_4["Galactose"]
  node_5["Galactose"]
  node_6["Mannose"]
  node_7["Mannose"]
  node_8["Glucose"]
  node_1 --"beta 1-4"--> node_2
  node_2 --"alpha 1-3"--> node_3
  node_3 --"alpha 1-4"--> node_4
  node_4 --"alpha 1-3"--> node_5
  node_6 --"beta 1-4"--> node_7
  node_7 --"alpha 1-4"--> node_8
  node_2 --"beta 1-2"--> node_6
[1]:
import plotly
plotly.offline.init_notebook_mode()
[2]:
import biobuild as bb
# first get some compounds
bb.load_sugars()

glc = bb.molecule("GLC")
gal = bb.molecule("GAL")
man = bb.molecule("MAN")
fuc = bb.molecule("FUC")

Using functional syntax#

Now we will build the glycan only using the functional syntax of biobuild:

[3]:
# start with the glucose-glucose at the top
# remember that biobuild has pre-available linkages that
# can be referenced by their string id directly;
# glycosydic linkages are among them.
glycan = bb.connect(glc, glc, link="14bb")

# now build the galactose branch
gal_branch = bb.connect(gal, gal, link="14aa")
gal_branch = bb.connect(gal_branch, gal, link="13aa")

# now build the mannose branch
man_branch = bb.connect(man, man, "14bb")
man_branch = bb.connect(man_branch, glc, "14ab")

# now attach both branches to the glucose-glucose
# this time we need to specify at which residues to connect
# since we don't want to use the default last residue
glycan = bb.connect(glycan, gal_branch, link="13ab", at_residue_b=1)
glycan = bb.connect(glycan, man_branch, link="12bb", at_residue_a=2, at_residue_b=1)

# that's it! now we can visualize the glycan
glycan.show()

Using method syntax#

Now let’s repeat the same using the method syntax. Here we will need to add more inplace=False statements since we want to reuse the individual molecules multiple times. We did not have to bother with this using the functional syntax since we automatically generated copies there. On the other hand, we now can work a little more efficiently since we don’t create too many unnecessary copies of objects (of course, we could have achieved the same with the functional syntax by adjusting the copy arguments).

[4]:
glycan2 = glc.attach(glc, link="14bb", inplace=False)

# now build the galactose branch
gal_branch2 = gal.attach(gal, link="14aa", inplace=False)
gal_branch2.attach(gal, link="13aa")

# now build the mannose branch
man_branch2 = man.attach(man, "14bb", inplace=False)
man_branch2.attach(glc, "14ab")

# now attach both branches to the glucose-glucose
glycan2.attach(gal_branch2, link="13ab", other_residue=1)
glycan2.attach(man_branch2, link="12bb", at_residue=2, other_residue=1)

# that's it! now we can visualize the glycan
glycan2.show()

Using operator syntax#

Now for one final version using the operator based syntax. This one is the shortest but also the most cryptic. It requires us to first set the linkage and residues before we can use + or * operators.

[5]:
# start again with the glucose-glucose at the top
# first set the linkage using %, then add the next molecule
glycan3 = glc % "14bb" + glc

# now build the galactose branch
# we can separate the statements into multiple lines
gal % "14aa"
gal_branch3 = gal + gal
gal_branch3 % "13aa"
gal_branch3 += gal

# now build the mannose branch
# or we can "chain" the statements together using ()
man_branch3 = (man % "14bb" + man) % "14ab" + glc

# now attach both branches to the glucose-glucose
# now we also need to set the attach residues using @
glycan3 = glycan3 % "13ab" + gal_branch3 @ 1
glycan3 = glycan3 % "12bb" @ 2 + man_branch3 @ 1

# that's it! now we can visualize the glycan
glycan3.show()

Naturally, we can combine all three syntaxes together if we like. Whatever bits and pieces you like best about the three. So, if you don’t like the mol_a % "14bb" syntax just use mol_a.set_linkage("14bb") instead, it will not affect your ability to use mol_a + mol_b or mol_a.attach(mol_b) or bb.connect(mol_a, mol_b) later on. By the way, if you set the linkage beforehand, there is no need to specify it again when calling the attach method or connect function. Same thing goes for the attach residue.

With that we are at the end of this little tour through the three different ways to construct molecules in biobuild. Of course, there are many more functions and methods than there are operators, and it may not help readability to rely on the operators too much, especially when chaining many statements together. Nevertheless, they are a handy way to very concisely construct molecules. Please, feel free to use whichever syntax you prefer. Good luck in your project using biobuild!