Adding Resources#

In this tutorial we will cover:

  • how we can set custom default resources

If you checked out the tutorial on built-in data resources you know that biobuild comes with a variety of molecular structures for the user to work with directly, such as compounds containing amino acids or sugars. These can be easily loaded using functions such as load_sugars(). However, what if we are working on a drug discovery project and would like to engineer small variations of a promising compound? We may have a structure of the compound itself but we don’t want to keep re-loading them every time we open a new notebook to use biobuild. But we would like for it to be available by default.

Luckily, biobuild allows us to set our own defaults that can be loaded in every future session. We can add custom linkages and custom structures to the default CHARMM topology and PDBE Compounds, respecitively using functions add_linkage and add_compound. Here’s how:

Adding custom linkages#

You may know that you can refer to any registered linkage directly via its string identifier. For example, we can refer to a beta 1->4 glycosydic linkage by its id 14bb. We do not need to get the linkage object first because biobuild will be able to interpret the id and search the linkage object automatically. To make a custom linkage available in such a way, we simply need to define it, give it an id (which is otherwise optional when creating a Linkage object). Then we call add_linkage with our new linkage and that’s it!

[1]:
import plotly
plotly.offline.init_notebook_mode()
[2]:
import biobuild as bb

# define a custom link
my_link = bb.linkage("C1", "O4", ["O1", "HO1"], ["HO4"], id="my_14_glyco_link")

# now add the link to the default CHARMM topology
bb.add_linkage(my_link)

# now we find it at the bottom of the list of 'out-of-the-box' available linkages
bb.available_linkages()
[2]:
[Linkage(SCK0),
 Linkage(SCK1),
 Linkage(LLLO),
 Linkage(CERA),
 Linkage(CERB),
 Linkage(DAGA),
 Linkage(DAGB),
 Linkage(INS2A),
 Linkage(INS2B),
 Linkage(INS6A),
 Linkage(INS6B),
 Linkage(SGPA),
 Linkage(TGPA),
 Linkage(SGPB),
 Linkage(TGPB),
 Linkage(NGLA),
 Linkage(11aa),
 Linkage(11ab),
 Linkage(11bb),
 Linkage(12aa),
 Linkage(12ab),
 Linkage(12ba),
 Linkage(12bb),
 Linkage(13aa),
 Linkage(13ab),
 Linkage(13ba),
 Linkage(13bb),
 Linkage(14aa),
 Linkage(14ab),
 Linkage(14ba),
 Linkage(14bb),
 Linkage(16aa),
 Linkage(16ab),
 Linkage(16ba),
 Linkage(SUCR),
 Linkage(LCTL),
 Linkage(AB15),
 Linkage(SA23AB),
 Linkage(LINK),
 Linkage(my_14_glyco_link)]

We can now make use of our custom linkage just as we would use any of the predefined ones. So, to connect a glucose and a mannose using our new linkage, we can do:

[3]:
bb.load_sugars()

glc = bb.molecule("GLC")
man = bb.molecule("MAN")

# connect using only the string identifier
# instead of the actual Linkage object
new = bb.connect(glc, man, "my_14_glyco_link")
new.show()

If we close the notebook, we will have lost "my_14_glyco_link" however. In order to ensure it is available in all future sessions, we can call add_linkage with the additional argument overwrite=True. This will permanently save the modified CHARMM topology as the new default.

[4]:
# now the linkage is available even the next time we open a new notebook or restart the kernel
bb.add_linkage(my_link, overwrite=True)

Adding custom structures#

While it is true that biobuild does not load sugar-compounds and the like automatically, it does actually load a default PDBE compounds instance (it just happens to be empty be default). It can be populated with anything we want, however, using the add_compound function. The function expects a molecule, alongside with some additional metadata such as identifiers and name synonyms by which the molecule can be found. The logic is identical to add_linkage and we can make the additions permanent using the overwrite=True argument when calling add_compound.

[5]:
# first we need to make sure the molecule has a good id
# the id can be anything, but it needs to be unique
new.id = "my_sugar"

# now let's add our small sugar compound to the PDBE compounds
# we provide as metadata only two names for the compound (more options are available)
bb.add_compound(new, names=["my first sugar", "glucose-mannose"])

Now we can check if "my_sugar" is available as a compound using bb.has_compound:

[6]:
# the has_compound function accepts both the
# registered id as well as any of the name synonyms or identifiers
bb.has_compound("my_sugar"), bb.has_compound("my first sugar")
[6]:
(True, True)
[7]:
# again we can make the compound available for the next time we open a new notebook or restart the kernel
# by using the overwrite=True option
bb.add_compound(new, names=["my first sugar", "glucose-mannose"], overwrite=True)

Warning

Using the add_compound function using overwrite=True can lead to an unwanted consequence: all other loaded compounds are automatically also added to the default compounds! That means, since we called load_sugars before, now the default loaded compounds will include not only our my_sugar molecule but also all the sugars that we loaded before! To prevent this from happening it would be better first to create a new instance of PDBECompounds, add the compound there. Then use the save_as_default_compounds function to permanently set the defaults in this way.

[8]:
# make a new empty PDBECompounds object
new_compounds = bb.PDBECompounds()

# now we can add our compound to the new object
new_compounds.add(new, type="CUSTOM-MOLECULES", names=["my first sugar", "glucose-mannose"])

# check what compounds are now available in the new object
new_compounds.ids
[8]:
['my_sugar']

Now we can save this object as the future default using save_as_default_compounds, without saving the 1000-something sugar compounds that are currently also loaded.

[9]:
# save as defaults
bb.save_as_default_compounds(new_compounds)

# print the number of currently loaded compounds
len(bb.get_default_compounds())
[9]:
1069

What if we have multiple compounds already added to our defaults which we would like to keep? The easiest way to handle this kind of scenario is using subset_compounds_by_types in order to get a new PDBECompounds object that only contains compounds of interest. For instance, if we always set the type of our custom molecules to be CUSTOM-MOLECULES, then we can always get our custom defaults using subset_compounds_by_types("CUSTOM-MOLECULES"), add any new compound to this database, and use save_as_default_compounds to make sure we have our molecules available in the future.

All this will leave the currently loaded default compounds of 1000-sugars and our custom molecules completely unaffected.

[10]:
# get all custom compounds we may have already added in the past
my_compounds = bb.subset_compounds_by_types("CUSTOM-MOLECULES")

# now add the new compounds to the existing ones
my_compounds.add(new, type="CUSTOM-MOLECULES", names=["my first sugar", "glucose-mannose"])

# (if we have not already done so) also add it to the current defaults (just don't use overwrite=True)
bb.add_compound(new, type="CUSTOM-MOLECULES", names=["my first sugar", "glucose-mannose"])

# now save the new compounds as defaults
bb.save_as_default_compounds(my_compounds)

Handling updates#

Setting defaults like this will overwrite some files in your local biobuild package directory. Since biobuild comes with these same files when you install it, every update would overwrite your custom defaults again. To make sure your custom settings are not simply lost, be sure to call export_custom_resources before updating. This will export your CHARMM Topology and PDBECompounds database to external files in some directory of your choice. After the update you can use import_custom_resources to point to these files and re-set your precious settings (again import_custom_resources can be used transiently or permanently, depending on whether or not we use overwrite=True or not).

Note

These functions will use the currently loaded default compounds.

[11]:
# export the custom defaults we just made
# to the current directory
bb.export_custom_resources(".")

# now we can load the custom resources from the current directory
bb.import_custom_resources(".")

Restoring biobuild defaults#

Made a mistake and now your compounds are messed up? You can always call restore_default_compounds or restore_default_topology to get the previous settings back. This will restore your settings to biobuild factory defaults, i.e. an empty component library! Make sure to first export any compounds you would like to keep using subset_compounds_by_types or more thoroughly using export_compounds and later use the PDBECompounds.merge method to create a custom library to your liking.