{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> ### In this tutorial we will cover:\n", "> - which built-in resources are available\n", "> - how to set your own default settings" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Built-in resources\n", "\n", "BuildAMol has three built-in data resources: the _CHARMM_ force field, the _PDBE compound library_, and _PubChem_ (remotely queried)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "```mermaid\n", "\n", "flowchart TB\n", " node_1((\"CHARMM\"))\n", " node_2[\"pre-defined linkages\"]\n", " node_3((\"PDBE Compounds\"))\n", " node_4[\"small molecules\"]\n", " node_5((\"PubChem\"))\n", " node_7[\"amino acids\"]\n", " node_8[\"sugars\"]\n", " node_9[\"lipids\"]\n", " node_6[\"any other available molecule\"]\n", " node_10[\"nucleotides\"]\n", " node_1 --> node_2\n", " node_3 --> node_4\n", " node_3 --> node_7\n", " node_3 --> node_8\n", " node_3 --> node_9\n", " node_5 --> node_6\n", " node_3 --> node_10\n", "\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import buildamol as bam\n", "bam.visual.set_backend(\"py3dmol\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### CHARMM Force Field\n", "\n", "In order to connect molecules together, the user may define their own `Linkage` by specifying which atoms to connect and which atoms to remove in the process. However, to make life easier, BuildAMol` references the CHARMM force field which already specifies a number of linkage types - so-called `patches`. Each _patch_ specifies the atoms to connect and remove as well as the _internal coordinates_ around the newly formed bond. This allows BuildAMol to generate structures by pure matrix transformation as the resulting geometry is already specified. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can check what linkages are available by default using:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Linkage(SCK0), Linkage(SCK1), Linkage(LLLO), Linkage(CERA), Linkage(CERB), Linkage(DAGA), Linkage(DAGB), Linkage(INS2A), Linkage(INS2B), Linkage(INS6A), Linkage(INS6B), Linkage(SGPA), Linkage(TGPA), Linkage(SGPB), Linkage(TGPB), Linkage(NGLA), Linkage(11aa), Linkage(11ab), Linkage(11bb), Linkage(12aa), Linkage(12ab), Linkage(12ba), Linkage(12bb), Linkage(13aa), Linkage(13ab), Linkage(13ba), Linkage(13bb), Linkage(14aa), Linkage(14ab), Linkage(14ba), Linkage(14bb), Linkage(16aa), Linkage(16ab), Linkage(16ba), Linkage(SUCR), Linkage(LCTL), Linkage(AB15), Linkage(SA23AB), Linkage(LINK)]\n" ] } ], "source": [ "print(bam.available_linkages())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Each linkage is identified by an ID within the CHARMM force field - e.g. `12aa` stands for the `1->2 alpha glycosydic linkage`. Each of the pre-defined available linkages can be referenced by their (string) id when connecting molecules together.\n", "\n", "For example, we can connect two mannoses using a `12aa` linkage by:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "man = bam.molecule(\"./files/man.pdb\")\n", "\n", "# use pre-defined 12aa linkage\n", "man2 = bam.connect(man, man, \"12aa\")\n", "man2.show()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Linkage(SCK0), Linkage(SCK1), Linkage(LLLO), Linkage(CERA), Linkage(CERB), Linkage(DAGA), Linkage(DAGB), Linkage(INS2A), Linkage(INS2B), Linkage(INS6A), Linkage(INS6B), Linkage(SGPA), Linkage(TGPA), Linkage(SGPB), Linkage(TGPB), Linkage(NGLA), Linkage(11aa), Linkage(11ab), Linkage(11bb), Linkage(12aa), Linkage(12ab), Linkage(12ba), Linkage(12bb), Linkage(13aa), Linkage(13ab), Linkage(13ba), Linkage(13bb), Linkage(14aa), Linkage(14ab), Linkage(14ba), Linkage(14bb), Linkage(16aa), Linkage(16ab), Linkage(16ba), Linkage(SUCR)]\n" ] } ], "source": [ "# define a custom 1->2 glycosydic linkage\n", "my_12aa = bam.linkage(atom1=\"O2\", atom2=\"C1\", delete_in_target=[\"HO2\"], delete_in_source=[\"O1\", \"HO1\"], id=\"my_12aa\")\n", "\n", "# add the linkage to the library\n", "bam.add_linkage(my_12aa)\n", "\n", "# and check that it was added to the list\n", "print(bam.available_linkages()[:-5])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If the set of available linkages is quite large and we want to check if a particular one is available, we can also use the `has_linkage` function to check if a linkage with a given id is pre-loaded in the current session." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bam.has_linkage(\"my_12aa\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### The `CHARMMTopology` class\n", "\n", "The data from the CHARMM force field is handled by the `CHARMMTopology` class, which parses (you guessed it) a CHARMM topology file (**not** parameter file) and stores its data in a dictionary structure. Its purpose is to store linkages.\n", "\n", "The default instance of this class can be accessed using the `get_default_topology` function. Why is this useful? Well, if there is a \"get\"-default topology function there may be a \"set\"-version as well (which is totally the case). If you have your own CHARMM topology file with defined linkages and molecules, you can `read_topology` to parse your own file, use `set_default=True` to make your topology the default, and thus tailor BuildAMol to your specific needs. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Linkage(my_14bb), Linkage(my_16ab)]\n" ] } ], "source": [ "# read a custom topology file to make a CHARMMTopology\n", "# (but don't set it as the default topology)\n", "my_top = bam.read_topology(\"./files/my_top.top\", set_default=False)\n", "\n", "# check out the patches / linkages in the topology\n", "print(my_top.patches)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If we want to use a non-default topology we can either specify the topology we want to use as an argument to functions and methods which accept a `_topology` argument, or we directly provide the linkage objects we obtain from the topology." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# connect the two clucoses using the `my_16ab` linkage from my_top\n", "# which we can even specify just via the ID string\n", "my_man2 = bam.connect(man, man, \"my_16ab\", _topology=my_top)\n", "\n", "# or\n", "my_16ab = my_top.get_patch(\"my_16ab\")\n", "my_man2 = bam.connect(man, man, my_16ab)\n", "\n", "my_man2.show()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### PDBE Compounds\n", "\n", "BuildAMol maintains a part of the PDBE component library of small molecules, common sugar, lipid, nucleotides, and amino acid compounds and derivatives to directly obtain molecular structures while coding, without the need to download any pdb files externally. Molecules can be obtained from the library using their _PDB ID_, their names or some other available identifier such as SMILES. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To reduce memory load, BuildAMol loads by default an empty library which can then be populated according to the user's needs using functions:\n", "\n", "- `load_small_molecules()` \n", "- `load_sugars()`\n", "- `load_lipids()`\n", "- `load_amino_acids()`\n", "- `load_nucleotides()`\n", "- or `load_all_compounds()` (to perform all of the above)\n", "\n", "All of the above also have an `unload_` equivalent to again remove the compounds from the currently loaded default compounds.\n", "\n", "Of course, we can set a custom default library to load automatically using the `set_default_compounds` function (see below)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# we are interested in working with sugars so we load the sugars\n", "bam.load_sugars()\n", "\n", "# now we are able to refer to sugar compounds we want to use directly by their pdb ID or name\n", "# for example, we can connect two glucose molecules using the 1->6 linkage\n", "glc = bam.molecule(\"GLC\")\n", "glc2 = bam.connect(glc, glc, \"16ab\")\n", "\n", "glc2.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we would like to make the molecule makery a bit quicker by removing the overhead of first figuring out what kind of input we provided, we can use the `get_compound` function instead of the `molecule` function - this also works with all sorts of inputs automatically. One step further would be the `Molecule.from_compound` classmethod which requires the user to specify which type of input they use:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# get a galactose (slowest, but most convenient)\n", "gal = bam.molecule(\"GAL\")\n", "# or (faster)\n", "gal = bam.get_compound(\"GAL\")\n", "# or (fastest, but most tedious)\n", "gal = bam.Molecule.from_compound(\"GAL\", by=\"id\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Similar to the CHARMM topology, we can get the default instance of the `PDBECompounds` class that handles the databse using" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1068" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "comp = bam.get_default_compounds()\n", "# print how many (currently sugar-only) compounds are available in the library\n", "print(len(comp))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding custom compounds\n", "\n", "If a user has built a specific molecule that they are going to use a lot and do not wish to save and load all the time from their own file, they can add custom molecules to the default compounds using the `add_compound` function.\n", "\n", "Let's say our current project revolves around our two-glucose molecule, we can add it to the default compounds to ensure that it will always be available in the future." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "glc2.id = \"glc2\" # we need to give the molecule a unique ID\n", "bam.add_compound(glc2, \n", " type=\"my_project_basic_molecules\", # we can add some descriptive tag here (optional)\n", " names=[\"di-glucose\"] # provide some name aliases (optional)\n", " )\n", "\n", "# now we can refer to the compound by our self-given name 'di-glucose'\n", "bam.molecule(\"di-glucose\").show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting default compounds\n", "\n", "If we have a custom set of compounds that we are always using, we can set them to be the defaults that are loaded by BuildAMol in every session. We can do so using the `set_default_compounds` function, just like with the CHARMM topology. Using the option `overwrite=True` we can ensure that BuildAMol will load the currently set default PDBE compounds library automatically in every future session." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "# get the currently loaded compounds (including our newly added one)\n", "current_compounds = bam.get_default_compounds()\n", "\n", "# save the current compounds as defaults for every future session\n", "bam.set_default_compounds(current_compounds, overwrite=True)\n", "\n", "# if we realize later that we would like to reset our defaults again\n", "# we can use:\n", "bam.restore_default_compounds()\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PubChem\n", "\n", "PubChem maintains an enormous database of readily available 3D structures for molecules. They offer a handy API in the form of the `pubchempy` package, which is integrated into buildamol. In fact, any call to `molecule` will eventually end up calling on PubChem if no compound or file could be identified for the given inputs. Hence, to use PubChem the user needs not learn anything new. Alternatively, there is a classmethod `Molecule.from_pubchem` that can be used as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get an aspirin molecule" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bam.has_compound(\"aspirin\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, apparently, aspirin is not available from the loaded sugar compounds (no surprise there), but it is actually also not going to be part of any of the other pre-defined sets. So, we need to refer to pubchem to get it..." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "aspirin = bam.Molecule.from_pubchem(\"aspirin\")\n", "aspirin.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And there it is! Again, using the `molecule` function would be all we need. The call to PubChem is automatically handled by BuildAMol if a reference to PDBECompounds does not yield any results." ] } ], "metadata": { "kernelspec": { "display_name": "glyco2", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }