The core package#
The core classes and functions of BuildAMol
The Molecule module#
The Molecule is the heart of the buildamol package. It is a class that handles molecular structures and allows the user to easily assemble them into larger constructs. The Molecule is a child of the BaseEntity class that provides most of the functionality. The Molecule module defines additionally a number of toplevel functions to easily create new molecules by querying databases or reading files.
The Molecule class is a wrapper around a biopython structure and a core part of BuildAMol functionality. It provides a convenient interface to molecular structures and their properties, such as atoms, bonds, residues, chains, etc.
Note
To help with identifying individual atoms, residues, etc. BuildAMol uses a different identification scheme than biopython does. Therefore BuildAMol comes with its own child classes of the biopython classes that are used to represent the structure. These classes are called Atom, Residue, Chain, etc. and can be used as drop-in replacements for the biopython classes and should not break any existing code. However, in case any incompatibility is observed anyway, the classes are equipped with a to_biopython method that will remove the BuildAMol overhead and return pure biopython objects (this is not an in-place operation, however, and will return a new object).
Making Molecules#
The easiest way to make a new molecule is to use the toplevel molecule function, which will automatically try to detect the type of user provided input and generate a molecule from it. Currently supported inputs are: - A biopython structure - An RDKit molecule - An OpenBabel molecule - An OpenMM topology - An STK molecule - A PDB id - A PDB file - A CIF file - A MOL file - A PDBQT file - A JSON file - An XML file - An XYZ file - A SMILES string - An InChI string - An IUPAC name or abbreviation, or any name that matches a known compound synonym that is associated with the PubChem database
from buildamol import molecule
my_glucose = molecule("GLC") # use the PDB id
# or
my_glucose = molecule("GLC.pdb") # use the PDB file
# or
my_glucose = molecule("alpha-d-glucose") # use the name
# ...
Since the molecule function is a try-and-error function it is convenient but not the most efficient. Hence, the Molecule class offers already a number of convenient methods to easily generate molecules directly from specified data sources. Available methods are:
Molecule.from_pdb to generate a molecule from a PDB file
Molecule.from_cif to generate a molecule from a CIF file
Molecule.from_smiles to generate a molecule from a SMILES string
Molecule.from_pubchem to generate a molecule from a PubChem entry
Molecule.from_compound to generate a molecule from a PDBECompounds entry
Molecule.from_rdkit to generate a molecule from an RDKit molecule object
Molecule.from_openmm to generate a molecule from an OpenMM topology object
Molecule.from_stk to generate a molecule from an STK molecule object
Molecule.from_molfile to generate a molecule from a MOL file
Molecule.from_json to generate a molecule from a JSON file
Molecule.from_xml to generate a molecule from an XML file
Molecule.from_pdbqt to generate a molecule from a PDBQT file
Molecule.from_pybel to generate a molecule from an OpenBabel molecule object
Molecule.from_xyz to generate a molecule from an XYZ file
Molecule.empty to generate an empty molecule (contains a model and chain)
Molecule.new to generate an empty molecule (contains a model, chain and residue)
Hence, if we know that “glucose” is already available in our local PDBECompounds database, we can generate the molecule also as follows:
from buildamol import Molecule
my_glucose = Molecule.from_compound("GLC") # use the PDB id
The quickest way to query the local PDBECompounds database is to use the PDB Id of the desired compounds. However, the from_compound accepts other inputs as well. The database is queried using the by parameter, which can be one of the following: - “id” for the PDB id (default) - “name” for the name of the compound (must match any known synonym of the iupac name) - “formula” for the chemical formula (usually ambiguous and will therefore often raise an error) - “smiles” for the SMILES string (also accepts InChI)
# create a new glucose molecule
glc = Molecule.from_compound("alpha-d-glucose", by="name") # use the name
Saving Molecules#
If a molecule is created from a large PDB file and has undergone a lot of preprocessing, it may be useful to save the molecule to a pickle file which can be loaded again later to avoid repeated preprocessing. This can be done using the save and load methods.
# save the molecule to a pickle file
my_molecule.save("my_molecule.pkl")
# load the molecule from the pickle file
my_molecule = Molecule.load("my_molecule.pkl")
Modifying Molecules#
Once a molecule is created, it can be modified in a number of ways. The most common modifications are - adding bonds (because when loading from a PDB file, bonds are not inferred automatically!) - adding additional residues and atoms - removing residues and atoms - adjusting labelling (e.g. changing the chain names and residue seqids)
Adding bonds#
To add a bond between two atoms, we use the add_bond method, which expects two arguments for the two connected atoms. The atoms can be specified by their full_id tuple, their id string, their serial_number (always starting at 1) or directly (the biopython.Atom object).
glc = Molecule.from_compound("GLC")
# add a bond between the first and second atom
glc.add_bond(1, 2)
# and also add a bond between "O1" and "HO1" atoms
glc.add_bond("O1", "HO1")
Already for small molecules such as glucose with only 24 atoms, it would be very tedious to add all bonds manually. Good thing that the molecules created using from_compound or from_pdb already contain all the default bonds!
However, in case the bonds are missing, or the PDB file did not specify any to begin with, the Molecule class offers two methods: apply_standard_bonds and infer_bonds. The former uses reference connectivity information from the PDBECompounds database or CHARMMTopology to add all bonds that are known for the compound (if it exists in the database). The latter will use a simple distance-based approach to infer bonds between atoms that are closer than a specified threshold (default: 1.6 Å), which can be restricted further to a min-max window.
# add all standard bonds for Glucose
glc.apply_standard_bonds()
# add all bonds that are closer than 1.6 Å
glc.infer_bonds(bond_length=1.6)
# add all bonds that are closer than 1.6 Å, but not closer than 1.0 Å
glc.infer_bonds(bond_length=(1.0, 1.6))
Note
By default infer_bonds will not attempt to add bonds between atoms that belong to different residues. This is because in cases of suboptimal conformations or in very large structures atoms that are close in space may not be connected in the structure. To override this behaviour, set the restrict_residues parameter to False.
# add all bonds that are closer than 1.6 Å, even if they belong to different residues
glc.infer_bonds(bond_length=1.6, restrict_residues=False)
Residue Connections#
Alternatively, instead of infer_bonds one may use infer_residue_connections to get bonds that connect different residues. This method will infer all bonds between atoms from different residues based on the distance between them. The inferred bonds are saved in the Molecule and also returned in a list. If the optional argument triplet is set to True, the methodd will also return the bonds immediately adjacent to the inferred bonds.
Take the following example of a molecule with two residues A and B that are connected by a bond between OA and (1)CB:
connection --> OA OB --- H
/ \ /
(1)CA --- (2)CA (1)CB
/ \ \
(6)CA (3)CA (2)CB --- (3)CB
\ /
(5)CA --- (4)CA
If triplet=False the method will only return the bond between OA and (1)CB. However, if triplet=True it will also return the bond between (2)CA and OA - thus forming a triplet of atoms (2)CA, OA and (1)CB that connect the two residues A and B.
# infer all bonds between atoms from different residues
>>> glc.infer_residue_connections(triplet=False)
[(OA, (1)CB)]
>>> glc.infer_residue_connections(triplet=True)
[(OA, (1)CB), ((2)CA, OA)]
Adding residues and atoms#
To add one or more new residue(s), we use the add_residues method, which expects a number buildamol.Residue objects as unnamed arguments. Similarly, to add one or more new atom(s), we use the add_atoms method, which expects a number of buildamol.Atom objects as unnamed arguments. Both methods allow to specify the parent object (chain or residue) via an optional argument and will automatically choose the last chain or residue if none is specified.
from buildamol import Residue, Atom
new_residue = Residue("XYZ", 1, " ")
# do things with the residue here
# ...
# add the residue to the molecule
# (add it to the last chain, whichever that may be)
glc.add_residues(new_residue)
new_atom = Atom("X", [0, 0, 0])
# add the atom to the first residue in the `glc` molecule
glc.add_atoms(new_atom, residue=1)
Removing residues and atoms#
In order to remove residues or atoms or bonds, we can use the remove_residues, remove_atoms and remove_bond`(yes, singluar!) methods. They work exactly like their `add_ counterparts, but instead of adding, they remove the specified objects.
# remove the first residue
glc.remove_residues(1)
# remove the first atom
glc.remove_atoms(1)
# remove the bond between the first and second atom
glc.remove_bond(1, 2)
Adjusting labelling#
Single-residue molecules that were loaded from a PDB file may not use the same atom labelling as the PDBE and CHARMM databases. In order to quickly adjust the labelling, a method autolabel exists. autolabel uses the atom connectivity and a rule-based algorithm to infer the most likely atom labels. However, since this method is not perfect, it is recommended to check the labelling afterwards and adjust it manually if necessary.
If working with large molecules that follow another labelling scheme it may be more efficient to simply define your own linkage recipies or patches (see the documentation of linkages) that use the the appropriate labelling scheme.
# load a molecule from a PDB file
glc = Molecule.from_pdb("glucose.pdb")
# adjust the labelling
glc.autolabel()
# save the molecule to a new PDB file
glc.to_pdb("glucose_autolabelled.pdb")
Another common operation is the adjustment of chain names and residue seqids. This can be done using the reindex method. This method accepts three starting values for the chain name, residue seqid and atom serial number and will then reindex all chains, residues and atoms to ensure they are continuously numbered and labelled. Some internal methods used when connecting different molecules are reliant on a continuous numbering scheme, so this method should be called before connecting molecules that were loaded from PDB files.
# load a molecule from a PDB file
glc = Molecule.from_pdb("glucose.pdb")
# reindex the molecule
glc.reindex()
We can also use one molecule as a “reference” for reindexing another molecule to make sure there are now labelling conflicts between them in case we want to connect them together later (this is usually done internally by BuildAMol automatically).
# load a molecule from a PDB file
glc = Molecule.from_pdb("glucose.pdb")
# load another molecule from a PDB file
cel = Molecule.from_pdb("cellulose.pdb")
cel.reindex() # make sure we have a continuous numbering scheme
# reindex the glucose molecule using the cellulose molecule as a reference
cel.adjust_indexing(glc)
Connecting Molecules#
Since most modifications are not simply single residues but rather complex structures, the second main purpose of a Molecule is to be easily connected to other Molecules to form a larger structure. To this end, the Molecule class provides a number of methods to easily assemble complex structures from small single residue molecules.
Forming Polymers#
The simplest way to generate a large structure is probably the repeat method, which will repeat the given molecule n times to form a homo-polymer.
# create a glucose molecule
glc = Molecule.from_compound("GLC")
# create cellulose from glucose
# using a 1-4 beta-beta glycosidic linkage
# which is pre-defined in the default CHARMMTopology
glc.repeat(10, "14bb")
# Now we have a cellulose of 10 glucoses
In the above example we used the repeat method explicitly, but we could also achieve the same with the short-hand *=. For this to work, we need to specify the linkage type beforehand. We do this by setting the patch attribute before using any operator.
# specify the "default" linkage type for connecting
# other molecules to this glucose
glc.linkage = "14bb"
# now make a cellulose by multiplying glucoses
glc *= 20
# Now we have a cellulose of 20 glucoses
If we wish to keep glc as a single residue Glucose and still get our desired cellulose, we can set inplace=False when calling repeat or simply use the * operator, both of which will have the same effect of creating a new copy that houses the appropriate residues.
cel = glc.repeat(10, "14bb", inplace=False)
# or (equivalently)
glc.patch = "14bb"
cel = glc * 10
Connecting different Molecules#
What if we want to connect different molecules? For example, we may want to connect a Galactose to a Glucose to form Lactose. This can be achieved using the attach method, which will attach a given molecule to to another molecule.
glc = Molecule.from_compound("GLC")
gal = Molecule.from_compound("GAL")
# attach the galactose to the glucose
# (we want a copy, so we set inplace=False just like with 'repeat')
lac = glc.attach(gal, "14bb", inplace=False)
# Now we have a lactose molecule
In the above example, the attach method is used to attach the galactose molecule to the glucose, but for those among us who prefer a more shorty syntax, the + operator will do the same thing.
# specify that incoming molecules shall be
# attached using a 1-4 beta linkage
glc.linkage = "14bb"
# now attach the galactose to the glucose
lac = glc + gal
Of course, if there is a + operator there should also be a += operator, which is simply the equivalent of attach with inplace=True.
glc.linkage = "14bb"
glc += gal
# Now 'glc' is a lactose molecule
Setting default Modifiers#
So far, we have always worked with a 1-4 beta-beta glycosidic linkage, which we apparently could select using the string “14bb”. But what if we want to use a different linkage type? For example, a 1-4 alpha-beta glycosidic linkage? You of course noticed, that attach and repeat accept an argument link which allows you to specify the linkage type, and that if you leave it blank the default linkage type is used. But how do we set the default linkage type?
Let’s first check what linkage types are available by default anyway. Have you noticed an argument named _topology at the end of the attach or repeat methods? The topology refers to the underlying _CHARMM_ topology which hosts the linkage type information. By default a topology is already loaded in BuildAMol’s framework so it is not necessary for the user to specify anything here, but we can check which linkage types are available by:
import buildamol as bam
# get the default topology
topology = bam.get_default_topology()
print(topology.patches)
Any of these linkages can be referenced by their name, e.g. “14bb” or “14ab”.
Wait a second, my desired linkage is not in the list! What now?! Well, you can always define a new Linkage to suit your needs. Check out the documentation on linkages for more information on how to do this. If you have your desired linkage ready to go, set it as the default by:
my_molecule.linkage = my_linkage
# or if you feel "super correct"
my_molecule.set_linkage(my_linkage)
# or if you feel "extra cocky"
my_molecule % my_linkage # <- the modulo operator assignes the "modifier" to the molecule
Now any call to attach, repeat, or any of its operator proxies will use your defined linkage by default.
Setting the default Residue for attachment#
When defining a Linkage we specify which atoms are supposed to be connected and removed, but we do not specify which residues these atoms belong to. We specify this as arguments inside the attach method for instance, but we can also leave this blank, in which case the last residue in the molecule is used by default. This is obviously not always what we want, however! Hence, if we do not want to specify the residue for attachment at every attach call or if we want to use the + operator, we can set the default residue for attachment by setting the attach_residue attribute:
# set the default attachment residue to the first residue
my_molecule.attach_residue = 1
# or
my_molecule.set_attach_residue(1)
# or (if you feel "extra cocky")
my_molecule @ 1 # <- the 'at' operator sets the attachment residue
# serial number indexing also works in reverse
# (set the second last residue as the default attachment residue)
my_molecule.attach_residue = -2
- class buildamol.core.Molecule.Molecule(structure, root_atom: str | int | Atom = None, model: int = 0, chain: str = None)[source]#
Bases:
BaseEntityA molecule to add onto a scaffold. A molecule consists of a single chain.
- Parameters:
structure (bio.PDB.Structure) – A biopython structure object
root_atom (str or int or Atom) – The id or the serial number of the root atom at which the molecule would be attached to a another structure such as protein scaffold or another Molecule.
model (int) – The model to use from the structure. Defaults to 0. This may be any valid identifier for a model in the structure, such as an integer or string.
chain (str) – The chain to use from the structure. Defaults to the first chain in the structure.
- attach(other: Molecule, link: str | Linkage = None, at_residue: int | Residue = None, other_residue: int | Residue = None, use_patch: bool = None, inplace: bool = True, other_inplace: bool = False, _topology=None)[source]#
Attach another structure to this one using a Patch or a Recipe.
- Parameters:
other (Molecule) – The other molecule to attach to this one
link (str or Linkage) – Either a Patch to apply when attaching or a Recipe to use when stitching. If None is defined, the default patch or recipe that was set earlier on the molecule is used.
at_residue (int or Residue) – The residue to attach the other molecule to. If None, the defined attach_residue is used.
other_residue (int or Residue) – The residue in the other molecule to attach this molecule to. If None, the defined attach_residue of the other molecule is used.
use_patch (bool) – If the specified linkage is a patch (has internal coordinates) it can and is by default applied as a patch. However, it can also be used as a recipe. Set this to false if you want to use the patch as a recipe.
inplace (bool) – If True the molecule is directly modified, otherwise a copy of the molecule is returned.
other_inplace (bool) – All atoms from the other molecule are integrated into this one. Hence, the other molecule is left empty. If False, a copy of the other molecule is used. Thus leaving the original molecule intact.
_topology (Topology) – The topology to use when attaching. If None, the topology of the molecule is used. Only used if the patch is a string.
- classmethod empty(id: str = None) Molecule[source]#
Create an Molecule without any atoms in it. This will have a single Model and Chain by default, however.
- Parameters:
id (str) – An id of the Molecule.
- Returns:
An empty Molecule object
- Return type:
- classmethod from_compound(compound: str, by: str = 'id', root_atom: str | int = None) Molecule[source]#
Create a Molecule from a reference compound from the PDBECompounds database
- Parameters:
compound (str) – The compound to search for
by (str) – The field to search by. This can be - “id” for the PDB id - “name” for the name of the compound (must match any known synonym of the iupac name) - “formula” for the chemical formula - “smiles” for the SMILES string (also accepts InChI)
root_atom (str or int) – The id or the serial number of the root atom (optional)
- classmethod from_geometry(geometry: Geometry, atoms: list, id: str = None, resname: str = 'UNK', direction: str = None)[source]#
Create a new Molecule using a molecular geometry and a list of starting atoms. This will place the atoms in the right spacial coordinates and fill up the provided starting atoms with hydrogens to match the geometry’s defined number of atoms.
- Parameters:
geometry (Geometry) – The geometry object
atoms (list) – A list of Atom objects
id (str) – The id of the Molecule.
resname (str) – The resname of the residue to add
direction (str) – The direction of the atoms (in case of a geometry that has planar and axial directions). This can be either “planar” or “axial”.
- Returns:
The Molecule object
- Return type:
- classmethod from_pubchem(query: str, root_atom: str | int = None, by: str = 'name', idx: int = 0) Molecule[source]#
Create a Molecule from PubChem
Note
PubChem follows a different atom labelling scheme than the CHARMM force field! This means that atom names may not match the names required by the default patches that are integrated in buildamol. It is advisable to run autolabel or relabel_hydrogens on the molecule. Naturally, custom patches or recipies working with adjusted atom names will always work.
- Parameters:
query (str) – The query to search for in the PubChem database
root_atom (str or int) – The id or the serial number of the root atom (optional)
by (str) – The method to search by. This can be any of the following: - cid - name - smiles - sdf - inchi - inchikey - formula
idx (int) – The index of the result to use if multiple are found. By default, the first result is used.
- Returns:
The Molecule object
- Return type:
- classmethod from_smiles(smiles: str, id: str = None, root_atom: str | int = None, add_hydrogens: bool = True) Molecule[source]#
Read a Molecule from a SMILES string
- Parameters:
smiles (str) – The SMILES string
id (str) – The id of the Molecule. By default the provided smiles string is used.
root_atom (str or int) – The id or the serial number of the root atom (optional)
add_hydrogens (bool) – Whether to add hydrogens to the molecule
- Returns:
The Molecule object
- Return type:
- get_residue_connections(residue_a=None, residue_b=None, triplet: bool = True, direct_by: str = None)[source]#
Get bonds between atoms that connect different residues in the structure This method is different from infer_residue_connections in that it works with the already present bonds in the molecule instead of computing new ones.
- Parameters:
residue_a (Union[int, str, tuple, bio.Residue.Residue]) – The residues to consider. If None, all residues are considered. Otherwise, only between the specified residues are considered.
residue_b (Union[int, str, tuple, bio.Residue.Residue]) – The residues to consider. If None, all residues are considered. Otherwise, only between the specified residues are considered.
triplet (bool) – Whether to include bonds between atoms that are in the same residue but neighboring a bond that connects different residues. This is useful for residues that have a side chain that is connected to the main chain. This is mostly useful if you intend to use the returned list for some purpose, because the additionally returned bonds are already present in the structure from inference or standard-bond applying and therefore do not actually add any particular information to the Molecule object itself.
direct_by (str) – The attribute to sort by. Can be either “serial”, “resid” or “root”. In the case of “serial”, the bonds are sorted by the serial number of the first atom. In the case of “resid”, the bonds are sorted by the residue id of the first atom. In the case of “root”, the bonds are sorted so that the first atom is graph-closer to the root than the second atom. Set to None to not sort the bonds.
- Returns:
A set of tuples of atom pairs that are bonded and connect different residues
- Return type:
set
- classmethod new(id: str = None, resname: str = 'UNK', atoms: list = None, bonds: list = None) Molecule[source]#
Create a new Molecule with a single residue
- Parameters:
id (str) – The id of the Molecule. By default an id is inferred from the filename.
resname (str) – The resname of the residue to add
atoms (list) – A list of Atom objects to add to the residue
bonds (list) – A list of bonds to add to the residue
- Returns:
A new Molecule object
- Return type:
- optimize(residue_graph: bool = None, algorithm: str = None, rotatron: str = None, rotatron_kws: dict = None, algorithm_kws: dict = None, inplace: bool = True)[source]#
Optimize the molecule’s conformation. This is a convenience method with less customizability than a manual optimization using the optimizers module.
- Parameters:
residue_graph (bool) – Whether to use the residue graph or the full atom graph for optimization. The residue graph is faster but less accurate. If the molecule is larger than 100 atoms, the residue graph is used by default.
algorithm (str) – The optimization algorithm to use. If not provided, an algorithm is determined based on the molecule’s size. This can be one of the following: - “genetic” for a genetic algorithm - “scipy” for a scipy-implemented gradient-based optimization - “swarm” for a particle swarm optimization - “anneal” for a simulated annealing optimization - “rdkit” for an RDKit-implemented force-field-based optimization (defaults to MMFF, if RDKit is installed) - “mmff” for an RDKit-implemented MMFF94 force-field-based optimization (if RDKit is installed) - “uff” for an RDKit-implemented UFF force-field-based optimization (if RDKit is installed)
rotatron (str) – The rotatron to use. This can be one of the following: - “distance” for a distance-based rotatron (default) - “overlap” for an overlap-based rotatron - “forcefield” for a force-field-based rotatron
algorithm_kws (dict) – Keyword arguments to pass to the optimization algorithm
rotatron_kws (dict) – Keyword arguments to pass to the rotatron
inplace (bool) – Whether to optimize the molecule in place or return a copy.
- Returns:
The optimized molecule (either the original object or a copy)
- Return type:
- patch_attach(other: Molecule, patch: Linkage | str = None, at_residue: int | Residue = None, other_residue: int | Residue = None, _topology=None)[source]#
Attach another structure to this one using a Patch.
- Parameters:
other (Molecule) – The other molecule to attach to this one
patch (str or Linkage) – A linkage to apply when attaching. If none is given, the default link that was set earlier on the molecule is used. If no patch was set, an AttributeError is raised. If a string is given, it is interpreted as the name of a patch in the topology.
at_residue (int or Residue) – The residue to attach the other molecule to. If None, the last residue of the molecule.
other_residue (int or Residue) – The residue of the other molecule to attach. If None, the first residue of the other molecule.
_topology – A specific topology to use for referencing. If None, the default CHARMM topology is used.
- react_with(other: Molecule, egroup: FunctionalGroup, ngroup: FunctionalGroup, as_electrophile: bool = True, at_residue: int | entity.base_classes.Residue = None, other_residue: int | entity.base_classes.Residue = None, inplace: bool = True, other_inplace: bool = False) Molecule[source]#
React this molecule with another molecule using functional groups to automatically create a linkage.
- Parameters:
other (Molecule) – The other molecule to react with
egroup (FunctionalGroup) – The electrophilic functional group to use
ngroup (FunctionalGroup) – The nucleophilic functional group to use
as_electrophile (bool) – Whether to use this molecule as the electrophile or the nucleophile
at_residue (int or Residue) – The residue to attach the other molecule to. If None, the last residue of the molecule.
other_residue (int or Residue) – The residue of the other molecule to attach. If None, the first residue of the other molecule.
inplace (bool) – If True the molecule is directly modified, otherwise a copy of the molecule is returned.
other_inplace (bool) – All atoms from the other molecule are integrated into this one. Hence, the other molecule is left empty. If False, a copy of the other molecule is used. Thus leaving the original molecule intact.
- Returns:
The modified molecule (either the original object or a copy)
- Return type:
- repeat(n: int, link=None, inplace: bool = True)[source]#
Repeat the molecule n times into a homo-polymer.
- Parameters:
n (int) – The number or units of the final polymer.
link (str or Patch or Recipe) – The patch or recipe to use when patching individual units together. If noe is given, the default patch or recipe is used (if defined).
inplace (bool) – If True the molecule is directly modified, otherwise a copy of the molecule is returned.
- Returns:
The modified molecule (either the original object or a copy)
- Return type:
- stitch_attach(other: Molecule, recipe: Linkage = None, remove_atoms=None, other_remove_atoms=None, at_atom=None, other_at_atom=None, at_residue=None, other_residue=None)[source]#
Stitch two molecules together by removing atoms and connecting them with a bond. This works without a pre-defined patch.
- Parameters:
other (Molecule) – The other molecule to attach to this one
recipe (Recipe) – The recipe to use when stitching. If None, the default recipe that was set earlier on the molecule is used (if defined).
remove_atoms (list of int) – The atoms to remove from this molecule. Only used if no recipe is provided.
other_remove_atoms (list of int) – The atoms to remove from the other molecule. Only used if no recipe is provided.
at_atom (int or str or Bio.PDB.Atom) – The atom forming the bond in this molecule. If a string is provided, an at_residue needs to be defined from which to get the atom. If None is provided, the root atom is used (if defined). Only used if no recipe is provided.
other_at_atom (int or str or Bio.PDB.Atom) – The atom to attach to in the other molecule. If a string is provided, an other_residue needs to be defined from which to get the atom. If None is provided, the root atom is used (if defined). Only used if no recipe is provided.
at_residue (int or Residue) – The residue to attach the other molecule to. If None, the attach_residue is used. Only used if a recipe is provided and the atoms
- to_smiles(isomeric: bool = True, write_hydrogens: bool = False) str[source]#
Convert the molecule to a SMILES string
- Parameters:
isomeric (bool) – Whether to include stereochemistry information in the SMILES string
write_hydrogens (bool) – Whether to include hydrogens in the SMILES string
- Returns:
The SMILES string
- Return type:
str
- buildamol.core.Molecule.acetylate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Acetylate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to acetylate
at_atom (int or str or Atom) – The atom to acetylate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to acetylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the acetyl group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to acetylate the molecule in place or return a new molecule
- Returns:
The acetylated molecule
- Return type:
- buildamol.core.Molecule.amidate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Amidate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to amidate
at_atom (int or str or Atom) – The atom to amidate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to amidate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the amide group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to amidate the molecule in place or return a new molecule
- Returns:
The amidated molecule
- Return type:
- buildamol.core.Molecule.aminate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Aminate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to amidate
at_atom (int or str or Atom) – The atom to amidate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to amidate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the amine group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to amidate the molecule in place or return a new molecule
- Returns:
The aminated molecule
- Return type:
- buildamol.core.Molecule.benzylate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Add a benzyl group to a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to benzylate
at_atom (int or str or Atom) – The atom to benzylate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to benzylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the benzyl group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to benzylate the molecule in place or return a new molecule
- Returns:
The benzylated molecule
- Return type:
- buildamol.core.Molecule.carboxylate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Carboxylate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to carboxylate
at_atom (int or str or Atom) – The atom to carboxylate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to carboxylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the carboxyl group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to carboxylate the molecule in place or return a new molecule
- Returns:
The carboxylated molecule
- Return type:
- buildamol.core.Molecule.connect(mol_a: Molecule, mol_b: Molecule, link: str | Linkage, at_residue_a: int | Residue = None, at_residue_b: int | Residue = None, copy_a: bool = True, copy_b: bool = True, _topology=None, use_patch: bool = True) Molecule[source]#
Connect two molecules together
- Parameters:
mol_a (Molecule) – The first (target) molecule
mol_b (Molecule) – The second (source) molecule
link (Linkage or str) – The linkage to use for connection. This can be either an instance of the Linkage class or a string identifier of a pre-defined patch in the (currently loaded default or specified) CHARMMTopology.
at_residue_a (int or bio.PDB.Residue) – The residue of the first molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
at_residue_b (int or bio.PDB.Residue) – The residue of the second molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
copy_a (bool) – Whether to copy the first molecule before connecting
copy_b (bool) – Whether to copy the second molecule before connecting. If False, all atoms of the second molecule will be added to the first molecule.
_topology (CHARMMTopology) – A specific topology to use in case a pre-existing patch is used as link and only the string identifier is supplied.
use_patch (bool) – If the linkage has internal coordinates available (i.e. is a “patch”) these are used by default. Set this to False to force-use stitching and its associated conformational optimization instead.
- Returns:
The connected molecule
- Return type:
- buildamol.core.Molecule.hydroxylate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Hydroxylate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to hydroxylate
at_atom (int or str or Atom) – The atom to hydroxylate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to hydroxylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the hydroxyl group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to hydroxylate the molecule in place or return a new molecule
- Returns:
The hydroxylated molecule
- Return type:
- buildamol.core.Molecule.make_smiles(mol: Molecule, isomeric: bool = True, write_hydrogens: bool = False) str[source]#
Generate a SMILES string from a molecule.
- Parameters:
mol (Molecule) – The molecule
isomeric (bool) – Whether to include stereochemistry information
write_hydrogens (bool) – Whether to include hydrogens in the SMILES string
- Returns:
smiles – The SMILES string
- Return type:
str
- buildamol.core.Molecule.methylate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Methylate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to methylate
at_atom (int or str or Atom) – The atom to methylate.This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) –
- The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
This atom needs to be in the same residue as the atom to methylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the methyl group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to methylate the molecule in place or return a new molecule
- Returns:
The methylated molecule
- Return type:
- buildamol.core.Molecule.molecule(mol=None) Molecule[source]#
Generate a molecule from an input. If the input is a string, the string can be a PDB id, some filename, SMILES or InChI string, IUPAC name or abbreviation. This function will try its best to automatically generate the molecule with minimal user effort. However, using a dedicated classmethod is recommended for more efficient and predictable results.
- Parameters:
mol (str or structure-like object) – An input string or structure-like object such as a BioPython Structure or RDKit Molecule, etc. If nothing is provided, a new empty molecule is generated.
- Returns:
molecule – The generated molecule
- Return type:
Examples
>>> from buildamol import molecule >>> mol = molecule("GLC") >>> mol = molecule("GLC.pdb") >>> mol = molecule("alpha-d-glucose")
- buildamol.core.Molecule.phenolate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, how: str = 'para', as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Add a phenol group to a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to phenolate
at_atom (int or str or Atom) – The atom to phenolate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to phenolate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
how (str) – The position of the hydroxyl group on the phenol. Can be one of “ortho”, “meta”, or “para”.
as_new_residue (bool) – Whether to attach the phenol group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to phenolate the molecule in place or return a new molecule
- Returns:
The phenolated molecule
- Return type:
- buildamol.core.Molecule.phosphorylate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Phosphorylate a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to phosphorylate
at_atom (int or str or Atom) – The atom to phosphorylate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to phosphorylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the phosphate as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to phosphorylate the molecule in place or return a new molecule
- Returns:
The phosphorylated molecule
- Return type:
- buildamol.core.Molecule.polymerize(mol: Molecule, n: int, link: str | Linkage = None, inplace: bool = False) Molecule[source]#
Polymerize a molecule
- Parameters:
- Returns:
The polymerized molecule
- Return type:
- buildamol.core.Molecule.query_pubchem(query: str, by: str = 'name') Molecule[source]#
Query the PubChem database for a given query string to obtain a Molecule object.
- Parameters:
query (str) – The query string
by (str) –
The type of query to perform. Can be one of: he method to search by. This can be any of the following:
cid
name
smiles
sdf
inchi
inchikey
formula
- Returns:
The molecule or None if no match was found
- Return type:
Molecule or None
- buildamol.core.Molecule.react(mol_a: Molecule, mol_b: Molecule, egroup: FunctionalGroup = None, ngroup: FunctionalGroup = None, a_is_electrophile: bool = True, at_residue_a: int | bio.Residue.Residue = None, at_residue_b: int | bio.Residue.Residue = None, reaction: Reaction = None, copy_a: bool = True, copy_b: bool = True) Molecule[source]#
Connect two molecules together by imitating a chemical reaction based on functional groups.
- Parameters:
mol_a (Molecule) – The first (target) molecule
mol_b (Molecule) – The second (source) molecule
egroup (FunctionalGroup) – The functional group of the first molecule to connect to.
ngroup (FunctionalGroup) – The functional group of the second molecule to connect to.
a_is_electrophile (bool) – Whether the first molecule is the electrophile (True) or nucleophile (False)
at_residue_a (int or bio.PDB.Residue) – The residue of the first molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
at_residue_b (int or bio.PDB.Residue) – The residue of the second molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
reaction (Reaction) – A specific Reaction instance to use instead of functional groups. If given the other functional group and residue arguments are ignored.
copy_a (bool) – Whether to copy the first molecule before connecting
copy_b (bool) – Whether to copy the second molecule before connecting. If False, all atoms of the second molecule will be added to the first molecule.
- Returns:
The connected molecule
- Return type:
- buildamol.core.Molecule.read_cif(filename: str, id: str = None) Molecule[source]#
Read a CIF file and return a molecule.
- Parameters:
filename (str) – The path to the CIF file
id (str) – The id of the molecule
- Returns:
molecule – The molecule
- Return type:
- buildamol.core.Molecule.read_molfile(filename: str, id: str = None) Molecule[source]#
Read a MOL file and return a molecule.
- Parameters:
filename (str) – The path to the MOL file
id (str) – The id of the molecule
- Returns:
molecule – The molecule
- Return type:
- buildamol.core.Molecule.read_pdb(filename: str, id: str = None, multimodel: bool = False, model: int = None, has_atom_ids: bool = True) Molecule[source]#
Read a PDB file and return a molecule.
- Parameters:
filename (str) – The path to the PDB file
id (str) – The id of the molecule
multimodel (bool) – Whether to read all models and return a list of molecules. If False only one model is read by default. If you desire that all models are read into a single molecule rather than into a list of separate molecules you can use the following tweak: multimodel=False, model=’all’ (this does not work with model=<some list> though…)
model (int or str) – The model number to read. If None, all models are read. This can be a list or tuple of integers or strings to read multiple models.
has_atom_ids (bool) – Whether the PDB file contains atom ids. If the file does not, the atom ids can be auto-generated if this is set to false.
- Returns:
molecule – The molecule or a list of molecules if multimodel is True
- Return type:
Molecule or list
- buildamol.core.Molecule.read_smiles(smiles: str, id: str = None) Molecule[source]#
Read a SMILES string and return a molecule.
- Parameters:
smiles (str) – The SMILES string
- Returns:
molecule – The molecule
- Return type:
- buildamol.core.Molecule.thiolate(mol: Molecule, at_atom: int | str | Atom, delete: int | str | Atom = None, as_new_residue: bool = True, inplace: bool = True) Molecule[source]#
Add a thiol group to a molecule at one or more specific atoms
- Parameters:
mol (Molecule) – The molecule to thiolate
at_atom (int or str or Atom) – The atom to thiolate. This can be any input that will allow to obtain an Atom object from the molecule. Alternatively, a list of such inputs can be provided as well.
delete (int or str or Atom) – The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule. This atom needs to be in the same residue as the atom to thiolate. If not provided, any Hydrogen atom attached to the at_atom will be deleted. If at_atom is a list, delete can be a list of the same length or None.
as_new_residue (bool) – Whether to attach the thiol group as a new residue or merge it into the same residue as at_atom.
inplace (bool) – Whether to thiolate the molecule in place or return a new molecule
- Returns:
The thiolated molecule
- Return type:
- buildamol.core.Molecule.write_cif(mol: Molecule, filename: str) None[source]#
Write a molecule to a CIF file.
- Parameters:
mol (Molecule) – The molecule to write
filename (str) – The path to the CIF file
The Linkage module#
The Linkage module defines the Linkage class that is used to connect two molecules in a specific way.
Linkage definitions#
A linkage is a connection between two _molecules_. At its core each linkage simply defines two atoms that should be connected, and what atoms to remove in the process. It is a “pseudo” chemical reaction, so to speak.
Building on the CHARMM force field, BuildAMol distinguishes two kinds of linkages: patches and recipies.
A patch is a linkage that can be applied purely geometrically and does not require numeric optimization. This is because a patch includes geometric data in form of _internal coordinates_ of the atoms in the immediate vicinity of the newly formed bond. Using this data, BuildAMol is able to attach molecule to one another through simple matrix transformations. Conesquently, patches are the most efficient way to connect molecules and are preferable to recipes - the other type of linkage.
A recipe on the other hand is a linkage that requires numeric optimization. This is because a recipe does not include any geometric data, but only the atoms that should be connected. The numeric optimization is then used to find the optimal (or at least suitable) conformation. This is useful for most users who wish to define their own linkage types, but who will likely not wish to painstakingly define the detailed geometry of angles and dihedrals of the atom neighborhood.
The distinction between patches and recipies is purely nominal, as both are represented by the Linkage class. However, there are functional wrappers available to create either a patch or recipe, respectively, which require different arguments (to make sure they are not forgotten and to make the code more readable).
from buildamol import recipe
# Create a custom recipe
my_link = recipe(
atom1 = "C1",
atom2 = "O4",
delete_in_target = ["O1", "HO1"],
delete_in_source = ["HO4"],
id = "my_link"
)
Pre-defined patches#
BuildAMol comes with a number of pre-defined patches from the CHARMM force field. These can be accessed through the resources module:
from buildamol import resources
# Get a list of all pre-defined patches
patches = resources.available_patches()
# Check for a specific patch
resources.has_patch("some_patch")
# Get a specific patch
my_patch = resources.get_patch("some_patch")
A custom linkage can be added to the list of pre-defined patches by using the add_patch function:
# add the above defined my_link to the list of pre-defined patches
resources.add_patch(my_link)
Note
Despite the use of “patch” in the function nomenclature, there is no difference between a patch and a recipe in terms of how they are used. Patches and Recipies are represented by the same data class and thus behave identically. Hence, there are also functional wrappers with the “linkage” available that can be used instead (if a user feels more comfortable with this) - they perform the same function.
resources.add_linkage(my_link)
# performs the same as
resources.add_patch(my_link)
# check for a specific linkage
resources.has_linkage("my_link")
# performs the same as
resources.has_patch("my_link")
# etc.
Pre-defined patches can be accessed directly by their id and need not be obtained first through the resources module. They can be directly passed
to the Molecule’s attach method or any other function that requires a linkage:
import buildamol as bam
mol1 = bam.read_pdb("my_molecule.pdb")
mol2 = bam.read_pdb("my_other_molecule.pdb")
# Attach mol2 to mol1 using the pre-defined patch "some_patch"
mol1.attach(mol2, "some_patch")
# works the same as doing
some_patch = bam.get_patch("some_patch")
mol1.attach(mol2, some_patch)
- class buildamol.core.Linkage.Linkage(id=None, description: str = None, automatically_delete_downstream_atoms: bool = False)[source]#
Bases:
AbstractEntity_with_ICUsing the Linkage class, a template reaction instruction is stored for attaching molecules to one another.
- Parameters:
id (str, optional) – The ID of the linkage.
description (str, optional) – An additional description of the linkage.
automatically_delete_downstream_atoms (bool, optional) – Whether to automatically delete all atoms downstream of a linker and deleted atom. This is useful for linkers that are part of a larger group that should be removed (e.g. a carboxyl group) without having to specify all atoms to delete manually.
- id#
The ID of the linkage.
- Type:
str
- bond#
The bond to form between the two molecules.
- Type:
tuple of str
- internal_coordinates#
The internal coordinates of the atoms in the immediate vicinity of the newly formed bond.
- Type:
list of InternalCoordinate
- deletes#
The atom IDs to delete in a tuple of lists where the first list contains the atom IDs to delete from the first structure (target) and the second one from the second structure (source)
- Type:
tuple of list of str
- atoms#
The atom IDs of the atoms in the linkage.
- Type:
list of str
- add_delete(id, _from: str = None)[source]#
Add an atom ID to delete
- Parameters:
id (str) – The atom ID to delete.
_from (str, optional) – The structure from which to delete the atom. Can be either “source” or “target”. If not provided, the structure is inferred from the atom ID, in which case either 1 (target) or 2 (source) must be the first character of the ID.
- add_id_to_delete(id, _from: str = None)#
Add an atom ID to delete
- Parameters:
id (str) – The atom ID to delete.
_from (str, optional) – The structure from which to delete the atom. Can be either “source” or “target”. If not provided, the structure is inferred from the atom ID, in which case either 1 (target) or 2 (source) must be the first character of the ID.
- apply(target, source, target_residue=None, source_residue=None)[source]#
Apply the linkage to the two molecules. This will delete the atoms that should be deleted and form the bond between the two molecules.
Note that this method does NOT perform any kind of geometric changes to the molecules themselves. It only adds the bond between the two molecules. It also does NOT merge the two molecules into one! Use the Molecule.attach method (or the connect toplevel function) to actually connect two molecules!
- Parameters:
target (Molecule) – The first molecule.
source (Molecule) – The second molecule.
target_residue (Residue, optional) – The residue in the target molecule to which the source molecule will be patched. By default, the attach_residue in the target molecule will be used.
source_residue (Residue, optional) – The residue in the source molecule that will be patched into the target molecule. By default, the attach_residue in the source molecule will be used.
- apply_bond(target, source, target_residue=None, source_residue=None)[source]#
Form the bond between the two molecules.
Note that this method does NOT perform any kind of geometric changes to the molecules themselves. It only adds the bond between the two molecules. It also does NOT merge the two molecules into one! Use the Molecule.attach method (or the connect toplevel function) to actually connect two molecules!
- Parameters:
target (Molecule) – The first molecule.
source (Molecule) – The second molecule.
target_residue (Residue, optional) – The residue in the target molecule to which the source molecule will be patched. By default, the attach_residue in the target molecule will be used.
source_residue (Residue, optional) – The residue in the source molecule that will be patched into the target molecule. By default, the attach_residue in the source molecule will be used.
- apply_deletes(target=None, source=None, target_residue=None, source_residue=None)[source]#
Delete atoms that should be deleted from the molecules as part of the linkage.
- Parameters:
target (Molecule) – The first molecule.
source (Molecule) – The second molecule.
target_residue (Residue, optional) – The residue in the target molecule to which the source molecule will be patched. By default, the attach_residue in the target molecule will be used.
source_residue (Residue, optional) – The residue in the source molecule that will be patched into the target molecule. By default, the attach_residue in the source molecule will be used.
- property atom1: str#
The atom ID of the first atom in the bond.
- property atom2: str#
The atom ID of the second atom in the bond.
- property bond: tuple#
The bond to form between the two molecules.
- can_apply(target, source, target_residue=None, source_residue=None) bool[source]#
Check if the linkage can be applied to the two molecules.
- Parameters:
target (Molecule) – The first molecule.
source (Molecule) – The second molecule.
target_residue (Residue, optional) – The residue in the target molecule to which the source molecule will be patched. By default, the attach_residue in the target molecule will be used.
source_residue (Residue, optional) – The residue in the source molecule that will be patched into the target molecule. By default, the attach_residue in the source molecule will be used.
- Returns:
True if the linkage can be applied, False otherwise.
- Return type:
bool
- can_be_source(molecule, residue=None)[source]#
Check if the linkage can be applied to the molecule as the source.
- Parameters:
- Returns:
True if the linkage can be applied to the molecule, False otherwise.
- Return type:
bool
- can_be_target(molecule, residue=None)[source]#
Check if the linkage can be applied to the molecule as the target.
- Parameters:
- Returns:
True if the linkage can be applied to the molecule, False otherwise.
- Return type:
bool
- copy() Linkage[source]#
Create a copy of the Linkage instance.
- Returns:
A new Linkage instance that is a copy of the original.
- Return type:
- property deletes#
Returns the atom IDs to delete in a tuple of lists where the first list contains the atom IDs to delete from the first structure (target) and the second one from the second structure (source)
- classmethod from_bond(bond: Bond, id: str = None, description: str = None, automatically_delete_downstream_atoms: bool = True) Linkage[source]#
Make a new Linkage instance from a bond.
- Parameters:
bond (Bond) – The bond to form between the two atoms.
id (str, optional) – The ID of the linkage.
description (str, optional) – An additional description of the linkage.
- classmethod from_functional_groups(emol: Molecule, egroup: FunctionalGroup, nmol: Molecule, ngroup: FunctionalGroup, automatically_delete_downstream_atoms: bool = True)[source]#
Create a new Linkage instance from two functional groups.
- Parameters:
emol (Molecule) – The first (target) molecule that houses the electrophile. The attach residue will be used as reference residue to match atoms to the functional group.
egroup (FunctionalGroup) – The electrophile functional group.
nmol (Molecule) – The second (source) molecule that houses the nucleophile. The attach residue will be used as reference residue to match atoms to the functional group.
ngroup (FunctionalGroup) – The nucleophile functional group.
- classmethod from_json(filename: str)[source]#
Make a new Linkage instance from a JSON file.
- Parameters:
filename (str) – The JSON filename.
- classmethod from_xml(filename: str)[source]#
Make a new Linkage instance from an XML file.
- Parameters:
filename (str) – The XML filename.
- identify_atoms(target, source, target_residue=None, source_residue=None)[source]#
Identify the atoms in the two molecules that are part of the linkage. If any of the binder or deleter atoms could not be identified, this method will raise a ValueError.
- Parameters:
target (Molecule) – The first molecule.
source (Molecule) – The second molecule.
target_residue (Residue, optional) – The residue in the target molecule to which the source molecule will be patched. By default, the attach_residue in the target molecule will be used.
source_residue (Residue, optional) – The residue in the source molecule that will be patched into the target molecule. By default, the attach_residue in the source molecule will be used.
- Returns:
atom1 (Atom) – The first atom in the bond.
atom2 (Atom) – The second atom in the bond.
delete_in_target (list of Atom) – The atoms to delete in the target molecule.
delete_in_source (list of Atom) – The atoms to delete in the source molecule.
- reverse(inplace: bool = True) Linkage[source]#
Reverse the linkage, i.e. swap the atoms in the bond and the deletes.
- Parameters:
inplace (bool, optional) – If True, the linkage will be reversed in place. If False, a new reversed linkage will be returned. Default is True.
- Returns:
The reversed linkage if inplace is False, otherwise None.
- Return type:
- buildamol.core.Linkage.linkage(atom1, atom2, delete_in_target=None, delete_in_source=None, internal_coordinates: dict = None, id: str = None, description: str = None, automatically_delete_downstream_atoms: bool = True) Linkage[source]#
Make a new Linkage instance to connect two molecules together.
- Parameters:
atom1 (str) – The atom in the first (target) molecule to connect.
atom2 (str) – The atom in the second (source) molecule to connect.
delete_in_target (str or tuple of str, optional) – The atom(s) in the first molecule to delete. If not provided, any Hydrogen atom bound to atom1 will be deleted.
delete_in_source (str or tuple of str, optional) – The atom(s) in the second molecule to delete. If not provided, any Hydrogen atom bound to atom2 will be deleted.
internal_coordinates (dict, optional) –
The internal coordinates of the atoms in the immediate vicinity of the newly formed bond. If provided, the link can be applied purely geometrically and will not require numeric optimization. If provided, this must be a dictionary where keys are tuples of four atoms ids and values tuples containing (in order):
the bond length between the first and second atom (first and third in case of an improper)
the bond length between the third and fourth atom
the bond angle between the first, second and third atom
the bond angle between the second, third and fourth atom
the dihedral angle between the first, second, third and fourth atom
True if the internal coordinate is improper, False otherwise
id (str, optional) – The ID of the linkage.
description (str, optional) – A description of the linkage.
automatically_delete_downstream_atoms (bool, optional) – Whether to automatically delete all atoms downstream of a linker and deleted atom. This is useful for linkers that are part of a larger group that should be removed (e.g. a carboxyl group) without having to specify all atoms to delete manually.
- Returns:
The new linkage instance.
- Return type:
- buildamol.core.Linkage.patch(atom1, atom2, delete_in_target, delete_in_source, internal_coordinates: dict, id: str = None, description: str = None) Linkage[source]#
Make a new Linkage instance that describes a “patch” between two molecules. A patch is a linkage that can be applied purely geometrically and does not require numeric optimization. As such, it requires the internal coordinates of the atoms in the immediate vicinity of the newly formed bond.
- Parameters:
atom1 (str or tuple of str) – The atom in the first (target) molecule to connect.
atom2 (str or tuple of str) – The atom in the second (source) molecule to connect.
delete_in_target (str or tuple of str) – The atom(s) in the first molecule to delete.
delete_in_source (str or tuple of str) – The atom(s) in the second molecule to delete.
internal_coordinates (dict, optional) –
The internal coordinates of the atoms in the immediate vicinity of the newly formed bond. If provided, the link can be applied purely geometrically and will not require numeric optimization. If provided, this must be a dictionary where keys are tuples of four atoms ids and values tuples containing (in order):
the bond length between the first and second atom (first and third in case of an improper)
the bond length between the third and fourth atom
the bond angle between the first, second and third atom
the bond angle between the second, third and fourth atom
the dihedral angle between the first, second, third and fourth atom
True if the internal coordinate is improper, False otherwise
id (str, optional) – The id of the linkage.
description (str, optional) – A description of the linkage.
- Returns:
The new linkage.
- Return type:
- buildamol.core.Linkage.recipe(atom1, atom2, delete_in_target=None, delete_in_source=None, id: str = None, description: str = None, automatically_delete_downstream_atoms: bool = True) Linkage[source]#
Make a new Linkage instance that describes a “recipe” to connect two molecules. A recipe is a linkage that can be applied numerically and requires numeric optimization as it does not have the internal coordinates of the atoms in the immediate vicinity of the newly formed bond.
- Parameters:
atom1 (str or tuple of str) – The atom in the first (target) molecule to connect.
atom2 (str or tuple of str) – The atom in the second (source) molecule to connect.
delete_in_target (str or tuple of str) – The atom(s) in the first molecule to delete. If not provided, any Hydrogen atom bound to atom1 will be deleted.
delete_in_source (str or tuple of str) – The atom(s) in the second molecule to delete. If not provided, any Hydrogen atom bound to atom2 will be deleted.
id (str, optional) – The id of the linkage.
description (str, optional) – A description of the linkage.
automatically_delete_downstream_atoms (bool, optional)
- Returns:
The new linkage.
- Return type:
The Reaction module#
The Reaction module defines the Reaction class that is used to model chemical reactions between molecules.
- class buildamol.core.Reaction(atom1: Atom | callable, atom2: Atom | callable, delete_in_target: List | callable = None, delete_in_source: List | callable = None, bond_order: int = 1)[source]#
Bases:
objectA class representing a (pseudo-) chemical reaction between two molecules. It serves as a factory to create Linkages from callables rather than direct atom identifiers.
- Parameters:
atom1 (Union[Atom, callable]) – The atom in the target molecule to which the source molecule will be connected. This can be an Atom object or a callable that takes a Molecule and returns an Atom
atom2 (Union[Atom, callable]) – The atom in the source molecule which will be connected to the target molecule. This can be an Atom object or a callable that takes a Molecule and returns an Atom
delete_in_target (Union[List, callable], optional) – A list of atoms in the target molecule to be deleted upon connection, or a callable that takes the atom1 and the target molecule and returns such a list. Default Hydrogen-deletion is applied if None.
delete_in_source (Union[List, callable], optional) – A list of atoms in the source molecule to be deleted upon connection, or a callable that takes the atom2 and the source molecule and returns such a list. Default Hydrogen-deletion is applied if None.
bond_order (int, optional) – The bond order of the new bond formed between atom1 and atom2. Default is 1 (single bond).
- apply(target: Molecule, source: Molecule, inplace: bool = False) Molecule[source]#
Apply the reaction to two molecules, creating a new molecule with the linkage applied. This is the same as calling the Reaction object directly.
- Parameters:
target (Molecule) – The target molecule to which the source molecule will be connected.
source (Molecule) – The source molecule which will be connected to the target molecule.
inplace (bool, optional) – If True, modify the target molecule in place. If False, create a copy of the target molecule. Default is False.
- Returns:
A new Molecule object with the linkage applied.
- Return type:
- can_apply(target: Molecule, source: Molecule) bool[source]#
Check if the reaction can be applied to the given target and source molecules. This checks if the specified atoms and deletions are valid in the context of the provided molecules and stores the resolved atoms and deletions in memory for later use.
This method is automatically called by the apply method.
- create_linkage(target: Molecule, source: Molecule) Linkage[source]#
Create one or more Linkage object(s) based on the current reaction parameters and the provided molecules. This does not modify the molecules, it only creates the Linkage object(s).
This method is automatically called by the apply method. It requires that can_apply has been called beforehand to ensure that the reaction can be applied.
- Parameters:
- Returns:
A Linkage object or a list of Linkage objects representing the connection(s) to be made between the target and source molecules.
- Return type:
- classmethod from_reactivities(nucleophile: Reactivity, electrophile: Reactivity, bond_order: int = 1, target_is_electrophile: bool = True)[source]#
Set up a Reaction from two Reactivity objects, one for the nucleophile (source) and one for the electrophile (target). This is a convenience method to quickly create a Reaction from predefined Reactivity patterns.
- Parameters:
nucleophile (Reactivity) – The Reactivity object defining the nucleophilic behavior of the source molecule.
electrophile (Reactivity) – The Reactivity object defining the electrophilic behavior of the target molecule.
bond_order (int, optional) – The bond order of the new bond formed between the nucleophile and electrophile. Default is 1 (single bond).
target_is_electrophile (bool, optional) – Set to False to modify the roles of nucleophile and electrophile, i.e. the target molecule is the nucleophile and the source molecule is the electrophile.
- set_reactivity(atom1: Atom | callable = None, atom2: Atom | callable = None, delete_in_target: List | callable = None, delete_in_source: List | callable = None, bond_order: int = None)[source]#
Set new parameters for the reaction.
- Parameters:
atom1 (Union[Atom, callable], optional) – The atom in the target molecule to which the source molecule will be connected. This can be an Atom object or a callable that takes a Molecule and returns an Atom
atom2 (Union[Atom, callable], optional) – The atom in the source molecule which will be connected to the target molecule. This can be an Atom object or a callable that takes a Molecule and returns an Atom
delete_in_target (Union[List, callable], optional) – A list of atoms in the target molecule to be deleted upon connection, or a callable that takes the atom1 and the target molecule and returns such a list. Default Hydrogen-deletion is applied if None.
delete_in_source (Union[List, callable], optional) – A list of atoms in the source molecule to be deleted upon connection, or a callable that takes the atom2 and the source molecule and returns such a list. Default Hydrogen-deletion is applied if None.
bond_order (int, optional) – The bond order of the new bond formed between atom1 and atom2. Default is 1 (single bond).
- with_reactivity(atom1: Atom | callable = None, atom2: Atom | callable = None, delete_in_target: List | callable = None, delete_in_source: List | callable = None, bond_order: int = None)[source]#
Create a new Reaction with modified parameters.
- Parameters:
atom1 (Union[Atom, callable], optional) – The atom in the target molecule to which the source molecule will be connected. This can be an Atom object or a callable that takes a Molecule and returns an Atom
atom2 (Union[Atom, callable], optional) – The atom in the source molecule which will be connected to the target molecule. This can be an Atom object or a callable that takes a Molecule and returns an Atom
delete_in_target (Union[List, callable], optional) – A list of atoms in the target molecule to be deleted upon connection, or a callable that takes the atom1 and the target molecule and returns such a list. Default Hydrogen-deletion is applied if None.
delete_in_source (Union[List, callable], optional) – A list of atoms in the source molecule to be deleted upon connection, or a callable that takes the atom2 and the source molecule and returns such a list. Default Hydrogen-deletion is applied if None.
bond_order (int, optional) – The bond order of the new bond formed between atom1 and atom2. Default is 1 (single bond).
- Returns:
A new Reaction object with the modified parameters.
- Return type:
The base module#
The entity module defines the BaseEntity class that is the base class for Molecules (and whatever other classes a user may wish to define that are similar in concept).
The BaseEntity class
- class buildamol.core.entity.BaseEntity(structure, model: int = 0)[source]#
Bases:
objectThe Base class for all classes that store and handle molecular structures. This class is not meant to be used directly but serves as the base for the Molecule class.
- Parameters:
structure (Structure or Bio.PDB.Structure) – A BuildAMol or Biopython structure
model (int) – The index of the model to use (default: 0)
- add_atoms(*atoms: Atom, residue=None, _copy: bool = False)[source]#
Add atoms to the structure. This will automatically adjust the atom’s serial number to fit into the structure.
- Parameters:
atoms (base_classes.Atom) – The atoms to add
residue (int or str) – The residue to which the atoms should be added, this may be either the seqid or the residue name, if None the atoms are added to the last residue. Note, that if multiple identically named residues are present, the first one is chosen, so using the seqid is a safer option!
_copy (bool) – If True, the atoms are copied and then added to the structure. This will leave the original atoms (and their parent structures) untouched.
- add_bond(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom, order: int = 1)[source]#
Add a bond between two atoms
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
order (int) – The order of the bond, i.e. 1 for single, 2 for double, 3 for triple, etc.
- add_bonds(*bonds)[source]#
Add multiple bonds at once.
- Parameters:
bonds – The bonds to add, each bond is a tuple of two atoms. Each atom may be specified directly (BuildAMol object) or by providing the serial number, the full_id or the id of the atoms.
- add_chains(*chains: Chain, adjust_seqid: bool = True, _copy: bool = False)[source]#
Add chains to the structure
- Parameters:
chains (base_classes.Chain) – The chains to add
adjust_seqid (bool) – If True, the seqid of the chains is adjusted to match the current number of chains in the structure (i.e. a new chain can be given seqid A, and it will be adjusted to the correct value of C if there are already two other chains in the molecule).
_copy (bool) – If True, the chains are copied before adding them to the molecule. This is useful if you want to add the same chain to multiple molecules, while leaving them and their original parent structures intakt.
- add_hydrogens(*atoms: int | str | Atom)[source]#
Infer missing hydrogens in the structure.
- Parameters:
atoms – The atoms to infer hydrogens for. If None, all atoms are considered.
- add_model(model: int | Model = None)[source]#
Add a new model to the molecule’s structure
- Parameters:
model (int or Model) – If not given, a new completely blank model is created. If an integer is given, an existing model is copied and added to the molecule. If a Model object is given, it is added to the molecule.
- add_residues(*residues: Residue, adjust_seqid: bool = True, _copy: bool = False)[source]#
Add residues to the structure
- Parameters:
residues (base_classes.Residue) – The residues to add
adjust_seqid (bool) – If True, the seqid of the residues is adjusted to match the current number of residues in the structure (i.e. a new residue can be given seqid 1, and it will be adjusted to the correct value of 3 if there are already two other residues in the molecule).
_copy (bool) – If True, the residues are copied before adding them to the molecule. This is useful if you want to add the same residue to multiple molecules, while leaving them and their original parent structures intakt.
- adjust_bond_length(atom1, atom2, length: float, move_descendants: bool = False)[source]#
Adjust the bond length between two atoms
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (Atom object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (Atom object) or by providing the serial number, the full_id or the id of the atoms.
length (float) – The new bond length
move_descendants (bool) – If True, this method will infer all descendant atoms and move them accordingly to preserve the overall geometry of the molecule. It will make things slower, however!
- adjust_indexing(mol)[source]#
Adjust the indexing of a molecule to match the scaffold index
- Parameters:
mol (Molecule) – The molecule to adjust the indexing of
- adjust_to_ph(ph: float | int | tuple, inplace: bool = True, **kwargs)[source]#
Adjust the protonation state and charges to match a certain pH
Note
This requires rdkit and molscrub packages to be installed!
- Parameters:
ph (float or tuple) – The pH value to adjust the structure to. If a tuple is given, a pH range can be specified as (low, high).
inplace (bool) – If True, the structure is modified in place, otherwise a new structure is returned.
**kwargs – Additional keyword arguments to pass to the scrub class of the molscrub package.
- align_to(axis: str | ndarray)[source]#
Align the structure (via it’s primary axis, i.e. the axis perpendicular to the main plane) to some other axis. This will rotate the molecule so that the primary axis is aligned with the given axis. This only works for (more or less) planar molecules.
- Parameters:
axis (str or np.ndarray) – The axis to align to. This can be either a unit vector or one of the strings “x”, “y”, or “z” to align to the respective axes.
- apply_standard_bonds(_compounds=None) list[source]#
Use reference compounds to infer bonds in the structure. This will be exclusively based on the residue and atom ids and not on the actual distances between atoms.
- Parameters:
_compounds – The compounds to use for the standard bonds. If None, the default compounds are used.
- Returns:
A list of tuples of atom pairs that are bonded
- Return type:
list
- apply_standard_bonds_for(*residues, _compounds=None) list[source]#
Use reference compounds to infer bonds in the structure for specific residues. This will be exclusively based on the residue and atom ids and not on the actual distances between atoms.
- Parameters:
residues – The residues to consider
_compounds – The compounds to use for the standard bonds. If None, the default compounds are used.
- Returns:
A list of tuples of atom pairs that are bonded
- Return type:
list
- property atoms#
A sorted list of all atoms in the structure
- property attach_residue#
The residue at which to attach other molecules to this one.
- autolabel(atoms: list = None)[source]#
Automatically label atoms in the structure to match the CHARMM force field atom nomenclature. This is useful if you want to use some pre-generated PDB file that may have used a different labelling scheme for atoms.
- Parameters:
atoms (list) – Optionally restrict the autolabelling to a specific set of atoms. If None, all atoms are considered.
- Returns:
The molecule with the autolabelled atoms (in-place modification).
- Return type:
Note
The labels are infererred and therefore may occasionally not be “correct”. It is advisable to check the labels after using this method.
- bend_at_bond(atom1: str | int | Atom, atom2: str | int | Atom, angle: float, neighbor: str | int | Atom = None, angle_is_degrees: bool = True)[source]#
Bend the molecule at a specific bond. This will rotate the atoms downstream of the bond in direction atom1->atom2 by the given angle. The axis of rotation will be the plane vector specified by the two atoms and one neighboring atom. A specific neighbor can be provided to ensure a specific plane is used (recommended), otherwise a random neighbor of atom1 will be used (preference is given to non-Hydrogens but a Hydrogen will be used if no other neighbor is found).
- Parameters:
atom1 (Union[str, int, base_classes.Atom]) – The first atom of the bond
atom2 (Union[str, int, base_classes.Atom]) – The second atom of the bond
angle (float) – The angle to bend by
neighbor (Union[str, int, base_classes.Atom], optional) – The atom to use as a neighbor for the plane vector, by default None, in which case a random neighbor of atom1 will be used. It is recommended to specify this to ensure a specific plane is used.
angle_is_degrees (bool, optional) – Whether the angle is given in degrees (default) or radians
- property bonds#
All bonds in the molecule
- property center_of_geometry#
The center of geometry of the molecule
- property center_of_mass#
The center of mass of the molecule
- property chains#
A sorted list of all chains in the molecule
- change_element(atom: int | Atom, element: str, adjust_bond_length: bool = True)[source]#
Change the element of an atom. This will automatically add or remove hydrogens if the new element has a different valency.
- Parameters:
atom (int or base_classes.Atom) – The atom to rename, either the object itself or its serial number
element (str) – The new element
adjust_bond_length (bool) – If True, adjust the bond length to match the new element. This may slow down the process if the atom is central in a very large molecule.
- property charge#
The total charge of the molecule
- chem2dview(linewidth: float = None, atoms: str = None, highlight_color: str = None, **kwargs)[source]#
View the molecule in 2D through RDKit
- Parameters:
linewidth (float) – The linewidth of the bonds.
atoms (str) – The label to use for the atoms. This can be any of the following: - None (default, element symbols, except for carbon) - “element” (force element symbols, even for carbon) - “serial” (the atom serial number) - “id” (the atom id / name) - “resid” (the residue serial number + atom id) - “off” (no labels) - any function that takes an (rdkit) atom and returns a string
highlight_color (str) – The color to use for highlighting atoms
- cis(*bond: Atom | tuple | Bond)[source]#
Rotate the molecule such that the atoms in the bond are in a cis configuration.
- cleanup(remove_empty_models: bool = True, remove_empty_chains: bool = True, remove_empty_residues: bool = True, reindex: bool = True, remove_hydrogens: bool = False, add_hydrogens: bool = False, apply_standard_bonds: bool = False, infer_bonds: bool = False)[source]#
Clean up the molecule by removing empty models, chains, and residues. This can optionally also reindex the atoms and residues, remove or add hydrogen atoms, and apply standard bonds or infer bonds.
- Parameters:
remove_empty_models (bool) – Whether to remove empty models
remove_empty_chains (bool) – Whether to remove empty chains
remove_empty_residues (bool) – Whether to remove empty residues
reindex (bool) – Whether to reindex the atoms and residues after cleaning up
remove_hydrogens (bool) – Whether to remove all hydrogen atoms
add_hydrogens (bool) – Whether to add all hydrogen atoms
apply_standard_bonds (bool) – Whether to apply standard connectivity based on loaded compounds (see load_compounds)
infer_bonds (bool) – Whether to infer bonds from the atom positions and element types
- collapse_chains(resnames: list = None)[source]#
Turn each chain of the molecule into a single residue but preserve the the chains.
- Parameters:
resnames (list, optional) – A list of residue names to use for the residues. If None, the residue names are taken from the first residue in each chain. A string can also be given to use the same name for all residues.
- compute_angle(atom1: str | int | Atom, atom2: str | int | Atom, atom3: str | int | Atom)[source]#
Compute the angle between three atoms where atom2 is the middle atom.
- Parameters:
atom1 – The first atom
atom2 – The second atom
atom3 – The third atom
- Returns:
The angle in degrees
- Return type:
float
- compute_angles()[source]#
Compute all angles of consecutively bonded atom triplets within the molecule.
- Returns:
angles – A dictionary of the form {atom_triplet: angle}
- Return type:
dict
- compute_dihedral(atom1: str | int | Atom, atom2: str | int | Atom, atom3: str | int | Atom, atom4: str | int | Atom)[source]#
Compute the dihedral angle between four atoms
- Parameters:
atom1 – The first atom
atom2 – The second atom
atom3 – The third atom
atom4 – The fourth atom
- Returns:
The dihedral angle in degrees
- Return type:
float
- compute_dihedrals()[source]#
Compute all dihedrals of consecutively bonded atom quartets within the molecule.
- Returns:
dihedrals – A dictionary of the form {atom_quartet: dihedral}
- Return type:
dict
- compute_length_along_axis(axis: str | ndarray) float[source]#
Compute the length of the molecule along a specific axis. This can be computed on any molecule but may not be meaningful in all cases (e.g. circular or branched molecules).
- Parameters:
axis (str or np.ndarray) – The axis to compute the length along. This can be either a unit vector or one of the strings “x”, “y”, or “z” to align to the respective axes.
- compute_normal_axis() ndarray[source]#
Compute the normal axis of the molecule. This is the axis that is perpendicular to the main plane of the molecule. This can be computed on any molecule but will only be meaningful for (more or less) planar molecules.
- compute_principal_axis() ndarray[source]#
Compute the principal axis of the molecule. This is the axis that shows the most variance in the coordinates. This can be computed on any molecule but will only be meaningful for (more or less) linear molecules.
- copy(n: int = 1) list[source]#
Create one or multiple deepcopy of the molecule
- Parameters:
n (int, optional) – The number of copies to make, by default 1
- Returns:
The copied molecule(s)
- Return type:
Molecule or list
- count_atoms() int[source]#
Count the number of atoms in the structure
- Returns:
The number of atoms
- Return type:
int
- count_bonds() int[source]#
Count the number of bonds in the structure
- Returns:
The number of bonds
- Return type:
int
- count_chains() int[source]#
Count the number of chains in the structure
- Returns:
The number of chains
- Return type:
int
- count_clashes(clash_threshold: float = 1.0, ignore_hydrogens: bool = True, coarse_precheck: bool = True) int[source]#
Count all clashes in the molecule.
- Parameters:
clash_threshold (float, optional) – The minimal allowed distance between two atoms (in Angstrom).
ignore_hydrogens (bool, optional) – Whether to ignore clashes with hydrogen atoms (default: True)
coarse_precheck (bool, optional) – If set to True a coarse-grained pre-screening on residue-level is done to speed up the computation. This may cause the sytem to overlook clashes if individual residues are particularly large, however (e.g. lipids with long carbon chains).
- Returns:
The number of clashes.
- Return type:
int
- count_models() int[source]#
Count the number of models in the structure
- Returns:
The number of models
- Return type:
int
- count_residues() int[source]#
Count the number of residues in the structure
- Returns:
The number of residues
- Return type:
int
- double(atom1, atom2, adjust_hydrogens: bool = False)[source]#
Set a double bond between two atoms
- Parameters:
atom1 – The first atom
atom2 – The second atom
adjust_hydrogens (bool) – Whether to adjust the number of hydrogens on the atoms based on the bond order
- draw2d(linewidth: float = None, atoms: str = None, highlight_color: str = None, **kwargs)#
View the molecule in 2D through RDKit
- Parameters:
linewidth (float) – The linewidth of the bonds.
atoms (str) – The label to use for the atoms. This can be any of the following: - None (default, element symbols, except for carbon) - “element” (force element symbols, even for carbon) - “serial” (the atom serial number) - “id” (the atom id / name) - “resid” (the residue serial number + atom id) - “off” (no labels) - any function that takes an (rdkit) atom and returns a string
highlight_color (str) – The color to use for highlighting atoms
- draw3d(*args, **kwargs)#
- drop_atoms(*atoms: int | str | tuple | Atom)[source]#
Remove one or more atoms from the structure. This method returns the Molecule object itself rather than the removed atoms. Use remove_atoms if you need the removed atoms.
- Parameters:
atoms – The atoms to remove, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
- drop_bond(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom)[source]#
Remove a bond between two atoms
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
- drop_chains(*chains: int | Chain)[source]#
Remove chains from the structure. This method returns the structure itself rather than the removed chains. If you want to get the removed chains, use remove_chains.
- Parameters:
chains (int or Chain) – The chains to remove, either the object itself or its id
- drop_hydrogens(*atoms: int | str | Atom)[source]#
Remove all hydrogens in the molecule.
- Parameters:
atoms – The atoms to remove hydrogens from. If None, all atoms are considered.
- drop_model(model: int | Model)[source]#
Drop a model from the molecule without removing its chains from the molecule. The chains of the dropped model will be removed from the model but remain in the molecule.
- Parameters:
model (int or Model) – The model to drop
- drop_residues(*residues: int | Residue)[source]#
Remove residues from the molecule. This method returns the molecule itself rather than the removed residues. If you want to get the removed residues, use remove_residues.
- Parameters:
residues (int or base_classes.Residue) – The residues to remove, either the object itself or its seqid
- find_clashes(clash_threshold: float = 1.0, ignore_hydrogens: bool = True, coarse_precheck: bool = True) list[source]#
Find all clashes in the molecule.
- Parameters:
clash_threshold (float, optional) – The minimal allowed distance between two atoms (in Angstrom).
ignore_hydrogens (bool, optional) – Whether to ignore clashes with hydrogen atoms (default: True)
coarse_precheck (bool, optional) – If set to True a coarse-grained pre-screening on residue-level is done to speed up the computation. This may cause the sytem to overlook clashes if individual residues are particularly large, however (e.g. lipids with long carbon chains).
- Returns:
A list of tuples of atoms that clash.
- Return type:
list
- find_clashes_with(other, clash_threshold: float = 1.0, ignore_hydrogens: bool = True, coarse_precheck: bool = True) list[source]#
Find all clashes between this molecule and another one.
- Parameters:
other (Molecule) – The other molecule to compare with
clash_threshold (float, optional) – The minimal allowed distance between two atoms (in Angstrom).
ignore_hydrogens (bool, optional) – Whether to ignore clashes with hydrogen atoms (default: True)
coarse_precheck (bool, optional) – If set to True a coarse-grained pre-screening on residue-level is done to speed up the computation. This may cause the sytem to overlook clashes if individual residues are particularly large, however (e.g. lipids with long carbon chains).
- Returns:
A list of tuples of atoms that clash.
- Return type:
list
- flip(plane_vector: ndarray, center: ndarray = None)[source]#
Flip the molecule around an axis
- Parameters:
plane_vector (np.ndarray or str) – The vector defining the plane to flip around. This must be a unit vector. It may also be one of the strings “xy”, “xz”, or “yz” to flip around the respective planes.
center (np.ndarray) – The center of the flip
- classmethod from_cif(filename: str, id: str = None)[source]#
Load a Molecule from a CIF file
- Parameters:
filename (str) – Path to the CIF file
id (str) – The id of the Molecule. By default an id is inferred from the filename.
- classmethod from_json(filename: str)[source]#
Make a Molecule from a JSON file
- Parameters:
filename (str) – Path to the JSON file
- classmethod from_molfile(filename: str)[source]#
Make a Molecule from a molfile
- Parameters:
filename (str) – Path to the molfile
- classmethod from_openmm(topology, positions)[source]#
Load a Molecule from an OpenMM topology and positions
- Parameters:
topology (simtk.openmm.app.Topology) – The OpenMM topology
positions (simtk.unit.Quantity) – The OpenMM positions
- classmethod from_pdb(filename: str, id: str = None, model: int = 0, has_atom_ids: bool = True)[source]#
Read a Molecule from a PDB file
- Parameters:
filename (str) – Path to the PDB file
root_atom (str or int) – The id or the serial number of the root atom (optional)
id (str) – The id of the Molecule. By default an id is inferred from the filename.
model (int) – The index of the model to use (default: 0)
has_atom_ids (bool) – If the PDB file provides no atom ids, set this to False in order to autolabel the atoms.
- classmethod from_pdbqt(filename: str)[source]#
Make a Molecule from a PDBQT file
- Parameters:
filename (str) – Path to the PDBQT file
- classmethod from_pybel(mol)[source]#
Load a Molecule from a Pybel molecule
- Parameters:
mol (pybel.Molecule) – The Pybel molecule
- classmethod from_rdkit(mol, id: str = None)[source]#
Load a Molecule from an RDKit molecule
- Parameters:
mol (rdkit.Chem.rdchem.Mol) – The RDKit molecule
id (str) – The id of the Molecule. By default an id is inferred from the “_Name” property of the mol object (if present).
- classmethod from_stk(obj)[source]#
Load a Molecule from an stk ConstructedMolecule
- Parameters:
obj (stk.ConstructedMolecule) – The stk ConstructedMolecule
- classmethod from_xml(filename: str)[source]#
Make a Molecule from an XML file
- Parameters:
filename (str) – Path to the XML file
- classmethod from_xyz(filename: str)[source]#
Make a Molecule from an XYZ file
- Parameters:
filename (str) – Path to the XYZ file
- get_ancestors(atom1: str | int | Atom, atom2: str | int | Atom) set[source]#
Get the atoms upstream of a bond. This will return the set of all atoms that are connected before the bond atom1-atom2 in the direction of atom1, the selection can be reversed by reversing the order of atoms (atom2-atom1).
- Parameters:
atom1 – The first atom
atom2 – The second atom
- Returns:
A set of atoms
- Return type:
set
Examples
OH
/
- (1)CH3 — CH
CH2 — (2)CH3
``` >>> mol.get_ancestors(“(1)CH3”, “CH”) set() >>> mol.get_ancestors(“CH”, “CH2”) {“(1)CH3”, “OH”} >>> mol.get_ancestors(“CH2”, “CH”) {“(2)CH3”}
- get_atom(atom: int | str | tuple, by: str = None, residue: int | Residue = None)[source]#
Get an atom from the structure either based on its id, serial number or full_id. Note, if multiple atoms match the requested criteria, for instance there are multiple ‘C1’ from different residues, only the first one is returned. To get all atoms matching the criteria, use the get_atoms method.
- Parameters:
atom – The atom id, serial number, full_id tuple, or element symbol.
by (str) – The type of parameter to search for. Can be either ‘id’, ‘serial’, ‘full_id’, or ‘element’. Because this looks for one specific atom, this parameter can be inferred from the datatype of the atom parameter. If it is an integer, it is assumed to be the serial number, if it is a string, it is assumed to be the atom id and if it is a tuple, it is assumed to be the full_id.
residue (int or Residue) – A specific residue to search in. If None, the entire structure is searched.
- Returns:
atom – The atom
- Return type:
- get_atom_graph(_copy: bool = True) AtomGraph[source]#
Get an AtomGraph for the Molecule
- Parameters:
_copy (bool) – If True, not the “original” AtomGraph object that the Molecule relies on is returned but a new one. However, the molecule will still be linked to the new graph. This is useful if you want to make changes to the graph itself (not including changes to the graph nodes, i.e. the atoms itself, such as rotations).
- Returns:
The generated graph
- Return type:
- get_atom_quartets() list[source]#
Compute quartets of four consequtively bonded atoms
- Returns:
atom_quartets – A list of atom quartets
- Return type:
list
- get_atoms(*atoms: int | str | tuple, by: str = None, keep_order: bool = False, residue: int | Residue = None, filter: callable = None) list[source]#
Get one or more atoms from the structure either based on their id, serial number or full_id. Note, if multiple atoms match the requested criteria, for instance there are multiple ‘C1’ from different residues all of them are returned in a list. It is a safer option to use the full_id or serial number to retrieve a specific atom. If no search parameters are provided, the underlying atom-generator of the structure is returned.
Note
This does not support mixed queries. I.e. you cannot query for an atom with id ‘C1’ and serial number 1 at the same time. Each call can only query for one type of parameter.
- Parameters:
atoms – The atom id, serial number, full_id tuple, or element string symbol. This supports multiple atoms to search for. However, only one type of parameter is supported per call. If left empty, the underlying generator is returned.
by (str) – The type of parameter to search for. Can be either ‘id’, ‘serial’, ‘full_id’, or ‘element’ If None is given, the parameter is inferred from the datatype of the atoms argument ‘serial’ in case of int, ‘id’ in case of str, full_id in case of a tuple.
keep_order (bool) – Whether to return the atoms in the order they were queried. If False, the atoms are returned in the order they appear in the structure.
residue (int or Residue) – A specific residue to search in. If None, the entire structure is searched.
filter (callable) – A filter function that is applied to the atoms. If the filter returns True, the atom is included in the result. The filter function must take an atom as its only argument and return a boolean.
- Returns:
atom – The atom(s)
- Return type:
list or generator
- get_atoms_within(anchor: Atom | ndarray, distance: float) set[source]#
Get all atoms within a certain distance from an anchor point.
- Parameters:
anchor (Atom or np.ndarray) – The anchor point. This can be either an Atom object or a 3D coordinate as a numpy array.
distance (float) – The distance threshold.
- Returns:
A set of atoms within the specified distance from the anchor point.
- Return type:
set
- get_attach_residue()[source]#
Get the residue that is used for attaching other molecules to this one.
- get_axial_hydrogen(atom: int | str | tuple | Atom) Atom[source]#
Get the axial hydrogen neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The axial hydrogen, if it exists, None otherwise
- Return type:
- get_axial_neighbor(atom: int | str | tuple | Atom) Atom[source]#
Get the axial neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The axial neighbor, if it exists, None otherwise
- Return type:
- get_bond(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom, add_if_not_present: bool = True) Bond[source]#
Get/make a bond between two atoms.
- Parameters:
- Returns:
bond – The bond object. If the bond is not present and add_if_not_present is False, None is returned.
- Return type:
- get_bond_array() ndarray[source]#
Get the bonds of the atoms in the molecule as an array of atom1, atom2, bond_order
- Returns:
The bonds
- Return type:
np.ndarray
- get_bond_mask() ndarray[source]#
Get the bonds of the atoms in the molecule as a 2D mask where fields with 1 indicate a bond between the atoms of row and column.
- Returns:
The bond mask
- Return type:
np.ndarray
- get_bonds(atom1: int | str | tuple | Atom | Residue = None, atom2: int | str | tuple | Atom = None, residue_internal: bool = True, either_way: bool = True)[source]#
Get one or multiple bonds from the molecule. If only one atom is provided, all bonds that are connected to that atom are returned.
- Parameters:
atom1 – The atom id, serial number or full_id tuple of the first atom. This may also be a residue, in which case all bonds between atoms in that residue are returned.
atom2 – The atom id, serial number or full_id tuple of the second atom
residue_internal (bool) – If True, only bonds where both atoms are in the given residue (if atom1 is a residue) are returned. If False, all bonds where either atom is in the given residue are returned.
either_way (bool) – If True, the order of the atoms does not matter, if False, the order of the atoms does matter. By setting this to false, it is possible to also search for bonds that have a specific atom in position 1 or 2 depending on which argument was set, while leaving the other input as none.
- Returns:
bond – The bond(s). If no input is given, all bonds are returned as a generator.
- Return type:
list or generator
- get_chain(chain: str)[source]#
Get a chain from the structure either based on its name.
- Parameters:
chain – The chain id
- Returns:
chain – The chain
- Return type:
- get_coords(*atom_selector, **atom_selectors) ndarray[source]#
Get the coordinates of the atoms in the molecule
- Parameters:
atom_selectors – Arguments or keyword arguments to pass to get_atoms(). If None, all atoms are selected.
- Returns:
The coordinates
- Return type:
np.ndarray
- get_degree(atom: int | str | Atom)[source]#
Get the degree of an atom in the structure
- Parameters:
atom – The atom to get the degree of, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
- Returns:
The degree of the atom’s connectivity as the sum of the bond orders that connect it to its neighbors
- Return type:
int
- get_descendants(atom1: str | int | Atom, atom2: str | int | Atom) set[source]#
Get the atoms downstream of a bond. This will return the set of all atoms that are connected after the bond atom1-atom2 in the direction of atom2, the selection can be reversed by reversing the order of atoms (atom2-atom1).
- Parameters:
atom1 – The first atom
atom2 – The second atom
- Returns:
A set of atoms
- Return type:
set
Examples
OH
/
- (1)CH3 — CH
CH2 — (2)CH3
``` >>> mol.get_descendants(“(1)CH3”, “CH”) {“OH”, “CH2”, “(2)CH3”} >>> mol.get_descendants(“CH”, “CH2”) {“(2)CH3”} >>> mol.get_descendants(“CH2”, “CH”) {“OH”, “(1)CH3”}
- get_equatorial_hydrogen(atom: int | str | tuple | Atom) Atom[source]#
Get the equatorial hydrogen neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The equatorial hydrogen, if it exists, None otherwise
- Return type:
- get_equatorial_neighbor(atom: int | str | tuple | Atom) Atom[source]#
Get the equatorial neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The equatorial neighbor, if it exists, None otherwise
- Return type:
- get_hydrogen(atom: int | str | tuple | Atom) Atom[source]#
Get any hydrogen neighbor of an atom.
- Parameters:
atom – The atom
- Returns:
The hydrogen, if it exists, None otherwise
- Return type:
- get_hydrogens(atom: int | str | tuple | Atom = None) set[source]#
Get multiple hydrogen atoms
- Parameters:
atom – A specific atom whose hydrogen neighbors should be returned. If None, all hydrogen atoms in the molecule are returned.
- Returns:
A set of hydrogen atoms
- Return type:
set
- get_left_hydrogen(atom: int | str | tuple | Atom) Atom[source]#
Get the “left-protruding” hydrogen neighbor of an atom with two hydrogens and two non-hydrogen neighbors.
- Parameters:
atom – The atom
- Returns:
The left hydrogen, if it exists, None otherwise
- Return type:
Example
H_B |
- CH3 – C – CH2 – OH
H_A
``` We want to get the left and right hydrogens of the central C atom (labeled only C). Using part of the logic behind R/S nomenclature for chiral centers, we prioritize the non-H neighbors and then rotate the molecule such that the highest order non-H neighbor points toward the user and the other non-H neighbor points away. The left and right hydrogens are then determined based on their orientation in this view.
In this case, the left hydrogen is H_A and the right hydrogen is H_B.
- get_linkage()[source]#
Get the linkage that is currently set as default attachment specication for this molecule
- get_model(model: int = None) Model[source]#
Get a model from the molecule.
- Parameters:
model (Int) – The id of the model to get. If not provided the current working model is returned.
- Returns:
The model
- Return type:
- get_neighbors(atom: int | str | tuple | Atom, n: int = 1, mode: str = 'upto', filter: callable = None) set[source]#
Get the neighbors of an atom.
- Parameters:
atom – The atom
n – The number of bonds that may separate the atom from its neighbors.
mode – The mode to use. Can be “upto” or “at”. If upto, all neighbors that are at most n bonds away are returned. If at, only neighbors that are exactly n bonds away are returned.
filter – A filter function that is applied to the neighbors. If the filter returns True, the atom is included in the result.
- Returns:
A set of atoms
- Return type:
set
Examples
O — (2)CH2
/
- (1)CH3 — CH OH
(1)CH2 — (2)CH3
``` >>> mol.get_neighbors(“(2)CH2”, n=1) {“O”, “OH”} >>> mol.get_neighbors(“(2)CH2”, n=2, mode=”upto”) {“O”, “OH”, “CH”} >>> mol.get_neighbors(“(2)CH2”, n=2, mode=”at”) {“CH”}
- get_residue(residue: int | str | tuple | Residue, by: str = None, chain=None)[source]#
Get a residue from the structure either based on its name, serial number or full_id. Note, if multiple residues match the requested criteria, for instance there are multiple ‘MAN’ from different chains, only the first one is returned.
- Parameters:
residue – The residue id, seqid or full_id tuple
by (str) – The type of parameter to search for. Can be either ‘name’, ‘serial’ (or ‘seqid’) or ‘full_id’ By default, this is inferred from the datatype of the residue parameter. If it is an integer, it is assumed to be the sequence identifying number (serial number), if it is a string, it is assumed to be the residue name and if it is a tuple, it is assumed to be the full_id.
chain (str) – Further restrict to a residue from a specific chain.
- Returns:
residue – The residue
- Return type:
- get_residue_connections(residue_a=None, residue_b=None, triplet: bool = True, rotatable_only: bool = False)[source]#
Get bonds between atoms that connect different residues in the structure This method is different from infer_residue_connections in that it works with the already present bonds in the molecule instead of computing new ones.
- Parameters:
residue_a (Union[int, str, tuple, base_classes.Residue]) – The residues to consider. If None, all residues are considered. Otherwise, only between the specified residues are considered.
residue_b (Union[int, str, tuple, base_classes.Residue]) – The residues to consider. If None, all residues are considered. Otherwise, only between the specified residues are considered.
triplet (bool) – Whether to include bonds between atoms that are in the same residue but neighboring a bond that connects different residues. This is useful for residues that have a side chain that is connected to the main chain. This is mostly useful if you intend to use the returned list for some purpose, because the additionally returned bonds are already present in the structure from inference or standard-bond applying and therefore do not actually add any particular information to the Molecule object itself.
rotatable_only (bool) – Whether to only return bonds that are rotatable. This is useful if you want to use the returned bonds for optimization.
- Returns:
A set of tuples of atom pairs that are bonded and connect different residues
- Return type:
list
- get_residue_graph(detailed: bool = False, locked: bool = True) ResidueGraph#
Generate a ResidueGraph for the Molecule
- Parameters:
detailed (bool) – If True the graph will include the residues and all atoms that form bonds connecting different residues. If False, the graph will only include the residues and their connections without factual bonds between any existing atoms.
locked (bool) – If True, the graph will also migrate the information on any locked bonds into the graph. This is only relevant if detailed is True.
- get_residues(*residues: int | str | tuple | Residue, by: str = None, chain=None, filter: callable = None)[source]#
Get residues from the structure either based on their name, serial number or full_id.
- Parameters:
residues – The residues’ id, seqid or full_id tuple. If None is passed, the iterator over all residues is returned.
by (str) – The type of parameter to search for. Can be either ‘name’, ‘seqid’ (or ‘serial’) or ‘full_id’ By default, this is inferred from the datatype of the residue parameter. If it is an integer, it is assumed to be the sequence identifying number (serial number), if it is a string, it is assumed to be the residue name and if it is a tuple, it is assumed to be the full_id.
chain (str) – Further restrict to residues from a specific chain.
- Returns:
The residue(s)
- Return type:
list or generator
- get_right_hydrogen(atom: int | str | tuple | Atom) Atom[source]#
Get the “right-protruding” hydrogen neighbor of an atom with two hydrogens and two non-hydrogen neighbors.
- Parameters:
atom – The atom
- Returns:
The right hydrogen, if it exists, None otherwise
- Return type:
Example
H_B |
- CH3 – C – CH2 – OH
H_A
``` We want to get the left and right hydrogens of the central C atom (labeled only C). Using part of the logic behind R/S nomenclature for chiral centers, we prioritize the non-H neighbors and then rotate the molecule such that the highest order non-H neighbor points toward the user and the other non-H neighbor points away. The left and right hydrogens are then determined based on their orientation in this view.
In this case, the left hydrogen is H_A and the right hydrogen is H_B.
- get_root() Atom[source]#
Get the root atom of the molecule. The root atom is the atom at which it is attached to another molecule.
- has_clashes(clash_threshold: float = 1.0, ignore_hydrogens: bool = True, coarse_precheck: bool = True) bool[source]#
Check if the molecule has any clashes.
- Parameters:
clash_threshold (float, optional) – The minimal allowed distance between two atoms (in Angstrom).
ignore_hydrogens (bool, optional) – Whether to ignore clashes with hydrogen atoms (default: True)
coarse_precheck (bool, optional) – If set to True a coarse-grained pre-screening on residue-level is done to speed up the computation. This may cause the sytem to overlook clashes if individual residues are particularly large, however (e.g. lipids with long carbon chains).
- Returns:
True if there are clashes, False otherwise.
- Return type:
bool
- has_hydrogens() bool[source]#
Check if the structure has hydrogen atoms
- Returns:
True if the structure has hydrogen atoms, False otherwise
- Return type:
bool
- property id#
- index_by_chain()[source]#
Reindex the residues in the structure by chain. This will let each chain start with a residue 1. This will not reindex the atoms, only the residues.
- infer_bonds(max_bond_length: float = None, restrict_residues: bool = True, infer_bond_orders: bool = False) list[source]#
Infer bonds between atoms in the structure
- Parameters:
max_bond_length (float) – The maximum distance between atoms to consider them bonded. If None, the default value is 1.6 Angstroms.
restrict_residues (bool) – Whether to restrict bonds to only those in the same residue. If False, bonds between atoms in different residues are also inferred.
infer_bond_orders (bool) – Whether to infer the bond orders (double and tripple bonds) based on registered functional groups. This will slow the inference down, however.
- Returns:
A list of tuples of atom pairs that are bonded
- Return type:
list
- infer_bonds_for(*residues_or_atoms: Residue | Atom, max_bond_length: float = None, infer_bond_orders: bool = False)[source]#
Infer bonds between atoms in the structure for a specific set of residues or atoms
- Parameters:
residues_or_atoms – The residues or atoms to consider
max_bond_length (float) – The maximum distance between atoms to consider them bonded. If None, the default value is 1.6 Angstroms.
infer_bond_orders (bool) – Whether to infer the bond orders (double and tripple bonds) based on registered functional groups. This will slow the inference down, however.
- Returns:
list – A list of tuples of atom pairs that are bonded
.. versionchanged:: 1.2.10 – infer_bonds_for now works with both residues and individual atoms but only accepts Residue and Atom objects as input and cannot search for them via serial numbers or ids. To keep using the old behavior where only residues were supported via any identifier use the infer_bonds_for_residues method instead.
- infer_bonds_for_atoms(*atoms: Atom, max_bond_length: float = None, infer_bond_orders: bool = False)[source]#
Infer bonds between atoms in the structure for a specific set of atoms
- Parameters:
atoms – The atoms to consider
max_bond_length (float) – The maximum distance between atoms to consider them bonded. If None, the default value is 1.6 Angstroms.
infer_bond_orders (bool) – Whether to infer the bond orders (double and tripple bonds) based on registered functional groups. This will slow the inference down, however.
- Returns:
A list of tuples of atom pairs that are bonded
- Return type:
list
- infer_bonds_for_residues(*residues, max_bond_length: float = None, infer_bond_orders: bool = False)[source]#
Infer bonds between atoms in the structure for a specific set of residues
- Parameters:
residues – The residues to consider
max_bond_length (float) – The maximum distance between atoms to consider them bonded. If None, the default value is 1.6 Angstroms.
infer_bond_orders (bool) – Whether to infer the bond orders (double and tripple bonds) based on registered functional groups. This will slow the inference down, however.
- Returns:
A list of tuples of atom pairs that are bonded
- Return type:
list
- infer_residue_connections(bond_length: float | tuple = None, triplet: bool = True) list[source]#
Infer bonds between atoms that connect different residues in the structure
- Parameters:
bond_length (float or tuple) – If a float is given, the maximum distance between atoms to consider them bonded. If a tuple, the minimal and maximal distance between atoms. If None, the default value is min 0.8 Angstrom, max 1.6 Angstroms.
triplet (bool) – Whether to include bonds between atoms that are in the same residue but neighboring a bond that connects different residues. This is useful for residues that have a side chain that is connected to the main chain. This is mostly useful if you intend to use the returned list for some purpose, because the additionally returned bonds are already present in the structure from inference or standard-bond applying and therefore do not actually add any particular information to the Molecule object itself.
- Returns:
A list of bonds that link atoms from different residues.
- Return type:
list
Examples
For a molecule with the following structure: ```
- connection –> OA OB — H
/ /
- (1)CA — (2)CA (1)CB
/
- (6)CA (3)CA (2)CB — (3)CB
/
(5)CA — (4)CA
``` The circular residue A and linear residue B are connected by a bond between (1)CA and the oxygen OA and (1)CB. By default, because OA originally is associated with residue A, only the bond OA — (1)CB is returned. However, if triplet=True, the bond OA — (1)CA is also returned, because the entire connecting “bridge” between residues A and B spans either bond around OA. >>> mol.infer_residue_connections(triplet=False) [(“OA”, “(1)CB”)] >>> mol.infer_residue_connections(triplet=True) [(“OA”, “(1)CB”), (“OA”, “(2)CA”)]
- is_cis(*bond: Atom | tuple | Bond) bool[source]#
Check if the atoms in the bond are in a cis configuration.
- is_locked(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom)[source]#
Check if a bond is locked
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
- Returns:
True if the bond is locked, False otherwise
- Return type:
bool
- is_trans(*bond: Atom | tuple | Bond) bool[source]#
Check if the atoms in the bond are in trans configuration
- link_atoms(*atoms: Atom, residue=None)[source]#
Softlink atoms to the structure. This will add the atoms to the index of the maintained structure but it will not adjust the atoms’ own parent references. This is useful if you want to have atoms be accessible from multiple Molecule objects.
- Parameters:
atoms (base_classes.Atom) – The atoms to link
- link_chains(*chains: Chain)[source]#
Softlink chains to the structure. This will add the chains to the index of the maintained structure but it will not adjust the chains’ own parent references. This is useful if you want to have chains be accessible from multiple Molecule objects.
- Parameters:
chains (base_classes.Chain) – The chains to link
- link_residues(*residues: Residue, chain=None)[source]#
Softlink residues to the structure. This will add the residues to the index of the maintained structure but it will not adjust the residues’ own parent references. This is useful if you want to have residues be accessible from multiple Molecule objects.
- Parameters:
residues (base_classes.Residue) – The residues to link
chain (str or base_classes.Chain) – The chain to which the residues should be linked. If None, the residues are linked to the current working chain.
- property linkage#
The patch or recipe to use for attaching other molecules to this one
- classmethod load(filename: str)[source]#
Load a Molecule from a pickle file
- Parameters:
filename (str) – Path to the file
- lock_bond(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom)[source]#
Lock a bond between two atoms
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
- property locked_bonds#
All bonds that are locked and cannot be rotated around.
- make_atom_graph(_copy: bool = True) AtomGraph#
Get an AtomGraph for the Molecule
- Parameters:
_copy (bool) – If True, not the “original” AtomGraph object that the Molecule relies on is returned but a new one. However, the molecule will still be linked to the new graph. This is useful if you want to make changes to the graph itself (not including changes to the graph nodes, i.e. the atoms itself, such as rotations).
- Returns:
The generated graph
- Return type:
- make_residue_graph(detailed: bool = False, locked: bool = True) ResidueGraph[source]#
Generate a ResidueGraph for the Molecule
- Parameters:
detailed (bool) – If True the graph will include the residues and all atoms that form bonds connecting different residues. If False, the graph will only include the residues and their connections without factual bonds between any existing atoms.
locked (bool) – If True, the graph will also migrate the information on any locked bonds into the graph. This is only relevant if detailed is True.
- property mass#
The total mass of the molecule
- merge(other, adjust_indexing: bool = True)[source]#
Merge another molecule into this one. This will simply add all chains, residues, and atoms of the other molecule to this one. It will NOT perform any kind of geometrical alignment or anything like that.
- Parameters:
other (Molecule) – The other molecule to merge into this one
adjust_indexing (bool) – Whether to adjust the indexing of the atoms and residues in the merged molecule
- property model#
The working model of the structure
- property models#
A list of all models in the base-structure
- move(vector: ndarray)[source]#
Move the molecule in 3D space
- Parameters:
vector (np.ndarray) – The vector to move the molecule by
- move_to(pos: ndarray)[source]#
Move the molecule to a specific position in 3D space
- Parameters:
pos (np.ndarray) – The position to move the molecule to. This will be the new center of geometry.
- property patch#
The patch to use for attaching other molecules to this one (synonym for recipe)
- place(pos: ndarray)#
Move the molecule to a specific position in 3D space
- Parameters:
pos (np.ndarray) – The position to move the molecule to. This will be the new center of geometry.
- plotly(residue_graph: bool = False, atoms: bool = True, line_color: str = 'black')[source]#
Prepare a view of the molecule in 3D using Plotly but do not open a browser window.
- Parameters:
residue_graph (bool) – If True, a residue graph is shown instead of the full structure.
atoms (bool) – Whether to draw the atoms (default: True)
line_color (str) – The color of the lines connecting the atoms
- Returns:
viewer – The viewer object
- Return type:
- purge_bonds(atom: int | str | Atom = None)[source]#
Remove all bonds connected to an atom
- Parameters:
atom – The atom to remove the bonds from, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms. If None, all bonds are removed.
- py3dmol(style: str = 'stick', color: str = None, size: tuple = None)[source]#
View the molecule in 3D through py3Dmol
- Parameters:
style (str) – The style to use for the visualization. Can be “line”, “stick”, “sphere”, “cartoon”, “surface”, or “label”
color (str) – A specific color to use for the visualization
size (tuple) – The size of the view as a tuple of (width, height) in pixels.
- quartet(atom1: str | int | Atom, atom2: str | int | Atom, atom3: str | int | Atom, atom4: str | int | Atom)[source]#
Make an atom quartet from four atoms.
- Parameters:
atom1 – The four atoms that make up the quartet.
atom2 – The four atoms that make up the quartet.
atom3 – The four atoms that make up the quartet.
atom4 – The four atoms that make up the quartet.
- property recipe#
The recipe to use for stitching other molecules to this one (synonym for patch)
- reindex(start_chainid: int = 1, start_resid: int = 1, start_atomid: int = 1)[source]#
Reindex the atoms and residues in the structure. You can use this method if you made substantial changes to the molecule and want to be sure that there are no gaps in the atom and residue numbering.
- Parameters:
start_chainid (int) – The starting chain id (default: 1=A, 2=B, …, 26=Z, 27=AA, 28=AB, …)
start_resid (int) – The starting residue id
start_atomid (int) – The starting atom id
- relabel_hydrogens()[source]#
Relabel hydrogen atoms in the structure to match the standard labelling according to the CHARMM force field. This is useful if you want to use some pre-generated PDB file that may have used a different labelling scheme for atoms.
- remove_atoms(*atoms: int | str | tuple | Atom) list[source]#
Remove one or more atoms from the structure and return them.
- Parameters:
atoms – The atoms to remove, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
- Returns:
The removed atoms
- Return type:
list
- remove_chains(*chains: int | Chain) list[source]#
Remove chains from the structure and return them.
- Parameters:
chains (int or Chain) – The chains to remove, either the object itself or its id
- Returns:
The removed chains
- Return type:
list
- remove_model(model: int | Model)[source]#
Remove a model from the molecule and all its chains from the molecule and return the removed model.
- remove_residues(*residues: int | Residue) list[source]#
Remove residues from the molecule and return them.
- Parameters:
residues (int or base_classes.Residue) – The residues to remove, either the object itself or its seqid
- Returns:
The removed residues
- Return type:
list
- rename_atom(atom: int | Atom, name: str, residue: int | Residue = None)[source]#
Rename an atom
- Parameters:
atom (int or base_classes.Atom) – The atom to rename, either the object itself or its serial number
name (str) – The new name (id)
residue (int or base_classes.Residue) – The residue to which the atom belongs, either the object itself or its seqid. Useful when giving a possibly redundant id as identifier in multi-residue molecules.
- rename_atoms(old_name: str, new_name: str, residue_name: str = None)[source]#
Rename multiple atoms to the same name
- Parameters:
old_name (str) – The name of the atoms to rename
new_name (str) – The new name
residue_name (str) – The name of the residue of the atoms to rename (if only atoms from a specific type of residue should be renamed).
- rename_chain(chain: str | Chain, name: str)[source]#
Rename a chain
- Parameters:
chain (str or Chain) – The chain to rename, either the object itself or its id
name (str) – The new name
- rename_residue(residue: int | Residue, name: str)[source]#
Rename a residue
- Parameters:
residue (int or Residue) – The residue to rename, either the object itself or its seqid
name (str) – The new name
- rename_residues(old_name: str, new_name: str)[source]#
Rename multiple residues to the same name
- Parameters:
old_name (str) – The name of the residues to rename
new_name (str)
- property residues#
A sorted list of all residues in the molecule
- property root_atom#
The root atom of this molecule/scaffold at which it is attached to another molecule/scaffold
- property root_residue#
The residue of the root atom
- rotate(angle: float, axis: ndarray, center: ndarray = None, angle_is_degrees: bool = True)[source]#
Rotate the molecule around an axis
- Parameters:
angle (float) – The angle to rotate by
axis (np.ndarray or str) – The axis to rotate around. This must be a unit vector. Alternatively, it may be one of the strings “x”, “y”, or “z” to rotate around the respective axes.
center (np.ndarray) – The center of the rotation. By default the center of geometry is used to achieve relative rotations (i.e. without translation). Use “absolute” if you want to rotate around the literal axes.
angle_is_degrees (bool) – Whether the angle is given in degrees (default) or radians
- rotate_ancestors(atom1: str | int | Atom, atom2: str | int | Atom, angle: float, angle_is_degrees: bool = True)[source]#
Rotate all ancestor atoms (atoms before atom1) of a bond
- Parameters:
atom1 (Union[str, int, base_classes.Atom]) – The first atom (whose upstream neighbors are rotated)
atom2 (Union[str, int, base_classes.Atom]) – The second atom
angle (float) – The angle to rotate by
angle_is_degrees (bool) – Whether the angle is given in degrees (default) or radians
- rotate_around_bond(atom1: str | int | Atom, atom2: str | int | Atom, angle: float, descendants_only: bool = False, angle_is_degrees: bool = True)[source]#
Rotate the structure around a bond
- Parameters:
atom1 – The first atom
atom2 – The second atom
angle – The angle to rotate by in degrees
descendants_only – Whether to only rotate the descendants of the bond, i.e. only atoms that come after atom2 (sensible only for linear molecules, or bonds that are not part of a circular structure).
angle_is_degrees – Whether the angle is given in degrees (default) or radians
Examples
For a molecule starting as: ```
OH
/
- (1)CH3 — CH
CH2 — (2)CH3
``` we can rotate around the bond (1)CH3 — CH by 180° using
>>> import numpy as np >>> angle = 180 >>> mol.rotate_around_bond("(1)CH3", "CH", angle)
and thus achieve the following: ```
CH2 — (2)CH3
/
- (1)CH3 — CH
OH
- rotate_descendants(atom1: str | int | Atom, atom2: str | int | Atom, angle: float, angle_is_degrees: bool = True)[source]#
Rotate all descendant atoms (atoms after atom2) of a bond.
- Parameters:
atom1 (Union[str, int, base_classes.Atom]) – The first atom
atom2 (Union[str, int, base_classes.Atom]) – The second atom (whose downstream neighbors are rotated)
angle (float) – The angle to rotate by
angle_is_degrees (bool) – Whether the angle is given in degrees (default) or radians
- save(filename: str)[source]#
Save the object to a pickle file
- Parameters:
filename (str) – Path to the PDB file
- search_by_constraints(constraints: list) list[source]#
Search for atoms based on a list of constraints. The constraints must be constraint functions from structural.neighbors.constraints. Each entry in the constraints list represents the constraints for one specific atom. Constraints apply to atom neighborhoods not the atom graph as a whole! This means that constraints are applied to the neighbors of the atoms when searching!
- Parameters:
constraints (list) – A list of constraint functions
- Returns:
A list of matching atoms. Each entry in this list will be a dictionary mapping the atoms (values) to the constraint function index for which they match (key).
- Return type:
list
Examples
OH
/
- (1)CH3 — CH
CH2 — (2)CH3
``` we can search for the metyhl groups by using the following constraints:
>>> from buildamol.core.structural.neighbors import constraints >>> constraints = [ ... # the first atom must be a carbon and have three hydrogen neighbors ... # we only search for the methyl-carbons... ... constraints.multi_constraint( ... constraints.has_element("C"), ... constraints.has_neighbor_hist({"H": 3}), ... ), ... ] >>> mol.search_by_constraints(constraints) [{0: (1)C}, {0: (2)C}]
- set_attach_residue(residue: int | Residue = None)[source]#
Set the residue that is used for attaching other molecules to this one.
- Parameters:
residue – The residue to be used for attaching other molecules to this one
- set_bond(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom, order: int = 1)[source]#
Specify a bond between two atoms. The difference between this method and add_bond is that the latter can be used to incrementally add bond orders (i.e. make a double bond out of a single bond by calling the method twice). This method will always set the bond order to the provided value.
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
order (int) – The order of the bond, i.e. 1 for single, 2 for double, 3 for triple, etc.
- set_bond_order(atom1, atom2, order: int, adjust_hydrogens: bool = False)[source]#
Set the order of a bond between two atoms
- Parameters:
atom1 – The first atom
atom2 – The second atom
order (int) – The order of the bond
adjust_hydrogens (bool) – Whether to adjust the number of hydrogens on the atoms based on the bond order
- set_bonds(*bonds)[source]#
Specify multiple bonds at once. The difference between this method and add_bonds is that the latter can be used to incrementally add bond orders (i.e. make a double bond out of a single bond by calling the method twice or certain bonds are specified multiple times in the arguments). This method will always set the bond order to the provided value.
- Parameters:
bonds – The bonds to add, each bond is a tuple of two atoms. Each atom may be specified directly (BuildAMol object) or by providing the serial number, the full_id or the id of the atoms.
- set_charge(atom: str | int | tuple | Atom, charge: int, adjust_protonation: bool = True)[source]#
Set the charge of an atom. This will automatically adjust the number of protons on the atom if the charge is changed.
- Parameters:
atom (str or int or tuple or Atom) – The atom whose charge should be changed
charge (int) – The new charge. This is NOT the charge difference to apply but the final charge of the atom.
adjust_protonation (bool) – If True, adjust the number of protons on the atom to match the charge.
- set_coords(coords: ndarray, *atom_selector, **atom_selectors)[source]#
Set the coordinates of the atoms in the molecule
- Parameters:
coords (np.ndarray) – The new coordinates
atom_selectors – Arguments or keyword arguments to pass to get_atoms(). If None, all atoms are selected. The number and order of atoms in the selection must match the number and order of coordinates.
- set_linkage(link: str | Linkage = None, _topology=None)[source]#
Set a linkage to be used for attaching other molecules to this one
- Parameters:
link (str or Linkage) – The linkage to be used. Can be either a string with the name of a known Linkage in the loaded topology, or an instance of the Linkage class. If None is given, the currently loaded default linkage is removed.
_topology – The topology to use for referencing the link.
- set_model(model: int)[source]#
Set the current working model of the molecule
- Parameters:
model (Int) – The id of the model to set as active
- set_parent(obj: Atom | Residue | Chain | Model, parent: Residue | Chain | Model)[source]#
Reassign a structural component like an Atom to a new parent object.
- set_root(atom)[source]#
Set the root atom of the molecule
- Parameters:
atom (Atom or int or str or tuple) – The atom to be used as the root atom. This may be an Atom object, an atom serial number, an atom id (must be unique), or the full-id tuple.
- show3d(*args, **kwargs)#
- single(atom1, atom2, adjust_hydrogens: bool = False)[source]#
Set a single bond between two atoms
- Parameters:
atom1 – The first atom
atom2 – The second atom
adjust_hydrogens (bool) – Whether to adjust the number of hydrogens on the atoms based on the bond order
- split_contiguous(target_residues: list = None)[source]#
Split residues that contain multiple contiguous atom groups into separate residues. Residues that are split will be removed from the molecule and replaced with the new residues labeled “UNL_X” where X is a counter. The indexing is not affected by this operation (i.e. atom serials are not changed).
- Parameters:
target_residues (list) – A list of residues to split. If None, all residues are split.
- split_models(_copy: bool = False) list[source]#
Split the molecule into multiple molecules, each containing one of the models.
- split_residues()[source]#
Split the molecule into separate residues, creating a list of new molecules, each with a single residue.
- squash(chain_id: str = 'A', resname: str = 'UNK')[source]#
Turn the entire molecule into a single chain with a single residue.
- squash_chains(chain_id: str = 'A')[source]#
Turn all chains of the molecule into a single chain but preserve the residues.
- stack(axis: str | ndarray, n: int, pad: float = 0)[source]#
Stack the molecule along an axis. This will create n copies of the molecule along the axis with a padding of pad between them. This method is a convenience wrapper for move and merge and will not perform any kind of alignment or rotation.
- Parameters:
axis (str or np.ndarray) – The axis to stack along. This can be either a unit vector or one of the strings “x”, “y”, or “z” to stack along the respective axes.
n (int) – The number of copies to stack
pad (float) – The padding between the copies
- property structure#
The buildamol base-structure
- superimpose_to_atom(ref_atom: Atom | int | str, other_atom: Atom | ndarray)[source]#
Superimpose the molecule to another molecule based on two atoms. This will move this molecule so that the atom in ref_atom is superimposed to the atom in other_atom.
- superimpose_to_bond(ref_bond: tuple | Bond, other_bond: tuple | Bond)[source]#
Superimpose the molecule to another molecule based on two bonds. This will move this molecule so that the atoms in ref_bond are superimposed to the atoms in other_bond.
- superimpose_to_pair(pair1, pair2)[source]#
Superimpose the molecule to another molecule based on two atom pairs (they do not need to be bonded). This will move this molecule so that the atoms in pair1 are superimposed to the atoms in pair2.
- Parameters:
pair1 (tuple) – The pair to superimpose in this molecule. These may either be Atom objects or any input which can be used to get atoms in this molecule.
pair2 (tuple) – The pair to superimpose to. These must be either Atom objects or arbitrary coordinates (np.ndarray).
- superimpose_to_residue(ref_residue, other_residue)[source]#
Superimpose the molecule to another molecule based on two residues. This will move this molecule so that the residues are superimposed.
- superimpose_to_triplet(ref_triplet: tuple, other_triplet: tuple)[source]#
Superimpose the molecule to another molecule based on two atom triplets. This will move this molecule so that the atoms in ref_triplet are superimposed to the atoms in other_triplet.
- Parameters:
ref_triplet (tuple) – The triplet to superimpose to. These may either be Atom objects or any input which can be used to get atoms in this molecule.
other_triplet (tuple) – The triplet to superimpose from.. These must be either Atom objects or arbitrary coordinates (np.ndarray).
- to_biopython()[source]#
Convert the molecule to a Biopython structure
- Returns:
The Biopython structure
- Return type:
Bio.PDB.Structure.Structure
- to_cif(filename: str)[source]#
Write the molecule to a CIF file
- Parameters:
filename (str) – Path to the CIF file
- to_json(filename: str, type: str = None, names: list = None, identifiers: list = None, one_letter_code: str = None, three_letter_code: str = None)[source]#
Write the molecule to a JSON file
- Parameters:
filename (str) – Path to the JSON file
type (str) – The type of the molecule to be written to the JSON file (e.g. “protein”, “ligand”, etc.).
names (list) – A list of names of the molecules to be written to the JSON file.
identifiers (list) – A list of identifiers of the molecules to be written to the JSON file (e.g. SMILES, InChI, etc.).
one_letter_code (str) – A one-letter code for the molecule to be written to the JSON file.
three_letter_code (str) – A three-letter code for the molecule to be written to the JSON file.
- to_molfile(filename: str)[source]#
Write the molecule to a Molfile
- Parameters:
filename (str) – Path to the Mol file
- to_numpy(export_bonds: bool = True)[source]#
Convert the molecule to numpy arrays
- Parameters:
export_bonds (bool) – If True, the bonds are also exported. If False, the bond array will remain empty.
- Returns:
The atomic numbers and atomic coordinates in one array and the bonds with atom serial numbers and bond order in a second array
- Return type:
tuple
- to_pdb(filename: str, symmetric: bool = True)[source]#
Write the molecule to a PDB file
- Parameters:
filename (str) – Path to the PDB file
symmetric (bool) – If True, bonds are written symmetrically - i.e. if atom A is bonded to atom B, then atom B is also bonded to atom A, and both atoms will get an entry in the “CONECT” section. If False, only one of the atoms will get an entry in the “CONECT” section.
- to_pdbqt(filename: str)[source]#
Write the molecule to a PDBQT file
- Parameters:
filename (str) – Path to the PDBQT file
- to_pybel()[source]#
Convert the molecule to a Pybel molecule
- Returns:
The Pybel molecule
- Return type:
pybel.Molecule
- to_rdkit()[source]#
Convert the molecule to an RDKit molecule
- Returns:
The RDKit molecule
- Return type:
rdkit.Chem.rdchem.Mol
- to_stk()[source]#
Convert the molecule to a STK molecule
- Returns:
The STK molecule
- Return type:
stk.BuildingBlock
- to_xml(filename: str, atom_attributes: list = None)[source]#
Write the molecule to an XML file
- Parameters:
filename (str) – Path to the XML file
atom_attributes (list) –
- A list of attributes to include in the XML file. Always included are:
serial_number
id
element
- to_xyz(filename: str)[source]#
Write the molecule to an XYZ file
- Parameters:
filename (str) – Path to the XYZ file
- trans(*bond: Atom | tuple | Bond)[source]#
Rotate the molecule such that the atoms in the bond are in a trans configuration.
- transpose(vector: ndarray, angle: float, axis: ndarray, center: ndarray = None, angle_is_degrees: bool = True)[source]#
Transpose the molecule in 3D space
- Parameters:
vector (np.ndarray) – The vector to move the molecule by
angle (float) – The angle to rotate by
axis (np.ndarray) – The axis to rotate around. This must be a unit vector.
center (np.ndarray) – The center of the rotation
angle_is_degrees (bool) – Whether the angle is given in degrees (default) or radians
- triple(atom1, atom2, adjust_hydrogens: bool = False)[source]#
Set a triple bond between two atoms
- Parameters:
atom1 – The first atom
atom2 – The second atom
adjust_hydrogens (bool) – Whether to adjust the number of hydrogens on the atoms based on the bond order
- unlink_atoms(*atoms: Atom)[source]#
Unlink atoms from the structure. This will remove the atoms from the index of the maintained structure but it will not adjust the atoms’ own parent references. This is useful if you want to have atoms be accessible from multiple Molecule objects.
- Parameters:
atoms (base_classes.Atom) – The atoms to unlink
- unlink_chains(*chains: Chain)[source]#
Unlink chains from the structure. This will remove the chains from the index of the maintained structure but it will not adjust the chains’ own parent references. This is useful if you want to have chains be accessible from multiple Molecule objects.
- Parameters:
chains (base_classes.Chain) – The chains to unlink
- unlink_residues(*residues: Residue)[source]#
Unlink residues from the structure. This will remove the residues from the index of the maintained structure but it will not adjust the residues’ own parent references. This is useful if you want to have residues be accessible from multiple Molecule objects.
- Parameters:
residues (base_classes.Residue) – The residues to unlink
- unlock_bond(atom1: int | str | tuple | Atom, atom2: int | str | tuple | Atom)[source]#
Unlock a bond between two atoms
- Parameters:
atom1 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
atom2 – The atoms to bond, which can either be directly provided (biopython object) or by providing the serial number, the full_id or the id of the atoms.
In addition the base_classes module defines wrappers around native BioPython classes such as Atom, Residue, etc. These classes are used by buildamol in order to facilitate atom identifcation in situations where multiple identical molecules are connected to each other. All these classes support a from_biopython and to_biopython conversion.
The base_classes module
The base_classes are deriviatives of the original Biopython classes, but with the change that they use a UUID4 as their identifier (full_id) instead of a hierarchical tuple. This makes each object unique and allows for easy comparison where a == b is akin to a is b. Consequently, the __hash__ method is overwritten to use the UUID4 as the hash.
Warning
Each class has its own copy method that returns a deep copy of the object with a new UUID4. So a.copy() == a is False, while a standard deepcopy(a) == a is True since the UUID4 will not have been updated automatically.
Converting to and from biopython#
Each BuildAMol class can be generated from a biopython class using the from_biopython class method. And each BuildAMol class has a to_biopython method that returns the pure-biopython equivalent. It is important to note, that for most purposes, however, the BuildAMol classes should work fine as trop-in replacements for the original biopython classes.
import Bio.PDB as bio
from buildamol.base_classes import Atom
bio_atom = bio.Atom("CA", (0, 0, 0))
atom = Atom.from_biopython(bio_atom)
assert atom == bio_atom # False since atom uses a UUID4 as its identifier
assert atom.to_biopython() == bio_atom # True
The conversion from and to biopython works hierarchically, so if an entire biopython structure is converted to BuildAMol then all atoms, residues, chains and models will be converted to their BuildAMol equivalents.
import Bio.PDB as bio
from buildamol.base_classes import Structure
bio_structure = bio.PDBParser().get_structure("test", "test.pdb")
structure = Structure.from_biopython(bio_structure)
atoms = list(structure.get_atoms())
bio_atoms = list(bio_structure.get_atoms())
assert len(atoms) == len(bio_atoms) # True
- class buildamol.base_classes.Atom(id: str, coord: ndarray, serial_number: int = 1, bfactor: float = 0.0, occupancy: float = 1.0, fullname: str = None, element: str = None, altloc=' ', pqr_charge=None, radius=None)[source]#
Bases:
ID,AtomAn Atom object that inherits from Biopython’s Atom class.
- Parameters:
id (str) – The atom identifier
coord (ndarray) – The atom coordinates
serial_number (int, optional) – The atom serial number. The default is 1.
bfactor (float, optional) – The atom bfactor. The default is 0.0.
occupancy (float, optional) – The atom occupancy. The default is 1.0.
fullname (str, optional) – The atom fullname. The default is None, in which case the id is used again.
element (str, optional) – The atom element. The default is None, in which case it is inferred based on the id.
altloc (str, optional) – The atom altloc. The default is “ “.
pqr_charge (float, optional) – The atom pqr_charge. The default is None.
radius (float, optional) – The atom radius. The default is None.
- altloc#
- anisou_array#
- property atomic_number#
The atomic number of the atom’s element.
- bfactor#
- property charge#
The atom charge.
- coord#
- disordered_flag#
- element#
- equals(other, include_coord: bool = False) bool[source]#
Check if the atom is equal to another atom. This will return True if the two atoms match and have same the parent-serial number.
- classmethod from_biopython(atom) Atom[source]#
Convert a Biopython atom to an Atom object
- Parameters:
atom – The Biopython atom
- Returns:
The Atom object
- Return type:
- classmethod from_element(element: str, **kwargs)[source]#
Create a blank atom with a given element and coordinates.
- Parameters:
element (str) – The atom element.
**kwargs – Additional keyword arguments to pass to the new Atom initializer.
- property full_id#
A self-adjusting full_id for an Biopython Atom
- fullname#
- get_axial_hydrogen()[source]#
Get the axial hydrogen neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The axial hydrogen, if it exists, None otherwise
- Return type:
- get_axial_neighbor()[source]#
Get the axial neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The axial neighbor, if it exists, None otherwise
- Return type:
- get_equatorial_hydrogen()[source]#
Get the equatorial hydrogen neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The equatorial hydrogen, if it exists, None otherwise
- Return type:
- get_equatorial_neighbor()[source]#
Get the equatorial neighbor of an atom, if the atom is in a ring structure.
- Parameters:
atom – The atom
- Returns:
The equatorial neighbor, if it exists, None otherwise
- Return type:
- get_hydrogens() set[source]#
Get all hydrogen neighbors of an atom.
- Parameters:
atom – The atom
- Returns:
A set of hydrogen neighbors, if they exist, an empty set otherwise
- Return type:
set
- get_left_hydrogen()[source]#
Get the “left-protruding” hydrogen neighbor of an atom with two hydrogens and two non-hydrogen neighbors.
- Parameters:
atom – The atom
- Returns:
The left hydrogen, if it exists, None otherwise
- Return type:
Example
H_B |
- CH3 – C – CH2 – OH
H_A
``` We want to get the left and right hydrogens of the central C atom (labeled only C). Using part of the logic behind R/S nomenclature for chiral centers, we prioritize the non-H neighbors and then rotate the molecule such that the highest order non-H neighbor points toward the user and the other non-H neighbor points away. The left and right hydrogens are then determined based on their orientation in this view.
In this case, the left hydrogen is H_A and the right hydrogen is H_B.
- get_neighbors(n: int = 1, mode: str = 'upto', filter: callable = None) set[source]#
Get the neighboring atoms of this atom within n bonds.
- Parameters:
n (int) – The number of bonds to search for neighbors.
mode (str, optional) – The mode to use for searching for neighbors. The default is “upto”, which will return all neighbors within n bonds. Other options are “at” which will return only neighbors that are exactly n bonds away.
filter (callable, optional) – A function that takes an atom as input and returns True if the atom should be included in the output. The default is None.
- Returns:
A set of neighboring atoms.
- Return type:
set
- get_right_hydrogen()[source]#
Get the “right-protruding” hydrogen neighbor of an atom with two hydrogens and two non-hydrogen neighbors.
- Parameters:
atom – The atom
- Returns:
The right hydrogen, if it exists, None otherwise
- Return type:
Example
H_B |
- CH3 – C – CH2 – OH
H_A
``` We want to get the left and right hydrogens of the central C atom (labeled only C). Using part of the logic behind R/S nomenclature for chiral centers, we prioritize the non-H neighbors and then rotate the molecule such that the highest order non-H neighbor points toward the user and the other non-H neighbor points away. The left and right hydrogens are then determined based on their orientation in this view.
In this case, the left hydrogen is H_A and the right hydrogen is H_B.
- id#
- level#
- mass#
- matches(other, include_id: bool = True, include_coord: bool = False) bool[source]#
Check if the atom matches another atom. This will return True if the two atoms have the same element, id, parent-residue name, and altloc.
- property molecule#
- move(vector)[source]#
Move the atom by a vector.
- Parameters:
vector (ndarray) – The vector to move the atom by.
- property name#
Synonym for id.
- classmethod new(element_or_id: str, coord: ndarray = None, generate_id: bool = True, **kwargs) Atom[source]#
Create a blank atom with a given element and coordinates.
- Parameters:
element_or_id (str) –
The atom element. If the element is not found in the periodic table, it will be used as the atom id. To ensure the correct element is assigned to the atom either pass it as it as keyword argument directly or to let BuildAMol correctly infer the element use one of the following patterns with your atom id:
<element+><number> -> C1, O2, CA2 (calcium id=CA2)
<number><element+> -> 1C, 2O, 1CA (calcium id=1CA)
<space><element><string|number+> -> “ CA” (Carbon id=CA), “ ND2 “ (Nitrongen id=ND2), “ OXT” (Oxygen id=OXT)
<element+><space+> -> “CA “ (Calcium id=CA), “FE “ (Iron id=FE)
<element+>_<string|number+> -> “CA_” (Calcium id=CA), “C_A” (Carbon id=CA)
(the + indicates multiple characters)
coord (ndarray, optional) – The atom coordinates. The default is None.
generate_id (bool, optional) – Whether to automatically generate a new id for the atom to avoid identically named atoms. The default is True.
**kwargs – Additional keyword arguments to pass to the Atom initializer
- Returns:
The blank atom.
- Return type:
- occupancy#
- parent: Residue | None#
- pqr_charge#
- radius#
- serial_number#
- set_element(element, adjust_id: bool = True)[source]#
Set the atom element.
- Parameters:
element (str) – The element to set.
adjust_id (bool, optional) – Whether to adjust the atom id to the new element. The default is True.
- sigatm_array#
- siguij_array#
- to_biopython()[source]#
Convert the Atom object to a Biopython atom
- Returns:
The Biopython atom
- Return type:
- property weight#
The atom mass (synonym for mass).
- xtra: dict#
- class buildamol.base_classes.Bond(*atoms)[source]#
Bases:
objectA class representing a bond between two atoms.
- atom1#
- atom2#
- compute_length() float[source]#
Compute the bond length.
- Returns:
The bond length.
- Return type:
float
- is_cis() bool[source]#
Check if the bond is a cis bond.
- Returns:
True if the bond is a cis bond, False otherwise.
- Return type:
bool
- is_double() bool[source]#
Check if the bond is a double bond.
- Returns:
True if the bond is a double bond, False otherwise.
- Return type:
bool
- is_single() bool[source]#
Check if the bond is a single bond.
- Returns:
True if the bond is a single bond, False otherwise.
- Return type:
bool
- is_trans() bool[source]#
Check if the bond is a trans bond.
- Returns:
True if the bond is a trans bond, False otherwise.
- Return type:
bool
- is_triple() bool[source]#
Check if the bond is a triple bond.
- Returns:
True if the bond is a triple bond, False otherwise.
- Return type:
bool
- property length: float#
- order#
- to_list() list[source]#
Convert the bond to a list of atom1, atom2, bond_order.
- Returns:
The bond as a list.
- Return type:
list
- to_tuple() tuple[source]#
Convert the bond to a tuple of atom1, atom2, bond_order.
- Returns:
The bond as a tuple.
- Return type:
tuple
- class buildamol.base_classes.Chain(id)[source]#
Bases:
ID,ChainA Chain object that inherits from Biopython’s Chain class.
- Parameters:
id (str) – The chain identifier
- property atoms#
- child_dict: dict[Any, _Child]#
- child_list: list[_Child]#
- copy()[source]#
Return a deep copy of the chain with a new UUID4.
- Returns:
The copied chain.
- Return type:
- equals(other) bool[source]#
Check if the chain is equal to another chain. This will check if the two chains have the same id, the same parent-model id, and have equal residues.
- classmethod from_biopython(chain) Chain[source]#
Convert a BioPython Chain object to a Chain object.
- Parameters:
chain (BioPython Chain object) – The chain to convert.
- Returns:
The converted chain.
- Return type:
- property full_id#
A self-adjusting full_id for an Biopython Chain
- get_residue(residue: str | int) Residue[source]#
Get a residue by its name or serial number.
Note
If there are multiple residues with the same name, the first one will be returned.
- Parameters:
residue (str or int) – The residue name or serial number.
- Returns:
The residue.
- Return type:
- get_residues(*residues: str | int) List[Residue][source]#
Get all residues in the chain.
- Parameters:
residues (str or int, optional) – The residue name or serial number to filter by.
- Returns:
The list of residues. If no residues argument is specified the default generator is returned.
- Return type:
List[Residue]
- internal_coord#
- level: str#
- link(residue)[source]#
Softlink a residue into this chain’s child_list without touching the residue’s own parent references.
- matches(other) bool[source]#
Check if the chain matches another chain. This will return True if the two chains have matching residues.
- property molecule#
- move(vector)[source]#
Move the chain by a vector.
- Parameters:
vector (ndarray) – The vector to move the chain by.
- property name#
Synonym for id.
- classmethod new(id: str) Chain[source]#
Create a blank chain with a given id.
- Parameters:
id (str, optional) – The chain identifier. The default is None.
- Returns:
The blank chain.
- Return type:
- parent: _Parent | None#
- property residues#
- to_biopython() Chain[source]#
Convert a Chain object to a pure BioPython Chain object.
- Parameters:
with_children (bool, optional) – Whether to convert the residues of the chain as well. The default is True.
- Returns:
The converted chain.
- Return type:
bio.Chain.Chain
- unlink(residue)[source]#
Unlink a residue from this chain’s child_list without touching the residue’s own parent references.
- xtra#
- class buildamol.base_classes.Model(id)[source]#
Bases:
Model,IDA Model object that inherits from Biopython’s Model class.
- Parameters:
id (int or str) – The model identifier
- property chains#
Get the chains in the model.
- child_dict: dict[Any, _Child]#
- child_list: list[_Child]#
- copy()[source]#
Return a deep copy of the model with a new UUID4.
- Returns:
The copied model.
- Return type:
- equals(other) bool[source]#
Check if the model is equal to another model. This will return True if the two models have the same id, same parent-structure id, and have matching chains.
- classmethod from_biopython(model)[source]#
Convert a BioPython Model object to a Model object.
- Parameters:
model (BioPython Model object) – The model to convert.
- Returns:
The converted model.
- Return type:
- property full_id#
A self-adjusting full_id for an Biopython Model
- get_chain(chain: str | int) Chain[source]#
Get a chain by its id.
- Parameters:
chain (str or int) – The chain id.
- Returns:
The chain.
- Return type:
- get_chains(*chains: str | int) List[Chain][source]#
Get all chains in the model.
- Parameters:
chains (str or int, optional) – The chain id to filter by.
- Returns:
The list of chains. If no chains argument is specified the default generator is returned.
- Return type:
List[Chain]
- level: str#
- link(chain)[source]#
Softlink a chain into this model’s child_list without touching the chain’s own parent references.
- matches(other) bool[source]#
Check if the model matches another model. This will return True if the two models have matching chains.
- property molecule#
- move(vector)[source]#
Move the model by a vector.
- Parameters:
vector (ndarray) – The vector to move the model by.
- classmethod new(id: int = None) Model[source]#
Create a blank model with a given id.
- Parameters:
id (int) – The model identifier.
- Returns:
The blank model.
- Return type:
- parent: _Parent | None#
- property serial_num#
- property serial_number#
- to_biopython()[source]#
Convert a Model object to a pure BioPython Model object.
- Returns:
The converted model.
- Return type:
bio.Model.Model
- unlink(chain)[source]#
Unlink a chain from this model’s child_list without touching the chain’s own parent references.
- xtra#
- class buildamol.base_classes.Residue(resname, segid=' ', icode=1)[source]#
Bases:
ID,ResidueA Residue object that inherits from Biopython’s Residue class.
- Parameters:
resname (str) – The residue name
segid (str) – The residue segid.
icode (int) – The residue icode. This is the residue serial number.
- add(atom)[source]#
Add an Atom object.
Checks for adding duplicate atoms, and raises a PDBConstructionException if so.
- property atoms#
- child_dict: dict[Any, _Child]#
- child_list: list[_Child]#
- property coord#
- copy() Residue[source]#
Return a deep copy of the residue with a new UUID4.
- Returns:
The copied residue.
- Return type:
- disordered#
- equals(other, include_serial: bool = False) bool[source]#
Check if the residue is equal to another residue. This will check if the two residues are in the same parent and if all atoms are matching.
- classmethod from_biopython(residue) Residue[source]#
Convert a BioPython Residue object to a Residue object.
- Parameters:
residue (BioPython Residue object) – The residue to convert.
- Returns:
The converted residue
- Return type:
- property full_id#
A self-adjusting full_id for an Biopython Residue
- get_atom(atom: str | int) Atom[source]#
Get an atom by its name or serial number.
- Parameters:
atom (str or int) – The atom name or serial number.
- Returns:
The atom.
- Return type:
- get_atoms(*atoms: str | int) List[Atom][source]#
Get all atoms in the residue.
- Parameters:
atoms (str or int, optional) – The atom name or serial number to filter by.
- Returns:
The list of atoms. If no atoms argument is specified the default generator is returned.
- Return type:
List[Atom]
- get_bonds(residue_internal: bool = True) list[source]#
Get a list of all bonds with participating atoms that belong to this residue.
- Parameters:
residue_internal (bool, optional) – Whether to only return bonds that are internal to this residue. Or also include bonds to atoms outside of this residue. The default is True.
- get_neighbors(n: int = 1, mode: str = 'upto', filter: callable = None) set[source]#
Get the neighboring residues of this residue as they appear in the topology ResidueGraph.
- Parameters:
n (int) – The number of bonds to search for neighbors.
mode (str, optional) – The mode to use for searching for neighbors. The default is “upto”, which will return all neighbors within n bonds. Other options are “at” which will return only neighbors that are exactly n bonds away.
filter (callable, optional) – A function that takes a residue as input and returns True if the residue should be included in the output. The default is None.
- property id#
Return identifier.
- internal_coord#
- level: str#
- link(atom)[source]#
Softlink an atom into this residue’s child_list without touching the atom’s own parent references.
- matches(other) bool[source]#
Check if the residue matches another residue. This will return True if the two residues have the same resname, segid, and parent-chain id.
- property molecule#
- move(vector)[source]#
Move the residue by a vector.
- Parameters:
vector (ndarray) – The vector to move the residue by.
- property name#
Synonym for resname.
- classmethod new(resname: str, segid: str = ' ', icode: int = None) Residue[source]#
Create a blank residue with a given name and segid.
- Parameters:
resname (str) – The residue name.
segid (str, optional) – The residue segid. The default is “ “.
icode (int, optional) – The residue icode. The default is None.
- Returns:
The blank residue.
- Return type:
- parent: _Parent | None#
- resname#
- segid#
- to_biopython() Residue[source]#
Convert a Residue object to a pure BioPython Residue object.
- Returns:
The converted residue.
- Return type:
bio.Residue.Residue
- unlink(atom)[source]#
Unlink an atom from this residue’s child_list without touching the atom’s own parent references.
- xtra#
- class buildamol.base_classes.Structure(id)[source]#
Bases:
ID,StructureA Structure object that inherits from Biopython’s Structure class.
- Parameters:
id (str) – The structure identifier
- child_dict: dict[Any, _Child]#
- child_list: list[_Child]#
- copy()[source]#
Return a deep copy of the structure with a new UUID4.
- Returns:
The copied structure.
- Return type:
- equals(other) bool[source]#
Check if the structure is equal to another structure. This will return True if the two structures have the same id and have equal models.
- classmethod from_biopython(structure: Structure) Structure[source]#
Convert a BioPython Structure object to a Structure object.
- Parameters:
structure (BioPython Structure object) – The structure to convert.
- Returns:
The converted structure.
- Return type:
- property full_id#
- level: str#
- link(model)[source]#
Softlink a model into this structure’s child_list without touching the model’s own parent references.
- matches(other) bool[source]#
Check if the structure matches another structure. This will return True if the two structures have the same id.
- property molecule#
- move(vector)[source]#
Move the structure by a vector.
- Parameters:
vector (ndarray) – The vector to move the structure by.
- classmethod new(id: str) Structure[source]#
Create a blank structure with a given id.
- Parameters:
id (str) – The structure identifier.
- Returns:
The blank structure.
- Return type:
- parent: _Parent | None#
- to_biopython() Structure[source]#
Convert a Structure object to a pure BioPython Structure object.
- Returns:
The converted structure.
- Return type:
bio.Structure.Structure
- unlink(model)[source]#
Unlink a model from this structure’s child_list without touching the model’s own parent references.
- xtra#
- Molecule
- Making Molecules
- Modifying Molecules
- Connecting Molecules
Moleculeacetylate()amidate()aminate()benzylate()carboxylate()connect()hydroxylate()make_smiles()methylate()molecule()phenolate()phosphorylate()polymerize()query_pubchem()react()read_cif()read_molfile()read_pdb()read_smiles()thiolate()write_cif()write_molfile()write_pdb()
- Linkage
- Reaction
- buildamol base
- Base Classes