The graphs package#
This module defines classes to represent PDB structures as graph objects. These store the molecule connectivity information. Two graphs are available, the AtomGraph, as an atomic-scale representation of a molecule, and the ResidueGraph which represents each residue as a single node in the graph, thereby abstracting away the atomic details.
The graphs contain many of the analytical methods used for molecule structure analysis and manipulation that does not concern adding or removing atoms.
The graphs also serve an important function in buildamol’s conformational optimization process as they are the main data structure to which the optimization algorithms are applied.
The BaseGraph module#
The BaseGraph is at the basis of both the AtomGraph and ResidueGraph.
The BaseGraph class
- class buildamol.graphs.base_graph.BaseGraph(*args, backend=None, **kwargs)[source]#
Bases:
GraphThe basic class for molecular graphs
- property atoms#
Returns the atoms in the molecule
- property bonds#
Returns the bonds in the molecule
- property central_node#
Returns the central most node of the graph. This is computed based on the mean of all node coordinates.
- property chains#
Returns the chains in the molecule
- direct_edges(root_node=None, edges: list = None) list[source]#
Sort the edges such that the first node in each edge is the one closer to the root node. If no root node is provided, the central node is used.
- Parameters:
root_node – The root node to use for sorting the edges. If not provided, the central node is used.
edges (list, optional) – The edges to sort, by default None, in which case all edges are sorted.
- Returns:
The sorted edges
- Return type:
list
- draw()[source]#
Prepare a 3D view of the graph but do not show it yet
- Returns:
A 3D viewer
- Return type:
- find_cycles() list[source]#
Find all cycles in the graph
- Returns:
A list of cycles in the graph, where each cycle is a list of nodes
- Return type:
list
- find_edges(root_node=None, min_descendants: int = 1, min_ancestors: int = 1, max_descendants: int = None, max_ancestors: int = None, bond_order: int = None, exclude_cycles: bool = False, only_cycles: bool = False, exclude_locked: bool = False, only_locked: bool = False) list[source]#
Find edges in the graph according to the given criteria. This does not restrict for edges that are rotatable.
- Parameters:
root_node – A root node by which to direct the edges (closer to further).
min_descendants (int, optional) – The minimum number of descendants that an edge must have to be considered rotatable.
min_ancestors (int, optional) – The minimum number of ancestors that an edge must have to be considered rotatable.
max_descendants (int, optional) – The maximum number of descendants that an edge must have to be considered rotatable.
max_ancestors (int, optional) – The maximum number of ancestors that an edge must have to be considered rotatable.
bond_order (int or tuple, optional) – The bond order to filter by. If a tuple is given, the bond order must be one of the values in the tuple.
exclude_cycles (bool, optional) – Whether to exclude edges that are in cycles, by default False
only_cycles (bool, optional) – Whether to only include edges that are in cycles, by default False
exclude_locked (bool, optional) – Whether to exclude locked edges, by default False
only_locked (bool, optional) – Whether to only include locked edges, by default False
- Returns:
A list of rotatable edges
- Return type:
list
- find_edges_in_cycles() set[source]#
Find all edges that connect nodes in cycles, where both nodes are in the same cycle
- Returns:
The edges in cycles
- Return type:
set
- find_nodes_in_cycles() set[source]#
Find all nodes that are in cycles
- Returns:
The nodes in cycles
- Return type:
set
- find_rotatable_edges(root_node=None, min_descendants: int = 1, min_ancestors: int = 1, max_descendants: int = None, max_ancestors: int = None) list[source]#
Find all edges in the graph that are rotatable (i.e. not locked, single, and not in a circular constellation). You can also filter and direct the edges.
- Parameters:
root_node – A root node by which to direct the edges (closer to further).
min_descendants (int, optional) – The minimum number of descendants that an edge must have to be considered rotatable.
min_ancestors (int, optional) – The minimum number of ancestors that an edge must have to be considered rotatable.
max_descendants (int, optional) – The maximum number of descendants that an edge must have to be considered rotatable.
max_ancestors (int, optional) – The maximum number of ancestors that an edge must have to be considered rotatable.
- Returns:
A list of rotatable edges
- Return type:
list
- get_ancestors(node_1, node_2, use_cache: bool = True)[source]#
Get all ancestor nodes that come before a specific edge defined in the direction from node1 to node2 (i.e. get all nodes that comebefore node1). This method is directed in contrast to the get_neighbors() method, which will get all neighboring nodes of an anchor node irrespective of direction.
- Parameters:
node_1 – The nodes that define the edge
node_2 – The nodes that define the edge
use_cache (bool, optional) – Whether to use the cache for the ancestors, by default True. If True and the graph has not received new nodes since the last time the cache was updated, a simple lookup is performed. Otherwise the ancestor nodes are recursively calculated again.
- Returns:
The ancestor nodes
- Return type:
set
Examples
In case of this graph:
A---B---C---D---E \ F---H | G
``` A—B—C—D—E
F—H | G
>>> graph.get_ancestors("B", "C") {"A", "F", "G", "H"} >>> graph.get_ancestors("F", "B") {"H", "G"} >>> graph.get_ancestors("A", "B") set() # because in this direction there are no other nodes
- get_cycle(node, cycles=None) set[source]#
Get the cycle that a node is in
- Parameters:
node – The node to check
- Returns:
The nodes in the cycle that the node is in. If the node is not in a cycle, None is returned.
- Return type:
set
- get_descendants(node_1, node_2, use_cache: bool = True)[source]#
Get all descendant nodes that come after a specific edge defined in the direction from node1 to node2 (i.e. get all nodes that come after node2). This method is directed in contrast to the get_neighbors() method, which will get all neighboring nodes of an anchor node irrespective of direction.
- Parameters:
node_1 – The nodes that define the edge
node_2 – The nodes that define the edge
use_cache (bool, optional) – Whether to use the cache for the descendants, by default True. If True and the graph has not received new nodes since the last time the cache was updated, a simple lookup is performed. Otherwise the descendant nodes are recursively calculated again.
- Returns:
The descendant nodes
- Return type:
set
Examples
In case of this graph:
A---B---C---D---E \ F---H | G
``` A—B—C—D—E
F—H | G
>>> graph.get_descendants("B", "C") {"D", "E"} >>> graph.get_descendants("B", "F") {"H", "G"} >>> graph.get_descendants("B", "A") set() # because in this direction there are no other nodes
- abstractmethod get_neighbors(node, n: int = 1, mode='upto')[source]#
Get the neighbors of a node
- Parameters:
node – The target node
n (int, optional) – The number of edges to separate the node from its neighbors.
mode (str, optional) – The mode to use for getting the neighbors, by default “upto” - “upto”: get all neighbors up to a distance of n edges - “exact”: get all neighbors exactly n edges away
- Returns:
The neighbors of the node
- Return type:
set
- in_cycle(node, cycles=None) bool[source]#
Check if a node is in a cycle
- Parameters:
node – The node to check
- Returns:
True if the node is in a cycle, False otherwise
- Return type:
bool
- in_same_cycle(node_1, node_2, cycles=None) bool[source]#
Check if two nodes are in the same cycle
- Parameters:
node_1 – The nodes to check
node_2 – The nodes to check
- Returns:
True if the nodes are in the same cycle, False otherwise
- Return type:
bool
- is_locked(node_1, node_2)[source]#
Check if an edge is locked
- Parameters:
node_1 – The nodes that define the edge
node_2 – The nodes that define the edge
- Returns:
Whether the edge is locked
- Return type:
bool
- lock_edge(node_1, node_2)[source]#
Lock an edge, preventing it from being rotated.
- Parameters:
node_1 – The nodes that define the edge
node_2 – The nodes that define the edge
- property nodes_in_cycles: set#
Returns the nodes in cycles
- property residues#
Returns the residues in the molecule
- rotate_around_edge(node_1, node_2, angle: float, descendants_only: bool = False, update_coords: bool = True)[source]#
Rotate descending nodes around a specific edge by a given angle.
- Parameters:
node_1 – The nodes that define the edge around which to rotate.
node_2 – The nodes that define the edge around which to rotate.
angle (float) – The angle to rotate by, in radians.
descendants_only (bool, optional) – Whether to only rotate the descending nodes, by default False, in which case the entire graph will be rotated.
update_coords (bool, optional) – Whether to update the coordinates of the nodes after rotation, by default True.
- Returns:
new_coords – The new coordinates of the nodes after rotation.
- Return type:
dict
- sample_edges(edges: list = None, n: int = 3, m: int = 3) list[source]#
Sample a number of rotatable edges from the graph. This is done by clustering the nodes together to sample “representive” edges from each cluster. This is useful for subsampling the rotatable edges for an optimization to reduce the search space.
- Parameters:
edges (list, optional) – The edges to sample from, by default None, in which case all rotatable edges are sampled.
n (int) – The number of clusters to sample from.
m (int) – The number of edges to sample from each cluster
root_node – A root node to direct the edges (optional)
- Returns:
A list of sampled edges
- Return type:
list
- search_by_constraints(constraints: list) list[source]#
Search for neighboring nodes that match a set of constraints.
- Parameters:
constraints (list) – A list of constraint functions, where each entry represents the constraints for a specific node. All constraints must be satisfied for all nodes in the neighborhood to be considered a match.
- Returns:
A list of dictionaries where each dictionary contains nodes that match the constraints. The keys represent the constraint index which the nodes satisfy and the values are the nodes themselves.
- Return type:
list
- property structure#
Returns the underlying bio.PDB.Structure object
The AtomGraph module#
The AtomGraph handles the atom connectivity within Molecule objects. It provides the bulk of connectivity related methods such as get_neighbors.
The AtomGraph class
- class buildamol.graphs.AtomGraph(*args, backend=None, **kwargs)[source]#
Bases:
BaseGraphA graph representation of atoms and bonds in a contiguous molecule.
- draw()[source]#
Prepare a 3D view of the graph but do not show it yet
- Returns:
A 3D viewer
- Return type:
- classmethod from_biopython(structure, apply_standard_bonds: bool = True, infer_residue_connections: bool = True, infer_bonds: bool = False, max_bond_length: float = None, restrict_residues: bool = True, _topology=None)[source]#
Create an AtomGraph from a biopython structure
- Parameters:
structure – The biopython structure. This can be any biopython object that houses atoms.
infer_residue_connections (bool) – Whether to infer residue connecting bonds based on atom distances.
infer_bonds (bool) – Whether to infer bonds from the distance between atoms. If this is set to True, standard bonds cannot be also applied!
max_bond_length (float) – The maximum distance between atoms to infer a bond. If none is given, a default bond length is assumed.
restrict_residues (bool) – Whether to restrict to atoms of the same residue when inferring bonds. If set to False, this will also infer residue connecting bonds.
_topology – A specific reference topology to use when re-constructing any missing parts. By default the default CHARMM topology is used.
- Returns:
The AtomGraph representation of the molecule
- Return type:
- classmethod from_molecule(mol, locked: bool = False)[source]#
Create an AtomGraph from a molecule
- Parameters:
mol (buildamol.molecule.Molecule) – The molecule to convert
locked (bool, optional) – If True, any information about locked bonds will also be transferred to the AtomGraph, by default False.
- Returns:
The AtomGraph representation of the molecule
- Return type:
- get_neighbors(atom: Atom, n: int = 1, mode='upto')[source]#
Get the neighbors of a node
- Parameters:
atom (Atom) – The atom
n (int, optional) – The number of bonds to separate the atom from its neighbors.
mode (str, optional) – The mode to use for getting the neighbors, by default “upto” - “upto”: get all neighbors up to a distance of n bonds - “exact”: get all neighbors exactly n bonds away
- Returns:
The neighbors of the atom
- Return type:
set
- migrate_bonds(other)[source]#
Migrate bonds from another graph
- Parameters:
other (AtomGraph) – The other graph to migrate bonds from
- search_by_constraints(constraints: list) list[source]#
Search for neighboring nodes that match a set of constraints.
- Parameters:
constraints (list) – A list of constraint functions, where each entry represents the constraints for a specific node. All constraints must be satisfied for all nodes in the neighborhood to be considered a match.
- Returns:
A list of dictionaries where each dictionary contains nodes that match the constraints. The keys represent the constraint index which the nodes satisfy and the values are the nodes themselves.
- Return type:
list
The ResidueGraph module#
The ResidueGraph handles the residue connectivity within Molecule objects. It is an abstraction of the AtomGraph and provides the many of the same methods. ResidueGraph objects serve as primary input for structural optimization algorithms in the optimizers package of buildamol.
The ResidueGraph class
- class buildamol.graphs.ResidueGraph(*args, backend=None, **kwargs)[source]#
Bases:
BaseGraphA graph representation of residues bonded together as an abstraction of a large contiguous molecule.
- add_atomic_bonds(*edges)[source]#
Add atom-level bonds to the graph.
- Parameters:
*edges – The edges to add
- property atomic_bonds#
Get the atomic-level bonds in the molecule.
- Returns:
The atomic-level bonds in the molecule
- Return type:
dict
- centers_of_mass()[source]#
Get the centers of mass of the residues in the molecule.
- Returns:
The centers of mass of the residues in the molecule
- Return type:
dict
- draw()[source]#
Prepare a 3D view of the graph but do not show it yet
- Returns:
A 3D viewer
- Return type:
- find_rotatable_edges(root_node=None, min_descendants: int = 1, min_ancestors: int = 1, max_descendants: int = None, max_ancestors: int = None)[source]#
Find all edges in the graph that are rotatable (i.e. not locked, single, and not in a circular constellation). You can also filter and direct the edges.
- Parameters:
root_node – A root node by which to direct the edges (closer to further).
min_descendants (int, optional) – The minimum number of descendants that an edge must have to be considered rotatable.
min_ancestors (int, optional) – The minimum number of ancestors that an edge must have to be considered rotatable.
max_descendants (int, optional) – The maximum number of descendants that an edge must have to be considered rotatable.
max_ancestors (int, optional) – The maximum number of ancestors that an edge must have to be considered rotatable.
- Returns:
A list of rotatable edges
- Return type:
list
- classmethod from_AtomGraph(atom_graph, infer_connections: bool = None)[source]#
Create a ResidueGraph from an AtomGraph.
- Parameters:
atom_graph (AtomGraph) – The AtomGraph representation of the molecule
infer_connections (bool) – Whether to infer the bonds between residues from the atom-level bonds. If the AtomGraph already contains atom-level bonds that connect different residues, this is not necessary. If this is set to None, connections will be inferred automatically if no atom-level bonds are present in the AtomGraph.
- Returns:
The ResidueGraph representation of the molecule
- Return type:
- classmethod from_molecule(mol, detailed: bool = False, locked: bool = True)[source]#
Create a ResidueGraph from a molecule object.
- Parameters:
mol (Molecule) – The molecule object
detailed (bool) – Whether to make a “detailed” residue graph representation including the atomic-scale bonds between residues. If True, locked bonds can be directly migrated from the molecule.
locked (bool) – Whether to migrate locked bonds from the molecule. This is only possible if detailed is True.
- Returns:
The ResidueGraph representation of the molecule
- Return type:
- get_neighbors(residue: Residue, n: int = 1, mode='upto')[source]#
Get the neighbors of a residue
- Parameters:
residue (bio.Residue.Residue) – The target residue
n (int, optional) – The number of connections to separate the residue from its neighbors.
mode (str, optional) – The mode to use for getting the neighbors, by default “upto” - “upto”: get all neighbors up to a distance of n bonds - “exact”: get all neighbors exactly n bonds away
- Returns:
The neighbors of the residue
- Return type:
set
- lock_centers()[source]#
Lock any edges that connect residue centers of mass to their constituent atoms. This only applies to detailed graphs.
- make_detailed(include_samples: bool = True, include_far_away: bool = False, include_heteroatoms: bool = False, include_clashes: bool = True, n_samples: int | float = 0.5, f: float = 1.0, no_hydrogens: bool = False) ResidueGraph[source]#
Use a detailed representation of the residues in the molecule by adding the specific atoms that connect the residues together. This is useful for visualization and analysis.
Note
This function is not reversible. It is applied in-place.
- Parameters:
include_samples (bool) – If True, a number of atoms are sampled from each residue and included in the detailed representation.
include_far_away (bool) – If True, atoms that are not involved in residue connections are also included if their distance to the residue’s center of mass is greater than f * the 75th percentile of atom distances to the residue’s center of mass.
include_heteroatoms (bool) – If True, all hetero-atoms are included in the detailed representation, regardless of their distance to the residue center of mass.
include_clashes (bool) – If True, all atoms that are involved in a clash are included in the detailed representation.
n_samples (int or float) – The number or fraction of atoms to sample from each residue if include_samples is True. If a fraction in range (0,1) is given instead of an integer, the number of atoms to sample is adjusted according to the residue size.
f (float) – The factor by which the 75th percentile of atom distances to the residue’s center of mass is multiplied to determine the cutoff distance for outlier atoms. This is only used if include_outliers is True.
no_hydrogens (bool) – If True, hydrogens are not included in the detailed representation.
- prune_triplets()[source]#
Prune bond triangles where two nodes from the same residue are connected to each other and the residue…
- property residues#
Get the residues in the molecule.
- Returns:
The residues in the molecule
- Return type:
list
- search_by_constraints(constraints: list) list[source]#
Search for neighboring nodes that match a set of constraints.
- Parameters:
constraints (list) – A list of constraint functions, where each entry represents the constraints for a specific node. All constraints must be satisfied for all nodes in the neighborhood to be considered a match.
- Returns:
A list of dictionaries where each dictionary contains nodes that match the constraints. The keys represent the constraint index which the nodes satisfy and the values are the nodes themselves.
- Return type:
list