The graphs package#

This module defines classes to represent PDB structures as graph objects. These store the molecule connectivity information. Two graphs are available, the AtomGraph, as an atomic-scale representation of a molecule, and the ResidueGraph which represents each residue as a single node in the graph, thereby abstracting away the atomic details.

The graphs contain many of the analytical methods used for molecule structure analysis and manipulation that does not concern adding or removing atoms.

The graphs also serve an important function in buildamol’s conformational optimization process as they are the main data structure to which the optimization algorithms are applied.

The BaseGraph module#

The BaseGraph is at the basis of both the AtomGraph and ResidueGraph.

The BaseGraph class
class buildamol.graphs.base_graph.BaseGraph(*args, backend=None, **kwargs)[source]#

Bases: Graph

The basic class for molecular graphs

property atoms#

Returns the atoms in the molecule

property bonds#

Returns the bonds in the molecule

property central_node#

Returns the central most node of the graph. This is computed based on the mean of all node coordinates.

property chains#

Returns the chains in the molecule

clear_cache()[source]#

Clear the descendant cache

direct_edges(root_node=None, edges: list = None) list[source]#

Sort the edges such that the first node in each edge is the one closer to the root node. If no root node is provided, the central node is used.

Parameters:
  • root_node – The root node to use for sorting the edges. If not provided, the central node is used.

  • edges (list, optional) – The edges to sort, by default None, in which case all edges are sorted.

Returns:

The sorted edges

Return type:

list

draw()[source]#

Prepare a 3D view of the graph but do not show it yet

Returns:

A 3D viewer

Return type:

PlotlyViewer3D

find_cycles() list[source]#

Find all cycles in the graph

Returns:

A list of cycles in the graph, where each cycle is a list of nodes

Return type:

list

find_edges(root_node=None, min_descendants: int = 1, min_ancestors: int = 1, max_descendants: int = None, max_ancestors: int = None, bond_order: int = None, exclude_cycles: bool = False, only_cycles: bool = False, exclude_locked: bool = False, only_locked: bool = False) list[source]#

Find edges in the graph according to the given criteria. This does not restrict for edges that are rotatable.

Parameters:
  • root_node – A root node by which to direct the edges (closer to further).

  • min_descendants (int, optional) – The minimum number of descendants that an edge must have to be considered rotatable.

  • min_ancestors (int, optional) – The minimum number of ancestors that an edge must have to be considered rotatable.

  • max_descendants (int, optional) – The maximum number of descendants that an edge must have to be considered rotatable.

  • max_ancestors (int, optional) – The maximum number of ancestors that an edge must have to be considered rotatable.

  • bond_order (int or tuple, optional) – The bond order to filter by. If a tuple is given, the bond order must be one of the values in the tuple.

  • exclude_cycles (bool, optional) – Whether to exclude edges that are in cycles, by default False

  • only_cycles (bool, optional) – Whether to only include edges that are in cycles, by default False

  • exclude_locked (bool, optional) – Whether to exclude locked edges, by default False

  • only_locked (bool, optional) – Whether to only include locked edges, by default False

Returns:

A list of rotatable edges

Return type:

list

find_edges_in_cycles() set[source]#

Find all edges that connect nodes in cycles, where both nodes are in the same cycle

Returns:

The edges in cycles

Return type:

set

find_nodes_in_cycles() set[source]#

Find all nodes that are in cycles

Returns:

The nodes in cycles

Return type:

set

find_rotatable_edges(root_node=None, min_descendants: int = 1, min_ancestors: int = 1, max_descendants: int = None, max_ancestors: int = None) list[source]#

Find all edges in the graph that are rotatable (i.e. not locked, single, and not in a circular constellation). You can also filter and direct the edges.

Parameters:
  • root_node – A root node by which to direct the edges (closer to further).

  • min_descendants (int, optional) – The minimum number of descendants that an edge must have to be considered rotatable.

  • min_ancestors (int, optional) – The minimum number of ancestors that an edge must have to be considered rotatable.

  • max_descendants (int, optional) – The maximum number of descendants that an edge must have to be considered rotatable.

  • max_ancestors (int, optional) – The maximum number of ancestors that an edge must have to be considered rotatable.

Returns:

A list of rotatable edges

Return type:

list

get_ancestors(node_1, node_2, use_cache: bool = True)[source]#

Get all ancestor nodes that come before a specific edge defined in the direction from node1 to node2 (i.e. get all nodes that comebefore node1). This method is directed in contrast to the get_neighbors() method, which will get all neighboring nodes of an anchor node irrespective of direction.

Parameters:
  • node_1 – The nodes that define the edge

  • node_2 – The nodes that define the edge

  • use_cache (bool, optional) – Whether to use the cache for the ancestors, by default True. If True and the graph has not received new nodes since the last time the cache was updated, a simple lookup is performed. Otherwise the ancestor nodes are recursively calculated again.

Returns:

The ancestor nodes

Return type:

set

Examples

In case of this graph:

A---B---C---D---E
    \
    F---H
    |
    G

``` A—B—C—D—E

F—H | G

```

>>> graph.get_ancestors("B", "C")
{"A", "F", "G", "H"}
>>> graph.get_ancestors("F", "B")
{"H", "G"}
>>> graph.get_ancestors("A", "B")
set() # because in this direction there are no other nodes
get_cycle(node, cycles=None) set[source]#

Get the cycle that a node is in

Parameters:

node – The node to check

Returns:

The nodes in the cycle that the node is in. If the node is not in a cycle, None is returned.

Return type:

set

get_descendants(node_1, node_2, use_cache: bool = True)[source]#

Get all descendant nodes that come after a specific edge defined in the direction from node1 to node2 (i.e. get all nodes that come after node2). This method is directed in contrast to the get_neighbors() method, which will get all neighboring nodes of an anchor node irrespective of direction.

Parameters:
  • node_1 – The nodes that define the edge

  • node_2 – The nodes that define the edge

  • use_cache (bool, optional) – Whether to use the cache for the descendants, by default True. If True and the graph has not received new nodes since the last time the cache was updated, a simple lookup is performed. Otherwise the descendant nodes are recursively calculated again.

Returns:

The descendant nodes

Return type:

set

Examples

In case of this graph:

A---B---C---D---E
    \
    F---H
    |
    G

``` A—B—C—D—E

F—H | G

```

>>> graph.get_descendants("B", "C")
{"D", "E"}
>>> graph.get_descendants("B", "F")
{"H", "G"}
>>> graph.get_descendants("B", "A")
set() # because in this direction there are no other nodes
get_locked_edges()[source]#

Get all locked edges

Returns:

The locked edges

Return type:

set

abstractmethod get_neighbors(node, n: int = 1, mode='upto')[source]#

Get the neighbors of a node

Parameters:
  • node – The target node

  • n (int, optional) – The number of edges to separate the node from its neighbors.

  • mode (str, optional) – The mode to use for getting the neighbors, by default “upto” - “upto”: get all neighbors up to a distance of n edges - “exact”: get all neighbors exactly n edges away

Returns:

The neighbors of the node

Return type:

set

get_unlocked_edges()[source]#

Get all unlocked edges

Returns:

The unlocked edges

Return type:

set

in_cycle(node, cycles=None) bool[source]#

Check if a node is in a cycle

Parameters:

node – The node to check

Returns:

True if the node is in a cycle, False otherwise

Return type:

bool

in_same_cycle(node_1, node_2, cycles=None) bool[source]#

Check if two nodes are in the same cycle

Parameters:
  • node_1 – The nodes to check

  • node_2 – The nodes to check

Returns:

True if the nodes are in the same cycle, False otherwise

Return type:

bool

is_locked(node_1, node_2)[source]#

Check if an edge is locked

Parameters:
  • node_1 – The nodes that define the edge

  • node_2 – The nodes that define the edge

Returns:

Whether the edge is locked

Return type:

bool

lock_all()[source]#

Lock all edges

lock_edge(node_1, node_2)[source]#

Lock an edge, preventing it from being rotated.

Parameters:
  • node_1 – The nodes that define the edge

  • node_2 – The nodes that define the edge

property nodes_in_cycles: set#

Returns the nodes in cycles

property residues#

Returns the residues in the molecule

rotate_around_edge(node_1, node_2, angle: float, descendants_only: bool = False, update_coords: bool = True)[source]#

Rotate descending nodes around a specific edge by a given angle.

Parameters:
  • node_1 – The nodes that define the edge around which to rotate.

  • node_2 – The nodes that define the edge around which to rotate.

  • angle (float) – The angle to rotate by, in radians.

  • descendants_only (bool, optional) – Whether to only rotate the descending nodes, by default False, in which case the entire graph will be rotated.

  • update_coords (bool, optional) – Whether to update the coordinates of the nodes after rotation, by default True.

Returns:

new_coords – The new coordinates of the nodes after rotation.

Return type:

dict

sample_edges(edges: list = None, n: int = 3, m: int = 3) list[source]#

Sample a number of rotatable edges from the graph. This is done by clustering the nodes together to sample “representive” edges from each cluster. This is useful for subsampling the rotatable edges for an optimization to reduce the search space.

Parameters:
  • edges (list, optional) – The edges to sample from, by default None, in which case all rotatable edges are sampled.

  • n (int) – The number of clusters to sample from.

  • m (int) – The number of edges to sample from each cluster

  • root_node – A root node to direct the edges (optional)

Returns:

A list of sampled edges

Return type:

list

search_by_constraints(constraints: list) list[source]#

Search for neighboring nodes that match a set of constraints.

Parameters:

constraints (list) – A list of constraint functions, where each entry represents the constraints for a specific node. All constraints must be satisfied for all nodes in the neighborhood to be considered a match.

Returns:

A list of dictionaries where each dictionary contains nodes that match the constraints. The keys represent the constraint index which the nodes satisfy and the values are the nodes themselves.

Return type:

list

show()[source]#

Show the graph

property structure#

Returns the underlying bio.PDB.Structure object

unlock_all()[source]#

Unlock all edges

unlock_edge(node_1, node_2)[source]#

Unlock an edge, allowing it to be rotated.

Parameters:
  • node_1 – The nodes that define the edge

  • node_2 – The nodes that define the edge

The AtomGraph module#

The AtomGraph handles the atom connectivity within Molecule objects. It provides the bulk of connectivity related methods such as get_neighbors.

The AtomGraph class
class buildamol.graphs.AtomGraph(*args, backend=None, **kwargs)[source]#

Bases: BaseGraph

A graph representation of atoms and bonds in a contiguous molecule.

draw()[source]#

Prepare a 3D view of the graph but do not show it yet

Returns:

A 3D viewer

Return type:

PlotlyViewer3D

classmethod from_biopython(structure, apply_standard_bonds: bool = True, infer_residue_connections: bool = True, infer_bonds: bool = False, max_bond_length: float = None, restrict_residues: bool = True, _topology=None)[source]#

Create an AtomGraph from a biopython structure

Parameters:
  • structure – The biopython structure. This can be any biopython object that houses atoms.

  • infer_residue_connections (bool) – Whether to infer residue connecting bonds based on atom distances.

  • infer_bonds (bool) – Whether to infer bonds from the distance between atoms. If this is set to True, standard bonds cannot be also applied!

  • max_bond_length (float) – The maximum distance between atoms to infer a bond. If none is given, a default bond length is assumed.

  • restrict_residues (bool) – Whether to restrict to atoms of the same residue when inferring bonds. If set to False, this will also infer residue connecting bonds.

  • _topology – A specific reference topology to use when re-constructing any missing parts. By default the default CHARMM topology is used.

Returns:

The AtomGraph representation of the molecule

Return type:

AtomGraph

classmethod from_molecule(mol, locked: bool = False)[source]#

Create an AtomGraph from a molecule

Parameters:
  • mol (buildamol.molecule.Molecule) – The molecule to convert

  • locked (bool, optional) – If True, any information about locked bonds will also be transferred to the AtomGraph, by default False.

Returns:

The AtomGraph representation of the molecule

Return type:

AtomGraph

get_neighbors(atom: Atom, n: int = 1, mode='upto')[source]#

Get the neighbors of a node

Parameters:
  • atom (Atom) – The atom

  • n (int, optional) – The number of bonds to separate the atom from its neighbors.

  • mode (str, optional) – The mode to use for getting the neighbors, by default “upto” - “upto”: get all neighbors up to a distance of n bonds - “exact”: get all neighbors exactly n bonds away

Returns:

The neighbors of the atom

Return type:

set

migrate_bonds(other)[source]#

Migrate bonds from another graph

Parameters:

other (AtomGraph) – The other graph to migrate bonds from

search_by_constraints(constraints: list) list[source]#

Search for neighboring nodes that match a set of constraints.

Parameters:

constraints (list) – A list of constraint functions, where each entry represents the constraints for a specific node. All constraints must be satisfied for all nodes in the neighborhood to be considered a match.

Returns:

A list of dictionaries where each dictionary contains nodes that match the constraints. The keys represent the constraint index which the nodes satisfy and the values are the nodes themselves.

Return type:

list

The ResidueGraph module#

The ResidueGraph handles the residue connectivity within Molecule objects. It is an abstraction of the AtomGraph and provides the many of the same methods. ResidueGraph objects serve as primary input for structural optimization algorithms in the optimizers package of buildamol.

The ResidueGraph class
class buildamol.graphs.ResidueGraph(*args, backend=None, **kwargs)[source]#

Bases: BaseGraph

A graph representation of residues bonded together as an abstraction of a large contiguous molecule.

add_atomic_bonds(*edges)[source]#

Add atom-level bonds to the graph.

Parameters:

*edges – The edges to add

property atomic_bonds#

Get the atomic-level bonds in the molecule.

Returns:

The atomic-level bonds in the molecule

Return type:

dict

centers_of_mass()[source]#

Get the centers of mass of the residues in the molecule.

Returns:

The centers of mass of the residues in the molecule

Return type:

dict

draw()[source]#

Prepare a 3D view of the graph but do not show it yet

Returns:

A 3D viewer

Return type:

PlotlyViewer3D

find_rotatable_edges(root_node=None, min_descendants: int = 1, min_ancestors: int = 1, max_descendants: int = None, max_ancestors: int = None)[source]#

Find all edges in the graph that are rotatable (i.e. not locked, single, and not in a circular constellation). You can also filter and direct the edges.

Parameters:
  • root_node – A root node by which to direct the edges (closer to further).

  • min_descendants (int, optional) – The minimum number of descendants that an edge must have to be considered rotatable.

  • min_ancestors (int, optional) – The minimum number of ancestors that an edge must have to be considered rotatable.

  • max_descendants (int, optional) – The maximum number of descendants that an edge must have to be considered rotatable.

  • max_ancestors (int, optional) – The maximum number of ancestors that an edge must have to be considered rotatable.

Returns:

A list of rotatable edges

Return type:

list

classmethod from_AtomGraph(atom_graph, infer_connections: bool = None)[source]#

Create a ResidueGraph from an AtomGraph.

Parameters:
  • atom_graph (AtomGraph) – The AtomGraph representation of the molecule

  • infer_connections (bool) – Whether to infer the bonds between residues from the atom-level bonds. If the AtomGraph already contains atom-level bonds that connect different residues, this is not necessary. If this is set to None, connections will be inferred automatically if no atom-level bonds are present in the AtomGraph.

Returns:

The ResidueGraph representation of the molecule

Return type:

ResidueGraph

classmethod from_molecule(mol, detailed: bool = False, locked: bool = True)[source]#

Create a ResidueGraph from a molecule object.

Parameters:
  • mol (Molecule) – The molecule object

  • detailed (bool) – Whether to make a “detailed” residue graph representation including the atomic-scale bonds between residues. If True, locked bonds can be directly migrated from the molecule.

  • locked (bool) – Whether to migrate locked bonds from the molecule. This is only possible if detailed is True.

Returns:

The ResidueGraph representation of the molecule

Return type:

ResidueGraph

get_atomic_bond(residue1, residue2) tuple[source]#

Get the atomic-level bond between two residues.

Parameters:
  • residue1 (Residue or str) – The first residue or it’s id

  • residue2 (Residue or str) – The second residue or it’s id

Returns:

The atomic bond between the two residues

Return type:

tuple

get_neighbors(residue: Residue, n: int = 1, mode='upto')[source]#

Get the neighbors of a residue

Parameters:
  • residue (bio.Residue.Residue) – The target residue

  • n (int, optional) – The number of connections to separate the residue from its neighbors.

  • mode (str, optional) – The mode to use for getting the neighbors, by default “upto” - “upto”: get all neighbors up to a distance of n bonds - “exact”: get all neighbors exactly n bonds away

Returns:

The neighbors of the residue

Return type:

set

get_residue(r)[source]#

Get a residue in the molecule.

Parameters:

r (str or Residue) – The residue or it’s id

Returns:

The residue

Return type:

Residue

lock_centers()[source]#

Lock any edges that connect residue centers of mass to their constituent atoms. This only applies to detailed graphs.

make_detailed(include_samples: bool = True, include_far_away: bool = False, include_heteroatoms: bool = False, include_clashes: bool = True, n_samples: int | float = 0.5, f: float = 1.0, no_hydrogens: bool = False) ResidueGraph[source]#

Use a detailed representation of the residues in the molecule by adding the specific atoms that connect the residues together. This is useful for visualization and analysis.

Note

This function is not reversible. It is applied in-place.

Parameters:
  • include_samples (bool) – If True, a number of atoms are sampled from each residue and included in the detailed representation.

  • include_far_away (bool) – If True, atoms that are not involved in residue connections are also included if their distance to the residue’s center of mass is greater than f * the 75th percentile of atom distances to the residue’s center of mass.

  • include_heteroatoms (bool) – If True, all hetero-atoms are included in the detailed representation, regardless of their distance to the residue center of mass.

  • include_clashes (bool) – If True, all atoms that are involved in a clash are included in the detailed representation.

  • n_samples (int or float) – The number or fraction of atoms to sample from each residue if include_samples is True. If a fraction in range (0,1) is given instead of an integer, the number of atoms to sample is adjusted according to the residue size.

  • f (float) – The factor by which the 75th percentile of atom distances to the residue’s center of mass is multiplied to determine the cutoff distance for outlier atoms. This is only used if include_outliers is True.

  • no_hydrogens (bool) – If True, hydrogens are not included in the detailed representation.

prune_triplets()[source]#

Prune bond triangles where two nodes from the same residue are connected to each other and the residue…

property residues#

Get the residues in the molecule.

Returns:

The residues in the molecule

Return type:

list

search_by_constraints(constraints: list) list[source]#

Search for neighboring nodes that match a set of constraints.

Parameters:

constraints (list) – A list of constraint functions, where each entry represents the constraints for a specific node. All constraints must be satisfied for all nodes in the neighborhood to be considered a match.

Returns:

A list of dictionaries where each dictionary contains nodes that match the constraints. The keys represent the constraint index which the nodes satisfy and the values are the nodes themselves.

Return type:

list

to_AtomGraph()[source]#

Convert the ResidueGraph to an AtomGraph.

Returns:

The AtomGraph representation of the molecule

Return type:

AtomGraph