Molecular Factories#

The molecular_factories package provides a collection of tools for the automatic generation of molecular structures.

Specifically implemented are:

The Assembler class is a class that can be to assemble molecules from a library of fragments.

It requires a list of Molecules that serve as the fragments to be assembled. The class will then generate random molecules by randomly selecting fragments from the library and attaching them to each other.

Usage#

  1. Create a list of fragments to be used for assembly

  2. Create an instance of the Assembler class with the list of fragments

  3. Use the sample method to generate random molecules or the make method to create a specific fragment from an instruction matrix

Example

Let’s make a little toy example

import buildamol as bam
from buildamol.extensions.molecular_factories import Assembler
import matplotlib.pyplot as plt

# get some molecules to serve as fragments
fragments = [
    bam.Molecule.from_smiles("C1=CC=CC=C1", id="A").autolabel(),
    bam.Molecule.from_smiles("CC=O", id="B").autolabel(),
    bam.Molecule.from_smiles("COC=C", id="C").autolabel(),
    bam.Molecule.from_smiles("C1=CCC=C1", id="D").autolabel(),
    bam.Molecule.from_smiles("C(C)N", id="E").autolabel(),
]

# make the assembler
assembler = Assembler(fragments)

# generate some molecules from 3 fragments each
# let's make 9 molecules
molecules = assembler.sample(n_fragments=3, n=9)

fig, axs = plt.subplots(3, 3, figsize=(12, 12))
for mol, ax in zip(molecules, axs.flat):
        ax.imshow(
            mol.draw2d().draw(),
        )
        ax.axis("off")
plt.show()
_images/assembler_example1.png

Making Molecules from Arrays#

We can also use the make method to create a specific molecule from an instruction matrix This matrix is a 2D numpy array where each row corresponds to an instruction for attaching the next fragment onto the molecule. The columns are as follows:

[
[incoming_fragment_global_index, incoming_atom_index, target_fragment_atom],
[incoming_fragment_global_index, incoming_atom_index, target_fragment_atom]
...
]

The incoming_fragment_global_index is the index of the fragment in the fragment library (i.e. in the list). The incoming_atom_index is the index of the atom in the incoming fragment that will be attached to the target fragment (i.e. the attachment point). The target_fragment_atom is the index of the atom in the target fragment that will be attached to the incoming fragment.

Let’s make a molecule from an instruction matrix. Let’s take the fourth fragment molecule as a start. Then attach the second fragment molecule to it, by attaching the its second atom to the first atom of already present molecule. Then attach again the fourth fragment onto the molecule by attaching its first atom to the first atom of the second fragment in the molecule.

matrix = np.array([
[3, 0, 0],
[1, 1, 0],
[3, 0, 0],
])

mol = assembler.make(matrix)
mol.draw2d().show()
_images/assembler_example2.jpg

If including this into an automatic pipeline or an optimization loop it is recommended to wrap the whole thing into a try-except block to catch any errors that might occur due to invalid matrices. The clue is that the atoms used for attachment should not be used more than once in the matrix. If they are used more than once, the molecule will not be able to be assembled leading to an error.

class buildamol.extensions.molecular_factories.assembler.Assembler(fragments: list)[source]#

Bases: object

The Assembler class is a class that can be to assemble molecules from a library of fragments. Each molecule is a linear chain of fragments that are attached to each other.

Parameters:

fragments (list) – A list of Molecules that serve as the fragments to be assembled.

make(matrix: ndarray) Molecule[source]#

Assemble a molecule based on an instruction matrix

Parameters:

matrix (np.ndarray) – The matrix encoding for the molecule

Returns:

The assembled molecule

Return type:

Molecule

random(n_fragments: int) ndarray[source]#

Make a random matrix encoding for a molecule assembled from fragments.

Parameters:

n_fragments (int) – The number of fragments to use for the molecule

Returns:

A matrix encoding for the molecule

Return type:

np.ndarray

sample(n_fragments: int, n: int = 1)[source]#

Generate n random molecules from the fragment library

Parameters:
  • n_fragments (int) – The number of fragments to use for each molecule

  • n (int) – The number of molecules to generate

Yields:

Molecule – A molecule assembled from the fragments

specify_attachment_points(fragment_or_index, points: list)[source]#

Specify the attachment points for a fragment

Parameters:
  • fragment_or_index (int or Molecule) – The fragment for which to specify the attachment points

  • points (list) – The attachment points to specify. These must be the indices of the atoms in the fragment as they appear in fragment.get_atoms() (NOT the serial_number!).

The Derivator class can be used to create stochastic and/or combinatorical derivatives of a molecule.

Usage#

  1. Create a Derivator object with a molecule as an argument.

  2. Use the element_changable, functional_group_addable, bond_order_changable, and modifiable methods to specify the possible changes that can be made to the molecule.

  3. Use the sample, all, or make method to generate one or more derivative molecule(s).

Example

Let’s say we have a benzene ring which we want to derivatize. Let’s say further that we are interested in changing the elements of two atoms and adding one functional group to an atom. We can do this as follows:

  1. Create the benzene molecule and derivator object

import buildamol as bam
bam.load_small_molecules()
benzene = bam.molecule("benzene")

# Create the derivator object
D = bam.Derivator(benzene)
  1. Specify the possible changes that can be made to the molecule

Let’s say that C1 can be either a carbon, a nitrogen, or an oxygen. And C5 can be either a carbon or a nitrogen. Also, we are interested in possibly adding a functional group to C3. Namely, we want to check with a hydroxyl group and a phosphate group.

# set the possible elements for C1 and C5
D.element_changable(
    "C1",
    ("C", "N", "O"),
    )
D.element_changable(
    "C5",
    ("C", "N"),
    )

# set the possible functional groups for C3
# (the groups are functions that modify the molecule in place)
D.functional_group_addable(
    "C3",
    (D.nothing, bam.hydroxylate, bam.phosphorylate),
    )
  1. Generate derivatives

Since we only have a small number of changes, we can generate all possible derivatives using the all method.

# generate all possible derivatives
derivatives = list(D.all())
  1. Visualize the derivatives

Now we can visualize the derivatives using the draw2d method.

import matplotlib.pyplot as plt

# compute the layout for the plots (looks nicer like this)
rows = np.sqrt(len(derivatives))
cols = np.ceil(len(derivatives) / rows)
rows = np.round(rows)

fig, axs = plt.subplots(int(rows), int(cols), figsize=(12, 12))
for i, molecule in enumerate(derivatives):
    molecule.squash()
    try:
        img = molecule.draw2d().draw()
        axs.flat[i].imshow(img)
    except Exception as e:
        print(e)
for ax in axs.flat:
    ax.axis("off")

plt.show()
_images/derivator_example1.png
class buildamol.extensions.molecular_factories.derivator.Derivator(molecule: Molecule)[source]#

Bases: object

A class to generate derivatives of a molecule.

Parameters:

molecule (Molecule) – The molecule to generate derivatives of

property N: int#

The number of possible derivatives

all() Generator[Molecule][source]#

Generate all possible derivatives

Yields:

Molecule – The generated derivative molecule

bond_order_changable(bond, bond_orders: tuple, probabilities: tuple = None)[source]#

Specify a bond to be derivable to a set of possible bond orders

Parameters:
  • bond (Bond) – The bond

  • bond_orders (set) – The set of possible bond orders

  • probabilities (tuple) – The probabilities of each bond order being used

element_changable(atom, elements: tuple, probabilities: tuple = None)[source]#

Specify an atom to be derivable to a set of possible elements

Parameters:
  • atom (Atom) – The atom

  • elements (set) – The set of possible elements

  • probabilities (tuple) – The probabilities of each element being used

functional_group_addable(atom, group_modifiers: tuple, probabilities: tuple = None)[source]#

Specify an atom onto which a functional group can be added

Parameters:
  • atom (Atom) – The atom

  • group_modifiers (tuple) – The set of possible functional groups that can be added onto the atom. Each entry must be a function that will modify the molecule in place, taking only the molecule and target atom as arguments!

  • probabilities (tuple) – The probabilities of each functional group being added

global_modifiers(modifiers: tuple, probabilities: tuple = None)[source]#

Specify a set of possible modifications to the molecule as a whole.

Parameters:
  • modifiers (tuple) – The set of possible modifications that can be applied to the molecule. Each entry must be a function that will modify the molecule in place, taking only the molecule as an argument!

  • probabilities (tuple) – The probabilities of each modification being applied

make(elements: tuple, bonds: tuple, groups: tuple, modifiers: tuple) Molecule[source]#

Make a specific derivative molecule. Provide a tuple or other ordered iterable of the same length as the number of atoms/bonds specified during setup for each editable category. Each element must be an integer denoting the index of the desired derivative application.

Parameters:
  • elements (tuple) – The elements to apply.

  • bonds (tuple) – The bond orders to apply.

  • groups (tuple) – The group modifiers to apply.

  • modifiers (tuple) – The molecule modifiers to apply.

Returns:

The generated derivative molecule

Return type:

Molecule

nothing(*args)[source]#

A function to do nothing (can be passed as a functional group modifier)

sample(n: int = 1, use_probabilities: bool = True, apply_all_modifiers: bool = False) Generator[Molecule][source]#

Create a derivative molecule

Parameters:
  • n (int) – The number of derivatives to generate

  • use_probabilities (bool) – Whether to use the probabilities when generating derivatives

  • apply_all_modifiers (bool) – If True all modifiers are applied to the sampled molecule(s). Otherwise one modifier is sampled and applied to each molecule.

Yields:

Molecule – The generated derivative molecule