Source code for buildamol.core.Molecule

"""
The `Molecule` class is a wrapper around a biopython structure and a core part 
of BuildAMol functionality. It provides a convenient interface to molecular structures
and their properties, such as atoms, bonds, residues, chains, etc. 

.. note:: 

    To help with identifying individual atoms, residues, etc. BuildAMol uses a different identification scheme than biopython does.
    Therefore BuildAMol comes with its own child classes of the biopython classes
    that are used to represent the structure. These classes are called `Atom`, `Residue`, `Chain`, etc.
    and can be used as drop-in replacements for the biopython classes and should not break any existing code.
    However, in case any incompatibility is observed anyway, the classes are equipped with a `to_biopython` method
    that will remove the BuildAMol overhead and return pure biopython objects (this is not an in-place operation, however,
    and will return a new object). 


Making Molecules
================

The easiest way to make a new molecule is to use the toplevel `molecule` function, which will automatically
try to detect the type of user provided input and generate a molecule from it. Currently supported inputs are:
- A biopython structure
- An RDKit molecule
- An OpenBabel molecule
- An OpenMM topology
- An STK molecule
- A PDB id
- A PDB file
- A CIF file
- A MOL file
- A PDBQT file
- A JSON file
- An XML file
- An XYZ file
- A SMILES string
- An InChI string
- An IUPAC name or abbreviation, or any name that matches a known compound synonym that is associated with the PubChem database

.. code-block:: python

    from buildamol import molecule

    my_glucose = molecule("GLC") # use the PDB id

    # or
    my_glucose = molecule("GLC.pdb") # use the PDB file

    # or
    my_glucose = molecule("alpha-d-glucose") # use the name

    # ...

Since the `molecule` function is a try-and-error function it is convenient but not the most efficient. Hence, the `Molecule` class
offers already a number of convenient methods to easily generate molecules directly from specified data sources. Available methods are:

- `Molecule.from_pdb` to generate a molecule from a PDB file
- `Molecule.from_cif` to generate a molecule from a CIF file
- `Molecule.from_smiles` to generate a molecule from a SMILES string
- `Molecule.from_pubchem` to generate a molecule from a PubChem entry
- `Molecule.from_compound` to generate a molecule from a PDBECompounds entry
- `Molecule.from_rdkit` to generate a molecule from an RDKit molecule object
- `Molecule.from_openmm` to generate a molecule from an OpenMM topology object
- `Molecule.from_stk` to generate a molecule from an STK molecule object
- `Molecule.from_molfile` to generate a molecule from a MOL file
- `Molecule.from_json` to generate a molecule from a JSON file
- `Molecule.from_xml` to generate a molecule from an XML file
- `Molecule.from_pdbqt` to generate a molecule from a PDBQT file
- `Molecule.from_pybel` to generate a molecule from an OpenBabel molecule object
- `Molecule.from_xyz` to generate a molecule from an XYZ file
- `Molecule.empty` to generate an empty molecule (contains a model and chain)
- `Molecule.new` to generate an empty molecule (contains a model, chain and residue)


Hence, if we know that "glucose" is already available in our local PDBECompounds database, we can generate the molecule also as follows:

.. code-block:: python

    from buildamol import Molecule

    my_glucose = Molecule.from_compound("GLC") # use the PDB id


The quickest way to query the local PDBECompounds database is to use the `PDB Id` of the desired compounds.
However, the `from_compound` accepts other inputs as well. The database is queried using the `by` parameter, which can be one of the following:
- "id" for the PDB id (default)
- "name" for the name of the compound (must match any known synonym of the iupac name)
- "formula" for the chemical formula (usually ambiguous and will therefore often raise an error)
- "smiles" for the SMILES string (also accepts InChI)

.. code-block:: python

    # create a new glucose molecule
    glc = Molecule.from_compound("alpha-d-glucose", by="name") # use the name


Saving Molecules
----------------
If a molecule is created from a large PDB file and has undergone a lot of preprocessing, it may be useful to `save` the molecule to a pickle file
which can be loaded again later to avoid repeated preprocessing. This can be done using the `save` and `load` methods.

.. code-block:: python

    # save the molecule to a pickle file
    my_molecule.save("my_molecule.pkl")

    # load the molecule from the pickle file
    my_molecule = Molecule.load("my_molecule.pkl")

    

Modifying Molecules
===================

Once a molecule is created, it can be modified in a number of ways. The most common modifications are
- adding bonds (because when loading from a PDB file, bonds are not inferred automatically!)
- adding additional residues and atoms
- removing residues and atoms
- adjusting labelling (e.g. changing the chain names and residue seqids)

Adding bonds
------------

To add a bond between two atoms, we use the `add_bond` method, which expects two arguments for the two connected atoms.
The atoms can be specified by their `full_id` tuple, their `id` string, their `serial_number` (always starting at 1) or directly (the `biopython.Atom` object).

.. code-block:: python

    glc = Molecule.from_compound("GLC")

    # add a bond between the first and second atom
    glc.add_bond(1, 2)

    # and also add a bond between "O1" and "HO1" atoms
    glc.add_bond("O1", "HO1")


Already for small molecules such as glucose with only 24 atoms, it would be very tedious to add all bonds manually. Good thing that the molecules
created using `from_compound` or `from_pdb` already contain all the default bonds!

However, in case the bonds are missing, or the PDB file did not specify any to begin with, the `Molecule` class offers two methods: `apply_standard_bonds` and `infer_bonds`. 
The former uses reference connectivity information from the PDBECompounds database or CHARMMTopology to add all bonds that are known for the compound (if it exists in the database).
The latter will use a simple distance-based approach to infer bonds between atoms that are closer than a specified threshold (default: 1.6 Å), which can be restricted
further to a min-max window.

.. code-block:: python

    # add all standard bonds for Glucose
    glc.apply_standard_bonds()

    # add all bonds that are closer than 1.6 Å
    glc.infer_bonds(bond_length=1.6)

    # add all bonds that are closer than 1.6 Å, but not closer than 1.0 Å
    glc.infer_bonds(bond_length=(1.0, 1.6))


.. note::

    By default `infer_bonds` will not attempt to add bonds between atoms that belong to different residues.
    This is because in cases of suboptimal conformations or in very large structures atoms that are close in space may not be connected in the structure.
    To override this behaviour, set the `restrict_residues` parameter to `False`.

    .. code-block:: python

        # add all bonds that are closer than 1.6 Å, even if they belong to different residues
        glc.infer_bonds(bond_length=1.6, restrict_residues=False)

        
Residue Connections
-------------------

Alternatively, instead of `infer_bonds` one may use `infer_residue_connections` to get bonds that connect different residues.
This method will infer all bonds between atoms from different residues based on the distance between them. The inferred bonds are
saved in the `Molecule` and also returned in a `list`. If the optional argument `triplet` is set to `True`, the methodd will also
return the bonds immediately adjacent to the inferred bonds.

Take the following example of a molecule with two residues `A` and `B` that are connected by a bond between `OA` and `(1)CB`:

.. code-block:: text    

     connection -->  OA     OB --- H
                    /  \\   /
     (1)CA --- (2)CA   (1)CB 
        /         \\        \\
    (6)CA         (3)CA    (2)CB --- (3)CB
        \\         /
     (5)CA --- (4)CA
 
     
If `triplet=False` the method will only return the bond between `OA` and `(1)CB`. However, if `triplet=True` it will also return the bond
between `(2)CA` and `OA` - thus forming a triplet of atoms `(2)CA`, `OA` and `(1)CB` that connect the two residues A and B.

.. code-block:: python

    # infer all bonds between atoms from different residues
    >>> glc.infer_residue_connections(triplet=False)
    [(OA, (1)CB)]
    >>> glc.infer_residue_connections(triplet=True)
    [(OA, (1)CB), ((2)CA, OA)]
    


Adding residues and atoms
-------------------------

To add one or more new residue(s), we use the `add_residues` method, which expects a number `buildamol.Residue` objects as unnamed arguments.
Similarly, to add one or more new atom(s), we use the `add_atoms` method, which expects a number of `buildamol.Atom` objects as unnamed arguments.
Both methods allow to specify the parent object (`chain` or `residue`) via an optional argument and will automatically choose the last
chain or residue if none is specified.

.. code-block:: python

    from buildamol import Residue, Atom

    new_residue = Residue("XYZ", 1, " ")
    
    # do things with the residue here
    # ...

    # add the residue to the molecule
    # (add it to the last chain, whichever that may be)
    glc.add_residues(new_residue)

    new_atom = Atom("X", [0, 0, 0])

    # add the atom to the first residue in the `glc` molecule
    glc.add_atoms(new_atom, residue=1)

Removing residues and atoms
---------------------------

In order to remove residues or atoms or bonds, we can use the `remove_residues`, `remove_atoms` and `remove_bond`(yes, singluar!) methods.
They work exactly like their `add_` counterparts, but instead of adding, they remove the specified objects.

.. code-block:: python

    # remove the first residue
    glc.remove_residues(1)

    # remove the first atom
    glc.remove_atoms(1)

    # remove the bond between the first and second atom
    glc.remove_bond(1, 2)


Adjusting labelling
-------------------

Single-residue molecules that were loaded from a PDB file may not use the same atom labelling as the `PDBE` and `CHARMM` databases.
In order to quickly adjust the labelling, a method `autolabel` exists. `autolabel` uses the atom connectivity and a rule-based algorithm to infer 
the most likely atom labels. However, since this method is not perfect, it is recommended to check the labelling afterwards and adjust it manually if necessary.

If working with large molecules that follow another labelling scheme it may be more efficient to simply
define your own linkage recipies or patches (see the documentation of `linkages`) that use the the appropriate labelling scheme.

.. code-block:: python

    # load a molecule from a PDB file
    glc = Molecule.from_pdb("glucose.pdb")

    # adjust the labelling
    glc.autolabel()

    # save the molecule to a new PDB file
    glc.to_pdb("glucose_autolabelled.pdb")


Another common operation is the adjustment of chain names and residue seqids. This can be done using the `reindex` method.
This method accepts three starting values for the chain name, residue seqid and atom serial number and will then reindex all chains, residues and atoms
to ensure they are continuously numbered and labelled. Some internal methods used when connecting different molecules are reliant on a continuous
numbering scheme, so this method should be called before connecting molecules that were loaded from PDB files.

.. code-block:: python

    # load a molecule from a PDB file
    glc = Molecule.from_pdb("glucose.pdb")

    # reindex the molecule
    glc.reindex()

We can also use one molecule as a "reference" for reindexing another molecule to make sure there are now labelling conflicts between them in case we want to connect them together later (this is usually done internally by BuildAMol automatically).

.. code-block:: python  

    # load a molecule from a PDB file
    glc = Molecule.from_pdb("glucose.pdb")

    # load another molecule from a PDB file
    cel = Molecule.from_pdb("cellulose.pdb")
    cel.reindex() # make sure we have a continuous numbering scheme

    # reindex the glucose molecule using the cellulose molecule as a reference
    cel.adjust_indexing(glc)



Connecting Molecules
====================

Since most modifications are not simply single residues but rather complex structures, the second main purpose of a `Molecule` is to be easily 
connected to other Molecules to form a larger structure. To this end, the `Molecule` class provides a number of methods to easily assemble complex structures from small single residue molecules.


Forming Polymers
----------------

The simplest way to generate a large structure is probably the `repeat` method, which will repeat the given molecule `n` times
to form a homo-polymer.

.. code-block:: python

    # create a glucose molecule
    glc = Molecule.from_compound("GLC")

    # create cellulose from glucose
    # using a 1-4 beta-beta glycosidic linkage
    # which is pre-defined in the default CHARMMTopology
    glc.repeat(10, "14bb")

    # Now we have a cellulose of 10 glucoses

In the above example we used the `repeat` method explicitly, but we could also achieve the same with the short-hand `*=`. For this to work, we need to specify 
the linkage type beforehand. We do this by setting the `patch` attribute before using any operator.

.. code-block:: python

    # specify the "default" linkage type for connecting 
    # other molecules to this glucose
    glc.linkage = "14bb"

    # now make a cellulose by multiplying glucoses
    glc *= 20

    # Now we have a cellulose of 20 glucoses

    
If we wish to keep `glc` as a single residue Glucose and still get our desired cellulose, 
we can set `inplace=False` when calling `repeat` or simply use the `*` operator, both of which 
will have the same effect of creating a new copy that houses the appropriate residues.

.. code-block:: python

    cel = glc.repeat(10, "14bb", inplace=False)

    # or (equivalently)
    glc.patch = "14bb"
    cel = glc * 10


    
Connecting different Molecules
------------------------------

What if we want to connect different molecules? For example, we may want to connect a Galactose to a Glucose to form Lactose.
This can be achieved using the `attach` method, which will attach a given molecule to to another molecule.

.. code-block:: python

    glc = Molecule.from_compound("GLC")
    gal = Molecule.from_compound("GAL")

    # attach the galactose to the glucose
    # (we want a copy, so we set inplace=False just like with 'repeat')
    lac = glc.attach(gal, "14bb", inplace=False)

    # Now we have a lactose molecule


In the above example, the `attach` method is used to attach the galactose molecule to the glucose, but for those
among us who prefer a more shorty syntax, the `+` operator will do the same thing. 

.. code-block:: python

    # specify that incoming molecules shall be
    # attached using a 1-4 beta linkage
    glc.linkage = "14bb"

    # now attach the galactose to the glucose
    lac = glc + gal


Of course, if there is a `+` operator there should also be a `+=` operator, which is simply the equivalent of `attach` with `inplace=True`.

.. code-block:: python

    glc.linkage = "14bb"
    glc += gal


    # Now 'glc' is a lactose molecule


Setting default Modifiers
-------------------------

So far, we have always worked with a 1-4 beta-beta glycosidic linkage, which we apparently could select using the string `"14bb"`. 
But what if we want to use a different linkage type? For example, a 1-4 alpha-beta glycosidic linkage?  You of course noticed, 
that `attach` and `repeat` accept an argument `link` which allows you to specify the linkage type, and that if you leave it blank
the default linkage type is used. But how do we set the default linkage type?

Let's first check what linkage types are available by default anyway. Have you noticed an argument named `_topology`
at the end of the `attach` or `repeat` methods? The `topology` refers to the underlying _CHARMM_ topology which hosts the linkage type information.
By default a topology is already loaded in BuildAMol's framework so it is not necessary for the user to specify anything here, but we can check which linkage types are available by:

.. code-block:: python

    import buildamol as bam

    # get the default topology
    topology = bam.get_default_topology()

    print(topology.patches)


Any of these linkages can be referenced by their name, e.g. `"14bb"` or `"14ab"`. 

Wait a second, my desired linkage is not in the list! What now?! Well, you can always define a new `Linkage` to suit your needs.
Check out the documentation on :ref:`linkages` for more information on how to do this. If you have your desired linkage 
ready to go, set it as the default by:

.. code-block:: python

    my_molecule.linkage = my_linkage
    
    # or if you feel "super correct"
    my_molecule.set_linkage(my_linkage)

    # or if you feel "extra cocky"
    my_molecule % my_linkage # <- the modulo operator assignes the "modifier" to the molecule


Now any call to `attach`, `repeat`, or any of its operator proxies will use your defined linkage by default.

Setting the default Residue for attachment
------------------------------------------

When defining a `Linkage` we specify which atoms are supposed to be connected and removed, but we do not specify which residues
these atoms belong to. We specify this as arguments inside the `attach` method for instance, but we can also leave this blank, in which case the
last residue in the molecule is used by default. This is obviously not always what we want, however! Hence, if we do not want to specify the residue for attachment
at every `attach` call or if we want to use the `+` operator, we can set the default residue for attachment by setting the `attach_residue` attribute:

.. code-block:: python

    # set the default attachment residue to the first residue
    my_molecule.attach_residue = 1

    # or
    my_molecule.set_attach_residue(1)

    # or (if you feel "extra cocky")
    my_molecule @ 1 # <- the 'at' operator sets the attachment residue


    # serial number indexing also works in reverse
    # (set the second last residue as the default attachment residue)
    my_molecule.attach_residue = -2  
"""

from copy import deepcopy
import os
from typing import Union

import numpy as np
import Bio.PDB as bio

import buildamol.core.entity as entity
import buildamol.core.Linkage as Linkage
import buildamol.core.reaction as reaction
import buildamol.utils as utils
import buildamol.structural as structural
import buildamol.resources as resources

__all__ = [
    "Molecule",
    "read_pdb",
    "write_pdb",
    "read_molfile",
    "write_molfile",
    "read_cif",
    "write_cif",
    "read_smiles",
    "molecule",
    "connect",
    "react",
    "polymerize",
    "make_smiles",
    "query_pubchem",
    "phosphorylate",
    "methylate",
    "acetylate",
    "amidate",
    "aminate",
    "hydroxylate",
    "carboxylate",
    "phenolate",
    "benzylate",
    "thiolate",
]



[docs]
def read_pdb(
    filename: str,
    id: str = None,
    multimodel: bool = False,
    model: int = None,
    has_atom_ids: bool = True,
) -> "Molecule":
    """
    Read a PDB file and return a molecule.

    Parameters
    ----------
    filename : str
        The path to the PDB file
    id : str
        The id of the molecule
    multimodel : bool
        Whether to read all models and return a list of molecules. If False only one model is read by default.
        If you desire that all models are read into a single molecule rather than into a list of separate molecules you can use the following tweak: `multimodel=False, model='all'` (this does not work with model=<some list> though...)
    model : int or str
        The model number to read. If None, all models are read. This can be a list or tuple of integers or strings to read multiple models.
    has_atom_ids : bool
        Whether the PDB file contains atom ids. If the file does not, the atom ids can be auto-generated if this is set to false.

    Returns
    -------
    molecule : Molecule or list
        The molecule or a list of molecules if multimodel is True
    """
    if multimodel:
        if model is None:
            models = utils.pdb.find_models(filename)
        elif isinstance(model, (int, str)):
            models = [str(model)]
        elif isinstance(model, (list, tuple, set)):
            models = [str(m) for m in model]
        else:
            raise ValueError(
                "model must be an integer (or string), a list of integers (or strings) or None (to read all models)"
            )
        molecules = []
        for model in models:
            if model.isdigit():
                model = int(model)
            new = Molecule.from_pdb(
                filename, id=id, model=model, has_atom_ids=has_atom_ids
            )
            molecules.append(new)
        return molecules
    return Molecule.from_pdb(filename, id=id, model=model, has_atom_ids=has_atom_ids)




[docs]
def write_pdb(mol: "Molecule", filename: str) -> None:
    """
    Write a molecule to a PDB file.

    Parameters
    ----------
    mol : Molecule
        The molecule to write
    filename : str
        The path to the PDB file
    """
    mol.to_pdb(filename)




[docs]
def read_cif(filename: str, id: str = None) -> "Molecule":
    """
    Read a CIF file and return a molecule.

    Parameters
    ----------
    filename : str
        The path to the CIF file
    id : str
        The id of the molecule

    Returns
    -------
    molecule : Molecule
        The molecule
    """
    return Molecule.from_cif(filename, id=id)




[docs]
def write_cif(mol: "Molecule", filename: str) -> None:
    """
    Write a molecule to a CIF file.

    Parameters
    ----------
    mol : Molecule
        The molecule to write
    filename : str
        The path to the CIF file
    """
    mol.to_cif(filename)




[docs]
def read_molfile(filename: str, id: str = None) -> "Molecule":
    """
    Read a MOL file and return a molecule.

    Parameters
    ----------
    filename : str
        The path to the MOL file
    id : str
        The id of the molecule

    Returns
    -------
    molecule : Molecule
        The molecule
    """
    return Molecule.from_molfile(filename, id=id)




[docs]
def write_molfile(mol: "Molecule", filename: str) -> None:
    """
    Write a molecule to a MOL file.

    Parameters
    ----------
    mol : Molecule
        The molecule to write
    filename : str
        The path to the MOL file
    """
    mol.to_molfile(filename)




[docs]
def read_smiles(smiles: str, id: str = None) -> "Molecule":
    """
    Read a SMILES string and return a molecule.

    Parameters
    ----------
    smiles : str
        The SMILES string

    Returns
    -------
    molecule : Molecule
        The molecule
    """
    return Molecule.from_smiles(smiles, id=id)




[docs]
def make_smiles(
    mol: "Molecule", isomeric: bool = True, write_hydrogens: bool = False
) -> str:
    """
    Generate a SMILES string from a molecule.

    Parameters
    ----------
    mol : Molecule
        The molecule
    isomeric : bool
        Whether to include stereochemistry information
    write_hydrogens : bool
        Whether to include hydrogens in the SMILES string

    Returns
    -------
    smiles : str
        The SMILES string
    """
    return mol.to_smiles(isomeric, write_hydrogens)




[docs]
def query_pubchem(query: str, by: str = "name") -> "Molecule":
    """
    Query the PubChem database for a given
    query string to obtain a Molecule object.

    Parameters
    ----------
    query : str
        The query string
    by : str
        The type of query to perform. Can be one of:
        he method to search by. This can be any of the following:

        - cid
        - name
        - smiles
        - sdf
        - inchi
        - inchikey
        - formula

    Returns
    -------
    Molecule or None
        The molecule or None if no match was found
    """
    try:
        return Molecule.from_pubchem(query, by=by)
    except:
        return None




[docs]
def molecule(mol=None) -> "Molecule":
    """
    Generate a molecule from an input. If the input is a string, the string can be a PDB id, some filename, SMILES or InChI string, IUPAC name or abbreviation.
    This function will try its best to automatically generate the molecule with minimal user effort. However, using a dedicated classmethod is
    recommended for more efficient and predictable results.

    Parameters
    ----------
    mol : str or structure-like object
        An input string or structure-like object such as a BioPython Structure or RDKit Molecule, etc.
        If nothing is provided, a new empty molecule is generated.

    Returns
    -------
    molecule : Molecule
        The generated molecule

    Examples
    --------
    >>> from buildamol import molecule
    >>> mol = molecule("GLC")
    >>> mol = molecule("GLC.pdb")
    >>> mol = molecule("alpha-d-glucose")
    """
    if mol is None:
        return Molecule.empty()

    if isinstance(mol, bio.Structure.Structure) or isinstance(
        mol, entity.base_classes.Structure
    ):
        return Molecule(mol)
    elif isinstance(mol, Molecule):
        return mol.copy()
    elif "openbabel" in str(type(mol).__mro__[0]):
        return Molecule.from_pybel(mol)
    elif "rdkit" in str(type(mol).__mro__[0]):
        return Molecule.from_rdkit(mol)
    elif (
        "openmm" in str(type(mol).__mro__[0])
        and hasattr(mol, "topology")
        and hasattr(mol, "positions")
    ):
        return Molecule.from_openmm(mol.topology, mol.positions)
    elif "stk" in str(type(mol).__mro__[0]):
        return Molecule.from_stk(mol)

    if not isinstance(mol, str):
        raise ValueError("input must be a structure object or a string")

    # ------------------
    # mol may be a PDB id
    # ------------------
    if resources.has_compound(mol):
        return resources.get_compound(mol)

    if os.path.isfile(mol):
        _mol = mol.lower().strip()
        if _mol.endswith(".pdb"):
            return Molecule.from_pdb(mol)
        elif _mol.endswith(".cif"):
            return Molecule.from_cif(mol)
        elif _mol.endswith(".pkl"):
            return Molecule.load(mol)
        elif _mol.endswith(".json"):
            return Molecule.from_json(mol)
        elif _mol.endswith(".xml"):
            return Molecule.from_xml(mol)
        elif (
            _mol.endswith(".mol")
            or _mol.endswith(".mol2")
            or _mol.endswith(".sdf")
            or _mol.endswith(".sd")
        ):
            return Molecule.from_molfile(mol)
        elif _mol.endswith(".pdbqt"):
            return Molecule.from_pdbqt(mol)
        elif _mol.endswith(".xyz"):
            return Molecule.from_xyz(mol)
        elif _mol.endswith(".smi") or _mol.endswith(".smiles"):
            return Molecule.from_smiles(open(mol).read().strip())

    if " " not in mol:
        try:
            return Molecule.from_smiles(mol)
        except:
            pass

    try:
        return Molecule.from_pubchem(mol)
    except:
        pass

    raise ValueError(f"Could not generate molecule from input: {mol}")




[docs]
def polymerize(
    mol: "Molecule",
    n: int,
    link: Union[str, "Linkage.Linkage"] = None,
    inplace: bool = False,
) -> "Molecule":
    """
    Polymerize a molecule

    Parameters
    ----------
    mol : Molecule
        The molecule to polymerize
    n : int
        The number of monomers to add
    link : str or Linkage
        The linkage to use for polymerization. If None, the default linkage of the molecule is used.
    inplace : bool
        Whether to polymerize the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The polymerized molecule
    """
    if link is None and mol._linkage is None:
        raise ValueError(
            "No patch or recipe provided and no default is set on the molecule"
        )
    return mol.repeat(n, link, inplace=inplace)




[docs]
def connect(
    mol_a: "Molecule",
    mol_b: "Molecule",
    link: Union[str, "Linkage.Linkage"],
    at_residue_a: Union[int, "bio.Residue.Residue"] = None,
    at_residue_b: Union[int, "bio.Residue.Residue"] = None,
    copy_a: bool = True,
    copy_b: bool = True,
    _topology=None,
    use_patch: bool = True,
) -> "Molecule":
    """
    Connect two molecules together

    Parameters
    ----------
    mol_a : Molecule
        The first (target) molecule
    mol_b : Molecule
        The second (source) molecule
    link : Linkage or str
        The linkage to use for connection. This can be either an instance of the Linkage class or a string identifier
        of a pre-defined patch in the (currently loaded default or specified) CHARMMTopology.
    at_residue_a : int or bio.PDB.Residue
        The residue of the first molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
    at_residue_b : int or bio.PDB.Residue
        The residue of the second molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
    copy_a : bool
        Whether to copy the first molecule before connecting
    copy_b : bool
        Whether to copy the second molecule before connecting.
        If False, all atoms of the second molecule will be added to the first molecule.
    _topology : CHARMMTopology
        A specific topology to use in case a pre-existing patch is used as link and only the string identifier
        is supplied.
    use_patch : bool
        If the linkage has internal coordinates available (i.e. is a "patch") these are used by default. Set this to False
        to force-use stitching and its associated conformational optimization instead.

    Returns
    -------
    Molecule
        The connected molecule
    """
    if copy_b:
        mol_b = mol_b.copy()
    new = mol_a.attach(
        mol_b,
        link,
        at_residue=at_residue_a,
        other_residue=at_residue_b,
        inplace=not copy_a,
        _topology=_topology,
        use_patch=use_patch,
    )
    return new




[docs]
def react(
    mol_a: "Molecule",
    mol_b: "Molecule",
    egroup: "FunctionalGroup" = None,
    ngroup: "FunctionalGroup" = None,
    a_is_electrophile: bool = True,
    at_residue_a: Union[int, "bio.Residue.Residue"] = None,
    at_residue_b: Union[int, "bio.Residue.Residue"] = None,
    reaction: "Reaction" = None,
    copy_a: bool = True,
    copy_b: bool = True,
) -> "Molecule":
    """
    Connect two molecules together by imitating a chemical reaction based on functional groups.

    Parameters
    ----------
    mol_a : Molecule
        The first (target) molecule
    mol_b : Molecule
        The second (source) molecule
    egroup : FunctionalGroup
        The functional group of the first molecule to connect to.
    ngroup : FunctionalGroup
        The functional group of the second molecule to connect to.
    a_is_electrophile : bool
        Whether the first molecule is the electrophile (True) or nucleophile (False)
    at_residue_a : int or bio.PDB.Residue
        The residue of the first molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
    at_residue_b : int or bio.PDB.Residue
        The residue of the second molecule to connect to. If an integer is provided, the seqid must be used, starting at 1.
    reaction : Reaction
        A specific Reaction instance to use **instead** of functional groups.
        If given the other functional group and residue arguments are ignored.
    copy_a : bool
        Whether to copy the first molecule before connecting
    copy_b : bool
        Whether to copy the second molecule before connecting.
        If False, all atoms of the second molecule will be added to the first molecule.

    Returns
    -------
    Molecule
        The connected molecule
    """
    if reaction is not None:
        if not "Reaction" in str(type(reaction).__mro__[0]):
            raise ValueError(
                f"reaction must be an instance of the Reaction class, got {type(reaction)}"
            )
        if not reaction.can_apply(mol_a, mol_b):
            raise ValueError(
                "The provided reaction cannot be applied to the given molecules"
            )
        if copy_a:
            mol_a = mol_a.copy()
        if copy_b:
            mol_b = mol_b.copy()
        return reaction.apply(mol_a, mol_b, inplace=True)

    new = mol_a.react_with(
        mol_b,
        egroup,
        ngroup,
        a_is_electrophile,
        at_residue_a,
        at_residue_b,
        not copy_a,
        not copy_b,
    )
    return new



def _modify(
    mol: "Molecule",
    modifier: "Molecule",
    at_atom,
    delete,
    modifier_at_atom: "str",
    modifier_deletes: "List[str]",
    as_new_residue: bool = True,
    inplace: bool = True,
):
    """
    The core function of all modification functions. This function is used to modify a molecule by attaching a modifier (e.g. a functional group) molecule to it.

    Parameters
    ----------
    mol : Molecule
        The molecule to modify
    modifier : Molecule
        The modifier molecule
    at_atom
        The atom of the molecule to modify. This can be any identifier that will allow to get an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete
        The atom to delete in the molecule. This can be any identifier that will allow to get an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well. If a list is provided as at_atom, use either None to specify no deleters
        or a list of equal length to specify the deleters for each atom in at_atom. None is allowed for individual entries in the list but the list
        needs the same length as at_atom.
    modifier_at_atom : str
        The atom of the modifier molecule to attach to the molecule
    modifier_deletes : List[str]
        The atoms to delete in the modifier molecule. The first entry needs to be a direct neighbor of the modifier_at_atom.
    as_new_residue : bool
        Whether to attach the modifier as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to modify the molecule in place or return a new molecule
    """
    if isinstance(at_atom, (list, tuple)):
        if delete is None:
            delete = (None for i in at_atom)
        elif isinstance(delete, (list, tuple)) and len(delete) != len(at_atom):
            raise ValueError(
                f"delete must be a list of same length if at_atom is a list. at_atom has length {len(at_atom)} but delete has length {len(delete)}"
            )
        elif not isinstance(delete, (list, tuple)):
            raise ValueError(
                f"delete must be a list if at_atom is a list, got {type(delete)} instead"
            )

        if not inplace:
            mol = mol.copy()
        for a, d in zip(at_atom, delete):
            _modify(
                mol,
                modifier,
                a,
                d,
                modifier_at_atom,
                modifier_deletes,
                as_new_residue=as_new_residue,
                inplace=True,
            )
        return mol

    at_atom = mol.get_atom(at_atom)
    if at_atom is None:
        raise ValueError(f"Could not find at_atom in molecule: {at_atom}")
    at_residue = at_atom.get_parent()
    if delete is not None:
        delete = mol.get_atom(delete)
    else:
        delete = mol.get_hydrogen(at_atom)
        if delete is None:
            mol.add_hydrogens(at_atom)
            delete = mol.get_hydrogen(at_atom)
            if delete is None:
                raise ValueError(
                    f"No hydrogen found/inferred on atom {at_atom.id} ({at_atom.serial_number}) to delete during modification and no delete atom was provided."
                )

    modifier_at_atom = modifier.get_atom(modifier_at_atom)

    dist = structural.single_bond_lengths[modifier_at_atom.element].get(
        at_atom.element, None
    )
    if dist:
        mol.adjust_bond_length(at_atom, delete, dist)
        modifier.adjust_bond_length(modifier_at_atom, modifier_deletes[0], dist)

    l = Linkage.linkage(
        at_atom.id, modifier_at_atom, [delete], modifier_deletes, id="MODIFY"
    )
    mol = mol.attach(modifier, l, at_residue=at_residue, inplace=inplace)

    if not as_new_residue:
        modifier_residue = mol.get_residue(-1)  # the last residue is the modifier
        names_already_taken = set(atom.id for atom in at_residue.get_atoms())
        for atom in list(modifier_residue.get_atoms()):
            # just some simple renaming scheme
            if atom.id in names_already_taken:
                element_count = sum(
                    1 for a in at_residue.get_atoms() if a.element == atom.element
                )
                new_name = f"{atom.element}{element_count + 1}"
                atom.id = new_name
            mol.set_parent(atom, at_residue)
        mol.remove_residues(modifier_residue)
    return mol



[docs]
def phosphorylate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Phosphorylate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to phosphorylate
    at_atom : int or str or Atom
        The atom to phosphorylate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to phosphorylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the phosphate as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to phosphorylate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The phosphorylated molecule
    """
    resources.load_small_molecules()
    phos = Molecule.from_compound("PO4").add_hydrogens()
    return _modify(
        mol,
        phos,
        at_atom,
        delete,
        "O2",
        ["HO2"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def methylate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Methylate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to methylate
    at_atom : int or str or Atom
        The atom to methylate.This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
       The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to methylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the methyl group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to methylate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The methylated molecule
    """
    resources.load_small_molecules()
    methyl = Molecule.from_compound("CH3")
    return _modify(
        mol,
        methyl,
        at_atom,
        delete,
        "C",
        ["HC1"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def acetylate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Acetylate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to acetylate
    at_atom : int or str or Atom
        The atom to acetylate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to acetylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the acetyl group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to acetylate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The acetylated molecule
    """
    resources.load_small_molecules()
    acetyl = Molecule.from_compound("ACE")
    return _modify(
        mol,
        acetyl,
        at_atom,
        delete,
        "C",
        ["H"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def hydroxylate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Hydroxylate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to hydroxylate
    at_atom : int or str or Atom
        The atom to hydroxylate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to hydroxylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the hydroxyl group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to hydroxylate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The hydroxylated molecule
    """
    resources.load_small_molecules()
    hydroxyl = Molecule.from_compound("HOH")
    hydroxyl.rename_residue("HOH", "OH").rename_atom("H2", "HO")
    return _modify(
        mol,
        hydroxyl,
        at_atom,
        delete,
        "O",
        ["H1"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def aminate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Aminate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to amidate
    at_atom : int or str or Atom
        The atom to amidate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to amidate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the amine group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to amidate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The aminated molecule
    """
    resources.load_small_molecules()
    amide = Molecule.from_compound("NH3")
    return _modify(
        mol,
        amide,
        at_atom,
        delete,
        "N",
        ["HN1"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def amidate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Amidate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to amidate
    at_atom : int or str or Atom
        The atom to amidate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to amidate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the amide group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to amidate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The amidated molecule
    """
    resources.load_small_molecules()
    amide = Molecule.from_compound("ARF")
    return _modify(
        mol,
        amide,
        at_atom,
        delete,
        "C",
        ["H"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def carboxylate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Carboxylate a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to carboxylate
    at_atom : int or str or Atom
        The atom to carboxylate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to carboxylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the carboxyl group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to carboxylate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The carboxylated molecule
    """
    resources.load_small_molecules()
    carboxyl = Molecule.from_compound("CBX")
    carboxyl.rename_atom("O1", "O").rename_atom("O2", "OXT").rename_atom("HO2", "HXT")
    return _modify(
        mol,
        carboxyl,
        at_atom,
        delete,
        "C",
        ["H"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def benzylate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Add a benzyl group to a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to benzylate
    at_atom : int or str or Atom
        The atom to benzylate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to benzylate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the benzyl group as a new residue or merge it into the same residue
        as at_atom.
    inplace : bool
        Whether to benzylate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The benzylated molecule
    """
    resources.load_small_molecules()
    benzyl = Molecule.from_compound("BNZ")
    return _modify(
        mol,
        benzyl,
        at_atom,
        delete,
        "C1",
        ["H1"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def phenolate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    how: str = "para",
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Add a phenol group to a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to phenolate
    at_atom : int or str or Atom
        The atom to phenolate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to phenolate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    how: str
        The position of the hydroxyl group on the phenol. Can be one of "ortho", "meta", or "para".
    as_new_residue : bool
        Whether to attach the phenol group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to phenolate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The phenolated molecule
    """
    resources.load_small_molecules()
    phenol = Molecule.from_compound("MBN")
    if how == "ortho":
        a, d = "C2", "H2"
    elif how == "meta":
        a, d = "C3", "H3"
    elif how == "para":
        a, d = "C4", "H4"
    else:
        raise ValueError(
            "Invalid value for 'how'. Must be one of 'ortho', 'meta', or 'para'"
        )
    return _modify(
        mol,
        phenol,
        at_atom,
        delete,
        a,
        [d],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
def thiolate(
    mol: "Molecule",
    at_atom: Union[int, str, entity.base_classes.Atom],
    delete: Union[int, str, entity.base_classes.Atom] = None,
    as_new_residue: bool = True,
    inplace: bool = True,
) -> "Molecule":
    """
    Add a thiol group to a molecule at one or more specific atoms

    Parameters
    ----------
    mol : Molecule
        The molecule to thiolate
    at_atom : int or str or Atom
        The atom to thiolate. This can be any input that will allow to obtain an Atom object from the molecule.
        Alternatively, a list of such inputs can be provided as well.
    delete : int or str or Atom
        The atom to delete. This can be any input that will allow to obtain an Atom object from the molecule.
        This atom needs to be in the same residue as the atom to thiolate. If not provided, any Hydrogen atom attached to the at_atom will be deleted.
        If at_atom is a list, delete can be a list of the same length or None.
    as_new_residue : bool
        Whether to attach the thiol group as a new residue or merge it into the same residue as at_atom.
    inplace : bool
        Whether to thiolate the molecule in place or return a new molecule

    Returns
    -------
    Molecule
        The thiolated molecule
    """
    resources.load_small_molecules()
    thiol = Molecule.from_compound("HOH")
    o = thiol.get_atom("O")
    o.set_element("S")
    o.id = "S"
    return _modify(
        mol,
        thiol,
        at_atom,
        delete,
        "S",
        ["H1"],
        as_new_residue=as_new_residue,
        inplace=inplace,
    )




[docs]
class Molecule(entity.BaseEntity):
    """
    A molecule to add onto a scaffold.
    A molecule consists of a single chain.

    Parameters
    ----------
    structure : bio.PDB.Structure
        A biopython structure object
    root_atom : str or int or Atom
        The id or the serial number of the root atom
        at which the molecule would be attached to a another
        structure such as protein scaffold or another Molecule.
    model : int
        The model to use from the structure. Defaults to 0. This may be any
        valid identifier for a model in the structure, such as an integer or string.
    chain : str
        The chain to use from the structure. Defaults to the first chain in the structure.
    """

    def __init__(
        self,
        structure,
        root_atom: Union[str, int, entity.base_classes.Atom] = None,
        model: int = 0,
        chain: str = None,
    ):
        super().__init__(structure, model)

        if not chain or len(self._model.child_list) == 1:
            self._working_chain = self._model.child_list[0]
        else:
            self._working_chain = self._model.child_dict.get(chain)
            if not self._working_chain:
                raise ValueError("The chain {} is not in the structure".format(chain))

        if root_atom:
            self.set_root(root_atom)
            if not root_atom in self._working_chain.get_atoms():
                raise ValueError("The root atom is not in the structure")
        self._root_atom = root_atom


[docs]
    @classmethod
    def empty(cls, id: str = None) -> "Molecule":
        """
        Create an Molecule without any atoms in it. This will have a single Model and Chain by default, however.

        Parameters
        ----------
        id : str
            An id of the Molecule.

        Returns
        -------
        Molecule
            An empty Molecule object
        """
        structure = structural.make_empty_structure(id)
        new = cls(structure)
        return new



[docs]
    @classmethod
    def new(
        self,
        id: str = None,
        resname: str = "UNK",
        atoms: list = None,
        bonds: list = None,
    ) -> "Molecule":
        """
        Create a new Molecule with a single residue

        Parameters
        ----------
        id : str
            The id of the Molecule. By default an id is inferred from the filename.
        resname : str
            The resname of the residue to add
        atoms : list
            A list of Atom objects to add to the residue
        bonds : list
            A list of bonds to add to the residue

        Returns
        -------
        Molecule
            A new Molecule object
        """
        new = self.empty(id)
        new.add_residues(entity.base_classes.Residue(resname))
        if atoms:
            new.add_atoms(atoms)
        if bonds:
            new.set_bonds(bonds)
        return new


    # @classmethod
    # def from_pdb(
    #     cls,
    #     filename: str,
    #     root_atom: Union[str, int] = None,
    #     id: str = None,
    #     model: int = 0,
    #     chain: str = None,
    # ) -> "Molecule":
    #     """
    #     Read a Molecule from a PDB file

    #     Parameters
    #     ----------
    #     filename : str
    #         Path to the PDB file
    #     root_atom : str or int
    #         The id or the serial number of the root atom (optional)
    #     id : str
    #         The id of the Molecule. By default an id is inferred from the filename.
    #     model : int
    #         The model to use from the structure. Defaults to 0. This may be any
    #         valid identifier for a model in the structure, such as an integer or string.
    #     chain : str
    #         The chain to use from the structure. Defaults to the first chain in the structure.

    #     Returns
    #     -------
    #     Molecule
    #         The Molecule object
    #     """
    #     if id is None:
    #         id = utils.filename_to_id(filename)
    #     struct = utils.defaults.__bioPDBParser__.get_structure(id, filename)
    #     new = cls(struct, root_atom, model=model, chain=chain)
    #     bonds = utils.pdb.parse_connect_lines(filename)
    #     if len(bonds) != 0:
    #         for b in bonds:
    #             new.add_bond(*b)
    #     return new


[docs]
    @classmethod
    def from_compound(
        cls,
        compound: str,
        by: str = "id",
        root_atom: Union[str, int] = None,
    ) -> "Molecule":
        """
        Create a Molecule from a reference compound from the PDBECompounds database

        Parameters
        ----------
        compound : str
            The compound to search for
        by : str
            The field to search by. This can be
            - "id" for the PDB id
            - "name" for the name of the compound (must match any known synonym of the iupac name)
            - "formula" for the chemical formula
            - "smiles" for the SMILES string (also accepts InChI)
        root_atom : str or int
            The id or the serial number of the root atom (optional)
        """
        mol = resources.get_default_compounds().get(compound, by=by)
        if mol is None:
            raise ValueError(f"Could not find compound '{compound}' using '{by}'")
        elif isinstance(mol, list):
            raise ValueError(
                f"Multiple compounds found using '{by}={compound}', choose any of these ids specifically {[i.id for i in mol]}"
            )
        if root_atom:
            mol.set_root(root_atom)
        return mol



[docs]
    @classmethod
    def from_smiles(
        cls,
        smiles: str,
        id: str = None,
        root_atom: Union[str, int] = None,
        add_hydrogens: bool = True,
    ) -> "Molecule":
        """
        Read a Molecule from a SMILES string

        Parameters
        ----------
        smiles : str
            The SMILES string
        id : str
            The id of the Molecule. By default the provided smiles string is used.
        root_atom : str or int
            The id or the serial number of the root atom (optional)
        add_hydrogens : bool
            Whether to add hydrogens to the molecule

        Returns
        -------
        Molecule
            The Molecule object
        """
        obj = structural.read_smiles(smiles, add_hydrogens)
        if structural.smiles.use_rdkit:
            new = cls.from_rdkit(obj, id=id)
        elif structural.smiles.use_openbabel:
            obj.title = id
            new = cls.from_pybel(obj)
        if root_atom:
            new.set_root(root_atom)
        if id is not None:
            new.rename_residue(1, id[:3])
        return new



[docs]
    @classmethod
    def from_pubchem(
        cls,
        query: str,
        root_atom: Union[str, int] = None,
        by: str = "name",
        idx: int = 0,
    ) -> "Molecule":
        """
        Create a Molecule from PubChem

        Note
        ----
        PubChem follows a different atom labelling scheme than the CHARMM force field!
        This means that atom names may not match the names required by the default patches that are
        integrated in buildamol. It is advisable to run `autolabel` or `relabel_hydrogens` on the molecule.
        Naturally, custom patches or recipies working with adjusted atom names will always work.

        Parameters
        ----------
        query : str
            The query to search for in the PubChem database
        root_atom : str or int
            The id or the serial number of the root atom (optional)
        by : str
            The method to search by. This can be any of the following:
            - cid
            - name
            - smiles
            - sdf
            - inchi
            - inchikey
            - formula
        idx : int
            The index of the result to use if multiple are found. By default, the first result is used.

        Returns
        -------
        Molecule
            The Molecule object
        """
        _compound_2d, _compound_3d = resources.pubchem.query(query, by=by, idx=idx)
        new = _molecule_from_pubchem(_compound_2d.iupac_name, _compound_3d)
        _new = cls(new.structure)
        _new.add_bonds(*(i.to_tuple() for i in new._bonds))
        new = _new
        new.id = _compound_2d.iupac_name
        if root_atom:
            new.set_root(root_atom)
        return new



[docs]
    @classmethod
    def from_geometry(
        cls,
        geometry: "structural.geometry.Geometry",
        atoms: list,
        id: str = None,
        resname: str = "UNK",
        direction: str = None,
    ):
        """
        Create a new Molecule using a molecular geometry and a list of starting atoms.
        This will place the atoms in the right spacial coordinates and fill up the provided starting atoms with hydrogens to match the geometry's defined number of atoms.

        Parameters
        ----------
        geometry : Geometry
            The geometry object
        atoms : list
            A list of Atom objects
        id : str
            The id of the Molecule.
        resname : str
            The resname of the residue to add
        direction : str
            The direction of the atoms (in case of a geometry that has planar and axial directions).
            This can be either "planar" or "axial".

        Returns
        -------
        Molecule
            The Molecule object
        """
        if len(atoms) > geometry.max_points:
            n = sum(1 for idx, atom in enumerate(atoms) if atom.coord.sum() != 0)
            n = n or 1
            _atoms = atoms[:n]
            geometry.make_and_apply(_atoms, atoms, direction=direction)

        if len(atoms) == geometry.size:
            bonds = [entity.base_classes.Bond(atoms[0], a) for a in atoms[1:]]
        else:
            atoms, bonds = geometry.fill_hydrogens(*atoms, direction=direction)
        new = cls.new(id=id, atoms=atoms, bonds=bonds, resname=resname)
        return new



[docs]
    def to_smiles(self, isomeric: bool = True, write_hydrogens: bool = False) -> str:
        """
        Convert the molecule to a SMILES string

        Parameters
        ----------
        isomeric : bool
            Whether to include stereochemistry information in the SMILES string
        write_hydrogens : bool
            Whether to include hydrogens in the SMILES string

        Returns
        -------
        str
            The SMILES string
        """
        return structural.make_smiles(
            self, isomeric=isomeric, add_hydrogens=write_hydrogens
        )



[docs]
    def get_residue_connections(
        self,
        residue_a=None,
        residue_b=None,
        triplet: bool = True,
        direct_by: str = None,
    ):
        """
        Get bonds between atoms that connect different residues in the structure
        This method is different from `infer_residue_connections` in that it works
        with the already present bonds in the molecule instead of computing new ones.

        Parameters
        ----------
        residue_a, residue_b : Union[int, str, tuple, bio.Residue.Residue]
            The residues to consider. If None, all residues are considered.
            Otherwise, only between the specified residues are considered.
        triplet : bool
            Whether to include bonds between atoms that are in the same residue
            but neighboring a bond that connects different residues. This is useful
            for residues that have a side chain that is connected to the main chain.
            This is mostly useful if you intend to use the returned list for some purpose,
            because the additionally returned bonds are already present in the structure
            from inference or standard-bond applying and therefore do not actually add any
            particular information to the Molecule object itself.
        direct_by : str
            The attribute to sort by. Can be either "serial", "resid" or "root".
            In the case of "serial", the bonds are sorted by the serial number of the first atom.
            In the case of "resid", the bonds are sorted by the residue id of the first atom.
            In the case of "root", the bonds are sorted so that the first atom is graph-closer to the root than the second atom.
            Set to None to not sort the bonds.

        Returns
        -------
        set
            A set of tuples of atom pairs that are bonded and connect different residues
        """
        bonds = super().get_residue_connections(
            residue_a=residue_a, residue_b=residue_b, triplet=triplet
        )
        if direct_by is not None:
            direct_connections = None
            if direct_by == "resid":
                direct_connections = super().get_residue_connections(
                    residue_a=residue_a, residue_b=residue_b, triplet=False
                )
                direct_connections1, direct_connections2 = set(
                    a for a, b in direct_connections
                ), set(b for a, b in direct_connections)
                direct_connections = direct_connections1.union(direct_connections2)
            bonds = self._direct_bonds(bonds, direct_by, direct_connections)
        return bonds



[docs]
    def repeat(self, n: int, link=None, inplace: bool = True):
        """
        Repeat the molecule n times into a homo-polymer.

        Parameters
        ----------
        n : int
            The number or units of the final polymer.
        link : str or Patch or Recipe
            The patch or recipe to use when patching individual units together.
            If noe is given, the default patch or recipe is used (if defined).
        inplace : bool
            If True the molecule is directly modified, otherwise a copy of the molecule is returned.

        Returns
        -------
        molecule
            The modified molecule (either the original object or a copy)
        """
        if not isinstance(n, int):
            raise TypeError("Can only multiply a molecule by an integer")
        if n <= 0:
            raise ValueError("Can only multiply a molecule by a positive integer")
        if not link and not self._linkage:
            raise RuntimeError("Cannot multiply a molecule without a patch defined")

        _other = self.copy()
        if not inplace:
            obj = self.copy()
        else:
            obj = self

        _patch = obj._linkage
        if link:
            obj % link

        for i in range(n - 1):
            obj += _other

        if _patch:
            obj % _patch

        return obj



[docs]
    def attach(
        self,
        other: "Molecule",
        link: Union[str, "Linkage.Linkage"] = None,
        at_residue: Union[int, "entity.base_classes.Residue"] = None,
        other_residue: Union[int, "entity.base_classes.Residue"] = None,
        use_patch: bool = None,
        inplace: bool = True,
        other_inplace: bool = False,
        _topology=None,
    ):
        """
        Attach another structure to this one using a Patch or a Recipe.

        Parameters
        ----------
        other : Molecule
            The other molecule to attach to this one
        link : str or Linkage
            Either a Patch to apply when attaching or a Recipe to use when stitching.
            If None is defined, the default patch or recipe that was set earlier on the molecule is used.
        at_residue : int or Residue
            The residue to attach the other molecule to. If None, the defined `attach_residue` is used.
        other_residue : int or Residue
            The residue in the other molecule to attach this molecule to. If None, the defined `attach_residue` of the other molecule is used.
        use_patch : bool
            If the specified linkage is a patch (has internal coordinates) it can and is by default applied as a patch. However, it can also be used as a recipe.
            Set this to false if you want to use the patch as a recipe.
        inplace : bool
            If True the molecule is directly modified, otherwise a copy of the molecule is returned.
        other_inplace : bool
            All atoms from the other molecule are integrated into this one. Hence, the other molecule is left empty. If False, a copy of the other molecule is used.
            Thus leaving the original molecule intact.
        _topology : Topology
            The topology to use when attaching. If None, the topology of the molecule is used. Only used if the patch is a string.
        """
        if not isinstance(other, Molecule):
            raise TypeError("Can only attach a Molecule to another Molecule")

        if use_patch is None:
            use_patch = utils.auxiliary.USE_IC

        if not inplace:
            obj = self.copy()
        else:
            obj = self

        if not link:
            link = obj._linkage
            if not link:
                raise ValueError("Cannot attach a molecule without a patch defined")

        if isinstance(link, str):
            if not _topology:
                _topology = resources.get_default_topology()
            link = _topology.get_patch(link)

        # check if link can be applied
        other_is_source = link.can_be_source(other, residue=other_residue)
        self_is_target = link.can_be_target(obj, residue=at_residue)

        forward_is_fine = other_is_source and self_is_target

        # check if the link can be applied in reverse
        reverse_is_fine = False
        if not forward_is_fine:
            other_is_target = link.can_be_target(other, residue=other_residue)
            self_is_source = link.can_be_source(obj, residue=at_residue)
            reverse_is_fine = other_is_target and self_is_source

        if not (forward_is_fine or reverse_is_fine):
            raise ValueError(
                "It looks like this link cannot be applied to the given molecules. Please check the link definitions again and make sure anchors and deleters are present."
            )

        # automatically reverse the link if it is not applicable in the current direction
        if not forward_is_fine and reverse_is_fine:
            print(
                "[info] The link in the current direction is not applicable to the molecules but the reverse is. Automatically reversing the link. If this is not intended, please review the linkage definition or molecules (target vs source)."
            )
            if link.has_IC:
                print(
                    "[warning] Reversing a patch with internal coordinates! This will remove the internal coordinates!"
                )
                link = link.copy()
                link.remove_internal_coordinates()
                link.reverse(inplace=True)
            else:
                link = link.reverse(inplace=False)

        if not other_inplace:
            _other = other.copy()
        else:
            _other = other

        if link.has_IC and use_patch:
            obj.patch_attach(
                _other,
                link,
                at_residue=at_residue,
                other_residue=other_residue,
                _topology=_topology,
            )
        else:
            obj.stitch_attach(
                _other,
                link,
                at_residue=at_residue,
                other_residue=other_residue,
            )
        return obj



[docs]
    def patch_attach(
        self,
        other: "Molecule",
        patch: Union["Linkage.Linkage", str] = None,
        at_residue: Union[int, bio.Residue.Residue] = None,
        other_residue: Union[int, bio.Residue.Residue] = None,
        _topology=None,
    ):
        """
        Attach another structure to this one using a Patch.

        Parameters
        ----------
        other : Molecule
            The other molecule to attach to this one
        patch : str or Linkage
            A linkage to apply when attaching. If none is given, the default link that was set earlier
            on the molecule is used. If no patch was set, an AttributeError is raised. If a string is
            given, it is interpreted as the name of a patch in the topology.
        at_residue : int or Residue
            The residue to attach the other molecule to. If None, the last residue of the molecule.
        other_residue : int or Residue
            The residue of the other molecule to attach. If None, the first residue of the other molecule.
        _topology
            A specific topology to use for referencing.
            If None, the default CHARMM topology is used.
        """

        if not patch:
            patch = self._linkage
        if not patch:
            raise AttributeError(
                "No patch was set for this molecule. Either set a default patch or provide a patch when attaching."
            )
        if isinstance(patch, str):
            if not _topology:
                _topology = resources.get_default_topology()
            patch = _topology.get_patch(patch)
        if not patch:
            raise ValueError(
                "No patch was found with the given name. Either set a default patch or provide a patch when attaching."
            )

        p = structural.__default_keep_keep_patcher__
        p.apply(patch, self, other, at_residue, other_residue)
        p.merge()
        return self



[docs]
    def stitch_attach(
        self,
        other: "Molecule",
        recipe: "Linkage.Linkage" = None,
        remove_atoms=None,
        other_remove_atoms=None,
        at_atom=None,
        other_at_atom=None,
        at_residue=None,
        other_residue=None,
    ):
        """
        Stitch two molecules together by removing atoms and connecting them with a bond. This works without a pre-defined patch.

        Parameters
        ----------
        other : Molecule
            The other molecule to attach to this one
        recipe : Recipe
            The recipe to use when stitching. If None, the default recipe that was set earlier on the molecule is used (if defined).
        remove_atoms : list of int
            The atoms to remove from this molecule. Only used if no recipe is provided.
        other_remove_atoms : list of int
            The atoms to remove from the other molecule. Only used if no recipe is provided.
        at_atom : int or str or Bio.PDB.Atom
            The atom forming the bond in this molecule. If a string is provided, an `at_residue` needs to be defined from which to get the atom. If None is provided, the root atom is used (if defined). Only used if no recipe is provided.
        other_at_atom : int or str or Bio.PDB.Atom
            The atom to attach to in the other molecule. If a string is provided, an `other_residue` needs to be defined from which to get the atom. If None is provided, the root atom is used (if defined). Only used if no recipe is provided.
        at_residue : int or Residue
            The residue to attach the other molecule to. If None, the `attach_residue` is used. Only used if a recipe is provided and the atoms
        """
        if not recipe and remove_atoms is None:
            if not self._linkage:
                raise AttributeError(
                    "No recipe was set for this molecule and no manual instructions were found. Either set a default recipe, provide a recipe when stitching, or provide the information about removed and bonded atoms directly."
                )
            recipe = self._linkage

        if recipe:
            target_atom, source_atom = recipe._stitch_ref_atoms

            # for compatibility with older versions
            if not hasattr(recipe, "_automatically_delete_downstream_atoms"):
                recipe._automatically_delete_downstream_atoms = True

            if recipe._automatically_delete_downstream_atoms:
                _, _, remove_atoms, other_remove_atoms = recipe.identify_atoms(
                    self, other, at_residue, other_residue
                )
            else:
                remove_atoms, other_remove_atoms = recipe.deletes
            return self.stitch_attach(
                other,
                remove_atoms=remove_atoms,
                other_remove_atoms=other_remove_atoms,
                at_atom=target_atom,
                other_at_atom=source_atom,
                at_residue=at_residue,
                other_residue=other_residue,
            )

        if not isinstance(other, Molecule):
            raise TypeError("Can only stitch two molecules together")

        if not at_atom:
            at_atom = self._root_atom
            if not at_atom:
                raise ValueError(
                    "No atom to attach to was provided and no root atom was defined in this molecule"
                )

        if not other_at_atom:
            other_at_atom = other._root_atom
            if not other_at_atom:
                raise ValueError(
                    "No atom to attach to was provided and no root atom was defined in the other molecule"
                )

        if not at_residue:
            at_residue = self.attach_residue

        if not other_residue:
            other_residue = other.attach_residue

        # since we are already copying before we can use the keep-keep stitcher, actually...
        p = structural.__default_keep_keep_stitcher__
        p.apply(
            self,
            other,
            remove_atoms,
            other_remove_atoms,
            at_atom,
            other_at_atom,
            at_residue,
            other_residue,
        )
        self = p.merge()
        return self



[docs]
    def react_with(
        self,
        other: "Molecule",
        egroup: "FunctionalGroup",
        ngroup: "FunctionalGroup",
        as_electrophile: bool = True,
        at_residue: Union[int, "entity.base_classes.Residue"] = None,
        other_residue: Union[int, "entity.base_classes.Residue"] = None,
        inplace: bool = True,
        other_inplace: bool = False,
    ) -> "Molecule":
        """
        React this molecule with another molecule using functional groups to
        automatically create a linkage.

        Parameters
        ----------
        other : Molecule
            The other molecule to react with
        egroup : FunctionalGroup
            The electrophilic functional group to use
        ngroup : FunctionalGroup
            The nucleophilic functional group to use
        as_electrophile : bool
            Whether to use this molecule as the electrophile or the nucleophile
        at_residue : int or Residue
            The residue to attach the other molecule to. If None, the last residue of the molecule.
        other_residue : int or Residue
            The residue of the other molecule to attach. If None, the first residue of the other molecule.
        inplace : bool
            If True the molecule is directly modified, otherwise a copy of the molecule is returned.
        other_inplace : bool
            All atoms from the other molecule are integrated into this one. Hence, the other molecule is left empty. If False, a copy of the other molecule is used.
            Thus leaving the original molecule intact.
        Returns
        -------
        molecule
            The modified molecule (either the original object or a copy)
        """
        if not inplace:
            obj = self.copy()
        else:
            obj = self

        if not other_inplace:
            _other = other.copy()
        else:
            _other = other

        _backup_attach_residue = obj.attach_residue
        if at_residue is not None:
            obj.set_attach_residue(at_residue)
        if other_residue is not None:
            _other.set_attach_residue(other_residue)

        if as_electrophile:
            link = Linkage.Linkage.from_functional_groups(obj, egroup, _other, ngroup)
        else:
            link = Linkage.Linkage.from_functional_groups(_other, egroup, obj, ngroup)

        obj.stitch_attach(_other, link)
        obj.set_attach_residue(_backup_attach_residue)
        return obj



[docs]
    def optimize(
        self,
        residue_graph: bool = None,
        algorithm: str = None,
        rotatron: str = None,
        rotatron_kws: dict = None,
        algorithm_kws: dict = None,
        inplace: bool = True,
    ):
        """
        Optimize the molecule's conformation. This is a convenience method with less
        customizability than a manual optimization using the `optimizers` module.

        Parameters
        ----------
        residue_graph : bool
            Whether to use the residue graph or the full atom graph for optimization.
            The residue graph is faster but less accurate. If the molecule is larger than 100 atoms,
            the residue graph is used by default.
        algorithm : str
            The optimization algorithm to use. If not provided, an algorithm is determined based on the molecule's size.
            This can be one of the following:
            - "genetic" for a genetic algorithm
            - "scipy" for a scipy-implemented gradient-based optimization
            - "swarm" for a particle swarm optimization
            - "anneal" for a simulated annealing optimization
            - "rdkit" for an RDKit-implemented force-field-based optimization (defaults to MMFF, if RDKit is installed)
            - "mmff" for an RDKit-implemented MMFF94 force-field-based optimization (if RDKit is installed)
            - "uff" for an RDKit-implemented UFF force-field-based optimization (if RDKit is installed)
        rotatron : str
            The rotatron to use. This can be one of the following:
            - "distance" for a distance-based rotatron (default)
            - "overlap" for an overlap-based rotatron
            - "forcefield" for a force-field-based rotatron
        algorithm_kws : dict
            Keyword arguments to pass to the optimization algorithm
        rotatron_kws : dict
            Keyword arguments to pass to the rotatron
        inplace : bool
            Whether to optimize the molecule in place or return a copy.

        Returns
        -------
        molecule
            The optimized molecule (either the original object or a copy)
        """
        import buildamol.optimizers as optimizers

        if residue_graph is None:
            residue_graph = self.count_atoms() > 300 and self.count_residues() > 3

        if not algorithm_kws:
            algorithm_kws = {}

        if not rotatron_kws:
            rotatron_kws = {}

        algorithm = algorithm or optimizers.auto_algorithm(self)

        if algorithm == "rdkit":
            opt = optimizers.rdkit_optimize(self)
            out = self.copy() if not inplace else self
            for a, b in zip(out.get_atoms(), opt.get_atoms()):
                a.set_coord(b.coord)
            return out
        elif algorithm == "mmff":
            opt = optimizers.mmff_optimize(self)
            out = self.copy() if not inplace else self
            for a, b in zip(out.get_atoms(), opt.get_atoms()):
                a.set_coord(b.coord)
            return out
        elif algorithm == "uff":
            opt = optimizers.uff_optimize(self)
            out = self.copy() if not inplace else self
            for a, b in zip(out.get_atoms(), opt.get_atoms()):
                a.set_coord(b.coord)
            return out
        elif algorithm not in ("scipy", "genetic", "swarm", "anneal"):
            raise ValueError(f"Unknown algorithm {algorithm}")

        rotatron = rotatron or "distance"
        if rotatron == "distance":
            rotatron = optimizers.DistanceRotatron
        elif rotatron == "overlap":
            rotatron = optimizers.OverlapRotatron
        elif rotatron == "forcefield":
            rotatron = optimizers.ForceFieldRotatron
        else:
            raise ValueError(f"Unknown rotatron {rotatron}")

        if not inplace:
            new = self.copy()
        else:
            new = self

        if residue_graph:
            graph = new.make_residue_graph(detailed=True)
        else:
            graph = new.make_atom_graph(_copy=False)

        edges = graph.find_rotatable_edges(
            graph.central_node, min_ancestors=1, min_descendants=1
        )
        if len(edges) > 150:
            edges = graph.sample_edges(edges, edges // 5)

        env = rotatron(graph, edges, **rotatron_kws)

        return optimizers.optimize(new, env, algorithm, **algorithm_kws)


    def __add__(self, other) -> "Molecule":
        """
        Add two molecules together. This will return a new molecule.
        """
        if isinstance(other, (tuple, list)):
            new = self + other[0]
            for o in other[1:]:
                new += o
            return new

        elif isinstance(other, Molecule):
            patch = self._linkage
            if not patch:
                patch = other._linkage
            if not patch:
                raise RuntimeError(
                    "Cannot add two molecules together without a patch, set a default patch on either of them (preferably on the one you are adding to, i.e. mol1 in mol1 + mol2)"
                )

            # if linkages are functional groups instead, we use the react_with method
            if (
                self._linkage.__class__.__name__ == "FunctionalGroup"
                and other._linkage.__class__.__name__ == "FunctionalGroup"
            ):
                new = self.react_with(
                    other, self._linkage, other._linkage, inplace=False
                )
                return new

            # otherwise we use the attach method
            new = self.attach(other, patch, inplace=False, other_inplace=False)
            return new

        new = self.copy()
        if isinstance(other, entity.base_classes.Atom):
            new.add_atoms(other)

        elif isinstance(other, entity.base_classes.Residue):
            new.add_residues(other)

        elif isinstance(other, entity.base_classes.Chain):
            new._model.add(other)

        elif isinstance(other, entity.base_classes.Model):
            new._base_struct.add(other)

        return new

    def __iadd__(self, other) -> "Molecule":
        """
        Attach another molecule to this one
        """
        if isinstance(other, (tuple, list)):
            for o in other:
                self += o
            return self

        elif isinstance(other, Molecule):
            patch = self._linkage
            if not patch:
                patch = other._linkage
            if not patch:
                raise RuntimeError(
                    "Cannot add two molecules together without a patch, set a default patch on either of them (preferably on the one you are adding to, i.e. mol1 in mol1 += mol2)"
                )

            # if linkages are functional groups instead, we use the react_with method
            if (
                self._linkage.__class__.__name__ == "FunctionalGroup"
                and other._linkage.__class__.__name__ == "FunctionalGroup"
            ):
                self.react_with(other, self._linkage, other._linkage)
                return self

            # otherwise we use the attach method
            self.attach(other, patch)
            return self

        elif isinstance(other, entity.base_classes.Atom):
            self.add_atoms(other)

        elif isinstance(other, entity.base_classes.Residue):
            self.add_residues(other)

        elif isinstance(other, entity.base_classes.Chain):
            self._model.add(other)

        elif isinstance(other, entity.base_classes.Model):
            self._base_struct.add(other)

        return self

    def __sub__(self, other) -> "Molecule":
        """
        Remove atoms, residues, chains or models from the molecule
        """
        if isinstance(other, (tuple, list)):
            new = self.copy()
            for o in other:
                new -= o
            return new

        new = self.copy()
        if isinstance(other, entity.base_classes.Atom):
            new.remove_atoms(other)

        elif isinstance(other, entity.base_classes.Residue):
            new.remove_residues(other)

        elif isinstance(other, entity.base_classes.Chain):
            new._model.detach_child(other.get_id())

        elif isinstance(other, entity.base_classes.Model):
            new._base_struct.detach_child(other.get_id())

        elif isinstance(other, Molecule):
            new.remove_residues(*other.residues)

        return new

    def __isub__(self, other) -> "Molecule":
        """
        Remove atoms, residues, chains or models from the molecule
        """
        if isinstance(other, (tuple, list)):
            for o in other:
                self -= o
            return self

        elif isinstance(other, entity.base_classes.Atom):
            self.remove_atoms(other)

        elif isinstance(other, entity.base_classes.Residue):
            self.remove_residues(other)

        elif isinstance(other, entity.base_classes.Chain):
            self._model.detach_child(other.get_id())

        elif isinstance(other, entity.base_classes.Model):
            self._base_struct.detach_child(other.get_id())

        elif isinstance(other, Molecule):
            self.remove_residues(*other.residues)

        return self

    def __mul__(self, n) -> "Molecule":
        """
        Add multiple identical molecules together using the * operator (i.e. mol * 3)
        This requires that the molecule has a patch defined
        """
        if not self._linkage:
            raise RuntimeError("Cannot multiply a molecule without a patch defined")

        if not isinstance(n, int):
            raise TypeError("Can only multiply a molecule by an integer")
        if n <= 0:
            raise ValueError("Can only multiply a molecule by a positive integer")

        new = self.attach(self, self._linkage, inplace=False, other_inplace=False)
        for i in range(n - 2):
            new = new.attach(self, self._linkage, inplace=True, other_inplace=False)

        return new

    def __imul__(self, n) -> "Molecule":
        """
        Add multiple identical molecules together using the *= operator (i.e. mol *= 3)
        This requires that the molecule has a patch defined
        """
        self.repeat(n)
        return self

    def __repr__(self):
        return f"Molecule({self.id})"

    def __str__(self):
        string = f"Molecule {self.id}:\n"
        if len(self.residues) > 1:
            string += str(self.make_residue_graph())
        else:
            string += str(self.residues[0])
        return string



def _molecule_from_pubchem(id, comp):
    """
    Convert a PubChem compound to a Molecule.

    Parameters
    ----------
    id : str
        The id of the molecule
    comp : pcp.Compound
        The PubChem compound

    Returns
    -------
    Molecule
        The molecule
    """
    bonds = comp.bonds
    mol = Molecule.empty(id)
    residue = entity.base_classes.Residue("UNK", " ", 1)
    mol.add_residues(residue)
    # structure = bio.Structure.Structure(id)
    # model = bio.Model.Model(0)
    # structure.add(model)
    # chain = bio.Chain.Chain("A")
    # model.add(chain)
    # residue = bio.Residue.Residue((" ", 1, " "), "UNK", 1)
    # chain.add(residue)

    element_counts = {}
    adx = 1
    for atom in comp.atoms:
        element = atom.element
        element_counts[element] = element_counts.get(element, 0) + 1

        id = f"{element}{element_counts[element]}"
        _atom = entity.base_classes.Atom(
            id,
            coord=np.array((atom.x, atom.y, atom.z)),
            serial_number=adx,
            bfactor=0.0,
            occupancy=0.0,
            element=element.upper(),
            fullname=id,
            altloc=" ",
            pqr_charge=0.0,
        )
        residue.add(_atom)
        adx += 1

    for bond in bonds:
        mol.add_bond(bond.aid1, bond.aid2, bond.order)

    return mol

    # def _molecule_from_ctfile(id, ctfile):
    #     """
    #     Convert a CTFile object to a Molecule.

    #     Parameters
    #     ----------
    #     id : str
    #         The id of the molecule
    #     ctfile : CTFile
    #         The CTFile object

    #     Returns
    #     -------
    #     Molecule
    #         The molecule
    #     """
    #     new = Molecule.empty(id)
    #     residue = entity.base_classes.Residue("UNK", " ", 1)
    #     new.add_residues(residue)

    # element_counts = {}
    # for atom in ctfile.atoms:
    #     element_counts.setdefault(atom.atom_symbol, 1)
    #     count = element_counts[atom.atom_symbol]
    #     element_counts[atom.atom_symbol] += 1

    #     x = float(atom._ctab_data["x"])
    #     y = float(atom._ctab_data["y"])
    #     z = float(atom._ctab_data["z"])

    #     _atom = entity.base_classes.Atom(
    #         id=f"{atom.atom_symbol}{count}",
    #         coord=np.array((x, y, z)),
    #         serial_number=int(atom.atom_number),
    #         pqr_charge=float(atom.charge),
    #     )
    #     residue.add(_atom)

    # for bond in ctfile.bonds:
    #     a, b = bond.first_atom, bond.second_atom
    #     a = int(a.atom_number)
    #     b = int(b.atom_number)
    #     order = int(bond._ctab_data["bond_type"])
    #     for i in range(order):
    #         new.add_bond(a, b)

    # return new


if __name__ == "__main__":
    # ser = Molecule.from_json("ser.json")
    # link = Linkage.from_json("peptide_linkage.json")

    # pep = ser.attach(ser, link, use_patch=False)
    # pep.show()

    # # ser = Molecule.from_pubchem("SER")
    # ser.to_cif("ser.cif")
    # ser.to_pdb("ser.pdb")
    # recipe = Linkage()
    # recipe.add_delete("O1", "target")
    # recipe.add_delete("HO1", "target")
    # recipe.add_delete("HO4", "source")
    # recipe.add_bond(("C1", "O4"))

    # glc = Molecule.from_compound("GLC")
    # glc.set_linkage(recipe)

    # glc2 = glc.copy()

    # _current_residues = len(glc.residues)
    # glc3 = glc + glc2
    # glc3.show()
    # glc3 = glc % "14bb" + glc
    pass
    # import pickle

    # tmp = pickle.load(open("/Users/noahhk/GIT/biobuild/tmp.pickle", "rb"))
    # tmp.get_residue_connections()
    # exit()
    # from timeit import timeit

    # # man = Molecule.from_compound("MAN")
    # glc = Molecule.from_compound("GLC")

    # # man.adjust_indexing(glc)
    # # assert glc.atoms[0].serial_number == len(man.atoms) + 1

    # t1 = timeit()

    # glc.repeat(5, "14bb")
    # # glc % "14bb"
    # # glc *= 10

    # t2 = timeit()

    # print(t2 - t1)

    from buildamol.utils import visual

    # v = visual.MoleculeViewer3D(glc)
    # v.show()

    # man = Molecule.from_pdb("support/examples/membrane.pdb", model=4)
    # man.infer_bonds()
    # man.

    man = Molecule.from_pubchem("benzene")
    man.infer_bonds(restrict_residues=False)

    g = man.make_residue_graph(True)
    v = visual.MoleculeViewer3D(g)

    for c in man.get_residue_connections():
        v.draw_vector(
            None, c[0].coord, 1.3 * (c[1].coord - c[0].coord), color="magenta"
        )
    v.show()