Bioinformatics#

The bio package provides a collection of tools for working with bio-molecules. Specifically implemented are:

Proteins

buildamol.extensions.bio.proteins.peptides.amino_acid_names_1letter = {'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y'}#: The 1-letter codes of standard amino acids. This includes only the names of the standard amino acids; does not load any molecule mobjects. Use the amino_acids object to access the actual molecules.

buildamol.extensions.bio.proteins.peptides.amino_acid_names_3letter = {'ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL'}#: The 3-letter codes of standard amino acids. This includes only the names of the standard amino acids; does not load any molecule mobjects. Use the amino_acids object to access the actual molecules.

buildamol.extensions.bio.proteins.peptides.amino_acids = amino_acids(ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL)#

Access point for standard amino acids. This supports 1-letter, 3-letter codes, and full names.

Example

>>> from buildamol.extensions.bio.proteins.peptides import amino_acids
>>> amino_acids.ALA  # Access by 3-letter code
Molecule(ALA)
>>> amino_acids.D  # Access by 1-letter code
Molecule(ASP)
>>> amino_acids.proline  # Access by full name
Molecule(PRO)

Each amino acid is a new and unique Molecule object! >>> amino_acids.arginine == amino_acids.arginine False # Each access returns a new Molecule object

buildamol.extensions.bio.proteins.peptides.omega(mol: Molecule, res: int | Residue = None) → float | ndarray[source]#

Compute the omega angle of a residue in a protein

Parameters:

mol (Molecule) – The protein
res (int) – The residue number of the residue having the carboxyl carbon. If not provided, all residues are considered.

Returns:

The omega angle(s) in degrees

Return type:

float or ndarray

buildamol.extensions.bio.proteins.peptides.peptide(seq: str) → Molecule[source]#

Create a peptide from a sequence

Parameters:: seq (str) – The sequence of the peptide in one-letter code
Returns:: The peptide
Return type:: Molecule

buildamol.extensions.bio.proteins.peptides.phi(mol: Molecule, res: int | Residue = None) → float | ndarray[source]#

Compute the phi angle of a residue in a protein

Parameters:

mol (Molecule) – The protein
res (int) – The residue number of the residue having the alpha carbon. If not provided, all residues are considered.

Returns:

The phi angle(s) in degrees

Return type:

float or ndarray

buildamol.extensions.bio.proteins.peptides.psi(mol: Molecule, res: int | Residue = None) → float | ndarray[source]#

Compute the psi angle of a residue in a protein

Parameters:

mol (Molecule) – The protein
res (int) – The residue number of the residue having the alpha carbon. If not provided, all residues are considered.

Returns:

The psi angle(s) in degrees

Return type:

float or ndarray

buildamol.extensions.bio.proteins.peptides.sequence(mol: Molecule, unknown: str | callable = 'X') → str[source]#

Get the 1-letter code sequence of a peptide. This also works for proteins with multiple chains. Chains are separated by a colon.

Parameters:

mol (Molecule) – The peptide
unknown (str or callable, optional) – The character to use for unknown residues (default: ‘X’) This can also be set to a function that takes a the molecule and residue object and returns a string. Set to None to ignore unknown residues.

Returns:

The sequence of the peptide in one-letter code

Return type:

str

buildamol.extensions.bio.proteins.peptides.sequence_to_1letter(seq: str, sep=' ') → str[source]#

Convert a sequence in three-letter code to one-letter code.

Parameters:

seq (str) – The sequence in three-letter code.

Returns:

str – The sequence in one-letter code.
sep (str, optional) – The separator to use between the three-letter codes (default: space)

Example

>>> sequence_to_1letter("ALA CYS ASP GLU")
'ACDE'

buildamol.extensions.bio.proteins.peptides.sequence_to_3letter(seq: str, sep=' ') → str[source]#

Convert a sequence in one-letter code to three-letter code.

Parameters:

seq (str) – The sequence in one-letter code.

Returns:

str – The sequence in three-letter code.
sep (str, optional) – The separator to use between the three-letter codes (default: space)
Exampl
——-
>>> sequence_to_3letter(“ACDE”)
’ALA CYS ASP GLU’

Lipids

buildamol.extensions.bio.lipids.simple_lipids.fatty_acid(length: int, double_bonds: int | tuple, cis: float | tuple, id: str = 'UNK') → Molecule[source]#

Create a fatty acid molecule.

Parameters:

length (int) – The length of the fatty acid chain.
double_bonds (int or tuple) – Double bonds to make. This can be either a single integer (the number of double bonds to make at random positions) or a tuple of integers (the positions of the double bonds).
cis (float or tuple) – The cis configuration of the double bonds. This can be either a single float (the probability of a double bond being cis) or a tuple of floats (the probability of each double bond being cis). Boolean values are also accepted instead of floats.
id (str) – The id of the fatty acid molecule.

Returns:

The fatty acid molecule.

Return type:

Molecule

Examples

# Create a fatty acid with 20 carbons, 4 double bonds, and 50% in cis configuration: >>> mol = fatty_acid(20, 4, 0.5) # Create a fatty acid with 16 carbons, and double bonds at positions (6 and 12) in trans and cis configuration: >>> mol = fatty_acid(16, (6, 12), (False, True)) # Create a fatty acid with 12 carbons and two double bonds both in cis configuration: >>> mol = fatty_acid(12, 2, True)

buildamol.extensions.bio.lipids.simple_lipids.phospholipid(chain1: Molecule, chain2: Molecule, headgroup: Molecule, headgroup_link: Linkage, id: str = 'UNK')[source]#

Create a phospholipid molecule from two fatty acid chains and a headgroup.

Parameters:

chain1 (Molecule) – The first fatty acid chain. None can be provided, to leave this position empty.
chain2 (Molecule) – The second fatty acid chain. None can be provided, to leave this position empty.
headgroup (Molecule) – The headgroup of the phospholipid. This will be attached to the phosphate group. It is assumed that the headgroup does NOT have its own phosphate group. None can be provided, to leave this position empty.
headgroup_link (Linkage) – The linkage to use to attach the headgroup to the phosphate group. The phosphate group is treated as “target” the headgroup is “source”.
id (str) – The id of the phospholipid molecule.

Returns:

The phospholipid molecule.

Return type:

Molecule

buildamol.extensions.bio.lipids.simple_lipids.sphingolipid(chain: Molecule, headgroup: Molecule, headgroup_link: Linkage, id: str = 'UNK')[source]#

Create a sphingolipid molecule from a fatty acid chain and a headgroup.

Parameters:

chain (Molecule) – The fatty acid chain. None can be provided, to leave this position empty.
headgroup (Molecule) – The headgroup of the sphingolipid. None can be provided, to leave this position empty.
headgroup_link (Linkage) – The linkage to use to attach the headgroup to the sphingosine. The sphingosine is treated as “target” the headgroup is “source”.
id (str) – The id of the sphingolipid molecule.

Returns:

The sphingolipid molecule.

Return type:

Molecule

buildamol.extensions.bio.lipids.simple_lipids.triacylglycerol(chain1: Molecule, chain2: Molecule, chain3: Molecule) → Molecule[source]#

Create a triacylglycerol molecule from three fatty acid chains.

Parameters:

chain1 (Molecule) – The first fatty acid chain. None can be provided, to leave this position empty.
chain2 (Molecule) – The second fatty acid chain (the one in the middle). None can be provided, to leave this position empty.
chain3 (Molecule) – The third fatty acid chain. None can be provided, to leave this position empty.

Returns:

The triacylglycerol molecule.

Return type:

Molecule

Glycans

buildamol.extensions.bio.glycans.glycan.glycan(iupac: str, id: str = 'UNK') → Molecule[source]#

Create a glycan molecule from an IUPAC string

Parameters:

iupac (str) – The IUPAC string of the glycan
id (str) – The id of the glycan molecule

Returns:

The glycan molecule

Return type:

Molecule

Functions to work with the IUPAC glycan nomenclature.

class buildamol.extensions.bio.glycans.iupac.IUPACParser[source]#

Bases: object

A parser for condensed IUPAC glycan nomenclature strings. This class will generate a list of connecting glycan segments from a string from which a Molecule can be built.

parse(string)[source]#

Parse a string of IUPAC glycan nomenclature into a list of glycan segments.

Parameters:: string (str) – The IUPAC glycan nomenclature string.
Returns:: A list of tuples where each segment is a tuple of (residue1, residue2, linkage).
Return type:: list

reset()[source]#: Reset the parser.

Nucleic Acids

Functions to work with simple DNA and RNA molecules

buildamol.extensions.bio.nucleic_acids.simple_sequences.dna(sequence: str) → Molecule[source]#

Create a DNA molecule from a sequence

Parameters:: sequence (str) – The DNA sequence
Returns:: The DNA molecule
Return type:: Molecule

buildamol.extensions.bio.nucleic_acids.simple_sequences.get_3prime(mol: Molecule) → Residue[source]#

Get the 3’ residue of a nucleic acid

Parameters:: mol (Molecule) – The nucleic acid molecule
Returns:: The 3’ residue
Return type:: Residue

buildamol.extensions.bio.nucleic_acids.simple_sequences.get_5prime(mol: Molecule) → Residue[source]#

Get the 5’ residue of a nucleic acid

Parameters:: mol (Molecule) – The nucleic acid molecule
Returns:: The 5’ residue
Return type:: Residue

buildamol.extensions.bio.nucleic_acids.simple_sequences.nucleic_acid(sequence: str) → Molecule[source]#

Create a generic nucleic acid molecule from a sequence (DNA or RNA)

Parameters:: sequence (str) – The nucleic acid sequence
Returns:: The nucleic acid molecule
Return type:: Molecule

buildamol.extensions.bio.nucleic_acids.simple_sequences.rna(sequence: str) → Molecule[source]#

Create an RNA molecule from a sequence

Parameters:: sequence (str) – The RNA sequence
Returns:: The RNA molecule
Return type:: Molecule