Bioinformatics#
The bio package provides a collection of tools for working with bio-molecules. Specifically implemented are:
- buildamol.extensions.bio.proteins.peptides.amino_acid_names_1letter = {'A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'Y'}#
The 1-letter codes of standard amino acids. This includes only the names of the standard amino acids; does not load any molecule mobjects. Use the amino_acids object to access the actual molecules.
- buildamol.extensions.bio.proteins.peptides.amino_acid_names_3letter = {'ALA', 'ARG', 'ASN', 'ASP', 'CYS', 'GLN', 'GLU', 'GLY', 'HIS', 'ILE', 'LEU', 'LYS', 'MET', 'PHE', 'PRO', 'SER', 'THR', 'TRP', 'TYR', 'VAL'}#
The 3-letter codes of standard amino acids. This includes only the names of the standard amino acids; does not load any molecule mobjects. Use the amino_acids object to access the actual molecules.
- buildamol.extensions.bio.proteins.peptides.amino_acids = amino_acids(ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL)#
Access point for standard amino acids. This supports 1-letter, 3-letter codes, and full names.
Example
>>> from buildamol.extensions.bio.proteins.peptides import amino_acids >>> amino_acids.ALA # Access by 3-letter code Molecule(ALA) >>> amino_acids.D # Access by 1-letter code Molecule(ASP) >>> amino_acids.proline # Access by full name Molecule(PRO)
Each amino acid is a new and unique Molecule object! >>> amino_acids.arginine == amino_acids.arginine False # Each access returns a new Molecule object
- buildamol.extensions.bio.proteins.peptides.omega(mol: Molecule, res: int | Residue = None) float | ndarray[source]#
Compute the omega angle of a residue in a protein
- Parameters:
mol (Molecule) – The protein
res (int) – The residue number of the residue having the carboxyl carbon. If not provided, all residues are considered.
- Returns:
The omega angle(s) in degrees
- Return type:
float or ndarray
- buildamol.extensions.bio.proteins.peptides.peptide(seq: str) Molecule[source]#
Create a peptide from a sequence
- Parameters:
seq (str) – The sequence of the peptide in one-letter code
- Returns:
The peptide
- Return type:
- buildamol.extensions.bio.proteins.peptides.phi(mol: Molecule, res: int | Residue = None) float | ndarray[source]#
Compute the phi angle of a residue in a protein
- Parameters:
mol (Molecule) – The protein
res (int) – The residue number of the residue having the alpha carbon. If not provided, all residues are considered.
- Returns:
The phi angle(s) in degrees
- Return type:
float or ndarray
- buildamol.extensions.bio.proteins.peptides.psi(mol: Molecule, res: int | Residue = None) float | ndarray[source]#
Compute the psi angle of a residue in a protein
- Parameters:
mol (Molecule) – The protein
res (int) – The residue number of the residue having the alpha carbon. If not provided, all residues are considered.
- Returns:
The psi angle(s) in degrees
- Return type:
float or ndarray
- buildamol.extensions.bio.proteins.peptides.sequence(mol: Molecule, unknown: str | callable = 'X') str[source]#
Get the 1-letter code sequence of a peptide. This also works for proteins with multiple chains. Chains are separated by a colon.
- Parameters:
mol (Molecule) – The peptide
unknown (str or callable, optional) – The character to use for unknown residues (default: ‘X’) This can also be set to a function that takes a the molecule and residue object and returns a string. Set to None to ignore unknown residues.
- Returns:
The sequence of the peptide in one-letter code
- Return type:
str
- buildamol.extensions.bio.proteins.peptides.sequence_to_1letter(seq: str, sep=' ') str[source]#
Convert a sequence in three-letter code to one-letter code.
- Parameters:
seq (str) – The sequence in three-letter code.
- Returns:
str – The sequence in one-letter code.
sep (str, optional) – The separator to use between the three-letter codes (default: space)
Example
>>> sequence_to_1letter("ALA CYS ASP GLU") 'ACDE'
- buildamol.extensions.bio.proteins.peptides.sequence_to_3letter(seq: str, sep=' ') str[source]#
Convert a sequence in one-letter code to three-letter code.
- Parameters:
seq (str) – The sequence in one-letter code.
- Returns:
str – The sequence in three-letter code.
sep (str, optional) – The separator to use between the three-letter codes (default: space)
Exampl
——-
>>> sequence_to_3letter(“ACDE”)
’ALA CYS ASP GLU’
- buildamol.extensions.bio.lipids.simple_lipids.fatty_acid(length: int, double_bonds: int | tuple, cis: float | tuple, id: str = 'UNK') Molecule[source]#
Create a fatty acid molecule.
- Parameters:
length (int) – The length of the fatty acid chain.
double_bonds (int or tuple) – Double bonds to make. This can be either a single integer (the number of double bonds to make at random positions) or a tuple of integers (the positions of the double bonds).
cis (float or tuple) – The cis configuration of the double bonds. This can be either a single float (the probability of a double bond being cis) or a tuple of floats (the probability of each double bond being cis). Boolean values are also accepted instead of floats.
id (str) – The id of the fatty acid molecule.
- Returns:
The fatty acid molecule.
- Return type:
Examples
# Create a fatty acid with 20 carbons, 4 double bonds, and 50% in cis configuration: >>> mol = fatty_acid(20, 4, 0.5) # Create a fatty acid with 16 carbons, and double bonds at positions (6 and 12) in trans and cis configuration: >>> mol = fatty_acid(16, (6, 12), (False, True)) # Create a fatty acid with 12 carbons and two double bonds both in cis configuration: >>> mol = fatty_acid(12, 2, True)
- buildamol.extensions.bio.lipids.simple_lipids.phospholipid(chain1: Molecule, chain2: Molecule, headgroup: Molecule, headgroup_link: Linkage, id: str = 'UNK')[source]#
Create a phospholipid molecule from two fatty acid chains and a headgroup.
- Parameters:
chain1 (Molecule) – The first fatty acid chain. None can be provided, to leave this position empty.
chain2 (Molecule) – The second fatty acid chain. None can be provided, to leave this position empty.
headgroup (Molecule) – The headgroup of the phospholipid. This will be attached to the phosphate group. It is assumed that the headgroup does NOT have its own phosphate group. None can be provided, to leave this position empty.
headgroup_link (Linkage) – The linkage to use to attach the headgroup to the phosphate group. The phosphate group is treated as “target” the headgroup is “source”.
id (str) – The id of the phospholipid molecule.
- Returns:
The phospholipid molecule.
- Return type:
- buildamol.extensions.bio.lipids.simple_lipids.sphingolipid(chain: Molecule, headgroup: Molecule, headgroup_link: Linkage, id: str = 'UNK')[source]#
Create a sphingolipid molecule from a fatty acid chain and a headgroup.
- Parameters:
chain (Molecule) – The fatty acid chain. None can be provided, to leave this position empty.
headgroup (Molecule) – The headgroup of the sphingolipid. None can be provided, to leave this position empty.
headgroup_link (Linkage) – The linkage to use to attach the headgroup to the sphingosine. The sphingosine is treated as “target” the headgroup is “source”.
id (str) – The id of the sphingolipid molecule.
- Returns:
The sphingolipid molecule.
- Return type:
- buildamol.extensions.bio.lipids.simple_lipids.triacylglycerol(chain1: Molecule, chain2: Molecule, chain3: Molecule) Molecule[source]#
Create a triacylglycerol molecule from three fatty acid chains.
- Parameters:
chain1 (Molecule) – The first fatty acid chain. None can be provided, to leave this position empty.
chain2 (Molecule) – The second fatty acid chain (the one in the middle). None can be provided, to leave this position empty.
chain3 (Molecule) – The third fatty acid chain. None can be provided, to leave this position empty.
- Returns:
The triacylglycerol molecule.
- Return type:
- buildamol.extensions.bio.glycans.glycan.glycan(iupac: str, id: str = 'UNK') Molecule[source]#
Create a glycan molecule from an IUPAC string
- Parameters:
iupac (str) – The IUPAC string of the glycan
id (str) – The id of the glycan molecule
- Returns:
The glycan molecule
- Return type:
Functions to work with the IUPAC glycan nomenclature.
- class buildamol.extensions.bio.glycans.iupac.IUPACParser[source]#
Bases:
objectA parser for condensed IUPAC glycan nomenclature strings. This class will generate a list of connecting glycan segments from a string from which a Molecule can be built.
Functions to work with simple DNA and RNA molecules
- buildamol.extensions.bio.nucleic_acids.simple_sequences.dna(sequence: str) Molecule[source]#
Create a DNA molecule from a sequence
- Parameters:
sequence (str) – The DNA sequence
- Returns:
The DNA molecule
- Return type:
- buildamol.extensions.bio.nucleic_acids.simple_sequences.get_3prime(mol: Molecule) Residue[source]#
Get the 3’ residue of a nucleic acid
- buildamol.extensions.bio.nucleic_acids.simple_sequences.get_5prime(mol: Molecule) Residue[source]#
Get the 5’ residue of a nucleic acid