Setting up PAMs using PAModelpy
This guide provides step-by-step instructions on how to use the provided Python scripts to build Protein Allocation Models (PAMs) from a genome-scale model and parameter datasets.
Overview
The pam_generation.py script allows users to set up a PAM based on a genome-scale metabolic model and an enzyme database. It integrates genome-protein-reaction (GPR) rules with enzyme kinetics to partition cellular proteins into functional sectors.
The accompanying test_pam_generation.py provides unit tests to validate the setup process.
Prerequisites
Required python libraries
Ensure you have PAModelpy installed
Install these dependencies via pip:
pip install cobra PAModelpy
Input files
Genome-scale model: the path to metabolic model in SBML format (e.g.,
iML1515.xml) or a cobra.Model instance (e.g., build from a json file likee_coli_core.json).Parameter Excel file: Contains data on enzymatic properties. The file should have at least the following sheets:
ActiveEnzymes: Enzyme-specific parameters such as reaction IDs, kcat values, and molar masses.
Translational (optional): Parameters related to the translational protein fraction.
UnusedEnzyme (optional): Parameters for the unused enzyme sector.
Dataset requirements
Excel file structure
ActiveEnzymes sheet
Column |
Description |
|---|---|
|
Reaction ID from the genome-scale model. |
|
Unique enzyme identifier*. |
|
List of genes associated with the enzyme. |
|
Gene-protein-reaction association (logical expression). |
|
Molar mass of the enzyme (kDa). |
|
Turnover number (s⁻¹). |
|
Reaction direction ( |
A unique enzyme identifier is defined as a single identifier per catalytically active unit. This means that an enzyme complex has a single identifier.
Enzyme-complex identifiers can be parsed from peptide identifiers and gene-to-protein mapping using the
merge_enzyme_complexesfunction.
Translational sheet
Parameter |
Description |
|---|---|
|
Identifier related to protein fraction associated with translational proteins |
|
Translational protein fraction at zero growth rate. [g_protein/g_CDW] |
|
Change in translational protein fraction per unit change of the associated reaction. |
|
Molar mass of the translational enzymes. [kDa] |
UnusedEnzyme sheet
Parameter |
Description |
|---|---|
|
Identifier related to protein fraction associated with the unused enzyme sector |
|
Unused enzyme fraction at zero growth rate. [g_protein/g_CDW] |
|
Change in unused enzyme fraction per unit change of the associated reaction. |
|
Molar mass of unused enzymes. [kDa] |
Building a PAM
Steps to create a PAM
Prepare the genome-scale model and parameter file.
Optional: change defaults in the Config object to match your model identifiers
Use the
set_up_pamfunction to initialize the model.Optimize the PAM
Example usage for E. coli
The model defaults are set to generate a PAM for the iML1515 model of Escherichia coli K-12.
from Scripts.pam_generation import set_up_pam
#1. Define input paths
model_path = "Models/iML1515.xml"
param_file = "Data/proteinAllocationModel_iML1515_EnzymaticData_new.xlsx"
#2. Config is not required for iML1515
#3. Build the PAM
pam = set_up_pam(pam_info_file=param_file,
model=model_path,
total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw)
active_enzymes=True,
translational_enzymes=True,
unused_enzymes=True,
sensitivity=True,
adjust_reaction_ids=False)
#4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h
pam.optimize()
print(f"Objective Value: {pam.objective.value}")
Example usage for other microorganisms
The model defaults have to be adapted to the identifiers for another microorganism. In case of a model in the BiGG namespace,
only the biomass reaction has to be adapted (which is the case for e.g. iJN1463,
the genome-scale model for Pseudomonas putida KT2440). For some other models, the entire namespace has to be adapted (e.g. for the Yeast9
model of Saccharomyces cerevisiae). This can be accomplished using a Config object.
Example for BiGG namespace: P. putida iJN1463
Please note that this is only an example, both the model, as the parameter file do not exist in this repository.
from Scripts.pam_generation import set_up_pam
from PAModelpy import Config
#1. Define input paths
model_path = "Models/iJN1463.xml"
param_file = "Data/proteinAllocationModel_iJN1463_EnzymaticData.xlsx"
#2. Change biomass reaction id in config
config = Config()
config.BIOMASS_REACTION = 'BIOMASS_KT2440_WT3'
#3. Build the PAM
pam = set_up_pam(pam_info_file=param_file,
model=model_path,
config=config,
total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw)
active_enzymes=True,
translational_enzymes=True,
unused_enzymes=True,
sensitivity=True,
adjust_reaction_ids=False)
#4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h
pam.optimize()
print(f"Objective Value: {pam.objective.value}")
Example for non-BiGG namespace: S. cerevisia Yeast9
Please note that this is only an example, both the model, as the parameter file do not exist in this repository.
from Scripts.pam_generation import set_up_pam
from PAModelpy import Config
#1. Define input paths
model_path = "Models/Yeast9.xml"
param_file = "Data/proteinAllocationModel_yeast9_EnzymaticData.xlsx"
#2. Change all the reaction ids in config and the protein regex
config = Config()
config.TOTAL_PROTEIN_CONSTRAINT_ID = "TotalProteinConstraint"
config.P_TOT_DEFAULT = 0.388 # g_protein/g_cdw
config.CO2_EXHANGE_RXNID = "r_1672"
config.GLUCOSE_EXCHANGE_RXNID = "r_1714"
config.BIOMASS_REACTION = "r_2111"
config.OXYGEN_UPTAKE_RXNID = "r_1992"
config.ACETATE_EXCRETION_RXNID = "r_1634"
config.PHYS_RXN_IDS = [
config.BIOMASS_REACTION,
config.GLUCOSE_EXCHANGE_RXNID,
config.ACETATE_EXCRETION_RXNID,
config.CO2_EXHANGE_RXNID,
config.OXYGEN_UPTAKE_RXNID]
config.ENZYME_ID_REGEX = r'(Y[A-P][LR][0-9]{3}[CW])'
#3. Build the PAM
pam = set_up_pam(pam_info_file=param_file,
model=model_path,
config=config,
total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw)
active_enzymes=True,
translational_enzymes=True,
unused_enzymes=True,
sensitivity=True,
adjust_reaction_ids=False)
#4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h
pam.optimize()
print(f"Objective Value: {pam.objective.value}")
A short note on enzyme identifiers
The Config object has an entry which enables the framework to find and extract enzyme identifiers from the PAModel:
Config.ENZYME_ID_REGEX = r'(?:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2})'
This is the default regular expression to find UniProt identifiers,
as derived from the UniProt database (obtained 2024-08-07). Two other default placeholder identifiers (E1, E10, Enzyme_GLC_D, etx)
are always included in any regex search with the PAModel:
default_enzyme_regex = r'E[0-9][0-9]*|Enzyme_*'
In case you would like to use other placeholders or another form of protein identifiers, please adapt the Config.ENZYME_ID_REGEX
attribute with the proper regular expression.
Testing the setup
To verify the correctness of the PAM generation process, run the tests provided in tests/unit_tests/test_utils/test_pam_generation.py:
python -m pytest tests/unit_tests/test_utils/test_pam_generation.py.py
Additional features
Increasing kcat values
The script includes a utility to scale up kcat values in the parameter file:
from PAModelpy.utils.pam_generation import increase_kcats_in_parameter_file
increase_kcats_in_parameter_file(
kcat_increase_factor=2,
pam_info_file_path_ori="Data/old_param_file.xlsx",
pam_info_file_path_out="Data/new_param_file.xlsx"
)
Generate enzyme-complex identifiers from peptide ids (e.g. uniprot annotation)
This helps with setting up a novel parameter file using identifiers obtained from uniprot and a gene-to-protein mapping
from PAModelpy.utils.pam_generation import get_protein_gene_mapping, merge_enzyme_complexes
from cobra.io import read_sbml_model
model = read_sbml_model('Models/iML1515.xml')
enzyme_db = "Data/old_param_file.xlsx"
protein2gene, gene2protein = get_protein_gene_mapping(enzyme_db, model)
# Ensure the enzyme complexes are merged on one row
eco_enzymes_mapped = merge_enzyme_complexes(enzyme_db, gene2protein)
Troubleshooting
Issue: Missing reaction in enzyme database
Solution: Ensure all reactions in the model are represented in the parameter file or adjustreaction_idsusingadjust_reaction_ids=True.Issue: Objective value is zero after optimization
Solution: Check the input parameter file for consistency and ensure that all reactions are correctly annotated.