# Setting up PAMs using PAModelpy This guide provides step-by-step instructions on how to use the provided Python scripts to build Protein Allocation Models (PAMs) from a genome-scale model and parameter datasets. --- ## Overview The `pam_generation.py` script allows users to set up a PAM based on a genome-scale metabolic model and an enzyme database. It integrates genome-protein-reaction (GPR) rules with enzyme kinetics to partition cellular proteins into functional sectors. The accompanying `test_pam_generation.py` provides unit tests to validate the setup process. --- ## Prerequisites ### Required python libraries Ensure you have `PAModelpy` installed Install these dependencies via pip: ```bash pip install cobra PAModelpy ``` ### Input files 1. **Genome-scale model**: the path to metabolic model in SBML format (e.g., `iML1515.xml`) or a cobra.Model instance (e.g., build from a json file like `e_coli_core.json`). 2. **Parameter Excel file**: Contains data on enzymatic properties. The file should have at least the following sheets: - **ActiveEnzymes**: Enzyme-specific parameters such as reaction IDs, kcat values, and molar masses. - **Translational** (optional): Parameters related to the translational protein fraction. - **UnusedEnzyme** (optional): Parameters for the unused enzyme sector. --- ## Dataset requirements ### Excel file structure #### ActiveEnzymes sheet | **Column** | **Description** | |---------------------|---------------------------------------------------------| | `rxn_id` | Reaction ID from the genome-scale model. | | `enzyme_id` | Unique enzyme identifier*. | | `gene` | List of genes associated with the enzyme. | | `GPR` | Gene-protein-reaction association (logical expression). | | `molMass` | Molar mass of the enzyme (kDa). | | `kcat_values` | Turnover number (s⁻¹). | | `direction` | Reaction direction (`f` for forward, `b` for backward). | * A unique enzyme identifier is defined as a single identifier per catalytically active unit. This means that an enzyme complex has a single identifier. * Enzyme-complex identifiers can be parsed from peptide identifiers and gene-to-protein mapping using the `merge_enzyme_complexes` function. #### Translational sheet | **Parameter** | **Description** | |---------------------|-------------------------------------------------------------------------------| | `id_list` | Identifier related to protein fraction associated with translational proteins | | `tps_0` | Translational protein fraction at zero growth rate. [g_protein/g_CDW] | | `tps_mu` | Change in translational protein fraction per unit change of the associated reaction. | | `mol_mass` | Molar mass of the translational enzymes. [kDa] | #### UnusedEnzyme sheet | **Parameter** | **Description** | |---------------------|---------------------------------------------------------------------------------| | `id_list` | Identifier related to protein fraction associated with the unused enzyme sector | | `ups_0` | Unused enzyme fraction at zero growth rate. [g_protein/g_CDW] | | `ups_mu` | Change in unused enzyme fraction per unit change of the associated reaction. | | `mol_mass` | Molar mass of unused enzymes. [kDa] | --- ## Building a PAM ### Steps to create a PAM 1. Prepare the genome-scale model and parameter file. 2. Optional: change defaults in the Config object to match your model identifiers 3. Use the `set_up_pam` function to initialize the model. 4. Optimize the PAM #### Example usage for *E. coli* The model defaults are set to generate a PAM for the iML1515 model of *Escherichia coli* K-12. ```python from Scripts.pam_generation import set_up_pam #1. Define input paths model_path = "Models/iML1515.xml" param_file = "Data/proteinAllocationModel_iML1515_EnzymaticData_new.xlsx" #2. Config is not required for iML1515 #3. Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, translational_enzymes=True, unused_enzymes=True, sensitivity=True, adjust_reaction_ids=False) #4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h pam.optimize() print(f"Objective Value: {pam.objective.value}") ``` #### Example usage for other microorganisms The model defaults have to be adapted to the identifiers for another microorganism. In case of a model in the BiGG namespace, only the biomass reaction has to be adapted (which is the case for e.g. [iJN1463](https://onlinelibrary.wiley.com/doi/abs/10.1111/1462-2920.14843), the genome-scale model for *Pseudomonas putida* KT2440). For some other models, the entire namespace has to be adapted (e.g. for the [Yeast9](https://www.embopress.org/doi/full/10.1038/s44320-024-00060-7) model of *Saccharomyces cerevisiae*). This can be accomplished using a [`Config`](api_reference/configuration.md) object. ##### Example for BiGG namespace: *P. putida* iJN1463 Please note that this is only an example, both the model, as the parameter file do not exist in this repository. ```python from Scripts.pam_generation import set_up_pam from PAModelpy import Config #1. Define input paths model_path = "Models/iJN1463.xml" param_file = "Data/proteinAllocationModel_iJN1463_EnzymaticData.xlsx" #2. Change biomass reaction id in config config = Config() config.BIOMASS_REACTION = 'BIOMASS_KT2440_WT3' #3. Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, config=config, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, translational_enzymes=True, unused_enzymes=True, sensitivity=True, adjust_reaction_ids=False) #4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h pam.optimize() print(f"Objective Value: {pam.objective.value}") ``` ##### Example for non-BiGG namespace: *S. cerevisia* Yeast9 Please note that this is only an example, both the model, as the parameter file do not exist in this repository. ```python from Scripts.pam_generation import set_up_pam from PAModelpy import Config #1. Define input paths model_path = "Models/Yeast9.xml" param_file = "Data/proteinAllocationModel_yeast9_EnzymaticData.xlsx" #2. Change all the reaction ids in config and the protein regex config = Config() config.TOTAL_PROTEIN_CONSTRAINT_ID = "TotalProteinConstraint" config.P_TOT_DEFAULT = 0.388 # g_protein/g_cdw config.CO2_EXHANGE_RXNID = "r_1672" config.GLUCOSE_EXCHANGE_RXNID = "r_1714" config.BIOMASS_REACTION = "r_2111" config.OXYGEN_UPTAKE_RXNID = "r_1992" config.ACETATE_EXCRETION_RXNID = "r_1634" config.PHYS_RXN_IDS = [ config.BIOMASS_REACTION, config.GLUCOSE_EXCHANGE_RXNID, config.ACETATE_EXCRETION_RXNID, config.CO2_EXHANGE_RXNID, config.OXYGEN_UPTAKE_RXNID] config.ENZYME_ID_REGEX = r'(Y[A-P][LR][0-9]{3}[CW])' #3. Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, config=config, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, translational_enzymes=True, unused_enzymes=True, sensitivity=True, adjust_reaction_ids=False) #4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h pam.optimize() print(f"Objective Value: {pam.objective.value}") ``` ##### A short note on enzyme identifiers The Config object has an entry which enables the framework to find and extract enzyme identifiers from the PAModel: ```python Config.ENZYME_ID_REGEX = r'(?:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2})' ``` This is the default [regular expression](https://www.uniprot.org/help/accession_numbers) to find UniProt identifiers, as derived from the UniProt database (obtained 2024-08-07). Two other default placeholder identifiers (`E1`, `E10`, `Enzyme_GLC_D`, etx) are always included in any regex search with the PAModel: ```python default_enzyme_regex = r'E[0-9][0-9]*|Enzyme_*' ``` In case you would like to use other placeholders or another form of protein identifiers, please adapt the `Config.ENZYME_ID_REGEX` attribute with the proper regular expression. --- ## Testing the setup To verify the correctness of the PAM generation process, run the tests provided in `tests/unit_tests/test_utils/test_pam_generation.py`: ```bash python -m pytest tests/unit_tests/test_utils/test_pam_generation.py.py ``` --- ## Additional features ### Increasing kcat values The script includes a utility to scale up kcat values in the parameter file: ```python from PAModelpy.utils.pam_generation import increase_kcats_in_parameter_file increase_kcats_in_parameter_file( kcat_increase_factor=2, pam_info_file_path_ori="Data/old_param_file.xlsx", pam_info_file_path_out="Data/new_param_file.xlsx" ) ``` ### Generate enzyme-complex identifiers from peptide ids (e.g. uniprot annotation) This helps with setting up a novel parameter file using identifiers obtained from uniprot and a gene-to-protein mapping ```python from PAModelpy.utils.pam_generation import get_protein_gene_mapping, merge_enzyme_complexes from cobra.io import read_sbml_model model = read_sbml_model('Models/iML1515.xml') enzyme_db = "Data/old_param_file.xlsx" protein2gene, gene2protein = get_protein_gene_mapping(enzyme_db, model) # Ensure the enzyme complexes are merged on one row eco_enzymes_mapped = merge_enzyme_complexes(enzyme_db, gene2protein) ``` --- ## Troubleshooting - **Issue: Missing reaction in enzyme database** Solution: Ensure all reactions in the model are represented in the parameter file or adjust `reaction_ids` using `adjust_reaction_ids=True`. - **Issue: Objective value is zero after optimization** Solution: Check the input parameter file for consistency and ensure that all reactions are correctly annotated.