# Setting up PAMs using PAModelpy This guide provides step-by-step instructions on how to use the provided Python scripts to build Protein Allocation Models (PAMs) from a genome-scale model and parameter datasets. --- ## Overview The `pam_generation.py` script allows users to set up a PAM based on a genome-scale metabolic model and an enzyme database. It integrates genome-protein-reaction (GPR) rules with enzyme kinetics to partition cellular proteins into functional sectors. The accompanying `test_pam_generation.py` provides unit tests to validate the setup process. --- ## Prerequisites ### Required python libraries Ensure you have `PAModelpy` installed on Python >=3.9 and <=3.11 Install these dependencies via pip: ```bash pip install cobra PAModelpy ``` Note that the package has been tested with the [Gurobi](https://www.mathworks.com/products/connections/product_detail/gurobi-optimizer.html) solver. In order for Gurobi to work properly, please install [gurobipy](https://pypi.org/project/gurobipy/) with a version matching your license. For example for the version used for the development of PAModelpy: `pip install gurobipy==9.5.2` ### Input files 1. **Genome-scale model**: the path to metabolic model in SBML format (e.g., `iML1515.xml`) or a cobra.Model instance (e.g., build from a json file like `e_coli_core.json`). 2. **Parameter Excel file**: Contains data on enzymatic properties. The file should have at least the following sheets: - **ActiveEnzymes**: Enzyme-specific parameters such as reaction IDs, kcat values, and molar masses. - **Translational** (optional): Parameters related to the translational protein fraction. - **UnusedEnzyme** (optional): Parameters for the unused enzyme sector. --- ## Dataset requirements ### Excel file structure #### ActiveEnzymes sheet | **Column** | **Description** | |---------------------|---------------------------------------------------------| | `rxn_id` | Reaction ID from the genome-scale model. | | `enzyme_id` | Unique enzyme identifier*. | | `gene` | List of genes associated with the enzyme. | | `GPR` | Gene-protein-reaction association (logical expression). | | `molMass` | Molar mass of the enzyme (kDa). | | `kcat_values` | Turnover number (s⁻¹). | | `direction` | Reaction direction (`f` for forward, `b` for backward). | * A unique enzyme identifier is defined as a single identifier per catalytically active unit. This means that an enzyme complex has a single identifier. * Enzyme-complex identifiers can be parsed from peptide identifiers and gene-to-protein mapping using the `merge_enzyme_complexes` function. #### Translational sheet | **Parameter** | **Description** | |---------------------|-------------------------------------------------------------------------------| | `id_list` | Identifier related to protein fraction associated with translational proteins | | `tps_0` | Translational protein fraction at zero growth rate. [g_protein/g_CDW] | | `tps_mu` | Change in translational protein fraction per unit change of the associated reaction. | | `mol_mass` | Molar mass of the translational enzymes. [kDa] | #### UnusedEnzyme sheet | **Parameter** | **Description** | |---------------------|---------------------------------------------------------------------------------| | `id_list` | Identifier related to protein fraction associated with the unused enzyme sector | | `ups_0` | Unused enzyme fraction at zero growth rate. [g_protein/g_CDW] | | `ups_mu` | Change in unused enzyme fraction per unit change of the associated reaction. | | `mol_mass` | Molar mass of unused enzymes. [kDa] | --- ## Building a PAM ### Steps to create a PAM 1. Prepare the genome-scale model and parameter file. 2. Optional: change defaults in the Config object to match your model identifiers 3. Use the `set_up_pam` function to initialize the model. 4. Optimize the PAM #### Example usage for *E. coli* The model defaults are set to generate a PAM for the iML1515 model of *Escherichia coli* K-12. ```python from PAModelpy.utils.pam_generation import set_up_pam #1. Define input paths model_path = "Models/iML1515.xml" param_file = "Data/proteinAllocationModel_iML1515_EnzymaticData_new.xlsx" #2. Config is not required for iML1515 #3. Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, translational_enzymes=True, unused_enzymes=True, sensitivity=True, adjust_reaction_ids=False) #4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h pam.optimize() print(f"Objective Value: {pam.objective.value}") ``` #### Example usage for other microorganisms The model defaults have to be adapted to the identifiers for another microorganism. In case of a model in the BiGG namespace, only the biomass reaction has to be adapted (which is the case for e.g. [iJN1463](https://onlinelibrary.wiley.com/doi/abs/10.1111/1462-2920.14843), the genome-scale model for *Pseudomonas putida* KT2440). For some other models, the entire namespace has to be adapted (e.g. for the [Yeast9](https://www.embopress.org/doi/full/10.1038/s44320-024-00060-7) model of *Saccharomyces cerevisiae*). This can be accomplished using a [`Config`](api_reference/configuration.md) object. ##### Example for BiGG namespace: *P. putida* iJN1463 Please note that this is only an example, both the model, as the parameter file do not exist in this repository. ```python from PAModelpy.utils.pam_generation import set_up_pam from PAModelpy import Config #1. Define input paths model_path = "Models/iJN1463.xml" param_file = "Data/proteinAllocationModel_iJN1463_EnzymaticData.xlsx" #2. Change biomass reaction id in config config = Config() config.BIOMASS_REACTION = 'BIOMASS_KT2440_WT3' #3. Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, config=config, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, translational_enzymes=True, unused_enzymes=True, sensitivity=True, adjust_reaction_ids=False) #4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h pam.optimize() print(f"Objective Value: {pam.objective.value}") ``` ##### Example for non-BiGG namespace: *S. cerevisia* Yeast9 Please note that this is only an example, both the model, as the parameter file do not exist in this repository. ```python from PAModelpy.utils.pam_generation import set_up_pam from PAModelpy import Config #1. Define input paths model_path = "Models/Yeast9.xml" param_file = "Data/proteinAllocationModel_yeast9_EnzymaticData.xlsx" #2. Change all the reaction ids in config and the protein regex config = Config() config.TOTAL_PROTEIN_CONSTRAINT_ID = "TotalProteinConstraint" config.P_TOT_DEFAULT = 0.388 # g_protein/g_cdw config.CO2_EXHANGE_RXNID = "r_1672" config.GLUCOSE_EXCHANGE_RXNID = "r_1714" config.BIOMASS_REACTION = "r_2111" config.OXYGEN_UPTAKE_RXNID = "r_1992" config.ACETATE_EXCRETION_RXNID = "r_1634" config.PHYS_RXN_IDS = [ config.BIOMASS_REACTION, config.GLUCOSE_EXCHANGE_RXNID, config.ACETATE_EXCRETION_RXNID, config.CO2_EXHANGE_RXNID, config.OXYGEN_UPTAKE_RXNID] config.ENZYME_ID_REGEX = r'(Y[A-P][LR][0-9]{3}[CW])' #3. Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, config=config, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, translational_enzymes=True, unused_enzymes=True, sensitivity=True, adjust_reaction_ids=False) #4. Optimize to find the max growth rate at a substrate uptake rate of -10 mmol/gcdw/h pam.optimize() print(f"Objective Value: {pam.objective.value}") ``` ##### A short note on enzyme identifiers The Config object has an entry which enables the framework to find and extract enzyme identifiers from the PAModel: ```python Config.ENZYME_ID_REGEX = r'(?:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2})' ``` This is the default [regular expression](https://www.uniprot.org/help/accession_numbers) to find UniProt identifiers, as derived from the UniProt database (obtained 2024-08-07). Two other default placeholder identifiers (`E1`, `E10`, `Enzyme_GLC_D`, etx) are always included in any regex search with the PAModel: ```python default_enzyme_regex = r'E[0-9][0-9]*|Enzyme_*' ``` In case you would like to use other placeholders or another form of protein identifiers, please adapt the `Config.ENZYME_ID_REGEX` attribute with the proper regular expression. --- ## Testing the setup To verify the correctness of the PAM generation process, run the tests provided in `tests/unit_tests/test_utils/test_pam_generation.py`: ```bash python -m pytest tests/unit_tests/test_utils/test_pam_generation.py.py ``` --- ## Additional features ### Increasing kcat values The script includes a utility to scale up kcat values in the parameter file: ```python from PAModelpy.utils.pam_generation import increase_kcats_in_parameter_file increase_kcats_in_parameter_file( kcat_increase_factor=2, pam_info_file_path_ori="Data/old_param_file.xlsx", pam_info_file_path_out="Data/new_param_file.xlsx" ) ``` ### Generate enzyme-complex identifiers from peptide ids (e.g. uniprot annotation) This helps with setting up a novel parameter file using identifiers obtained from uniprot and a gene-to-protein mapping ```python from PAModelpy.utils.pam_generation import get_protein_gene_mapping, merge_enzyme_complexes from cobra.io import read_sbml_model model = read_sbml_model('Models/iML1515.xml') enzyme_db = "Data/old_param_file.xlsx" protein2gene, gene2protein = get_protein_gene_mapping(enzyme_db, model) # Ensure the enzyme complexes are merged on one row eco_enzymes_mapped = merge_enzyme_complexes(enzyme_db, gene2protein) ``` ### Saving the PAM Saving the PAM is not often required, as all the parameters are stored in Excel format and can thus be easily modified. COBRApy's tools for saving models DO NOT WORK for saving PAMs: in fact the generated models won't be executable. In case you do want to store a PAM for later use, you can store it using pickle, provided the underlying metabolic model is picklable.. ```python import pickle pickle.dump(pam, "path/to/model.pickle") ``` Does pickling not work? Check if one of the reactions or metabolites has LP standard language (such as 'St') as id. ### Flux Variability Analysis for Enzymes Similar to the FVA from COBRApy, PAModelpy has implemented a FVA. You can provide a object type, which will be varied. In this manner, you can perform FVA on proteins. Please note that the enzyme variables in the model are corrected for the feasibility tolerance with a factor of 1e6 and that the computed units is in mmol/gCDW. To get interpretable units, unit corrections are required. The result is a dataframe with the minimum and maximum value of the variable which was studied. ```python from PAModelpy import Enzyme from PAModelpy.flux_analysis import flux_variability_analysis other_variables_to_analyze = [pam.reactions.get_by_id(rid) for rid in ['TPI', 'ACONTa', 'EX_ac_e']] fva_results = flux_variability_analysis(pam, variable_type=Enzyme, variable_list = other_variables_to_analyze #optional, provide a list with other variables to perform fva on which are not enzymes ) ``` ### Recombinant protein overexpression Example for eGFP overexpression. ```python import os from PAModelpy import Enzyme from PAModelpy.utils.recombinant_protein_expression import (read_sequence_from_file, add_recombinant_intracellular_protein_to_pam) aa_seq = read_sequence_from_file(os.path.join('Data', 'eGFP_protein_sequence.txt')) gfp_enzyme = Enzyme('eGFP', {}, molmass=2.8*1e4) protein_production_reaction = add_recombinant_intracellular_protein_to_pam(pam, protein=gfp_enzyme, aa_seq=aa_seq) ``` You can also determine protein export. In this case, ust adding and exporting a protein is not enough, as the exported protein requires ribosomes. In the normal PAM, the translational sector accounts for all ribosomes to make proteins for enzymes inside the cell. If you want to generate more proteins you need to allocate ribosomes specifically to make the exported protein. To determine the amount of ribosomes needed to make one mmol/gCDW of exported recombinant protein, a reverence growth rate is required. ```python import os from PAModelpy import Enzyme from PAModelpy.utils.recombinant_protein_expression import (read_sequence_from_file, add_recombinant_protein_production_and_export,) protein_production_reaction = add_recombinant_protein_production_and_export(aa_txt_file=os.path.join('Data', 'eGFP_protein_sequence.txt'), pam=pam, protein_name='eGFP', molecular_weight=2.8*1e4, reference_growth_rate=0.5) ``` --- ## Troubleshooting - **Issue: Missing reaction in enzyme database** Solution: Ensure all reactions in the model are represented in the parameter file or adjust `reaction_ids` using `adjust_reaction_ids=True`. - **Issue: Objective value is zero after optimization** Solution: Check the input parameter file for consistency and ensure that all reactions are correctly annotated. - **Issue: The PAM can be pickled, but not unpickled** Solution: Most likely one of the reaction in the metabolic models has a name like 'St'. Renaming this to something which is not part of the standard LP language will solve the issue.