# Example usage of PAModelpy ## Example 1: setting up an *Escherichia coli* Protein Allocation model (PAM) *Escherichia coli* (*E.coli*) is a commonly used model organism in Microbiology. When this microorganism is grown on increasing glucose concentration, it shifts from a purely respiratory metabolism to a respiro-fermentative metabolic phenotype. This phenomenon is called 'overflow metabolism'. Interestingly, overflow metabolism cannot be simulated using normal genome-scale models without additional constraints. With properly parametrized protein-constrained models however, we are able to simulate this metabolic phenotype. In this example, we'll set-up the *E.coli* PAM, and we'll study the predicted metabolic phenotypes for a range of glucose uptake rates. For this entire tutorial, you'll need to load the following packages: ```python #importing the packages import os from cobra.io import read_sbml_model import pandas as pd #load PAMpy modules from PAModelpy.EnzymeSectors import ActiveEnzymeSector, TransEnzymeSector, UnusedEnzymeSector from PAModelpy.PAModel import PAModel from PAModelpy.PAMValidator import PAMValidator from PAModelpy.configuration import Config from PAModelpy.utils import (merge_enzyme_complexes, parse_reaction2protein, _get_genes_for_proteins) ``` ### Step 1: Initiate the protein sectors Each protein-allocation model has three sectors: active enzyme sector (enzymes catalyzing the metabolic reactions), translational protein sectors (i.e. ribosomal proteins required for translation) and unused proteins (idle proteins which help the cell adapt to new conditions). The total of these three sectors is limited by an upperbound. This upperbound is determined by the sum of all non-maintenance enzymes, which is assumed to be constant for prokaryotes. For examples on how to parametrize these sectors, refer to `Scripts/create_ecolicore_pam_inclUE.ipynb`. - **Important note**: This tutorial aims to teach you how to build a PAM from scratch, so that you can adapt the sectors where needed and you get to understand the logic of the PAModelpy package. However, in the `utils` toolbox, there is a dedicated method for building a pam with the following syntax: ```python def set_up_pam(pam_info_file:str = '', model:Union[str, cobra.Model] = 'Models/iML1515.xml', config:Config = None, total_protein: Union[bool, float] = True, active_enzymes: bool = True, translational_enzymes: bool = True, unused_enzymes: bool = True, sensitivity:bool = True, enzyme_db:pd.DataFrame = None, adjust_reaction_ids:bool = False) -> PAModel ``` You can call this function as follows: ```python from PAModelpy.utils import set_up_pam import os pam = set_up_pam(os.path.join('Data', 'proteinAllocationModel_iML1515_EnzymaticData_py_uniprot.xlsx'), os.path.join('Models', 'iML1515.xml')) ``` More information can be found in the :ref:`PAM_setup_guide` #### 1.1: Active enzyme sector The active enzyme sector will be build using information about which enzymes catalyzes a specific reaction, the turnover rate of the catalysis and the molar mass of the enzyme. In this example we'll use the parameters as published by [Alter et al. (2021)](https://journals.asm.org/doi/10.1128/mSystems.00625-20), which can be found in the `Data` folder of the PAModelpy repository First, we'll define the paths we'll download the data ```python protein_sector_info_path = 'Data/proteinAllocationModel_iML1515_EnzymaticData_py_uniprot.xlsx' active_enzyme_data = pd.read_excel(protein_sector_info_path, sheet_name='ActiveEnzymes')) ``` The data is now in a dataframe with the following columns: ` rxn_id - gpr - gene - enzyme_id - molMass ` We need to collect the data from this table and put it in the correct structure to be parsed into the ActiveEnzymeSector object. The main input in this object is the rxn2protein dictionary, where all the information about protein-reaction associations required to build the protein-reaction relations in the model. It has the following format: ```json { "R1": { "E1": { "f": "forward kcat", "b": "backward kcat", "molmass": "molar mass", "protein_reaction_relation": [["E1"]] }, "E2": { "f": "forward kcat", "b": "backward kcat", "molmass": "molar mass", "protein_reaction_relation": [["E1"]] } } } ``` If you have a information about the gene-protein-reaction associations (e.g. 'AND'/'OR' relations between different peptides/proteins for one or more reactions), this information can be added in the `protein_reaction_relation` entry of the reaction2protein dictionary. This entry is a list of lists, in which each sublist represent one functional enzyme (complex). This means if E1 and E2 catalyze the same reaction, the `protein_reaction_relation` becomes `[['E1','E2']]` for an enzyme complex ('AND' relation), and `[['E1']['E2']]` for isozymes ('OR' relation). In this example we will use the peptide ids as defined by [UniProt](https://www.uniprot.org/). The [paper introducting sEnz](https://doi.org/10.1093/bioinformatics/btae691) uses a different system, based on EC numbers. How to build those PAMs can be found in the scripts associated to the publication and in the `Script/pam_generation.py` file. Now we will use gene-protein-reaction relations obtained from a genome-scale model and uniprot to include different enzyme relations. Fortunately, in the `utils` directory, there are functions available which help you parse the information to the reaction2protein and protein2gene dictionaries. This also includes a gapfilling step which assigns a 'dummy' identifier to each reaction which is not associated with an enzyme id. Before we start we have to ensure each row contains one 'catalytical unit', e.g. a single enzyme or enzyme complex which is able to catalyze a reaction. This means that we have to parse the dataframe. For this, we can use some tools provided in the `utils` directory. ```python #load the genome-scale information model = read_sbml_model(os.path.join('Models', 'iML1515.xml')) #get the mapping between the protein and genes protein2gene, gene2protein = _get_genes_for_proteins(active_enzyme_data, model) #parse the dataframe active_enzyme_per_cu = merge_enzyme_complexes(active_enzyme_data, gene2protein) reaction2protein, protein2gpr = parse_reaction2protein(active_enzyme_data, model) #Use the mapping between reactions and proteins to generate the active enzymes sector active_enzyme_sector = ActiveEnzymeSector(rxn2protein=rxn2protein, protein2gene=protein2gene) ``` #### 1.2: Translational protein sector The translational sector requires less parameters. We only need to define the reaction to which this section is proportional, (for example the biomass pseudoreaction or substrate uptake rate), defined by `id_list`, and the slope (`tps_mu`) and intercept (`tps_0`) of this relation. ```python translational_info = pd.read_excel(protein_sector_info_path, sheet_name='Translational') id_list = [translational_info[translational_info.Parameter == 'id_list'].loc[0,'Value']] translation_enzyme_sector = TransEnzymeSector(id_list=id_list, tps_0=[translational_info[translational_info.Parameter == 'tps_0'].loc[1,'Value']], tps_mu=[translational_info[translational_info.Parameter == 'tps_mu'].loc[2,'Value']], mol_mass=[translational_info[translational_info.Parameter == 'mol_mass'].loc[3,'Value']]) ``` #### 1.3 Unused enzyme sector The unused enzyme sector is defined in a very similar way as the translational protein sector. We assume that this section is absent when the microbe is growing at it's highest growth rate. We'll use this assumption to define the slope: ```python unused_protein_info = pd.read_excel(pam_info_file, sheet_name='ExcessEnzymes') ups_0 = unused_protein_info[unused_protein_info.Parameter == 'ups_0'].loc[2,'Value'] smax = unused_protein_info[unused_protein_info.Parameter == 's_max_uptake'].loc[1,'Value'] id_list =[unused_protein_info[unused_protein_info.Parameter == 'id_list'].loc[0,'Value']] unused_protein_sector = UnusedEnzymeSector(id_list=id_list, ups_mu=[ups_0/smax], ups_0=[ups_0], mol_mass=[unused_protein_info[unused_protein_info.Parameter == 'mol_mass'].loc[3,'Value']]) ``` ### Step 2: Building the model Now we are ready to build the model! We'll need to determine a maximal protein concentration. Following [Alter et al. (2021)](https://journals.asm.org/doi/10.1128/mSystems.00625-20), let's take 0.258 mmol/gcdw/h. As a basis, we'll use the iML1515 model, created by [Monk et al. (2017)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6521705/). NB: for the more advanced users who want to run the sensitivity analysis, please ensure that the `sensitivity` argument is set to `True`. If you are not interested in the sensitivity analysis, you can set `sensitivity` to `False`, which will speed up the computation time. ```python #load the genome-scale information model = read_sbml_model(os.path.join('Models', 'iML1515.xml')) #load the PAM with the genome-scale information and the information about the enzyme sectors pamodel = PAModel(id_or_model=model, p_tot=0.258, active_sector=active_enzyme_sector, translational_sector=translation_enzyme_sector, unused_sector=unused_protein_sector, sensitivity =True ) ``` ### Step 3: run and validate the model results To see if the PAM is working we can run some dummy simulations. Also, the PAMValidator module has functions which allow for easy visualization of the model predictions vs measured experimental data. ```python pamodel.test() ``` This is a simulation with a glucose uptake rate set to 10 mmol/gcdw/h. We can easily change to a different substrate uptake rate, e.g. 5 mmol/gcdw/h by putting that in as an function argument ```python pamodel.test(5) ``` In the PAMValidator object, you can find functions to run simulations over a range of substrate uptake rates. To initiate the PAMValidator, you need to provide the model and a path to an excel file with experimental data. In this example,we'll use the experimental data which can be found in `Data/Ecoli_phenotypes/Ecoli_phenotypes_py_rev.xls`. ```python from PAModelpy.PAMValidator import PAMValidator validator = PAMValidator(pamodel,'Data/Ecoli_phenotypes/Ecoli_phenotypes_py_rev.xls') #model flux rates of biomass formation, acetate, CO2 and O2 vs glucose uptake rate for a range of growth rates validator.validate_range() ``` Alternatively, you can run simulations in a good old-fashioned for-loop. For examples on how to do that look at the jupyter notebooks in the `Figures` directory. ### Step 4: interpreting the results What does the result tell you? What is the predicted metabolic phenotype and how does this relate to the experimental results. Did the model capture overflow metabolism? ### Outlook After this tutorial, you know how to apply PAModelpy to this very well-studied *E.coli* example. But how to address you're specific issue with your specific microbe? In the next example we'll show you how to use the Config() object to give the PAModel the right naming conventions for your specific microbe. Are you more interested in performing modifications to the model, such as deleting or adding enzymes, changing kcats, changing enzymes upper- and lowerbounds? Then have a look at the following jupyter notebook: `Examples/PAModel_example_script.ipynb`. Have fun! ## Example 2: working with the *Escherichia coli* Protein Allocation model (PAM) In Example 1, a detailed step-by-step guide for building a PAM is given. In this example we use methods available in PAModelpy to build a PAM and give you an overview of things you can do with the PAM once it is build. ### 1. Building the PAM using PAModelpy.utils In the example above, you've learned that there are quite some steps required to build a PAM. However, in PAModelpy, there are methods available which help you do this. For these methods, you need a specific way of structuring the information about the enzymes sectors. This makes it easier to automate generation of PAMs. The details are discussed in the [PAM_setup_guide](PAM_setup_guide.md). For this example, we use the methods describes in the [PAM_setup_guide](PAM_setup_guide.md) to build a pam. ```python from PAModelpy.utils import set_up_pam # Define input paths model_path = "Models/iML1515.xml" param_file = "Data/proteinAllocationModel_iML1515_EnzymaticData_new.xlsx" # Build the PAM pam = set_up_pam(pam_info_file=param_file, model=model_path, total_protein=0.258, # Optional: Total protein concentration (g_prot/g_cdw) active_enzymes=True, # Do you want to include an active enzyme sector? translational_enzymes=True, # Do you want to include a translational protein sector? unused_enzymes=True, # Do you want to include an unused enzyme sector? sensitivity=False, # Do you want to perform a sensitivity analysis? adjust_reaction_ids=False) # Does your model ignore suffixes like 'pp' and 'ex'? (ofter the case in core models) ``` ### 2. Modifying the pam Here are some example modifications to the PAM #### Changing kcat values For changing a kcat value, you need to provide the enzyme-reaction-direction mapping. For example, you want to change the kcat relating the forward reaction of `13PPDH2` with the enzyme abundance of `Q46856`: ```python rxn2kcat = {'13PPDH2':{'f':10}} pam.change_kcat_value(enzyme_id = 'Q46856', kcats = rxn2kcat) ``` NB: ensure your kcat values are in the right units (1/s). The change_kcat_value function will convert the kcat values to the model units for you. #### Changing enzyme concentrations ```python enzyme = pam.enzymes.get_by_id('Q46856') enzyme.concentration = 0.1 #mmol/gCDW ``` #### Changing the enzyme sector linear equations If you want to do modifications to the linear relations of a sector, you can use the buiild-in function to do this. This is for example usefull when you are changing carbon source and want to modify the translational sector. The units of the intercept should be in g_protein/g_CDW. The units of the slope*lin_rxn should yield g_protein/g_CDW. ```python translation_sector = pam.sectors.get_by_id('TranslationalProteinSector') pam. change_sector_parameters(sector = translation_sector, slope = 0.1, #in this case: g_p*h/(g_cdw*mmol_glc) intercept=0.1, # g_p/g_cdw lin_rxn_id='EX_glc__D_e', #optional: change the reaction to which the sector is related print_change = True #do you want to see the change? False by default ) ``` The output will look like: `Changing the slope and intercept of the TranslationalProteinSector` `Changing slope from to 10 mg/gcdw/h"` `Changing intercept from to 10 mg/gcdw` #### Changing the total protein content You can change the total protein content available for the enzyme sectors (in grams_protein/g_CDW) ```python pam.change_total_protein_constraint(p_tot=0.3) ``` #### Changing reaction bounds In order to calculate sensitivities, the mathematical structure with which the reaction bounds are defines are altered. It is therefore recommendable to use the build-in functions to change reaction bounds. Besides being more robust, this is also more straightforward. The units are similar to the model units (mmol/gCDW/h). ```python pam.change_reaction_bounds(rxn_id='13PPDH2', lower_bound= 0.1, upper_bound=0.11) ``` #### Toggling sensitivity For some applications you need a sensitivity analysis, for some you don't. As the sensitivity analysis can slow the computation down, it is possible to toggle the sensitivity on and off between simulations: ```python pam.sensitivity = True #sensitivity is on ``` ### 3. Perfoming simulations with the PAM If you are happy with the PAM, you off course want to perform simulations! With the `change_reaction_bounds` function you can change the bounds of for example the biomass production reaction or the substrate uptake rate. Changing the objective can be done in the same way as you are used to with a [cobrapy model](https://cobrapy.readthedocs.io/en/latest/building_model.html#Objective). The only thing you have to do is optimize! ```python solution = pam.optimize() ``` ### 4. Getting the results of the simulations The solution which is returned by the `optimize()` function is the same solution object as returned by a [cobrapy model](https://cobrapy.readthedocs.io/en/latest/simulating.html). This solution object only includes the metabolic reaction fluxes and the objective value, and NOT the enzyme concentration and sensitivities. To access the enzyme concentrations you can use the attribute `enzyme.concentration`. If you enabled the sensitivity analysis, you can access the capacity sensitivity coefficients and enzyme sensitivity coefficients after solving the model. ```python csc = pam.capacity_sensitivity_coefficients #pd.DataFrame with columns: ["rxn_id", "enzyme_id", "constraint", "coefficient"] esc = pam.enzyme_sensitivity_coefficients #pd.DataFrame with columns: ["rxn_id", "enzyme_id", "coefficient"] ``` ### 5. Saving the PAM Saving the PAM is not often required, as all the parameters are stored in Excel format and can thus be easily modified. COBRApy's tools for saving models DO NOT WORK for saving PAMs: in fact the generated models won't be executable. In case you do want to store a PAM for later use, you can store it using pickle, provided the underlying metabolic model is picklable.. ```python import pickle pickle.dump(pam, "path/to/model.pickle") ``` Does pickling not work? Check if one of the reactions or metabolites has LP standard language (such as 'St') as id. ## Example 3: Determining the most sensitive enzymes in a toy model When looking at the flux distribution resulting from our simulations, we do not get any information about which enzymes played an important role in prediciting the specific metabolic phenotype. However, with the right model configurations, we get the sensitivity of the objective function to slight changes in the enzyme availability (enzyme sensitivity coefficients, ESC) as a result from the model simulations. In this example we'll use a toy model to illustrate how these sensitivities can help us explain concepts of protein allocation. ![toy_model_image](assets/toy-model.png)
**Figure 1. Toy model network and parameters** *This toy model represents a schematic overview of a microbial metabolism, with an energy efficient (R1-R2-R4+R5-R6-R7) and an enzyme efficient (R1-R2-R3+R5-R6-R7) pathway. Besides the enzymes catalyzing the reactions (denoted with an 'E') and corresponding catalytic efficiency (kcat), also the relation with the reactions and the enzyme sectors are given. UES: Unused Enzyme Sector, TES: Translational Enzyme Sector, AES: Active Enzyme Sector.*
First, all import statements you'll need in this example: ```python import numpy as np from cobra.io import load_json_model import plotly.express from PAModelpy.EnzymeSectors import ActiveEnzymeSector, TransEnzymeSector, UnusedEnzymeSector from PAModelpy.PAModel import PAModel from PAModelpy.PAMValidator import PAMValidator from PAModelpy.configuration import Config ``` ### Step 1: Build the toy model Obviously, we first have to build the toy model. To make it easy, we have provided the toy model structure in a .json file in the `Models` directory. As the PAModelpy package makes working with real-life data easy, it performs units conversions to some inputs. For example, the kcat value is normally published in per sec, while we need per hour in our calculations. Furthermore, some inputs are scaled in order to decrease the order of magnitude difference between the variables. When we want to use 'dummy' data in a toy model, we need to take this into account. But before we start building the model, we need to be aware of one thing: the PAModel object assumes you want to analyse the *E.coli* iML1515 model by default. How can we make the model aware that we are using another model, and that we thus need other identifiers for substrate uptake rate, growth rate, etc? The Config object helps you with just that! You can use this object to configure all the identifiers you need. Don't forget to pass this object to all the PAModel objects you'll initialize, so all the information is passed on! ```python config = Config() config.BIOMASS_REACTION = 'R7' config.GLUCOSE_EXCHANGE_RXNID = 'R1' config.CO2_EXHANGE_RXNID = 'R8' config.ACETATE_EXCRETION_RXNID = 'R9' ``` With these defaults defined, we can start building our model. ```python nmbr_reactions = 9 # Building Active Enzyme Sector kcat_fwd = [1, 0.5, 1, 1, 0.5 ,0.45, 1.5] kcat_rev = [kcat for kcat in kcat_fwd] rxn2kcat = {} for i in range(nmbr_reactions-3): # all reactions have an enzyme, except excretion reactions rxn_id = f'R{i+1}' # 1e-6 to correct for the unit transformation in the model (meant to make the calculations preciser for different unit dimensions) #dummy molmass rxn2kcat = {**rxn2kcat, **{rxn_id: {f'E{i+1}':{'f': kcat_fwd[i]/(3600*1e-6), 'b': kcat_rev[i]/(3600*1e-6), 'molmass': 1e6, 'protein_reaction_relation': [[f'E{i+1}']]} }}} active_enzyme = ActiveEnzymeSector(rxn2protein = rxn2kcat, configuration=config) # Building Tranlational Protein Sector translation_enzyme = TransEnzymeSector(id_list = ['R7'], tps_mu=[0.01*1e-3], tps_0=[0.01*1e-3], mol_mass= [1], configuration=config) # Building Unused Enzyme Sector unused_enzyme = UnusedEnzymeSector(id_list = ['R1'], ups_mu=[-0.01*1e-3], ups_0=[0.1*1e-3], mol_mass= [1], configuration=config) # Building the toy_pam model = load_json_model('Models/toy_model.json') toy_pam = PAModel(model, name='toy model MCA with enzyme constraints', active_sector=active_enzyme, translational_sector = translation_enzyme, unused_sector = unused_enzyme, p_tot=0.6*1e-3, configuration=config) ``` ### Step 2: Perform the model simulations With the model in place, we can start our analysis. Since we are interested in which enzymes are important in different metabolic phenotypes, we want to run simulations over a range of growth rates. After each simulation we need to retrieve and store the enzyme sensitivity coefficients, so we can study them. We also will save the capacity sensitivity coefficients, which will give us information about which factor is limiting metabolism (substrate or enzyme availability). We directly save all the information we need later for plotting. ```python substrate_axis = list() Ccsc = list() Cesc = list() x_axis_csc = list() mu_list = list() for substrate in list(np.arange(1e-3, 1e-1, 1e-2)): toy_pam.change_reaction_bounds(rxn_id='R1', lower_bound=0, upper_bound=substrate) toy_pam.optimize() if toy_pam.solver.status == 'optimal' and toy_pam.objective.value>0: print('Running simulations with ', substrate, 'mmol/g_cdw/h of substrate going into the system') substrate_axis += [substrate] mu_list += [toy_pam.objective.value] Ccsc_new = list() for csc in ['flux_ub', 'flux_lb', 'enzyme_max', 'enzyme_min', 'proteome', 'sector']: Ccsc_new += toy_pam.capacity_sensitivity_coefficients[toy_pam.capacity_sensitivity_coefficients['constraint'] == csc].coefficient.to_list() Ccsc += [Ccsc_new] Cesc += [toy_pam.enzyme_sensitivity_coefficients.coefficient.to_list()] print('Sum of capacity sensitivity coefficients: \t \t \t \t \t \t \t ', round(sum(Ccsc_new),6)) print('Sum of variable sensitivity coefficients: \t \t \t \t \t \t \t ', round(sum(Cesc[-1]), 6), '\n') for csc in ['flux_ub', 'flux_lb', 'enzyme_max', 'enzyme_min', 'proteome', 'sector']: if csc == 'flux_ub' or csc == 'flux_lb': x_axis_csc += [rid +'_' + csc for rid in toy_pam.capacity_sensitivity_coefficients[toy_pam.capacity_sensitivity_coefficients['constraint'] == csc].rxn_id.to_list()] else: x_axis_csc += [rid +'_' + csc for rid in toy_pam.capacity_sensitivity_coefficients[toy_pam.capacity_sensitivity_coefficients['constraint'] == csc].enzyme_id.to_list()] x_axis_esc = toy_pam.enzyme_sensitivity_coefficients.enzyme_id.to_list() ``` ### Step 3: Plot the enzyme and capacity sensitivty coefficients heatmaps By plotting our results, we learn which individual reactions and enzymes contribute the most to which metabolic phenotype. ```python def print_heatmap(xaxis, matrix, yaxis = None): if yaxis is None: yaxis = list() for i in range(1, n + 1): yaxis += [f'R{i}'] fig = plotly.express.imshow(matrix, aspect="auto", x = xaxis, y = yaxis, labels = dict(x = 'sensitivity coefficients', y='substrate uptake')) fig.show() print_heatmap(x_axis_csc, Ccsc, yaxis=substrate_axis) print_heatmap(x_axis_esc, Cesc, yaxis=substrate_axis) ``` ### Step 4: Interpret the results Compare the toy model network structure :ref:`toy_model_image` with the results from the heatmap. Did you expect these results? Do they make sense? Which mechanisms to explain these observations. If the observations are not inline with you're expectations, you can use the enzyme sensitivities to point to the enzymatic parameters which might need to be adjusted (in this dummy example this makes no sense off course, but in reality this is a very plausible outcome). ### Outlook This tastes like more? In our publication we use the sensitivity analysis to explain metabolic phenotypes and to pinpoint genetic engineering examples. In the `Figures` folder you can find the code we used to generate these results.