Skip to content

bgc-mod/MCoM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCoM

The Microbial Community Model

MCoM serves as a platform to model interactions between phototrophic and heterotrophic organisms. It models the impact of phototrophs (P) that release organic carbon and nitrogen compounds (DOC, DON), which are taken up by heterotrophs (H) that release inorganic nitrogen (N) and metabolites (M) that impact the net growth (i.e. mortality) of P. The "interaction network" is specified with scalable matrices. We are using Python for simulation setup, result analysis and plotting, and Julia for the simulation runs. MCoM is developed and tested in a Linux environment and instructions below assume the user is working in a bash terminal.

Installation

Requirements:

  • git
  • python > 3.7
  • Julia >= 10.0.3

Clone repository:

cd <path-to-workspace>
git clone https://github.com/bgc-mod/MCoM.git mcom

Set environment variable MCOM_HOME to the git directory (probably, you want to add this to your bash configuration in $HOME/.bashrc):

export MCOM_HOME=<path-to-workspace>/mcom

Register mcom package as development package:

cd $MCOM_HOME
julia
julia> using Pkg
julia> Pkg.develop(path=ENV["MCOM_HOME"]*"/src/mcom.jl")

Install Python Dependencies (we recommend to install these in a virtual environment - see below):

pip install -r requirements.txt

Quickstart

Run a test run with:

cd $MCOM_HOME
python src/runner.py -c configs/test.toml

This runs the simulation, creates an output directory $MCOM_HOME/output/run_test, and saves results there. If it already exists, a backup of the existing data is made within $MCOM_HOME/output/backups (only the previous run is backed up for each run-ID). In the output directory, you should find different subdirectories:

  • log: contains stdout-log, a copy of the parameter file, and the instantiated community interaction network
  • figs: contains plots of the time course of different system components (see plot option, below)
  • dat: contains the time series of the system components (see options dt_out and save_offset, below)

To use other than the test parameters, duplicate one of the configuration files in the folder $MCOM_HOME/configs (with extension ".toml"). Then edit the parameters in the file for your purposes. At least, you should provide a new run ID (entry runid in the [run] section of the parameter file). For a full reference of the configurable parameters, see section Model and run configuration below. By default, the simulation output is stored in a directory <OUTDIR>/run_<runid>. Here, OUTDIR defaults to $MCOM_HOME/output, but can also be specified via the -o option of runner.py.

To start the simulation, run

cd $MCOM_HOME
python src/runner.py -c <your-new-config-file> -o <OUTPUT_BASEDIR>

Here, the option -o permits to specify a base directory for the output files (replacing the default destination $MCOM_HOME/output).

See below if you want to use MCoM from a python script.

Model and run configuration

MCoM configuration files use the TOML format. See the TOML-files in the directory $MCOM_HOME/configs for examples. The are structured into several sections (the beginning of each section is indicated by the header [<section-name>]). Each section defines a number of parameters:

Parameter section [run]

General options for configuration of the simulation run

  • runid: Name for the run. This defines the sub-directory of the output directory, where data generated by the run is stored. (String)
  • system_seed: Random seed for system generation, i.e., for parameter distributions and network connectivity. (Integer)
  • dynamics_seed: Random seed for stochasticity in dynamics. Currently only used for noise in nitrogen-supply. (Integer)
  • t_end: Simulation end time [days]. (Float)
  • dt: Integration step width [days]. (Float)
  • dt_out: Interval between saved system states [days]. (Float)
  • save_offset: Time, after which saving the state to disc starts [days]. (Float, default: 0.0)
  • recompute: Whether to load simulation results if these exist. (Bool, default: True) This only checks for the existence of the output directory. If the directory exists but the data is missing or corrupted, this will fail.
  • plot: Controls the plotting the julia results (figure files will be written to <OUTDIR>/run_<runid>/figs). Can be given as Boolean (default:false) or as Dict - in this case:
    • plot.ext: determines graphics format (String, default: "svg")
    • plot.logscale: triggers plotting additional figures with logscale y-axis (List of strings, e.g. ["P","H"], default: [])
    • plot.minlog: sets the lower y-axis limit to the given value for logscale figures of the specified components (dict: String → Float, default:{}, example: {P = -4.0, H = -10.0})
  • only_generate_network: Whether to just generate the network and not perform the simulation. (Bool, default: False)
  • verb: Verbosity level (Integer) or list [tstart, tend, level] (Float, Float, Int) defining a temporal interval, where the run should be at given verbosity level. Outside [tstart, tend], verbosity is set to 0.
  • debug: Flag to turn on additional sanity checks at runtime. (Bool, default:false)
  • scan.npoints: Specify the number of points for a scan grid. (Integer > 0)
  • scan.<PARAMETER_NAME>: Specify a scan range for the parameter <PARAMETER_NAME>. For a scalar variables, provide the specification in form of a pair [minval, maxval], for a ranged variable as a pair of ranges [[(minminrange, minmaxrange)], [(maxminrange, maxmaxrange)]]. For the scan, the parameter is varied linearly along scan.npoints points within the specified range. (tuple of range-limits and number of gridpoints, default: None)
  • scan.reseed: Whether to generate the network for each parameter value, each time using a new seed. To make a scan, which only varies the seeds, set this to true and specify a scan parameter with minval==maxval. (Bool, default: false)
  • scan.parallel: Whether the different runs of the scan shall be executed in parallel (Bool, default: False). The number of processes used for parallel computation is determined as minimum(scan.reseed, os.cpu_count()-1)

Parameter section [dims]

Dimensions of the different state components (all integer values).

  • P: Number nP of phytoplankton populations
  • H: Number nH of heterotrophic populations
  • DOC: Number nDOC of different DOC compounds
  • DON: Number nDON of different DON compounds
  • M: Number M of different metabolites

Parameter section [variant]

Model component activation flags, either true or false (might be removed in the future)

  • use_PI_curve: Whether to use PI-curve dependent photosynthesis. If not, the irradiance (see parameter section [environment]) directly determines the photosynthesis rate.
  • use_exudation: Whether phytoplankton species exudate nutrient-rich (high DON fraction) exudation in case of light limitation.
  • surplus_remineralization: Whether heterotrophs should take up DON at maximal rate always and remineralize surplus DON if not required for growth. (Default: true)
  • competing_pairs: Whether to consider a community of competing P-H-pairs. This implies that the interaction matrices are diagonals, i.e., that all state components have the same dimensions (nH == nP == nDOC == nDOC == nM).

Parameter section [network]

Specification of in- and out-degrees of P- and H-nodes in the community network. Each quantity can be specified as an integer or as a range of two integers. The latter induces the degree of the corresponding H- or P-node to be randomly drawn from a uniform distribution including the two given numbers.

  • exPDOC: Number of DOC exudates of a phytoplankton species.
  • exPDON: Number of DON exudates of a phytoplankton species.
  • exHDOC: Number of DOC exudates of a heterotroph species.
  • exHDON: Number of DON exudates of a heterotroph species.
  • upDOCH: Number of DOC compounds, which can be consumed by a heterotroph species.
  • upDONH: Number of DON compounds, which can be consumed by a heterotroph species.
  • prodHM: Number of metabolites produced by a heterotroph species.
  • effsMP: Number of metabolites affecting a phytoplankton species.

The connectivities can as well be specified explicitly, by providinge a list of lists that specifies for each species the indices of the connected nodes (these can be DOC-, DON-, or M-nodes). If the network connectivity is to be defined explicitly, the parameter explicit=true must be declared in this section. (This may change in the future to allow the explicit definition of individual components as for other parameters sections - see below.) Here's an example for the explicit network definition:

[network]
    explicit = true
    exPDOC = [[1,2], [3,4,5], [1,5]]
    ...

This specification of exPDOC means that the phytoplankton species P_1 exudes organic carbon compounds DOC_1 and DOC_2, species P_2 exudes compounds DOC_3, DOC_4, and DOC_5, and P_3 exudes compounds DOC_1 and DOC_5. This definition will only succeed if exactly three pytoplankton species exist (dims.P = 3). Analogously, the explicit lists for the other variables under [network] must contain exactly as many lists as there are species of the involved type (i.e. dims.P for all variables containing a P and dims.H for all variables containing an H).

Parameter section [init]

Initial values of state variables. All initial states are non-negative floating point numbers, which may be specified as a single value, a range defined as a list of two values, or, for vector-valued parameters (P, H, DOC, DON, and M), as an explicit list of the corresponding values. If a vector-valued parameter should be defined explicitly, this has to be declared by providing an additional variable wihtin this parameter section, e.g.

[init]
    explicit = ["P_init", "H_init"]
    P_init = [0.1, 0.2, 0.3]
    ... 

If a non-explicit parameter is defined by a range, its value (or values for vectors) is drawn from a uniform distribution within the specified interval. If a non-explicit vector-valued parameter is defined by a single value, its components are homogeneously initialized with that value. Initial states that should be specified are:

  • t_init: Initial time [days].
  • P_init: Initial phytoplankton population densities [10^6 cells/milliliter]
  • H_init: Initial heterotroph population densities [10^6 cells/milliliter]
  • DOC_init: Initial concentrations of DOC compounds [mM C]
  • DON_init: Initial concentrations of DON compounds [mM N]
  • M_init: Initial presence of metabolites [1/liter]
  • N_init: Initial concentrations of DIN [mM N]

Parameter section [environment]

Parameters describing the environmental conditions. Some expect constant values specific as floating point numbers, and some allow the specification as temporally varying functions (nutrient and irradiance at the moment). The possible specification schemes (value, range, and explicit) for vectors are the same as for parameters in section [init] (see above). However, the only vector valued parameter in this section is d_M. If it is to be provided explicitly, this must be declared by a parameter explicit = ["d_M"] within this section.

Floating point numbers:

  • D: Chemostat dilution rate [1/day].
  • d_M: Decay rate of metabolites [1/day].

Temporally varying quantities:

  • nutrient: Chemostat source (i.e., external) concentration of DIN [mM N]
  • irradiance: Irradiance [µmol Q / m² / s] (if variant.use_PI_curve=false, this directly parametrizes the photosynthesis rate in [mol C / mol Chl / day])

For each of the varying quantities the temporal fluctuations can be determined in the following form of the right hand side of the parameter definition:

  • <numeric value> or "constant(<numeric value>)": Defines the quantity as a constant.
  • "sinus(<amplitude>,<period=365>,<offset=0>)": Defines the quantity as a sinusoidal fluctuation between zero and the amplitude. The argument order is fixed, no argument keywords are allowed.
  • "pulse(<start_offset>,<pulse_duration>,<amplitude>,<period=365>)": Defines the quantity as a series of pulses of given duration and amplitude. The argument order is fixed, no argument keywords are allowed.
  • Any sum of the above. For instance, sinusoidal oscillations of the nutrient concentration between 1 and 2 could be defined as
nutrient = "1 + sinus(1)"

Parameter section [dynamics]

Parameters determining the population dynamics of P and H. All values are expected to be floating point numbers. Similar as described for the section [init], each parameter can be specified by a single value, a range of values, or an explicit definition (to be declared by adding explicit = ["<parname>", ...]). Importantly, this section contains not only scalar or vector parameters but also different types of matrices, of which each comes with its own semantics for randomized initialization:

  • Release matrices (R_PDOC, R_PDON, R_HDOC, R_HDON): These specify the fractional distribution of the total exuded DOM to different compounds. Therefore, they must be normalized along the compound index. The provided value or range specifies the fraction of the total exudation, which is uniformly distributed across all exuded compounds. The remaining part is distributed randomly, where the fractions assigned to individual compounds follow a Dirichlet distribution (uniform distribution on the simplex).

  • Rate matrices (V_DOCH, V_DONH, and P_HM): These specify the fractional distribution of a total rate over different produced or utilized compounds. The provided value or range specifies the total rate, which is randomly distributed across all exuded compounds with individual rates following a Dirichlet distribution (uniform distribution on the simplex).

  • Other matrices (A_MP, h_M, K_DONH, K_DOCH, Y_DOC, Y_DON): These are matrices, whose entries are drawn independently from a uniform distribution (or set to the provided value) analogous to the initialization of vector-valued parameters.

For all matrix types, only entries, which correspond to existing links (see [network] above) are set to non-zero values (others to zero, resp. to an ineffective value in case of half-saturation constants).

If parameter matrices are provided explicitly, they are to be provided as lists of lists. The i-th contained list corresponds to the parameters associated to the links, which connect to the i-th population. The length of the i-th list must equal the number of connected (DOC-, DON-, or M-)nodes (not their total number). The index-parameter-correspondence assumes ascendingly ordered indices.

The following definition of the release matrix R_PDOC would match the example of an explicit definition of the connectivity parameter exPDOC given for the [network]-section above.

[dynamics]
    explicit = ["R_PDOC", ...]
    R_PDOC = [[0.3, 0.7], [0.3, 0.2, 0.5], [0.4, 0.6]]

Here is the list of all parameters, that should be provided

Phytoplankton params

Vectors (nP-dimensional)
  • V_NP: Maximal nitrogen uptake rate [femto moles N / day / cell]
  • K_NP: Half-saturation constants for N uptake [mM N]
  • XP_C: Phytoplankton cell carbon content [femto moles C / cell]
  • d_P: Death rate [1/day]
  • d_P2: Quadratic death rate [1/(day*mM C)]
  • lambda: Metabolite induced death rate [1/day]
  • rCN_P: C:N-ratio [molC / molN]
  • rChlC: Chlorophyll:C-ratio [mol Chl / mol C]
  • q_ex: Maximal fraction of DOC-uptake released to allow DON-exudation in DOC-limited conditions. (dimensionless)
  • rCN_ex: Minimal C:N-ratio of exuded DOM. (molC / molN)
  • I_max: PI-curve parameter "maximal photosynthesis rate" (if no photoinhibition would take effect), PI-curve params are only effective if variant.use_PI_curve = true, units: [mol C / mol Chl / day]
  • I_alpha: Initial slope of the PI-curve [ mol C * m² / mol Chl / µmol Q / 86400]
  • I_beta: Photoinhibition coefficient [ mol C * m² / mol Chl / µmol Q / 86400]
Matrices
  • R_PDOC: Distribution of total exuded DOC to individual compounds. (list of nP lists containing release proportions of DOC compounds for the different phytoplankta, dimensionless, sums of proportions must equal one for each species)
  • R_PDON: Distribution of total exuded DON to individual compounds. (list of nP lists containing release proportions of DON compounds)
  • h_M: Half-saturation constants controlling the effect of metabolites on affected phytoplankton populations. (list of nP lists, [1/liter])
  • A_MP: Rate constants controlling the effect of metabolites on affected phytoplankton populations. (list of nP lists, [1/day])

Heterotroph params

Vectors (nH-dimensional)
  • V_NH: Maximal nitrogen uptake rate [femto moles N / day / cell]
  • K_NH: Half-saturation constants for N uptake [mM N]
  • XH_C: Phytoplankton cell carbon content [femto moles C / cell]
  • d_H: Death rate [1/day]
  • d_H2: Quadratic death rate [1 / day / mM C (biomass)]
  • rCN_H: C:N-ratio [mol C / mol N]
Matrices
  • Y_DOC: Yield constants, i.e. fractions of DOC-uptake assimilated into biomass. (list of nH lists, dimensionless)
  • Y_DON: Yield constants, i.e. fractions of DON-uptake assimilated into biomass. (list of nH lists, dimensionless)
  • V_DOCH: Maximal DOC uptake rates (list of nH lists, [femto moles C / day / cell])
  • V_DONH: Maximal DOC uptake rates (list of nH lists, [femto moles N / day / cell])
  • K_DOCH: Half-saturation constants for DOC-uptake. (list of nH lists, [mM C])
  • K_DONH: Half-saturation constants for DON-uptake. (list of nH lists, [mM N])
  • R_HDOC: Distribution of total exuded DOC to individual compounds. (list of nH lists containing release proportions of DOC compounds for the different heterotrophs, dimensionless, sums of proportions must equal one for each species)
  • R_HDON: Distribution of total exuded DON to individual compounds. (list of nH lists containing release proportions of DON compounds)
  • P_HM: Production effort (Proportion of population growth) of individual metabolites. (list of nH lists, dimensionless, row sums < 1)

Tests

We are using TextTest for integration tests. To run the tests, first install TextTest. It is recommended to set up a dedicated environment for this. I will provide the steps here for virtualenv but this should also work with other tools like conda for instance. You could also skip "Setup Virtual Environment" and directly go on at "Install TextTest" if you are not afraid of altering your python environment.

Setup Virtual Environment

First install virtualenv and prerequisites for the TextTest GUI:

sudo apt install virtualenv
sudo apt install libgirepository-2.0-dev gcc libcairo2-dev pkg-config python3-dev gir1.2-gtk-4.0

Then, create a new environment "ttest" (the following will create the environment in $MCOM_HOME/ttest, but you can choose any other folder), create an alias for its activation and activate it:

virtualenv $HOME/ENV/ttest
alias ttest="source $HOME/ENV/ttest/bin/activate"
ttest

Install TextTest

Update pip and setuptools, and install TextTest in the environment:

pip install -r requirements_dev.txt

Now, you are set to run the tests. For this, change to $MCOM_HOME/tests and run runTests.py:

python $MCOM_HOME/tests/runTests.py

You should see a window (called the "static gui" of TextTest) with tests in a list on its left area. You can mark tests and then click on run to start them. This opens the "dynamic gui", where test progress and success or failures are reported and can be examined. Output of the test runs is written into $MCOM_HOME/output/test_output/.

Code overview and using MCoM programmatically

The recommended workflow is to use the python interface provided by the script src/runner.py. When running MCoM by

python src/runner.py -c <CONFIG_FILE>

the script runner.py calls Julia code (the entry point is main.jl). The Julia part implements random network generation and numerical simulation of the community dynamics. To use MCoM programmatically, the code in runner.py, plotter.py, and aux.py is probably useful. As a prerequisite, the environmental variable PYTHONPATH should be extended to contain $MCOM_HOME/src:

export PYTHONPATH=PYTHONPATH:"$MCOM_HOME/src"

A few useful functions:

  • py.aux.load_config(configfile): Loads the contents of a MCoM config file configfile
  • runner.run_single_or_load(cfg, outdir): Starts Julia and runs a simulation, writing the output to the data directory <outdir>/"run_<runID>", which is returned.
  • py.aux.load_data(datadir): Loads and returns the simulation results from the data directory datadir.
  • py.plotter.plot_data(outdir, data): Plots the simulation results contained in data in the specified output directory outdir
  • py.aux.load_nx_network(network_fn): Loads the community network from the TOML file, which is generated during the MCoM run. The file is called network_pars.toml is written to the log-folder in the corresponding data directory. Returns a networkx.DiGraph representing the MCoM system, which can be manipulated using the networkx-API.
  • insert_explicit_nw(cfg, nw): Modifies the given MCoM configuration cfg (loaded with load_config(...)) to describe the network nw by inserting explicit parameter definitions in the appropriate parameter sections. Returns the modified configuration, which can be used with run_single_or_load(...).

About

The Microbial Community Model

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors