π Get in touch: This repository is actively being developed. If you're interested in using or contributing to this work, please open an issue or contact me at alison.peard@ouce.ox.ac.uk. Always happy to discuss!
This repository contains a snakemake workflow to generate spatially coherent climate multi-hazard event sets using extreme value theory and generative adversarial networks. The workflow is modular to facilitate new applications. It depends on a Pytorch implementation of StyleGAN2-ADA with differentiable augmentation that provides stable training on small (~100 sample) datasets.
The workflow is described in this manuscript and the rest of this README outlines basic usage.
The git branches are organised as follows:
main: stable versiondevelopment:- minor changes and bug fixes are added here first
- feature branches are merged into here
- periodically merge into
mainwhen stable
wip/<newfeature>: branches for new features with potentially breaking changes
Use tags to mark successful StyleGAN training runs, making it easy to return to working versions after breaking changes.
Tag a successful run:
git tag -a success-<yyyy-mm-dd>In the editor, document what worked and what didn't:
Successful run: feature 1, feature 2
Results:
- result 1
- result 2
- ...
Push the tag:
git push origin success-<yyyy-mm-dd>View tag details:
git show success-<yyyy-mm-dd>To find all checkpoints:
git tag -l "successs-*"- works with OUCE ERA5 reanalysis data β
- working with generalised Pareto marginal distributions β
- soft rescaling as a function of sample size (needs to be changed) β
- rescaling as a function of reduced variate return levels
- using heavy-tailed latents to match reduced variate tails better
- implement WeibβXIMIS marginal fitting for more reliable wind speeds
- switch to Reiss and Thomas (2007) tail dependence estimation
While the snakemake workflow can be run anywhere, StyleGAN2-ADA requires a CUDA-enabled NVIDIA GPU to train the GANs in a reasonable time. StyleGAN2-ADA has not been officially maintained, so it only works with CUDA versions up to 11.1. The recommended GPU is a NVIDIA V100. This workflow has been tested on the Oxford University Centre for the Environment's (OUCE) linux cluster with 1080Ti GPUs and on the University of Oxford's ARC cluster with V100 and RTX GPUs. See the official Pytorch implementation for more information.
Because of the strict GPU requirements, it may be necessary to spread rules across different machines. The Snakefile and profiles have been hardcoded to work with OUCE and ARC computing clusters, but a new profile can be added for other machines, and the main Snakefile (in workflow/Snakefile) modified appropriately. See the profiles/ directory for existing profiles.
Currently, the OUCE GPUs are not broken, so, while data processing is still done on the OUCE cluster, StyleGAN2 training is done on ARC.
To get started on a new machine, clone this repo and set up a micromamba environment with snakemake:
micromamba create -c conda-forge -c bioconda -n snakemake snakemake conda=24.7.1 -y
micromamba activate snakemake
# conda config --set-channel_priority strict Β # (or change in .condarc file)
python -m pip install snakemake-executor-plugin-slurm #Β snakemake >= 9.0.0, if using SLURM
# set up the required conda environments
snakemake --profile profiles/cluster --conda-create-envs-only
snakemake --profile profiles/slurm --conda-create-envs-only
# dry run to view the workflow for current config
snakemake --profile profiles/local -nThis workflow is only for generating event sets. To keep it tidy, downstream analysis should be done externally with code in a projects/<project>/analysis directory. This includes:
- Generating a
params.ncfile for the project - All extra analysis (in
<project>/analysis)
For Apple Silicon, the R package r-extremes is not available on the conda osx-arm64 subdirectory, so installation must be manually set to the osx-64 subdirectory. If running a rule that will install the R environment for the first time, prefix the command with CONDA_SUBDIR=osx-64, e.g.,
CONDA_SUBDIR=osx-64 snakemake --profile profiles/local/ process_all_data --use-conda --cores 2When running for the first time, login nodes are extremely slow for creating conda environments. It's best to create the environments on an interactive compute node first:
srun -p Short --pty /bin/bash
micromamba activate snakemake
snakemake --profile profiles/cluster/ --conda-create-envs-onlyAfter that, you need to run snakemake from the login node (as per current SoGE cluster configuration). To do this, it's best to open a screen session so that the job continues running if the connection drops:
screen -S snakemake
micromamba activate snakemake
cd path/to/hazGAN2
snakemake --profile profiles/slurm/ my_rule
# Ctrl+A then D to detach from screen session
# screen -r snakemake Β # to reattachHow jobs are sent to SLURM is defined in the config.yaml file in each profile (local,cluster,arc,slurm). You can modify any of the profiles or make a new one to suit your device. To run the rule on an interactive compute node use the following command:
snakemake --profile profiles/cluster/ my_ruleor to send the job to the SLURM scheduler:
snakemake --profile profiles/slurm/ my_ruleThe ARC cluster is the opposite to OUCE in that jobs must be submitted from a compute node, not the login node. To do this, use the arc-submit.sh script in the main directory. Modify this as suited. For now, ARC is only used for StyleGAN training.
To create a new project you need to make the following changes:
config/config.yaml: changeprojectvalueconfig/projects/: add a YAML file named{myproject}.yamlwith the same structure as existing project YAMLsresources/params/: useresources/grids/era5.ncto make{myproject}.ncinresources/params/with any spatial parameters for variable construction from raw ERA5 variables (see other param files for examples).
For each project, new Python and R functions can be added to the src/ directory. See projects/poweruk_winter for an example.
If you don't have access to the SoGE filestore, you should set up the input data structure as follows:
input/
βββ {variable_long_name}/
Β Β βββ nc/
Β Β Β Β βββ {variable_long_name}_{year}.nc
and the output (results) folder has the following structure:
:
βββ results/
β βββ .gitignore
β βββ {projectname}/
β Β Β βββ processing/
β Β Β βββ training/
β Β Β βββ generated/
β Β Β βββ analysis/
: