Standalone workflow to create national scale open-data packages from global open datasets.
Get the latest code by cloning this repository:
git clone git@github.com:nismod/irv-datapkg.git
or
git clone https://github.com/nismod/irv-datapkg.git
Install Python and packages - suggest using micromamba:
micromamba create -f environment.yml
Activate the environment:
micromamba activate datapkg
The data packages are produced using a
snakemake
workflow.
The workflow expects ZENODO_TOKEN
, CDSAPI_KEY
and CDSAPI_URL
to be set as
environment variables - these must be set before running any workflow steps.
If not interacting with Zenodo or the Copernicus Climate Data Store, these can be dummy strings:
echo "placeholder" > ZENODO_TOKEN
echo "https://cds-beta.climate.copernicus.eu/api" > CDSAPI_URL
echo "test" > CDSAPI_KEY
See Climate Data Store API docs and Zenodo API docs for access details.
Export from the file to the environment:
export ZENODO_TOKEN=$(cat ZENODO_TOKEN)
export CDSAPI_KEY=$(cat CDSAPI_KEY)
export CDSAPI_URL=$(cat CDSAPI_URL)
Check what will be run, if we ask for everything produced by the rule all
,
before running the workflow for real:
snakemake --dry-run all
Run the workflow, asking for all
, using 8 cores, with verbose log messages:
snakemake --cores 8 --verbose all
To publish, first create a Zenodo token,
save it and export it as the ZENODO_TOKEN
environment variable.
Upload a single data package:
snakemake --cores 1 zenodo/GBR.deposited
Publish (cannot be undone) either programmatically:
snakemake --cores 1 zenodo/GBR.published
Or after review online, through the Zenodo website (sandbox, live)
To get a quick list of DOIs from the Zenodo package json:
cat zenodo/*.deposition.json | jq '.metadata.prereserve_doi.doi'
To generate records.csv
with details of published packages:
python scripts/published_metadata.py
In case of warnings about GDAL_DATA
not being set, try running:
export GDAL_DATA=$(gdal-config --datadir)
To format the workflow definition Snakefile
:
snakefmt Snakefile
To format the Python helper scripts:
black scripts
These Python libraries may be a useful place to start analysis of the data in the packages produced by this workflow:
snkit
helps clean network datanismod-snail
is designed to help implement infrastructure exposure, damage and risk calculations
The open-gira
repository contains a larger
workflow for global-scale open-data infrastructure risk and resilience analysis.
MIT License, Copyright (c) 2023 Tom Russell and irv-datapkg contributors
This research received funding from the FCDO Climate Compatible Growth Programme. The views expressed here do not necessarily reflect the UK government's official policies.