Prepare hierarchical boundary datasets for use with the APEX coverage statistics pipeline to prepare data for the APEX Geospatial Explorer.
apex-convert-boundaries converts and normalises boundary datasets into a consistent format required by the companion project:
The tool standardises boundary datasets and enriches them with metadata required by the downstream raster statistics workflow used by the APEX Geospatial Explorer.
The APEX coverage pipeline processes global raster datasets (e.g. land cover or soil datasets) and generates statistics aggregated by administrative or thematic regions.
Before statistics can be calculated, boundary datasets must be:
- standardised
- cleaned
- enriched with hierarchical metadata
- exported into a format compatible with downstream tools
This repository provides the tooling required to perform that preparation step.
Typical datasets include:
- administrative boundaries
- statistical regions
- conservation areas
The prepared datasets are then consumed by apex-build-statistics to generate raster coverage summaries.
The APEX workflow consists of two complementary repositories:
-
apex-convert-boundaries
Prepares hierarchical boundary datasets. -
apex-build-statistics
Generates raster coverage statistics using those prepared boundaries.
Together they form the complete geospatial processing pipeline to prepare statistics data for use in the APEX Geospatial Explorer.
flowchart LR
A[Source Boundary Datasets<br>GAUL / NUTS / Natura2000]
--> B[apex-convert-boundaries<br>Standardise and enrich boundaries]
B --> C[APEX Boundary Dataset<br>FlatGeobuf + hierarchical metadata]
C --> D[apex-build-statistics<br>Raster summarisation engine]
E[Raster Coverage Datasets<br>WorldCover / WorldSoils / others]
--> D
D --> F[Coverage Statistics Outputs<br>Area-weighted summaries<br>per region and coverage]
apex-convert-boundaries converts raw boundary datasets into a standardised format required by the APEX processing pipeline.
Key tasks include:
- geometry cleaning
- CRS standardisation
- hierarchy generation
- attribute normalisation
- area calculation
Outputs include:
- FlatGeobuf boundary datasets
- hierarchical metadata fields
- region identifiers required by downstream tools
These outputs are designed to be consumed directly by apex-build-statistics.
- Converts multiple boundary datasets into a standardised APEX format
- Adds hierarchical metadata required for downstream processing
- Ensures consistent geometry and CRS handling
- Produces outputs suitable for the APEX statistics pipeline
- Command-line interface with progress output
- Easily extendable dataset registry
- Python 3.9+
- GDAL / OGR
- GeoPandas
- Shapely
git clone https://github.com/sparkgeo/apex-convert-boundaries.git
cd apex-convert-boundaries
pip install -e .For development dependencies:
pip install -e ".[dev]"apex_boundaries datasetsExample output:
Available datasets:
- gaul
- nuts
- natura2000
apex_boundaries process <dataset>Example:
apex_boundaries process nuts-boundary-2024-10MProcessing progress is written to stderr for logging, while the final output path is written to stdout so it can be used in pipelines.
Exit codes:
| Code | Meaning |
|---|---|
0 |
Success |
-1 |
Failure |
Current implementations include:
| Dataset | Description |
|---|---|
gaul |
FAO Global Administrative Unit Layers |
nuts |
EU NUTS statistical regions |
natura2000 |
Natura 2000 conservation areas |
Additional datasets can be added easily.
Processed datasets are exported as standardised boundary files containing metadata required by the APEX statistics pipeline.
These datasets include:
- hierarchical identifiers
- region names
- parent / child relationships
- polygon geometries
- total area metadata
The resulting datasets can be passed directly to apex-build-statistics for raster coverage analysis.
src/apex_boundaries/
│
├── cli.py
├── process.py
├── models.py
├── spinner.py
├── exceptions.py
└── boundaries/
├── gaul.py
├── nuts.py
└── natura2000.py
Boundary datasets are implemented as classes in:
src/apex_boundaries/boundaries/
Example structure:
class ExampleBoundaryDataset:
registry_key = "example"
source_url = "..."
def process(self):
...Once implemented and registered, the dataset becomes automatically available through the CLI.
Run formatting and linting:
pre-commit run --all-filesInstall development dependencies:
pip install -e ".[dev]"Contributions are welcome.
Typical workflow:
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests or documentation
- Submit a pull request
This project is licensed under the terms of the license included in the repository.
- apex-build-statistics
Generates area-weighted statistics from raster datasets using the prepared boundary datasets produced by this tool.