Skip to content

sparkgeo/apex-convert-boundaries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

APEX Boundary Converter

Python License Status

Prepare hierarchical boundary datasets for use with the APEX coverage statistics pipeline to prepare data for the APEX Geospatial Explorer.

apex-convert-boundaries converts and normalises boundary datasets into a consistent format required by the companion project:

👉 apex-build-statistics

The tool standardises boundary datasets and enriches them with metadata required by the downstream raster statistics workflow used by the APEX Geospatial Explorer.


Overview

The APEX coverage pipeline processes global raster datasets (e.g. land cover or soil datasets) and generates statistics aggregated by administrative or thematic regions.

Before statistics can be calculated, boundary datasets must be:

  • standardised
  • cleaned
  • enriched with hierarchical metadata
  • exported into a format compatible with downstream tools

This repository provides the tooling required to perform that preparation step.

Typical datasets include:

  • administrative boundaries
  • statistical regions
  • conservation areas

The prepared datasets are then consumed by apex-build-statistics to generate raster coverage summaries.


APEX Data Processing Architecture

The APEX workflow consists of two complementary repositories:

  1. apex-convert-boundaries
    Prepares hierarchical boundary datasets.

  2. apex-build-statistics
    Generates raster coverage statistics using those prepared boundaries.

Together they form the complete geospatial processing pipeline to prepare statistics data for use in the APEX Geospatial Explorer.

flowchart LR

A[Source Boundary Datasets<br>GAUL / NUTS / Natura2000]
--> B[apex-convert-boundaries<br>Standardise and enrich boundaries]

B --> C[APEX Boundary Dataset<br>FlatGeobuf + hierarchical metadata]

C --> D[apex-build-statistics<br>Raster summarisation engine]

E[Raster Coverage Datasets<br>WorldCover / WorldSoils / others]
--> D

D --> F[Coverage Statistics Outputs<br>Area-weighted summaries<br>per region and coverage]
Loading

Step 1 — Boundary Preparation (this repository)

apex-convert-boundaries converts raw boundary datasets into a standardised format required by the APEX processing pipeline.

Key tasks include:

  • geometry cleaning
  • CRS standardisation
  • hierarchy generation
  • attribute normalisation
  • area calculation

Outputs include:

  • FlatGeobuf boundary datasets
  • hierarchical metadata fields
  • region identifiers required by downstream tools

These outputs are designed to be consumed directly by apex-build-statistics.


Features

  • Converts multiple boundary datasets into a standardised APEX format
  • Adds hierarchical metadata required for downstream processing
  • Ensures consistent geometry and CRS handling
  • Produces outputs suitable for the APEX statistics pipeline
  • Command-line interface with progress output
  • Easily extendable dataset registry

Installation

Requirements

  • Python 3.9+
  • GDAL / OGR
  • GeoPandas
  • Shapely

Install from Source

git clone https://github.com/sparkgeo/apex-convert-boundaries.git
cd apex-convert-boundaries
pip install -e .

For development dependencies:

pip install -e ".[dev]"

Quick Start

List Available Boundary Datasets

apex_boundaries datasets

Example output:

Available datasets:
- gaul
- nuts
- natura2000

Process a Dataset

apex_boundaries process <dataset>

Example:

apex_boundaries process nuts-boundary-2024-10M

Processing progress is written to stderr for logging, while the final output path is written to stdout so it can be used in pipelines.

Exit codes:

Code Meaning
0 Success
-1 Failure

Supported Boundary Datasets

Current implementations include:

Dataset Description
gaul FAO Global Administrative Unit Layers
nuts EU NUTS statistical regions
natura2000 Natura 2000 conservation areas

Additional datasets can be added easily.


Output

Processed datasets are exported as standardised boundary files containing metadata required by the APEX statistics pipeline.

These datasets include:

  • hierarchical identifiers
  • region names
  • parent / child relationships
  • polygon geometries
  • total area metadata

The resulting datasets can be passed directly to apex-build-statistics for raster coverage analysis.


Repository Structure

src/apex_boundaries/
│
├── cli.py
├── process.py
├── models.py
├── spinner.py
├── exceptions.py
└── boundaries/
    ├── gaul.py
    ├── nuts.py
    └── natura2000.py

Adding a New Dataset

Boundary datasets are implemented as classes in:

src/apex_boundaries/boundaries/

Example structure:

class ExampleBoundaryDataset:
    registry_key = "example"

    source_url = "..."

    def process(self):
        ...

Once implemented and registered, the dataset becomes automatically available through the CLI.


Development

Run formatting and linting:

pre-commit run --all-files

Install development dependencies:

pip install -e ".[dev]"

Contributing

Contributions are welcome.

Typical workflow:

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Add tests or documentation
  5. Submit a pull request

License

This project is licensed under the terms of the license included in the repository.


Related Projects

  • apex-build-statistics
    Generates area-weighted statistics from raster datasets using the prepared boundary datasets produced by this tool.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages