Feat/abdul/yolo by AbdelrahmanKatkat · Pull Request #27 · hotosm/fAIr-models

AbdelrahmanKatkat · 2026-04-09T14:08:43Z

What does this PR do?

Adds YOLOv8 building-footprint segmentation (yolo_v8_segmentation): instance building-footprint segmentation for very high resolution RGB aerial imagery, built on an Ultralytics YOLOv8-Seg architecture (PAN-FPN neck with detection and mask-prototype heads). Output is GeoJSON polygons in EPSG:4326, one per detected building, with a per-polygon confidence score.

Summary

Field	Value
Task	Building footprint extraction
Input	3-band RGB GeoTIFF chips, VHR (~30–50 cm GSD)
Output	GeoJSON `FeatureCollection` of `Polygon` features in EPSG:4326, each with `class` and `confidence`
Coverage	Global base weights + per-area fine-tuning on user-labelled chips
Use cases	Edge / constrained-infra inference, batch building inventory, AOI-specific fine-tuning
License	Apache-2.0

When to pick this model

Fast, widely used instance segmentation baseline with per-polygon confidence and an ONNX inference path.
Good fit when fine-tuning on a small AOI is expected and polygon outputs are required for mapping workflows.
Inference thresholds (confidence_threshold, iou_threshold) provide practical control over precision vs recall.

Intended use

Direct inference on OpenAerialMap tiles, and optional fine-tuning on small downstream labelled sets (~100–200 chips) when local imagery differs from the base training distribution.

Typical workflow

Bring a TMS URL plus a bounding box, or a directory of georeferenced RGB chips.
Send a request to the inference container, or invoke the inference pipeline programmatically.
Receive building polygons with per-polygon confidence scores, ready to merge into OSM or another vector store.

Where it works best

VHR sat/aerial imagery, ~30–50 cm GSD, sourced from OpenAerialMap.
3-band RGB at the platform’s 256×256 chip contract.
Reasonably separable building instances; confidence + NMS thresholds tune precision/recall.

Where to use caution

Dense urban with many touching buildings: NMS and instance separation can merge or split adjacent buildings.
Regions visually distinct from the base distribution benefit from a per-area fine-tune.
Inputs outside the 3-band RGB envelope (multispectral, SAR, DEM, grayscale) are outside the model design envelope.

How to use

Live inference

A long-running container exposes POST /predict on port 8080.

{
  "model_uri":  "https://huggingface.co/hotosm/yolo/resolve/ff7f436c881a3fa02ce574f9e4cab6ac2f0a16da/yolov8s-seg.onnx",
  "image_uri":  "https://tiles.openaerialmap.org/.../{z}/{x}/{y}",
  "bbox":       [west, south, east, north],
  "zoom":       18,
  "params":     {"confidence_threshold": 0.5, "iou_threshold": 0.3}
}

The server fetches tiles for the requested bbox, runs inference, and returns a GeoJSON FeatureCollection. Each feature has properties.class = 1 and properties.confidence in [0, 1].

Fine-tuning on a local area

Drop a directory of RGB OAM chips and a single labels.geojson of OSM building polygons into the platform, then trigger training_pipeline. The pipeline produces a fine-tuned ONNX and a metrics report.

Inference parameters

Catalog defaults come from the model STAC item.

Parameter	Default	Meaning
`confidence_threshold`	0.5	Minimum instance confidence to keep a predicted building
`iou_threshold`	0.3	IoU threshold used during NMS for instance filtering

Inputs and outputs

Input contract

A directory of georeferenced RGB GeoTIFF chips (.tif / .tiff / .png with .aux.xml sidecars). The platform tile downloader produces this layout automatically for any TMS URL plus a bounding box.

Output contract

A GeoJSON FeatureCollection in EPSG:4326. Each feature:

{
  "type": "Feature",
  "properties": {"class": 1, "confidence": 0.84},
  "geometry":   {"type": "Polygon", "coordinates": [...]}
}

The confidence is the per-instance detection score from the YOLOv8-Seg head after thresholding and NMS.

Compute footprint

Model size

Artifact	Size
Base checkpoint (`.pt`)	23.84 MB
Baseline ONNX	47.34 MB
Total parameters (from ONNX initializers)	~11.78 M

Reference inference benchmark

Standardised CPU-only baseline for capacity planning. Single-threaded ONNX Runtime, cold session, synthetic RGB input. Measured on Intel Core i7-14650HX.

Workload: one OAM chip per forward pass (256×256 chip resized internally to the exported ONNX input 640×640).

Metric	Baseline ONNX
Cold session load	0.134 s
Per-chip forward (640×640)	0.316 s
End-to-end one chip (load once)	~0.45 s

Note: training and dataset chips use a 256×256 contract (training.imgsz=256); the exported baseline ONNX tensor is 640×640, so inference resizes each chip before forward pass.

Estimating larger AOIs

For a bbox that requires N OAM tiles at a given zoom:

total time ≈ session_load + N × per_chip_forward + polygonize_overhead

Examples on this baseline (single-thread, session loaded once):

Tiles in bbox	Forwards	Estimated time
1	1	~0.45 s
9	9	~3.0 s
25	25	~8.0 s
49	49	~15.6 s

Architecture

Component	Description
Backbone	Ultralytics YOLOv8 segmentation (YOLOv8-Seg, small variant)
Neck	PAN-FPN style feature aggregation (Ultralytics implementation)
Heads	Detection head + segmentation mask prototype head
Output	Per-instance masks polygonized to GeoJSON, each with a confidence score
Native chip contract	256×256 RGB (inference resizes to exported ONNX resolution)

Training data and recipe

Base weights (published checkpoint)

Item	Value
Source	HOT fAIr utilities published checkpoint `yolov8s_v2-seg.pt` — YOLOv8s-seg initialized for single-class building footprint segmentation
ONNX export	HuggingFace `yolov8s-seg.onnx` (baseline inference artifact)

Per-area fine-tune (quality Banepa run)

Item	Value
Dataset	`buildings-banepa-instance-segmentation`
Imagery	OpenAerialMap RGB chips (Banepa, Nepal)
Labels	OSM building footprint polygons
Split	120 chips → 96 train / 24 val (`val_ratio=0.2`, `split_seed=42`)

Normalisation

Stage	Behaviour
Training	Ultralytics default preprocessing at `imgsz=256` (chip PNG/TIF → model input)
Inference (ONNX)	RGB bands scaled to [0, 1] by dividing by 255; each chip resized to ONNX input (640×640 for baseline export)

Loss (Ultralytics YOLOv8 segmentation)

Composite loss from the utilities HYPERPARAM_CHANGES recipe:

Component	Weight / role
Box regression	`box = 7.48109`
Classification	`cls = 0.775`
Distribution focal loss (DFL)	`dfl = 1.5`
Instance mask segmentation	Ultralytics default mask branch (enabled for `yolov8*-seg`)

Optimiser and schedule

Parameter	Utilities default	STAC / quality Banepa finetune
optimizer	`auto`	`auto` (Ultralytics selects AdamW, effective lr ≈ 0.002)
lr0	`0.00854`	patched from STAC `learning_rate` when optimizer ≠ `auto`
lrf	`0.01232`	—
momentum	`0.95275`	—
weight_decay	`0.00058`	—
warmup_epochs	`3.82177`	—
warmup_momentum	`0.81423`	—

fAIr-models patches only optimizer and lr0 from STAC; all other hyperparameters remain the utilities recipe.

Regularisation and augmentation

Setting	Value
AMP	`true`
mosaic	`0` (off)
translate / scale / shear	`0`
degrees	`15.75`
flipud / fliplr	`0.5` / `0.255`
HSV jitter	`hsv_h=0.01269`, `hsv_s=0.68143`, `hsv_v=0.27`
erasing	`0`
overlap_mask	`false`
cache	`true`
positive class weight (`pc`)	`3.0` (STAC; applied via `YOLOSegWithPosWeight`)

Batch size and schedule (quality Banepa finetune)

Parameter	Value
epochs	30
batch_size	16
imgsz	256

Evaluation

Object-level metrics use polymetrics (IoU@0.5, Hungarian matching).

Truth: 2720 OSM building polygons (Banepa standard test patch)
Inference defaults: confidence_threshold=0.5, iou_threshold=0.3
Zero-shot: baseline ONNX (yolov8s-seg.onnx)
Fine-tuned: promoted local-model ONNX from quality Banepa finetune (30 epochs, batch 16, pc=3.0)

Banepa, Nepal (standard test patch, 2720 OSM GT polygons)

Metric	Zero-shot (baseline ONNX)	Per-area fine-tuned (quality run)
TP / FP / FN	924 / 861 / 1796	1291 / 898 / 1429
Precision	0.5176	0.5898
Recall	0.3397	0.4746
F1@0.5	0.4102	0.5260
Mean IoU (matched)	0.7262	0.6807
mAP@0.5	0.2092	0.3435
mAP@0.5:0.95	0.0877	0.1219
avg vertices (pred)	24.38	28.53
orthogonality (pred)	0.689	0.730

The fine-tune improves object-level footprint matching on this patch: +11.6 pp F1 vs baseline (0.526 vs 0.410), driven mainly by higher recall (+13.5 pp) with improved precision (+7.2 pp).

Chip-level validation (training signal, not polymetrics)

Metric	Value (quality Banepa finetune)
Mask mAP50 (`best.pt`)	0.624
Mask mAP50-95 (`best.pt`)	0.311

Train/val split (for the local finetune step)

Setting	Value
Strategy	Seeded random split
`val_ratio`	0.2
`split_seed`	42
Banepa sample	120 chips → 96 train / 24 val

Fine-tuning details

The train_model ZenML step expects a directory of OAM RGB chips plus a single GeoJSON of OSM building polygons. Labels are preprocessed into the YOLO training layout; training runs through hot_fair_utilities.training.yolo_v8.train with the HYPERPARAM_CHANGES recipe.

Default fine-tuning budget (STAC catalog): 30 epochs, batch 16, pc=3.0, optimizer auto, learning rate 0.002, imgsz=256, val_ratio=0.2, split_seed=42.

License

Apache-2.0

Citation

Ultralytics YOLOv8 segmentation: https://docs.ultralytics.com/models/yolov8/#instance-segmentation
Independent object-level metric reproduction: polymetrics

…files, and tests - Introduced YOLOv8-v1 and YOLOv8-v2 for building footprint segmentation. - Added ZenML pipelines for training and inference. - Created Dockerfiles for isolated runtime environments. - Implemented comprehensive smoke tests to validate functionality. - Updated .gitignore to include new sample data directories.

…from STAC - Added functions to load model weights and hyperparameters from STAC Item JSON files. - Updated preprocess and training_pipeline functions to utilize loaded hyperparameters. - Enhanced stac-item.json files for both YOLOv8-v1 and YOLOv8-v2 with additional metadata and structure. - Improved documentation for clarity on model configuration and usage.

- Deleted YOLOv8-v1 Dockerfile, pipeline, README, STAC item, and tests to streamline the model repository. - Updated .gitignore to exclude new directories for runs and weights. - Consolidated focus on YOLOv8-v2 for building footprint segmentation.

Feature/yolo

…artifact tracking - Modified the run_preprocessing function to return a list of tuples containing image data and corresponding label data. - Enhanced error handling in train_model to raise an error if the data loader is empty. - Updated training_pipeline to accommodate the new data loader structure.

…into feature/yolo

- Removed unnecessary line breaks and consolidated code for better clarity. - Updated error message formatting for consistency. - Minor adjustments in the test file for improved readability.

refactor(pipeline): streamline code formatting and improve readability

- Introduced a new step to split an existing YOLO dataset into train and validation sets. - Implemented shuffling and validation fraction control via hyperparameters. - Ensured proper directory structure and error handling for dataset integrity. - Updated the training pipeline to include the dataset splitting step.

- Updated the `run_preprocessing` function to return a list of tuples containing image data and corresponding label data for ZenML artifact tracking. - Added error handling to ensure the data loader is not empty before proceeding with model training. - Adjusted the training pipeline to utilize the new data loader structure.

- Added a new function to resolve input directories for local and remote datasets. - Updated the `preprocess` function to return the preprocessed directory path. - Refactored the `split_dataset` function to generate YOLO train/val splits and return split metadata. - Adjusted the smoke test to validate the new preprocessing and dataset splitting workflow. - Added `split_seed` parameter to the configuration for reproducibility.

- Added a missing comma in the metadata dictionary returned by the split_dataset function for improved syntax correctness.

Merge : Master

… into feat/abdul/yolo

kshitijrajsharma

rename yolov8v2 to yolo_v8_segmetation ! as we will not have multiple yolo version anymore
get rid of the bash scripts on tests and move tests to production test cases with pytest as defined in the instructions , check here : https://hotosm.github.io/fAIr-models/contributing/model/#testing and example here : https://github.com/hotosm/fAIr-models/tree/master/models/yolo11n_detection/tests , tests should validate each function defined in pipeline
Separate dockerfile to 3 stages , builder runtime and test : check here ; https://hotosm.github.io/fAIr-models/contributing/model/#dockerfile, with example here : https://github.com/hotosm/fAIr-models/blob/master/models/yolo11n_detection/Dockerfile
Slimdown readme.md to be userfriendly model card rather than development decisions , they can live in PR description
I haven't reviwed the pipeline yet , but the CI will validate the pipeline first and then i will have a look in near future !

…ing and update p_val parameter

…related documentation

…ne, and tests

…rovider info to stac-item.json

codecov · 2026-04-20T09:25:47Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

… into feat/abdul/yolo

… and improve GeoJSON processing

…rained_model_artifact

…eoJSON/JSON files and improved error handling

…rties in label processing

…rror handling for EPSG:4326 normalization

…n and enhance logging in training processes

… evaluation functions; update CI workflow to prevent cancellation of long-running builds

…ter adjustments; update CI workflow to allow cancellation of in-progress runs

…est script with dynamic model URI and increased predict timeout

… for training module

…O model training

…for YOLO segmentation

…o improve performance

…mes in image preparation

…mance and correct is_identity property usage in image preparation

…eturn GeoJSON features. Update prediction logic to utilize the new postprocess function for improved clarity and maintainability.

…n pipeline to streamline the codebase.

… improved code readability.

AbdelrahmanKatkat and others added 14 commits March 2, 2026 00:03

Merge branch 'master' into feature/yolo

c395a79

Merge pull request #22 from hotosm/feature/yolo

db47d07

Feature/yolo

Merge branch 'feature/yolo' of https://github.com/hotosm/fAIr-models …

57d432f

…into feature/yolo

refactor(pipeline): streamline code formatting and improve readability

95b566b

- Removed unnecessary line breaks and consolidated code for better clarity. - Updated error message formatting for consistency. - Minor adjustments in the test file for improved readability.

Merge pull request #24 from hotosm/yolo_ci_fix

8397616

refactor(pipeline): streamline code formatting and improve readability

Merge branch 'yolo_ci_fix' into feat/abdul/yolo

c76c07a

fix(pipeline): correct dataset_yaml formatting in split_dataset function

79735a8

- Added a missing comma in the metadata dictionary returned by the split_dataset function for improved syntax correctness.

AbdelrahmanKatkat requested a review from kshitijrajsharma April 12, 2026 19:31

Merge pull request #32 from hotosm/master

b695f5c

Merge : Master