Skip to content

Feat/abdul/yolo#27

Open
AbdelrahmanKatkat wants to merge 74 commits into
masterfrom
feat/abdul/yolo
Open

Feat/abdul/yolo#27
AbdelrahmanKatkat wants to merge 74 commits into
masterfrom
feat/abdul/yolo

Conversation

@AbdelrahmanKatkat

@AbdelrahmanKatkat AbdelrahmanKatkat commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds YOLOv8 building-footprint segmentation (yolo_v8_segmentation): instance building-footprint segmentation for very high resolution RGB aerial imagery, built on an Ultralytics YOLOv8-Seg architecture (PAN-FPN neck with detection and mask-prototype heads). Output is GeoJSON polygons in EPSG:4326, one per detected building, with a per-polygon confidence score.


Summary

Field Value
Task Building footprint extraction
Input 3-band RGB GeoTIFF chips, VHR (~30–50 cm GSD)
Output GeoJSON FeatureCollection of Polygon features in EPSG:4326, each with class and confidence
Coverage Global base weights + per-area fine-tuning on user-labelled chips
Use cases Edge / constrained-infra inference, batch building inventory, AOI-specific fine-tuning
License Apache-2.0

When to pick this model

  • Fast, widely used instance segmentation baseline with per-polygon confidence and an ONNX inference path.
  • Good fit when fine-tuning on a small AOI is expected and polygon outputs are required for mapping workflows.
  • Inference thresholds (confidence_threshold, iou_threshold) provide practical control over precision vs recall.

Intended use

Direct inference on OpenAerialMap tiles, and optional fine-tuning on small downstream labelled sets (~100–200 chips) when local imagery differs from the base training distribution.

Typical workflow

  1. Bring a TMS URL plus a bounding box, or a directory of georeferenced RGB chips.
  2. Send a request to the inference container, or invoke the inference pipeline programmatically.
  3. Receive building polygons with per-polygon confidence scores, ready to merge into OSM or another vector store.

Where it works best

  • VHR sat/aerial imagery, ~30–50 cm GSD, sourced from OpenAerialMap.
  • 3-band RGB at the platform’s 256×256 chip contract.
  • Reasonably separable building instances; confidence + NMS thresholds tune precision/recall.

Where to use caution

  • Dense urban with many touching buildings: NMS and instance separation can merge or split adjacent buildings.
  • Regions visually distinct from the base distribution benefit from a per-area fine-tune.
  • Inputs outside the 3-band RGB envelope (multispectral, SAR, DEM, grayscale) are outside the model design envelope.

How to use

Live inference

A long-running container exposes POST /predict on port 8080.

{
  "model_uri":  "https://huggingface.co/hotosm/yolo/resolve/ff7f436c881a3fa02ce574f9e4cab6ac2f0a16da/yolov8s-seg.onnx",
  "image_uri":  "https://tiles.openaerialmap.org/.../{z}/{x}/{y}",
  "bbox":       [west, south, east, north],
  "zoom":       18,
  "params":     {"confidence_threshold": 0.5, "iou_threshold": 0.3}
}

The server fetches tiles for the requested bbox, runs inference, and returns a GeoJSON FeatureCollection. Each feature has properties.class = 1 and properties.confidence in [0, 1].

Fine-tuning on a local area

Drop a directory of RGB OAM chips and a single labels.geojson of OSM building polygons into the platform, then trigger training_pipeline. The pipeline produces a fine-tuned ONNX and a metrics report.


Inference parameters

Catalog defaults come from the model STAC item.

Parameter Default Meaning
confidence_threshold 0.5 Minimum instance confidence to keep a predicted building
iou_threshold 0.3 IoU threshold used during NMS for instance filtering

Inputs and outputs

Input contract

A directory of georeferenced RGB GeoTIFF chips (.tif / .tiff / .png with .aux.xml sidecars). The platform tile downloader produces this layout automatically for any TMS URL plus a bounding box.

Output contract

A GeoJSON FeatureCollection in EPSG:4326. Each feature:

{
  "type": "Feature",
  "properties": {"class": 1, "confidence": 0.84},
  "geometry":   {"type": "Polygon", "coordinates": [...]}
}

The confidence is the per-instance detection score from the YOLOv8-Seg head after thresholding and NMS.


Compute footprint

Model size

Artifact Size
Base checkpoint (.pt) 23.84 MB
Baseline ONNX 47.34 MB
Total parameters (from ONNX initializers) ~11.78 M

Reference inference benchmark

Standardised CPU-only baseline for capacity planning. Single-threaded ONNX Runtime, cold session, synthetic RGB input. Measured on Intel Core i7-14650HX.

Workload: one OAM chip per forward pass (256×256 chip resized internally to the exported ONNX input 640×640).

Metric Baseline ONNX
Cold session load 0.134 s
Per-chip forward (640×640) 0.316 s
End-to-end one chip (load once) ~0.45 s

Note: training and dataset chips use a 256×256 contract (training.imgsz=256); the exported baseline ONNX tensor is 640×640, so inference resizes each chip before forward pass.

Estimating larger AOIs

For a bbox that requires N OAM tiles at a given zoom:

total time ≈ session_load + N × per_chip_forward + polygonize_overhead

Examples on this baseline (single-thread, session loaded once):

Tiles in bbox Forwards Estimated time
1 1 ~0.45 s
9 9 ~3.0 s
25 25 ~8.0 s
49 49 ~15.6 s

Architecture

Component Description
Backbone Ultralytics YOLOv8 segmentation (YOLOv8-Seg, small variant)
Neck PAN-FPN style feature aggregation (Ultralytics implementation)
Heads Detection head + segmentation mask prototype head
Output Per-instance masks polygonized to GeoJSON, each with a confidence score
Native chip contract 256×256 RGB (inference resizes to exported ONNX resolution)

Training data and recipe

Base weights (published checkpoint)

Item Value
Source HOT fAIr utilities published checkpoint yolov8s_v2-seg.pt — YOLOv8s-seg initialized for single-class building footprint segmentation
ONNX export HuggingFace yolov8s-seg.onnx (baseline inference artifact)

Per-area fine-tune (quality Banepa run)

Item Value
Dataset buildings-banepa-instance-segmentation
Imagery OpenAerialMap RGB chips (Banepa, Nepal)
Labels OSM building footprint polygons
Split 120 chips → 96 train / 24 val (val_ratio=0.2, split_seed=42)

Normalisation

Stage Behaviour
Training Ultralytics default preprocessing at imgsz=256 (chip PNG/TIF → model input)
Inference (ONNX) RGB bands scaled to [0, 1] by dividing by 255; each chip resized to ONNX input (640×640 for baseline export)

Loss (Ultralytics YOLOv8 segmentation)

Composite loss from the utilities HYPERPARAM_CHANGES recipe:

Component Weight / role
Box regression box = 7.48109
Classification cls = 0.775
Distribution focal loss (DFL) dfl = 1.5
Instance mask segmentation Ultralytics default mask branch (enabled for yolov8*-seg)

Optimiser and schedule

Parameter Utilities default STAC / quality Banepa finetune
optimizer auto auto (Ultralytics selects AdamW, effective lr ≈ 0.002)
lr0 0.00854 patched from STAC learning_rate when optimizer ≠ auto
lrf 0.01232
momentum 0.95275
weight_decay 0.00058
warmup_epochs 3.82177
warmup_momentum 0.81423

fAIr-models patches only optimizer and lr0 from STAC; all other hyperparameters remain the utilities recipe.

Regularisation and augmentation

Setting Value
AMP true
mosaic 0 (off)
translate / scale / shear 0
degrees 15.75
flipud / fliplr 0.5 / 0.255
HSV jitter hsv_h=0.01269, hsv_s=0.68143, hsv_v=0.27
erasing 0
overlap_mask false
cache true
positive class weight (pc) 3.0 (STAC; applied via YOLOSegWithPosWeight)

Batch size and schedule (quality Banepa finetune)

Parameter Value
epochs 30
batch_size 16
imgsz 256

Evaluation

Object-level metrics use polymetrics (IoU@0.5, Hungarian matching).

  • Truth: 2720 OSM building polygons (Banepa standard test patch)
  • Inference defaults: confidence_threshold=0.5, iou_threshold=0.3
  • Zero-shot: baseline ONNX (yolov8s-seg.onnx)
  • Fine-tuned: promoted local-model ONNX from quality Banepa finetune (30 epochs, batch 16, pc=3.0)

Banepa, Nepal (standard test patch, 2720 OSM GT polygons)

Metric Zero-shot (baseline ONNX) Per-area fine-tuned (quality run)
TP / FP / FN 924 / 861 / 1796 1291 / 898 / 1429
Precision 0.5176 0.5898
Recall 0.3397 0.4746
F1@0.5 0.4102 0.5260
Mean IoU (matched) 0.7262 0.6807
mAP@0.5 0.2092 0.3435
mAP@0.5:0.95 0.0877 0.1219
avg vertices (pred) 24.38 28.53
orthogonality (pred) 0.689 0.730

The fine-tune improves object-level footprint matching on this patch: +11.6 pp F1 vs baseline (0.526 vs 0.410), driven mainly by higher recall (+13.5 pp) with improved precision (+7.2 pp).

Chip-level validation (training signal, not polymetrics)

Metric Value (quality Banepa finetune)
Mask mAP50 (best.pt) 0.624
Mask mAP50-95 (best.pt) 0.311

Train/val split (for the local finetune step)

Setting Value
Strategy Seeded random split
val_ratio 0.2
split_seed 42
Banepa sample 120 chips → 96 train / 24 val

Fine-tuning details

The train_model ZenML step expects a directory of OAM RGB chips plus a single GeoJSON of OSM building polygons. Labels are preprocessed into the YOLO training layout; training runs through hot_fair_utilities.training.yolo_v8.train with the HYPERPARAM_CHANGES recipe.

Default fine-tuning budget (STAC catalog): 30 epochs, batch 16, pc=3.0, optimizer auto, learning rate 0.002, imgsz=256, val_ratio=0.2, split_seed=42.


License

Apache-2.0


Citation

AbdelrahmanKatkat and others added 14 commits March 2, 2026 00:03
…files, and tests

- Introduced YOLOv8-v1 and YOLOv8-v2 for building footprint segmentation.
- Added ZenML pipelines for training and inference.
- Created Dockerfiles for isolated runtime environments.
- Implemented comprehensive smoke tests to validate functionality.
- Updated .gitignore to include new sample data directories.
…from STAC

- Added functions to load model weights and hyperparameters from STAC Item JSON files.
- Updated preprocess and training_pipeline functions to utilize loaded hyperparameters.
- Enhanced stac-item.json files for both YOLOv8-v1 and YOLOv8-v2 with additional metadata and structure.
- Improved documentation for clarity on model configuration and usage.
- Deleted YOLOv8-v1 Dockerfile, pipeline, README, STAC item, and tests to streamline the model repository.
- Updated .gitignore to exclude new directories for runs and weights.
- Consolidated focus on YOLOv8-v2 for building footprint segmentation.
…artifact tracking

- Modified the run_preprocessing function to return a list of tuples containing image data and corresponding label data.
- Enhanced error handling in train_model to raise an error if the data loader is empty.
- Updated training_pipeline to accommodate the new data loader structure.
- Removed unnecessary line breaks and consolidated code for better clarity.
- Updated error message formatting for consistency.
- Minor adjustments in the test file for improved readability.
refactor(pipeline): streamline code formatting and improve readability
- Introduced a new step to split an existing YOLO dataset into train and validation sets.
- Implemented shuffling and validation fraction control via hyperparameters.
- Ensured proper directory structure and error handling for dataset integrity.
- Updated the training pipeline to include the dataset splitting step.
- Updated the `run_preprocessing` function to return a list of tuples containing image data and corresponding label data for ZenML artifact tracking.
- Added error handling to ensure the data loader is not empty before proceeding with model training.
- Adjusted the training pipeline to utilize the new data loader structure.
- Added a new function to resolve input directories for local and remote datasets.
- Updated the `preprocess` function to return the preprocessed directory path.
- Refactored the `split_dataset` function to generate YOLO train/val splits and return split metadata.
- Adjusted the smoke test to validate the new preprocessing and dataset splitting workflow.
- Added `split_seed` parameter to the configuration for reproducibility.
- Added a missing comma in the metadata dictionary returned by the split_dataset function for improved syntax correctness.
Comment thread models/yolo_v8_v2/tests/inside_container_smoke_test.py Outdated
Comment thread models/yolo_v8_v2/Dockerfile Outdated
Comment thread models/yolo_v8_v2/README.md Outdated

@kshitijrajsharma kshitijrajsharma left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread models/yolo_v8_v2/pipeline.py Outdated
Comment thread models/yolo_v8_v2/pipeline.py Outdated
@codecov

codecov Bot commented Apr 20, 2026

Copy link
Copy Markdown

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

… evaluation functions; update CI workflow to prevent cancellation of long-running builds
…ter adjustments; update CI workflow to allow cancellation of in-progress runs
…est script with dynamic model URI and increased predict timeout
…mance and correct is_identity property usage in image preparation
…eturn GeoJSON features. Update prediction logic to utilize the new postprocess function for improved clarity and maintainability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants