Hanzo ML Framework Integration Guide

Overview

Hanzo ML is the official ML framework for the Hanzo ecosystem, based on Candle from Hugging Face with optimizations for edge AI, multimodal workloads, and integration with Hanzo Engine.

Integration with Hanzo Engine

Repository Structure

~/work/hanzo/
├── ml/                 # hanzoai/ml - ML framework (based on HF candle)
├── engine/             # hanzoai/engine - Inference engine (based on mistral-rs)
├── jin/                # Jin multimodal models
└── llm/                # LLM Gateway proxy

Dependencies in Hanzo Engine

Add to ~/work/hanzo/engine/Cargo.toml:

[dependencies]
hanzo-ml = { git = "https://github.com/hanzoai/ml", branch = "main" }
hanzo-nn = { git = "https://github.com/hanzoai/ml", branch = "main" }
hanzo-transformers = { git = "https://github.com/hanzoai/ml", branch = "main" }

Feature Alignment

Both projects support consistent feature flags:

[features]
default = ["metal"]
metal = ["hanzo-ml/metal", "hanzo-nn/metal"]
cuda = ["hanzo-ml/cuda"] 
mkl = ["hanzo-ml/mkl"]
accelerate = ["hanzo-ml/accelerate"]

Model Loading Integration

In Hanzo Engine (mistral-rs fork)

use hanzo_ml_core::{Device, Tensor};
use hanzo_ml_transformers::models::llama::LlamaConfig;

// Load model using Hanzo ML
let device = Device::new_metal(0)?;
let model = LlamaConfig::load(&device, &config_path)?;

// Use with mistral-rs pipeline
let pipeline = Pipeline::new(model, tokenizer)?;

Quantization Support

Both frameworks support:

AFQ (Affine Quantization) - Optimized for Metal/Apple Silicon
GGUF/GGML - Universal quantization format
GPTQ/AWQ - GPU-optimized quantization
In-Situ Quantization (ISQ) - Runtime quantization

Development Workflow

1. Update ML Framework

cd ~/work/hanzo/ml
git fetch upstream
git merge upstream/main  # Merge HF candle updates
cargo test --workspace
git push origin main

2. Update Engine Dependencies

cd ~/work/hanzo/engine
cargo update hanzo-ml hanzo-nn hanzo-transformers
cargo test

3. Test Integration

cd ~/work/hanzo/engine
cargo run --features metal --release -- \
    -i --isq 4 plain -m meta-llama/Llama-3.2-3B-Instruct

Publishing to Crates.io

Hanzo ML Crates

The framework publishes these crates:

hanzo-ml - Core tensor operations
hanzo-nn - Neural network layers
hanzo-transformers - Transformer models
hanzo-datasets - Dataset utilities
hanzo-ml-pyo3 - Python bindings

Release Process

cd ~/work/hanzo/ml
cargo release --workspace minor
git push --tags

Sync Status

Latest Upstream Sync

HF Candle: a2029da3 (Jan 2025)
Features Added: SmolLM3, Qwen3 WASM, Mamba2, PaddleOCR-VL

Engine Integration Status

✅ Metal backend support
✅ AFQ quantization compatibility
✅ SIMD optimizations
✅ Memory introspection
🔄 Jin model integration (in progress)

Performance Optimizations

Apple Silicon (Metal)

Use AFQ4 quantization for best performance
Enable --features "metal accelerate"
Set group size to 64 for balanced speed/accuracy

CUDA

Use GPTQ or AWQ quantization
Enable Flash Attention for long sequences
Use PagedAttention for memory efficiency

CPU

Use GGUF models with appropriate quantization
Enable mkl feature for Intel optimizations
Consider accelerate on Apple platforms

Troubleshooting

Build Issues

# Clean and rebuild
cd ~/work/hanzo/ml
cargo clean
cargo build --workspace

# Check feature alignment
cd ~/work/hanzo/engine  
cargo tree | grep hanzo-ml

Runtime Issues

# Metal validation
cd ~/work/hanzo/engine
cargo run --features metal -- --help

# Check device detection
RUST_LOG=debug cargo run --features metal

Future Roadmap

Model Format Standardization - Universal model interchange
Joint Training Pipeline - Train models for both frameworks
Distributed Inference - Multi-device model serving
WebAssembly Optimization - Browser-based inference
MCP Integration - Model Context Protocol support

Contact

For issues with Hanzo ML integration:

GitHub Issues: hanzoai/ml
Engine Issues: hanzoai/engine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hanzo ML Framework Integration Guide

Overview

Integration with Hanzo Engine

Repository Structure

Dependencies in Hanzo Engine

Feature Alignment

Model Loading Integration

In Hanzo Engine (mistral-rs fork)

Quantization Support

Development Workflow

1. Update ML Framework

2. Update Engine Dependencies

3. Test Integration

Publishing to Crates.io

Hanzo ML Crates

Release Process

Sync Status

Latest Upstream Sync

Engine Integration Status

Performance Optimizations

Apple Silicon (Metal)

CUDA

CPU

Troubleshooting

Build Issues

Runtime Issues

Future Roadmap

Contact

FilesExpand file tree

HANZO_INTEGRATION.md

Latest commit

History

HANZO_INTEGRATION.md

File metadata and controls

Hanzo ML Framework Integration Guide

Overview

Integration with Hanzo Engine

Repository Structure

Dependencies in Hanzo Engine

Feature Alignment

Model Loading Integration

In Hanzo Engine (mistral-rs fork)

Quantization Support

Development Workflow

1. Update ML Framework

2. Update Engine Dependencies

3. Test Integration

Publishing to Crates.io

Hanzo ML Crates

Release Process

Sync Status

Latest Upstream Sync

Engine Integration Status

Performance Optimizations

Apple Silicon (Metal)

CUDA

CPU

Troubleshooting

Build Issues

Runtime Issues

Future Roadmap

Contact