v0.1.0 — Hanzo Engine
Initial release of Hanzo Engine — high-performance cloud inference engine.
Features
- 60+ model architectures (Llama, Qwen, Phi, Gemma, Mistral, etc.)
- CUDA 12+, Metal, and CPU backends
- Paged attention and continuous batching
- Speculative decoding and tensor parallelism
- OpenAI-compatible REST API
- Docker and Kubernetes native deployment
Links
- Documentation: https://engine.hanzo.ai
- Docker: ghcr.io/hanzoai/engine