Releases: hanzoai/engine
Releases · hanzoai/engine
v1.0.1
What's Changed
Features
- High-performance LLM inference engine
- Support for Qwen3 models (8B/4B variants)
- Native embeddings and reranking
- CUDA support for NVIDIA GPUs
- Metal support for Apple Silicon
- Flash Attention v3 support
Installation
# Download for your platform
curl -L https://github.com/hanzoai/engine/releases/latest/download/hanzoai-$(uname -s | tr "[:upper:]" "[:lower:]")-$(uname -m).tar.gz | tar -xz
sudo mv hanzo-engine /usr/local/bin/hanzoaiFull Changelog: v1.0.0...v1.0.1
v0.8.0
What's Changed
Features
- High-performance LLM inference engine
- Support for Qwen3 models (8B/4B variants)
- Native embeddings and reranking
- CUDA support for NVIDIA GPUs
- Metal support for Apple Silicon
- Flash Attention v3 support
Installation
# Download for your platform
curl -L https://github.com/hanzoai/engine/releases/latest/download/hanzoai-$(uname -s | tr "[:upper:]" "[:lower:]")-$(uname -m).tar.gz | tar -xz
sudo mv hanzo-engine /usr/local/bin/hanzoaiv0.1.0 — Hanzo Engine
Initial release of Hanzo Engine — high-performance cloud inference engine.
Features
- 60+ model architectures (Llama, Qwen, Phi, Gemma, Mistral, etc.)
- CUDA 12+, Metal, and CPU backends
- Paged attention and continuous batching
- Speculative decoding and tensor parallelism
- OpenAI-compatible REST API
- Docker and Kubernetes native deployment
Links
- Documentation: https://engine.hanzo.ai
- Docker: ghcr.io/hanzoai/engine