Skip to content

Releases: hanzoai/engine

v1.0.1

07 May 16:29

Choose a tag to compare

What's Changed

Features

  • High-performance LLM inference engine
  • Support for Qwen3 models (8B/4B variants)
  • Native embeddings and reranking
  • CUDA support for NVIDIA GPUs
  • Metal support for Apple Silicon
  • Flash Attention v3 support

Installation

# Download for your platform
curl -L https://github.com/hanzoai/engine/releases/latest/download/hanzoai-$(uname -s | tr "[:upper:]" "[:lower:]")-$(uname -m).tar.gz | tar -xz
sudo mv hanzo-engine /usr/local/bin/hanzoai

Full Changelog: v1.0.0...v1.0.1

v0.8.0

01 Apr 03:53

Choose a tag to compare

What's Changed

Features

  • High-performance LLM inference engine
  • Support for Qwen3 models (8B/4B variants)
  • Native embeddings and reranking
  • CUDA support for NVIDIA GPUs
  • Metal support for Apple Silicon
  • Flash Attention v3 support

Installation

# Download for your platform
curl -L https://github.com/hanzoai/engine/releases/latest/download/hanzoai-$(uname -s | tr "[:upper:]" "[:lower:]")-$(uname -m).tar.gz | tar -xz
sudo mv hanzo-engine /usr/local/bin/hanzoai

v0.1.0 — Hanzo Engine

25 Feb 08:11

Choose a tag to compare

Initial release of Hanzo Engine — high-performance cloud inference engine.

Features

  • 60+ model architectures (Llama, Qwen, Phi, Gemma, Mistral, etc.)
  • CUDA 12+, Metal, and CPU backends
  • Paged attention and continuous batching
  • Speculative decoding and tensor parallelism
  • OpenAI-compatible REST API
  • Docker and Kubernetes native deployment

Links