Feature request
PoC of Metal Flash Attention with Python, C, Rust bindings for non-MLX models on Apple Silicon.
https://github.com/bghira/universal-metal-flash-attention
Motivation
Faster inference on Apple for gguf models
Your contribution
Documentation