Olive-ai 0.9.2
New Features:
- Selective Mixed Precision. (#1898)
- Native GPTQ Implementation with support for Selective Mixed Precision. (#1949)
- Blockwise RTN Quantization for ONNX models. (#1899)
- Ability to add custom metadata in ONNX model. (#1900)
- New simplified
olive optimizeCLI command and theolive.quantize()Python API for effortless model optimization with minimal developer input. See CLI usage and Python API docs for more details. (#1996) - New command line
olive run-passprovides advanced users ability to run individual passes. (#1904)
New Integrations
- GPTQModel. (#1999)
- AIMET (#2028). This is a work in progress.
- ONNX model support while targeting OpenVINO. (#2019)
QuarkQuantization: AMD Quark quantization for LLMs. (#2010)VitisGenerateModelLLMfor optimized LLM model generation for Vitis AI Execution Provider. (#2010)
Improvements
- New graph surgeries including
dla transformers,DecomposeRotaryEmbeddingandDecomposeQuickGelu. (#2018, #1972, #2000) - Exposed
WorkflowOutputin Python API and added unified APIs for CLI commands. (#1907) - Refactored Docker system for simplified setup and execution. (#1990)
- ExtractAdapters:
- Added support for DORA and LoHA adapters. (#1611)
- NVMO quantization:
- OnnxPeepholeOptimizer:
- Removed
fuse_transpose_qatandpatch_unsupported_argmax_operator. (#1976)
- Removed
Deprecation
Azure ML will be deprecated in the next release, including:
- Azure ML system
- Azure ML workspace model
- Remote workflow
Recipes Migration
All recipes are being migrated to the olive-recipes repository. New recipes will be added and maintained there going forward.