A fully functional iOS app for quantizing Hugging Face AI models directly on your device. Built with SwiftUI and featuring real ML quantization capabilities.
- Actual Quantization: Converts models to GGUF format with Q2_K through FP32 quantization types
- Hugging Face Integration: Search and download models directly from Hugging Face Hub
- Architecture Support: Llama, Mistral, Qwen2, Gemma, Phi, Falcon, GPT-2, BERT
- Real Progress: Live progress tracking during download, analysis, and quantization
- GGUF Export: Outputs industry-standard GGUF format for use with llama.cpp
- Automatic Device Scanning: Detects your iPhone/iPad model, RAM, CPU cores, GPU capabilities, and Neural Engine
- Smart Recommendations: Suggests optimal quantization settings based on your device's capabilities
- Thermal Monitoring: Adjusts settings based on device temperature and battery state
- Liquid Glass Design: iOS 26-inspired glassmorphism with animated backgrounds
- Dark Mode First: Optimized for OLED displays with deep blacks
- Smooth Animations: Spring-based transitions and shimmer effects
- Responsive Layout: Adapts to all iPhone and iPad sizes
- Real Model Search: Search Hugging Face's entire model repository
- Curated Models: Pre-loaded with popular open-source models
- Detailed Info: View parameters, downloads, likes, and supported quantizations
- One-Tap Quantize: Start quantization directly from model details
- iOS 18.0+
- iPhone 11 or later (recommended)
- Metal-capable device
- At least 4GB RAM for 7B models
- Hugging Face token for gated models (like Llama)
- Download the latest IPA from GitHub Releases
- Use AltStore, Sideloadly, or TrollStore to install
- Trust the developer certificate in Settings
git clone https://github.com/NightVibes3/ModelQuantizer-iOS.git
cd ModelQuantizer-iOS
open ModelQuantizer.xcodeprojBuild and run on your device (requires Apple Developer account for signing).
- View your device capabilities at a glance
- See recommended quantization settings
- Access your quantized models
- View recent activity
- Tap "Quantize" in the tab bar
- Search for a model on Hugging Face (or select from popular models)
- Select quantization type (or use recommended)
- Adjust context length if needed
- Tap "Start Quantization"
- Wait for completion
Some models (like Llama) require authentication:
- Go to Settings tab
- Enter your Hugging Face token (get it from huggingface.co/settings/tokens)
- Now you can download gated models
- Detailed hardware specifications
- ML capabilities (Neural Engine, Metal features)
- Performance recommendations
- Supported model sizes
- View all your quantized models
- Share or export models
- Delete unwanted models
| Type | Bits | Compression | Quality | Use Case |
|---|---|---|---|---|
| Q2_K | 2 | 16× | Low | Entry-level devices |
| Q3_K_M | 3 | 10.7× | Fair | Limited RAM |
| Q4_K_M | 4 | 8× | Good | Balanced (Recommended) |
| Q5_K_M | 5 | 6.4× | Very Good | High-end devices |
| Q6_K | 6 | 5.3× | Excellent | Premium devices |
| Q8_0 | 8 | 4× | Near-Perfect | Maximum quality |
| FP16 | 16 | 2× | Original | Research/development |
- Max Model Size: 24GB
- Recommended: Q5-Q6 quantization
- Context: Up to 32K tokens
- Features: Full Neural Engine, all GPU layers
- Max Model Size: 12GB
- Recommended: Q4-Q5 quantization
- Context: Up to 16K tokens
- Features: Neural Engine, most GPU layers
- Max Model Size: 7GB
- Recommended: Q4 quantization
- Context: Up to 8K tokens
- Features: GPU acceleration
- Max Model Size: 4GB
- Recommended: Q3-Q4 quantization
- Context: Up to 4K tokens
- Features: Limited GPU
- Max Model Size: 2GB
- Recommended: Q2-Q3 quantization
- Context: Up to 2K tokens
- Features: CPU only
- Swift 6.0: Modern Swift with concurrency support
- SwiftUI: Declarative UI with 95%+ Swift code
- Metal: GPU acceleration for quantization operations
- Core ML: Neural Engine utilization where available
- Custom GGUF writer implementation
- Real tensor analysis and quantization
- Memory-mapped file I/O
- Progressive quantization with checkpointing
- Support for Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, FP16, FP32
- Background processing with progress callbacks
- Thermal throttling awareness
- Battery level monitoring
- Automatic memory management
- Hugging Face Hub integration
- Real model quantization
- Cloud quantization (offload heavy models)
- Model comparison tool
- Benchmark suite
- Custom model import
- Batch quantization
- iCloud sync for models
Contributions are welcome! Please read our Contributing Guide for details.
This project is licensed under the MIT License - see LICENSE for details.
- llama.cpp for GGUF format
- Hugging Face for the model hub
- ggml for quantization algorithms
This app is for educational and research purposes. Respect model licenses and terms of use. Some models require authentication or have commercial use restrictions.
Built with ❤️ for the AI community