Skip to content

NightVibes33/ModelQuantizer-iOS-Distributed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ModelQuantizer

A fully functional iOS app for quantizing Hugging Face AI models directly on your device. Built with SwiftUI and featuring real ML quantization capabilities.

App Icon

Features

Real ML Quantization

  • Actual Quantization: Converts models to GGUF format with Q2_K through FP32 quantization types
  • Hugging Face Integration: Search and download models directly from Hugging Face Hub
  • Architecture Support: Llama, Mistral, Qwen2, Gemma, Phi, Falcon, GPT-2, BERT
  • Real Progress: Live progress tracking during download, analysis, and quantization
  • GGUF Export: Outputs industry-standard GGUF format for use with llama.cpp

Device Intelligence

  • Automatic Device Scanning: Detects your iPhone/iPad model, RAM, CPU cores, GPU capabilities, and Neural Engine
  • Smart Recommendations: Suggests optimal quantization settings based on your device's capabilities
  • Thermal Monitoring: Adjusts settings based on device temperature and battery state

Beautiful UI/UX

  • Liquid Glass Design: iOS 26-inspired glassmorphism with animated backgrounds
  • Dark Mode First: Optimized for OLED displays with deep blacks
  • Smooth Animations: Spring-based transitions and shimmer effects
  • Responsive Layout: Adapts to all iPhone and iPad sizes

Model Library

  • Real Model Search: Search Hugging Face's entire model repository
  • Curated Models: Pre-loaded with popular open-source models
  • Detailed Info: View parameters, downloads, likes, and supported quantizations
  • One-Tap Quantize: Start quantization directly from model details

Requirements

  • iOS 18.0+
  • iPhone 11 or later (recommended)
  • Metal-capable device
  • At least 4GB RAM for 7B models
  • Hugging Face token for gated models (like Llama)

Installation

Sideloading (Recommended)

  1. Download the latest IPA from GitHub Releases
  2. Use AltStore, Sideloadly, or TrollStore to install
  3. Trust the developer certificate in Settings

Building from Source

git clone https://github.com/NightVibes3/ModelQuantizer-iOS.git
cd ModelQuantizer-iOS
open ModelQuantizer.xcodeproj

Build and run on your device (requires Apple Developer account for signing).

Usage

1. Home Dashboard

  • View your device capabilities at a glance
  • See recommended quantization settings
  • Access your quantized models
  • View recent activity

2. Quantize a Model

  1. Tap "Quantize" in the tab bar
  2. Search for a model on Hugging Face (or select from popular models)
  3. Select quantization type (or use recommended)
  4. Adjust context length if needed
  5. Tap "Start Quantization"
  6. Wait for completion

3. Hugging Face Authentication

Some models (like Llama) require authentication:

  1. Go to Settings tab
  2. Enter your Hugging Face token (get it from huggingface.co/settings/tokens)
  3. Now you can download gated models

4. View Device Info

  • Detailed hardware specifications
  • ML capabilities (Neural Engine, Metal features)
  • Performance recommendations
  • Supported model sizes

5. Browse Model Library

  • View all your quantized models
  • Share or export models
  • Delete unwanted models

Quantization Types

Type Bits Compression Quality Use Case
Q2_K 2 16× Low Entry-level devices
Q3_K_M 3 10.7× Fair Limited RAM
Q4_K_M 4 Good Balanced (Recommended)
Q5_K_M 5 6.4× Very Good High-end devices
Q6_K 6 5.3× Excellent Premium devices
Q8_0 8 Near-Perfect Maximum quality
FP16 16 Original Research/development

Device Compatibility

Ultra (iPhone 16 Pro/Max, iPad Pro M4)

  • Max Model Size: 24GB
  • Recommended: Q5-Q6 quantization
  • Context: Up to 32K tokens
  • Features: Full Neural Engine, all GPU layers

Flagship (iPhone 16/15 Pro)

  • Max Model Size: 12GB
  • Recommended: Q4-Q5 quantization
  • Context: Up to 16K tokens
  • Features: Neural Engine, most GPU layers

High-End (iPhone 14/13 Pro)

  • Max Model Size: 7GB
  • Recommended: Q4 quantization
  • Context: Up to 8K tokens
  • Features: GPU acceleration

Mid-Range (iPhone 12/11)

  • Max Model Size: 4GB
  • Recommended: Q3-Q4 quantization
  • Context: Up to 4K tokens
  • Features: Limited GPU

Entry-Level

  • Max Model Size: 2GB
  • Recommended: Q2-Q3 quantization
  • Context: Up to 2K tokens
  • Features: CPU only

Technical Details

Architecture

  • Swift 6.0: Modern Swift with concurrency support
  • SwiftUI: Declarative UI with 95%+ Swift code
  • Metal: GPU acceleration for quantization operations
  • Core ML: Neural Engine utilization where available

Quantization Engine

  • Custom GGUF writer implementation
  • Real tensor analysis and quantization
  • Memory-mapped file I/O
  • Progressive quantization with checkpointing
  • Support for Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, FP16, FP32

Performance

  • Background processing with progress callbacks
  • Thermal throttling awareness
  • Battery level monitoring
  • Automatic memory management

Roadmap

  • Hugging Face Hub integration
  • Real model quantization
  • Cloud quantization (offload heavy models)
  • Model comparison tool
  • Benchmark suite
  • Custom model import
  • Batch quantization
  • iCloud sync for models

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

This project is licensed under the MIT License - see LICENSE for details.

Acknowledgments

Disclaimer

This app is for educational and research purposes. Respect model licenses and terms of use. Some models require authentication or have commercial use restrictions.


Built with ❤️ for the AI community

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages