Smart LLM routing for Gemini models. Automatically selects the most cost-efficient Gemini model for your prompt using KNN-based routing.
pip install orkestra-geminiimport orkestra
client = orkestra.Client(key="YOUR_GEMINI_API_KEY")
response = client.generate("Explain quantum computing")
print(response.text)
print(f"Model: {response.model}")
print(f"Cost: ${response.cost:.6f}")
print(f"Saved: {response.savings_percent:.1f}% vs {response.base_model}")Orkestra uses a KNN (K-Nearest Neighbors) router trained on 5,608 query embeddings to predict which Gemini model will perform best for your specific prompt. Simple queries get routed to cheaper models, complex ones to premium models.
Each call:
- Embeds your prompt using Longformer (768-dim)
- KNN finds the 5 nearest training queries
- Routes to the model that performed best on similar queries
- Calls the selected Gemini model via
google-genai - Returns the response with cost savings vs your base model
| Tier | Model | Input $/1M | Output $/1M | Free Tier | Features |
|---|---|---|---|---|---|
| Budget | gemini-2.5-flash-lite |
$0.10 | $0.40 | ✅ Yes | Fast, low-cost (fallback until Gemini 3 lite) |
| Balanced | gemini-3-flash |
$0.50 | $3.00 | ❌ No | Balanced speed & capability for general use |
| Premium | gemini-3-pro |
$2.00 | $12.00 | ❌ No | Highest capability with Deep Think reasoning |
Note: Gemini 3 models were released in late 2025 with 80%+ better reasoning on complex tasks, enhanced multimodal understanding, and 1M token context window. The router automatically selects the optimal model based on your query complexity.
On the free tier, paid models automatically fall back to gemini-2.5-flash-lite.
Create a client with your Gemini API key.
client = orkestra.Client(key="YOUR_GEMINI_API_KEY")Set the baseline model for cost comparison (default: gemini-3-pro).
client.set_base_model("gemini-3-pro") # Compare savings against Gemini 3 ProGenerate a response with automatic model routing.
response = client.generate("What is 2+2?")| Field | Type | Description |
|---|---|---|
text |
str |
Generated response text |
model |
str |
Model that was selected |
cost |
float |
Actual cost in dollars |
input_tokens |
int |
Input token count |
output_tokens |
int |
Output token count |
savings |
float |
Dollars saved vs base model |
savings_percent |
float |
Percentage saved vs base model |
base_model |
str |
Comparison baseline model |
base_cost |
float |
What it would have cost with base model |
[ 1/25] What is 2+2?
Model: gemini-2.5-flash-lite
Tier: budget (expected: budget)
Cost: $0.000004 | Saved: 96.4%
[ 2/25] What's the capital of France?
Model: gemini-3-flash-preview
Tier: balanced (expected: budget)
Cost: $0.000029 | Saved: 75.0%
[ 3/25] Define photosynthesis in one sentence.
Model: gemini-2.5-flash-lite
Tier: budget (expected: budget)
Cost: $0.000011 | Saved: 96.6%
[ 4/25] List 5 primary colors.
Model: gemini-2.5-flash-lite
Tier: budget (expected: budget)
Cost: $0.000098 | Saved: 96.7%
[ 5/25] What year did WW2 end?
Model: gemini-3-flash-preview
Tier: balanced (expected: budget)
Cost: $0.000193 | Saved: 75.0%
[ 6/25] Convert 100 Celsius to Fahrenheit.
Model: gemini-2.5-flash-lite
Tier: budget (expected: budget)
Cost: $0.000063 | Saved: 96.6%
[ 7/25] What is the chemical symbol for water?
Model: gemini-2.5-flash-lite
Tier: budget (expected: budget)
Cost: $0.000005 | Saved: 96.5%
[ 8/25] Name the largest planet in our solar system.
Model: gemini-2.5-flash-lite
Tier: budget (expected: budget)
Cost: $0.000005 | Saved: 96.4%
[ 9/25] Write a Python function to check if a number is prime.
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.001437 | Saved: 75.0%
[10/25] Explain the difference between TCP and UDP.
Model: gemini-2.5-flash-lite
Tier: budget (expected: balanced)
Cost: $0.000558 | Saved: 96.7%
[11/25] Write a SQL query to find duplicate emails in a users table.
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.001372 | Saved: 75.0%
[12/25] Explain how a hash table works with collision handling.
Model: gemini-2.5-flash-lite
Tier: budget (expected: balanced)
Cost: $0.000698 | Saved: 96.7%
[13/25] Write a regex to validate email addresses.
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.002260 | Saved: 75.0%
[14/25] Explain the CAP theorem in distributed systems.
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.002992 | Saved: 75.0%
[15/25] Design a REST API for a todo application with authentication...
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.003459 | Saved: 75.0%
[16/25] Implement a binary search tree with insert, delete, and sear...
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.004176 | Saved: 75.0%
[17/25] Explain how garbage collection works in Java vs Python.
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.003236 | Saved: 75.0%
[18/25] Design a rate limiter for an API using the token bucket algo...
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.003835 | Saved: 75.0%
[19/25] Implement a LRU cache from scratch with O(1) operations.
Model: gemini-3-flash-preview
Tier: balanced (expected: balanced)
Cost: $0.003533 | Saved: 75.0%
[20/25] You are given a small tabular dataset for binary classificat...
Model: gemini-3-pro-preview
Tier: premium (expected: premium)
Cost: $0.021294 | Saved: 0.0%
[21/25] # Grand Challenge: Implement a Differentiable Memory-Augment...
Model: gemini-3-pro-preview
Tier: premium (expected: premium)
Cost: $0.069814 | Saved: 0.0%
[22/25] Design and implement a distributed consensus algorithm simil...
Model: gemini-3-pro-preview
Tier: premium (expected: premium)
Cost: $0.031868 | Saved: 0.0%
[23/25] Implement a complete compiler frontend for a subset of Pytho...
Model: gemini-3-pro-preview
Tier: premium (expected: premium)
Cost: $0.069468 | Saved: 0.0%
[24/25] Design a lock-free concurrent data structure for a work-stea...
Model: gemini-3-pro-preview
Tier: premium (expected: premium)
Cost: $0.034744 | Saved: 0.0%
[25/25] Implement a differentiable neural architecture search system...
Model: gemini-3-pro-preview
Tier: premium (expected: premium)
Cost: $0.046606 | Saved: 0.0%
MIT