Skip to content

imperativelabs/orkestra-gemini

Repository files navigation

orkestra-gemini

Smart LLM routing for Gemini models. Automatically selects the most cost-efficient Gemini model for your prompt using KNN-based routing.

Install

pip install orkestra-gemini

Quick Start

import orkestra

client = orkestra.Client(key="YOUR_GEMINI_API_KEY")
response = client.generate("Explain quantum computing")

print(response.text)
print(f"Model: {response.model}")
print(f"Cost: ${response.cost:.6f}")
print(f"Saved: {response.savings_percent:.1f}% vs {response.base_model}")

How It Works

Orkestra uses a KNN (K-Nearest Neighbors) router trained on 5,608 query embeddings to predict which Gemini model will perform best for your specific prompt. Simple queries get routed to cheaper models, complex ones to premium models.

Each call:

  1. Embeds your prompt using Longformer (768-dim)
  2. KNN finds the 5 nearest training queries
  3. Routes to the model that performed best on similar queries
  4. Calls the selected Gemini model via google-genai
  5. Returns the response with cost savings vs your base model

Model Tiers (Gemini 3)

Tier Model Input $/1M Output $/1M Free Tier Features
Budget gemini-2.5-flash-lite $0.10 $0.40 ✅ Yes Fast, low-cost (fallback until Gemini 3 lite)
Balanced gemini-3-flash $0.50 $3.00 ❌ No Balanced speed & capability for general use
Premium gemini-3-pro $2.00 $12.00 ❌ No Highest capability with Deep Think reasoning

Note: Gemini 3 models were released in late 2025 with 80%+ better reasoning on complex tasks, enhanced multimodal understanding, and 1M token context window. The router automatically selects the optimal model based on your query complexity.

On the free tier, paid models automatically fall back to gemini-2.5-flash-lite.

API

orkestra.Client(key)

Create a client with your Gemini API key.

client = orkestra.Client(key="YOUR_GEMINI_API_KEY")

client.set_base_model(model)

Set the baseline model for cost comparison (default: gemini-3-pro).

client.set_base_model("gemini-3-pro")  # Compare savings against Gemini 3 Pro

client.generate(prompt, *, max_tokens=8192, temperature=1.0)

Generate a response with automatic model routing.

response = client.generate("What is 2+2?")

orkestra.Response

Field Type Description
text str Generated response text
model str Model that was selected
cost float Actual cost in dollars
input_tokens int Input token count
output_tokens int Output token count
savings float Dollars saved vs base model
savings_percent float Percentage saved vs base model
base_model str Comparison baseline model
base_cost float What it would have cost with base model

Validation results

[ 1/25] What is 2+2?
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: budget)
       Cost: $0.000004 | Saved: 96.4%

[ 2/25] What's the capital of France?
       Model: gemini-3-flash-preview
       Tier: balanced (expected: budget)
       Cost: $0.000029 | Saved: 75.0%

[ 3/25] Define photosynthesis in one sentence.
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: budget)
       Cost: $0.000011 | Saved: 96.6%

[ 4/25] List 5 primary colors.
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: budget)
       Cost: $0.000098 | Saved: 96.7%

[ 5/25] What year did WW2 end?
       Model: gemini-3-flash-preview
       Tier: balanced (expected: budget)
       Cost: $0.000193 | Saved: 75.0%

[ 6/25] Convert 100 Celsius to Fahrenheit.
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: budget)
       Cost: $0.000063 | Saved: 96.6%

[ 7/25] What is the chemical symbol for water?
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: budget)
       Cost: $0.000005 | Saved: 96.5%

[ 8/25] Name the largest planet in our solar system.
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: budget)
       Cost: $0.000005 | Saved: 96.4%

[ 9/25] Write a Python function to check if a number is prime.
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.001437 | Saved: 75.0%

[10/25] Explain the difference between TCP and UDP.
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: balanced)
       Cost: $0.000558 | Saved: 96.7%

[11/25] Write a SQL query to find duplicate emails in a users table.
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.001372 | Saved: 75.0%

[12/25] Explain how a hash table works with collision handling.
       Model: gemini-2.5-flash-lite
       Tier: budget (expected: balanced)
       Cost: $0.000698 | Saved: 96.7%

[13/25] Write a regex to validate email addresses.
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.002260 | Saved: 75.0%

[14/25] Explain the CAP theorem in distributed systems.
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.002992 | Saved: 75.0%

[15/25] Design a REST API for a todo application with authentication...
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.003459 | Saved: 75.0%

[16/25] Implement a binary search tree with insert, delete, and sear...
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.004176 | Saved: 75.0%

[17/25] Explain how garbage collection works in Java vs Python.
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.003236 | Saved: 75.0%

[18/25] Design a rate limiter for an API using the token bucket algo...
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.003835 | Saved: 75.0%

[19/25] Implement a LRU cache from scratch with O(1) operations.
       Model: gemini-3-flash-preview
       Tier: balanced (expected: balanced)
       Cost: $0.003533 | Saved: 75.0%

[20/25] You are given a small tabular dataset for binary classificat...
       Model: gemini-3-pro-preview
       Tier: premium (expected: premium)
       Cost: $0.021294 | Saved: 0.0%

[21/25] # Grand Challenge: Implement a Differentiable Memory-Augment...
       Model: gemini-3-pro-preview
       Tier: premium (expected: premium)
       Cost: $0.069814 | Saved: 0.0%

[22/25] Design and implement a distributed consensus algorithm simil...
       Model: gemini-3-pro-preview
       Tier: premium (expected: premium)
       Cost: $0.031868 | Saved: 0.0%

[23/25] Implement a complete compiler frontend for a subset of Pytho...
       Model: gemini-3-pro-preview
       Tier: premium (expected: premium)
       Cost: $0.069468 | Saved: 0.0%

[24/25] Design a lock-free concurrent data structure for a work-stea...
       Model: gemini-3-pro-preview
       Tier: premium (expected: premium)
       Cost: $0.034744 | Saved: 0.0%

[25/25] Implement a differentiable neural architecture search system...
       Model: gemini-3-pro-preview
       Tier: premium (expected: premium)
       Cost: $0.046606 | Saved: 0.0%

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors