-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Just pitching this idea to see if we would like to try this, also happy to develop it in a dev branch to see how we feel about it. But first I'd like to get your feelings about this strategy @AmenRa @milyenpabo and @andersonbcdefg.
Problem
Numba is currently a hard dependency that significantly impacts the user experience:
- Code readability: Numba decorators and type hints make the codebase harder to understand
- Performance inconsistency: JIT compilation can sometimes be slower than pure NumPy for certain workloads
- Process spawning issues: Numba can create too many processes leading to crashes in some environments
- Installation complexity: Numba adds significant build complexity and binary size
- Debugging difficulty: JIT-compiled code is harder to debug and profile
Proposed Solution
Make Numba an optional dependency through a progressive two-step migration strategy.
Implementation Approach
Strategy: Progressive Migration to Dual Implementation
Step 1: Conditional Decorators (Quick Win)
- Replace
@njitwith@maybe_njitthat falls back to identity function when Numba is disabled - Immediate benefit: Users can disable Numba globally with minimal code changes
- Handles Numba-specific types (
numba.typed.Dict/List) with fallbacks - Convert
prangetorangewhen Numba is disabled
Step 2: Dual Implementation Pattern (Long-term Solution)
- Keep existing Numba implementations for performance
- Add clean, readable NumPy implementations as fallbacks
- Runtime selection based on Numba availability and user preference
- Much better code readability and debugging experience
Code Evolution Example: Precision Metric
Current:
from numba import njit
from .common import clean_qrels, fix_k
@njit(cache=True)
def precision_at_k(qrels, run, k):
qrels = clean_qrels(qrels, 1)
run = run[:fix_k(k, run)]
if qrels.shape[0] == 0:
return 0.0
return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]Step 1: Conditional Decorators
from ..decorators import maybe_njit
from .common import clean_qrels, fix_k
@maybe_njit(cache=True) # Falls back to pure Python when Numba disabled
def precision_at_k(qrels, run, k):
qrels = clean_qrels(qrels, 1)
run = run[:fix_k(k, run)]
if qrels.shape[0] == 0:
return 0.0
return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]Step 2: Dual Implementation (Future)
def precision_at_k_numpy(qrels, run, k):
"""Clean, readable NumPy implementation."""
relevant_docs = qrels[qrels[:, 1] >= 1][:, 0]
if k == 0 or k > len(run):
k = len(run)
top_k_docs = run[:k, 0]
if len(relevant_docs) == 0:
return 0.0
relevant_retrieved = np.intersect1d(relevant_docs, top_k_docs)
return len(relevant_retrieved) / k
@njit(cache=True)
def precision_at_k_numba(qrels, run, k):
# Existing optimized implementation
...
def precision_at_k(qrels, run, k):
"""Auto-select best implementation."""
if NUMBA_AVAILABLE and use_numba():
return precision_at_k_numba(qrels, run, k)
else:
return precision_at_k_numpy(qrels, run, k)Configuration System
# ranx/config.py
import os
_USE_NUMBA = None
def use_numba():
global _USE_NUMBA
if _USE_NUMBA is None:
_USE_NUMBA = os.environ.get('RANX_USE_NUMBA', 'true').lower() != 'false'
return _USE_NUMBA
def set_numba_enabled(enabled: bool):
global _USE_NUMBA
_USE_NUMBA = enabledUsage Examples
import ranx
# Option 1: Disable Numba globally
ranx.set_numba_enabled(False)
# Option 2: Environment variable
# export RANX_USE_NUMBA=false
# Usage remains identical - automatic fallback
qrels = ranx.Qrels.from_dict({"q1": {"d1": 1, "d2": 1}})
run = ranx.Run.from_dict({"q1": {"d1": 0.9, "d2": 0.8, "d3": 0.7}})
result = ranx.evaluate(qrels, run, ["precision@2"]) # Uses best available implementationBenefits
Step 1 Benefits:
- Immediate relief: Users can disable Numba right away
- Simplified debugging: Pure Python stack traces
- Easier development: No JIT compilation delays
- Zero breaking changes: Existing API unchanged
Step 2 Benefits:
- Clean, readable code: NumPy versions are self-documenting
- Educational value: Clean implementations help users understand metrics
- Better maintenance: Easier to debug and modify NumPy versions
- Performance flexibility: Users choose speed vs simplicity
Impact Areas
The change would affect ~132 functions across:
- Metrics (50+ functions): ndcg, precision, recall, etc.
- Fusion algorithms (40+ functions): bordafuse, bayesfuse, etc.
- Data structures (15+ functions): Qrels, Run operations
- Normalization (10+ functions)
- Utilities and statistical tests (15+ functions)
Performance Considerations
- Step 1: Same algorithms, just without JIT compilation (slower but functional)
- Step 2: NumPy implementations could often be as fast as Numba
- Vectorized NumPy might sometimes outperform Numba on small datasets
- Users get to choose their performance/readability tradeoff
Implementation Plan
Phase 1: Foundation (Step 1)
- Create configuration system (
ranx/config.py) - Add conditional decorators (
ranx/decorators.py) - Migrate decorators across codebase (can be done incrementally)
- Update
__init__.pyfor conditional Numba setup - Add tests for both Numba and non-Numba modes
Phase 2: Clean Implementations (Step 2)
- Start with high-impact metrics (precision, recall, ndcg)
- Add dual implementations incrementally
- Create comprehensive benchmarks
- Update documentation with examples
Alternative Approaches Considered
- Pure NumPy rewrite: Would break performance for existing users
- Separate packages: Split into
ranxandranx-numba— Too complex - Lazy imports: Import Numba only when needed — Doesn't solve core readability issues
This progressive approach gives immediate relief to users experiencing Numba issues while working toward a long-term solution
with clean, readable implementations.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request