Skip to content

[Feature Request] Make Numba Optional with Pure NumPy Fallbacks #75

@diegoceccarelli

Description

@diegoceccarelli

Just pitching this idea to see if we would like to try this, also happy to develop it in a dev branch to see how we feel about it. But first I'd like to get your feelings about this strategy @AmenRa @milyenpabo and @andersonbcdefg.

Problem

Numba is currently a hard dependency that significantly impacts the user experience:

  • Code readability: Numba decorators and type hints make the codebase harder to understand
  • Performance inconsistency: JIT compilation can sometimes be slower than pure NumPy for certain workloads
  • Process spawning issues: Numba can create too many processes leading to crashes in some environments
  • Installation complexity: Numba adds significant build complexity and binary size
  • Debugging difficulty: JIT-compiled code is harder to debug and profile

see also #74 #64


Proposed Solution

Make Numba an optional dependency through a progressive two-step migration strategy.


Implementation Approach

Strategy: Progressive Migration to Dual Implementation


Step 1: Conditional Decorators (Quick Win)

  • Replace @njit with @maybe_njit that falls back to identity function when Numba is disabled
  • Immediate benefit: Users can disable Numba globally with minimal code changes
  • Handles Numba-specific types (numba.typed.Dict/List) with fallbacks
  • Convert prange to range when Numba is disabled

Step 2: Dual Implementation Pattern (Long-term Solution)

  • Keep existing Numba implementations for performance
  • Add clean, readable NumPy implementations as fallbacks
  • Runtime selection based on Numba availability and user preference
  • Much better code readability and debugging experience

Code Evolution Example: Precision Metric

Current:

from numba import njit
from .common import clean_qrels, fix_k

@njit(cache=True)
def precision_at_k(qrels, run, k):
    qrels = clean_qrels(qrels, 1)
    run = run[:fix_k(k, run)]
    if qrels.shape[0] == 0:
        return 0.0
    return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]

Step 1: Conditional Decorators

from ..decorators import maybe_njit
from .common import clean_qrels, fix_k

@maybe_njit(cache=True)  # Falls back to pure Python when Numba disabled
def precision_at_k(qrels, run, k):
    qrels = clean_qrels(qrels, 1)
    run = run[:fix_k(k, run)]
    if qrels.shape[0] == 0:
        return 0.0
    return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]

Step 2: Dual Implementation (Future)

def precision_at_k_numpy(qrels, run, k):
    """Clean, readable NumPy implementation."""
    relevant_docs = qrels[qrels[:, 1] >= 1][:, 0]
    if k == 0 or k > len(run):
        k = len(run)
    top_k_docs = run[:k, 0]
    if len(relevant_docs) == 0:
        return 0.0
    relevant_retrieved = np.intersect1d(relevant_docs, top_k_docs)
    return len(relevant_retrieved) / k

@njit(cache=True)
def precision_at_k_numba(qrels, run, k):
    # Existing optimized implementation
    ...

def precision_at_k(qrels, run, k):
    """Auto-select best implementation."""
    if NUMBA_AVAILABLE and use_numba():
        return precision_at_k_numba(qrels, run, k)
    else:
        return precision_at_k_numpy(qrels, run, k)

Configuration System

# ranx/config.py
import os

_USE_NUMBA = None

def use_numba():
    global _USE_NUMBA
    if _USE_NUMBA is None:
        _USE_NUMBA = os.environ.get('RANX_USE_NUMBA', 'true').lower() != 'false'
    return _USE_NUMBA

def set_numba_enabled(enabled: bool):
    global _USE_NUMBA
    _USE_NUMBA = enabled

Usage Examples

import ranx

# Option 1: Disable Numba globally
ranx.set_numba_enabled(False)

# Option 2: Environment variable
# export RANX_USE_NUMBA=false

# Usage remains identical - automatic fallback
qrels = ranx.Qrels.from_dict({"q1": {"d1": 1, "d2": 1}})
run = ranx.Run.from_dict({"q1": {"d1": 0.9, "d2": 0.8, "d3": 0.7}})
result = ranx.evaluate(qrels, run, ["precision@2"])  # Uses best available implementation

Benefits

Step 1 Benefits:

  • Immediate relief: Users can disable Numba right away
  • Simplified debugging: Pure Python stack traces
  • Easier development: No JIT compilation delays
  • Zero breaking changes: Existing API unchanged

Step 2 Benefits:

  • Clean, readable code: NumPy versions are self-documenting
  • Educational value: Clean implementations help users understand metrics
  • Better maintenance: Easier to debug and modify NumPy versions
  • Performance flexibility: Users choose speed vs simplicity

Impact Areas

The change would affect ~132 functions across:

  • Metrics (50+ functions): ndcg, precision, recall, etc.
  • Fusion algorithms (40+ functions): bordafuse, bayesfuse, etc.
  • Data structures (15+ functions): Qrels, Run operations
  • Normalization (10+ functions)
  • Utilities and statistical tests (15+ functions)

Performance Considerations

  • Step 1: Same algorithms, just without JIT compilation (slower but functional)
  • Step 2: NumPy implementations could often be as fast as Numba
  • Vectorized NumPy might sometimes outperform Numba on small datasets
  • Users get to choose their performance/readability tradeoff

Implementation Plan

Phase 1: Foundation (Step 1)

  1. Create configuration system (ranx/config.py)
  2. Add conditional decorators (ranx/decorators.py)
  3. Migrate decorators across codebase (can be done incrementally)
  4. Update __init__.py for conditional Numba setup
  5. Add tests for both Numba and non-Numba modes

Phase 2: Clean Implementations (Step 2)

  1. Start with high-impact metrics (precision, recall, ndcg)
  2. Add dual implementations incrementally
  3. Create comprehensive benchmarks
  4. Update documentation with examples

Alternative Approaches Considered

  1. Pure NumPy rewrite: Would break performance for existing users
  2. Separate packages: Split into ranx and ranx-numba — Too complex
  3. Lazy imports: Import Numba only when needed — Doesn't solve core readability issues

This progressive approach gives immediate relief to users experiencing Numba issues while working toward a long-term solution
with clean, readable implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions