[Feature Request] Make Numba Optional with Pure NumPy Fallbacks

Just pitching this idea to see if we would like to try this, also happy to develop it in a dev branch to see how we feel about it. But first I'd like to get your feelings about this strategy @AmenRa @milyenpabo and @andersonbcdefg.

## Problem

Numba is currently a hard dependency that significantly impacts the user experience:

- **Code readability**: Numba decorators and type hints make the codebase harder to understand  
- **Performance inconsistency**: JIT compilation can sometimes be slower than pure NumPy for certain workloads  
- **Process spawning issues**: Numba can create too many processes leading to crashes in some environments  
- **Installation complexity**: Numba adds significant build complexity and binary size  
- **Debugging difficulty**: JIT-compiled code is harder to debug and profile  

see also https://github.com/AmenRa/ranx/issues/74 https://github.com/AmenRa/ranx/issues/64

---

## Proposed Solution

Make Numba an optional dependency through a progressive two-step migration strategy.

---

### Implementation Approach

#### Strategy: Progressive Migration to Dual Implementation

---

#### Step 1: Conditional Decorators (Quick Win)

- Replace `@njit` with `@maybe_njit` that falls back to identity function when Numba is disabled  
- Immediate benefit: Users can disable Numba globally with minimal code changes  
- Handles Numba-specific types (`numba.typed.Dict/List`) with fallbacks  
- Convert `prange` to `range` when Numba is disabled  

---

#### Step 2: Dual Implementation Pattern (Long-term Solution)

- Keep existing Numba implementations for performance  
- Add clean, readable NumPy implementations as fallbacks  
- Runtime selection based on Numba availability and user preference  
- Much better code readability and debugging experience  

---

### Code Evolution Example: Precision Metric

#### Current:

```python
from numba import njit
from .common import clean_qrels, fix_k

@njit(cache=True)
def precision_at_k(qrels, run, k):
    qrels = clean_qrels(qrels, 1)
    run = run[:fix_k(k, run)]
    if qrels.shape[0] == 0:
        return 0.0
    return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]
```

---

#### Step 1: Conditional Decorators

```python
from ..decorators import maybe_njit
from .common import clean_qrels, fix_k

@maybe_njit(cache=True)  # Falls back to pure Python when Numba disabled
def precision_at_k(qrels, run, k):
    qrels = clean_qrels(qrels, 1)
    run = run[:fix_k(k, run)]
    if qrels.shape[0] == 0:
        return 0.0
    return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]
```

---

#### Step 2: Dual Implementation (Future)

```python
def precision_at_k_numpy(qrels, run, k):
    """Clean, readable NumPy implementation."""
    relevant_docs = qrels[qrels[:, 1] >= 1][:, 0]
    if k == 0 or k > len(run):
        k = len(run)
    top_k_docs = run[:k, 0]
    if len(relevant_docs) == 0:
        return 0.0
    relevant_retrieved = np.intersect1d(relevant_docs, top_k_docs)
    return len(relevant_retrieved) / k

@njit(cache=True)
def precision_at_k_numba(qrels, run, k):
    # Existing optimized implementation
    ...

def precision_at_k(qrels, run, k):
    """Auto-select best implementation."""
    if NUMBA_AVAILABLE and use_numba():
        return precision_at_k_numba(qrels, run, k)
    else:
        return precision_at_k_numpy(qrels, run, k)
```

---

### Configuration System

```python
# ranx/config.py
import os

_USE_NUMBA = None

def use_numba():
    global _USE_NUMBA
    if _USE_NUMBA is None:
        _USE_NUMBA = os.environ.get('RANX_USE_NUMBA', 'true').lower() != 'false'
    return _USE_NUMBA

def set_numba_enabled(enabled: bool):
    global _USE_NUMBA
    _USE_NUMBA = enabled
```

---

### Usage Examples

```python
import ranx

# Option 1: Disable Numba globally
ranx.set_numba_enabled(False)

# Option 2: Environment variable
# export RANX_USE_NUMBA=false

# Usage remains identical - automatic fallback
qrels = ranx.Qrels.from_dict({"q1": {"d1": 1, "d2": 1}})
run = ranx.Run.from_dict({"q1": {"d1": 0.9, "d2": 0.8, "d3": 0.7}})
result = ranx.evaluate(qrels, run, ["precision@2"])  # Uses best available implementation
```

---

### Benefits

**Step 1 Benefits:**

- Immediate relief: Users can disable Numba right away  
- Simplified debugging: Pure Python stack traces  
- Easier development: No JIT compilation delays  
- Zero breaking changes: Existing API unchanged  

**Step 2 Benefits:**

- Clean, readable code: NumPy versions are self-documenting  
- Educational value: Clean implementations help users understand metrics  
- Better maintenance: Easier to debug and modify NumPy versions  
- Performance flexibility: Users choose speed vs simplicity  

---

### Impact Areas

The change would affect ~132 functions across:

- Metrics (50+ functions): ndcg, precision, recall, etc.  
- Fusion algorithms (40+ functions): bordafuse, bayesfuse, etc.  
- Data structures (15+ functions): Qrels, Run operations  
- Normalization (10+ functions)  
- Utilities and statistical tests (15+ functions)  

---

### Performance Considerations

- Step 1: Same algorithms, just without JIT compilation (slower but functional)  
- Step 2: NumPy implementations could often be as fast as Numba  
- Vectorized NumPy might sometimes outperform Numba on small datasets  
- Users get to choose their performance/readability tradeoff  

---

### Implementation Plan

**Phase 1: Foundation (Step 1)**

1. Create configuration system (`ranx/config.py`)  
2. Add conditional decorators (`ranx/decorators.py`)  
3. Migrate decorators across codebase (can be done incrementally)  
4. Update `__init__.py` for conditional Numba setup  
5. Add tests for both Numba and non-Numba modes  

**Phase 2: Clean Implementations (Step 2)**

1. Start with high-impact metrics (precision, recall, ndcg)  
2. Add dual implementations incrementally  
3. Create comprehensive benchmarks  
4. Update documentation with examples  

---

### Alternative Approaches Considered

1. **Pure NumPy rewrite**: Would break performance for existing users  
2. **Separate packages**: Split into `ranx` and `ranx-numba` — Too complex  
3. **Lazy imports**: Import Numba only when needed — Doesn't solve core readability issues  

---

This progressive approach gives immediate relief to users experiencing Numba issues while working toward a long-term solution  
with clean, readable implementations.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Make Numba Optional with Pure NumPy Fallbacks #75

Problem

Proposed Solution

Implementation Approach

Strategy: Progressive Migration to Dual Implementation

Step 1: Conditional Decorators (Quick Win)

Step 2: Dual Implementation Pattern (Long-term Solution)

Code Evolution Example: Precision Metric

Current:

Step 1: Conditional Decorators

Step 2: Dual Implementation (Future)

Configuration System

Usage Examples

Benefits

Impact Areas

Performance Considerations

Implementation Plan

Alternative Approaches Considered

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature Request] Make Numba Optional with Pure NumPy Fallbacks #75

Description

Problem

Proposed Solution

Implementation Approach

Strategy: Progressive Migration to Dual Implementation

Step 1: Conditional Decorators (Quick Win)

Step 2: Dual Implementation Pattern (Long-term Solution)

Code Evolution Example: Precision Metric

Current:

Step 1: Conditional Decorators

Step 2: Dual Implementation (Future)

Configuration System

Usage Examples

Benefits

Impact Areas

Performance Considerations

Implementation Plan

Alternative Approaches Considered

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions