Skip to content

⚡️ Speed up method BRD.time_series by 23%#114

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-BRD.time_series-mkp5t5k4
Open

⚡️ Speed up method BRD.time_series by 23%#114
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-BRD.time_series-mkp5t5k4

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Jan 22, 2026

📄 23% (0.23x) speedup for BRD.time_series in quantecon/game_theory/brd.py

⏱️ Runtime : 47.9 milliseconds 38.9 milliseconds (best of 137 runs)

📝 Explanation and details

The optimized code achieves a 23% speedup through three key optimizations:

1. Vectorized _set_action_dist using np.bincount (58% faster)

The original implementation uses a Python loop to populate the action distribution array:

for i in range(self.N):
    action_dist[actions[i]] += 1

The optimized version replaces this with NumPy's vectorized bincount:

counts = np.bincount(np.asarray(actions, dtype=np.intp), minlength=self.num_actions)
action_dist[:counts.shape[0]] = counts

Why it's faster: np.bincount is implemented in optimized C code and processes the entire array in one operation, avoiding Python loop overhead. The line profiler shows this function improved from 1.23ms to 0.515ms.

2. Inlined play() logic in time_series loop (Eliminates function call overhead)

The original code calls self.play() inside the main loop, which:

  • Incurs function call overhead for 2,881 iterations
  • Performs redundant check_random_state() calls (taking 13.3% of total time)
  • Creates unnecessary intermediate variables

The optimized version inlines the critical operations directly:

action_dist[action] -= 1
next_action = self.player.best_response(...)
action_dist[next_action] += 1

Why it's faster: Line profiler shows the redundant check_random_state calls in the original loop consumed 27.9ms (13.3% of total time). Eliminating these calls and function overhead provides significant savings.

3. Incremental cumulative sum updates (Avoids redundant recomputation)

The original code recomputes the cumulative sum on every iteration:

action = np.searchsorted(action_dist.cumsum(), player_ind_seq[t], side='right')

The optimized version maintains the cumulative sum and updates it incrementally:

cum = action_dist.cumsum()  # Once before loop
# In loop:
action = np.searchsorted(cum, idx, side='right')
# Update cum based on what changed
if action != next_action:
    if action < next_action:
        cum[action:next_action] -= 1
    else:
        cum[next_action:action] += 1

Why it's faster: Computing cumsum() takes 13.5% of loop time in the original code. The incremental update only modifies the affected range, which is cheaper than full recomputation, especially when action == next_action (no update needed).

Impact on Workloads

The test results show consistent speedups across all scenarios:

  • Small games (N=2-5, ts_length=5-30): 8-21% faster
  • Large-scale (N=100-500, ts_length=100-500): 16-33% faster, with the best gains on long time series (32.8% for ts_length=500)

The optimizations are particularly effective for:

  • Long time series where loop overhead and redundant cumsum computations accumulate
  • Games with many players where _set_action_dist is more expensive
  • Scenarios where actions frequently don't change (action == next_action), avoiding cumsum updates entirely

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 103 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import numbers

import numpy as np
# imports
import pytest  # used for our unit tests
from quantecon.game_theory.brd import BRD

def test_time_series_basic_deterministic_init():
    # Simple 2x2 payoff matrix for a symmetric 2-player game.
    payoff = [[0.0, 1.0], [1.0, 0.0]]
    # Create BRD with 3 players
    brd = BRD(payoff, N=3)
    # Provide an explicit initial action distribution (2 players choose action 0, 1 chooses action 1)
    init_dist = np.array([2, 1])
    # Run a short time series with deterministic random_state
    codeflash_output = brd.time_series(4, init_action_dist=init_dist.copy(), random_state=42); ts = codeflash_output # 274μs -> 253μs (8.09% faster)
    # Each row of the time series must sum to the number of players (conservation of players)
    for row in ts:
        pass

def test_time_series_random_init_reproducible():
    # If random_state is fixed, two runs without init_action_dist should be identical
    payoff = [[0.0, 0.0], [0.0, 0.0]]  # ties everywhere, but reproducibility is the focus
    brd = BRD(payoff, N=4)
    codeflash_output = brd.time_series(5, random_state=123); ts1 = codeflash_output # 300μs -> 287μs (4.45% faster)
    codeflash_output = brd.time_series(5, random_state=123); ts2 = codeflash_output # 281μs -> 271μs (3.41% faster)

def test_time_series_zero_length_returns_empty_array():
    # ts_length of zero should return an array with zero rows and num_actions columns
    payoff = [[0.0, 1.0], [1.0, 0.0]]
    brd = BRD(payoff, N=2)
    codeflash_output = brd.time_series(0, init_action_dist=np.array([1, 1]), random_state=0); out = codeflash_output # 180μs -> 181μs (0.580% slower)

def test_time_series_negative_length_raises():
    # Negative ts_length should result in an error from numpy.empty called inside.
    payoff = [[0.0, 1.0], [1.0, 0.0]]
    brd = BRD(payoff, N=2)
    with pytest.raises(Exception):
        # Expecting a ValueError or similar from numpy for negative dimensions
        brd.time_series(-1, init_action_dist=np.array([1, 1]), random_state=0) # 163μs -> 162μs (0.838% faster)

def test_time_series_invalid_init_action_dist_length_raises():
    # Provide an init_action_dist with length not matching num_actions
    payoff = [[0.0, 1.0, 2.0],
              [1.0, 0.0, 2.0],
              [2.0, 2.0, 0.0]]
    brd = BRD(payoff, N=3)
    # init_action_dist has wrong length (only 1 element instead of 3)
    bad_init = np.array([3])
    # Running should raise an exception at some point (IndexError or similar)
    with pytest.raises(Exception):
        brd.time_series(3, init_action_dist=bad_init, random_state=7) # 207μs -> 199μs (3.78% faster)

def test_tie_breaking_smallest_keeps_stable_distribution_but_random_changes():
    # Build a payoff matrix where all payoffs are equal -> best responses tie across all actions
    payoff = np.ones((3, 3))  # every action yields identical payoff
    brd_smallest = BRD(payoff, N=5)
    # Set an initial distribution that concentrates all players in action 0
    init_dist = np.array([5, 0, 0])
    # With tie_breaking 'smallest' and this init distribution, player selection will always pick action 0
    codeflash_output = brd_smallest.time_series(10, init_action_dist=init_dist.copy(),
                                           tie_breaking='smallest', random_state=0); out_smallest = codeflash_output # 358μs -> 315μs (13.6% faster)
    # All rows should be identical to the initial distribution in this configuration
    for row in out_smallest:
        pass

    # Now with random tie breaking, randomness may cause players to move away from action 0
    brd_random = BRD(payoff, N=5)
    codeflash_output = brd_random.time_series(10, init_action_dist=init_dist.copy(),
                                        tie_breaking='random', random_state=0); out_random = codeflash_output # 392μs -> 381μs (3.06% faster)
    # It's extremely unlikely (given randomness) that all rows remain identical to init_dist for tie-breaking random,
    # so assert that not every row equals the initial distribution. This checks that tie_breaking parameter is respected.
    all_equal = all((row.tolist() == init_dist.tolist()) for row in out_random)

def test_time_series_large_ts_reproducible_and_conserves_players():
    # Use moderate sizes to simulate large-ish run without exceeding the stated test constraints.
    num_actions = 4
    N = 10
    ts_length = 200  # 200 * 4 = 800 elements < 1000 element guideline
    # Simple payoff matrix to keep player.best_response stable
    payoff = np.zeros((num_actions, num_actions))
    brd = BRD(payoff, N=N)
    # Use a deterministic seed
    codeflash_output = brd.time_series(ts_length, random_state=999); ts = codeflash_output # 2.77ms -> 2.11ms (31.3% faster)
    # Each row must sum to N (conservation of players)
    for row in ts:
        pass
    # Reproducibility: running again with same seed yields same result
    codeflash_output = brd.time_series(ts_length, random_state=999); ts2 = codeflash_output # 2.77ms -> 2.09ms (32.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import numpy as np
# imports
import pytest
from quantecon.game_theory.brd import BRD
from quantecon.util import check_random_state

class TestBRDTimeSeriesBasic:
    """Basic functionality tests for BRD.time_series"""
    
    def test_basic_2x2_game_small_ts(self):
        """Test basic 2x2 game with small time series"""
        # Create a simple 2x2 coordination game
        payoff_matrix = np.array([[2, 0], [0, 1]])
        brd = BRD(payoff_matrix, N=2)
        
        # Generate short time series
        codeflash_output = brd.time_series(ts_length=5); ts = codeflash_output # 140μs -> 132μs (5.94% faster)
    
    def test_output_dtype_is_int(self):
        """Verify output dtype is integer"""
        payoff_matrix = np.array([[3, 1], [1, 3]])
        brd = BRD(payoff_matrix, N=3)
        
        codeflash_output = brd.time_series(ts_length=10); ts = codeflash_output # 211μs -> 185μs (14.1% faster)
    
    def test_output_shape_matches_parameters(self):
        """Verify output shape is (ts_length, num_actions)"""
        payoff_matrix = np.array([[1, 2, 0], [1, 1, 2], [0, 1, 1]])
        brd = BRD(payoff_matrix, N=4)
        
        ts_length = 15
        codeflash_output = brd.time_series(ts_length=ts_length); ts = codeflash_output # 281μs -> 255μs (10.4% faster)
    
    def test_action_distribution_sums_to_N(self):
        """Verify that each time step's action distribution sums to N"""
        payoff_matrix = np.array([[4, 0], [0, 3]])
        N = 5
        brd = BRD(payoff_matrix, N=N)
        
        codeflash_output = brd.time_series(ts_length=20); ts = codeflash_output # 345μs -> 290μs (18.9% faster)
        
        # Each row should sum to N (total number of players)
        for t in range(ts.shape[0]):
            pass
    
    def test_all_action_counts_non_negative(self):
        """Verify all action counts are non-negative"""
        payoff_matrix = np.array([[2, 1], [1, 2]])
        brd = BRD(payoff_matrix, N=3)
        
        codeflash_output = brd.time_series(ts_length=25); ts = codeflash_output # 408μs -> 336μs (21.5% faster)
    
    def test_custom_init_action_dist(self):
        """Test with custom initial action distribution"""
        payoff_matrix = np.array([[1, 0], [0, 1]])
        brd = BRD(payoff_matrix, N=4)
        
        # Provide initial distribution: 3 players on action 0, 1 on action 1
        init_dist = np.array([3, 1])
        codeflash_output = brd.time_series(ts_length=10, init_action_dist=init_dist); ts = codeflash_output # 210μs -> 178μs (18.5% faster)
    
    def test_init_dist_preserved_at_t0(self):
        """Verify initial distribution appears in first row"""
        payoff_matrix = np.array([[2, 0], [0, 2]])
        brd = BRD(payoff_matrix, N=6)
        
        init_dist = np.array([4, 2])
        codeflash_output = brd.time_series(ts_length=12, init_action_dist=init_dist); ts = codeflash_output # 238μs -> 197μs (20.6% faster)
    
    def test_reproducibility_with_random_state_int(self):
        """Verify same random_state produces identical results (int seed)"""
        payoff_matrix = np.array([[3, 1], [1, 3]])
        brd = BRD(payoff_matrix, N=2)
        
        # Generate two time series with same random seed
        codeflash_output = brd.time_series(ts_length=20, random_state=42); ts1 = codeflash_output # 505μs -> 445μs (13.4% faster)
        codeflash_output = brd.time_series(ts_length=20, random_state=42); ts2 = codeflash_output # 489μs -> 430μs (13.5% faster)
    
    def test_reproducibility_with_random_state_object(self):
        """Verify same RandomState object produces identical results"""
        payoff_matrix = np.array([[2, 0], [0, 2]])
        brd = BRD(payoff_matrix, N=3)
        
        # Create RandomState with fixed seed
        rs1 = np.random.RandomState(123)
        codeflash_output = brd.time_series(ts_length=15, random_state=rs1); ts1 = codeflash_output # 277μs -> 235μs (18.0% faster)
        
        # New RandomState with same seed
        rs2 = np.random.RandomState(123)
        codeflash_output = brd.time_series(ts_length=15, random_state=rs2); ts2 = codeflash_output # 249μs -> 209μs (19.0% faster)
    
    def test_different_random_states_differ(self):
        """Verify different random_states produce different results"""
        payoff_matrix = np.array([[1, 2], [2, 1]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=30, random_state=1); ts1 = codeflash_output # 638μs -> 538μs (18.5% faster)
        codeflash_output = brd.time_series(ts_length=30, random_state=2); ts2 = codeflash_output # 621μs -> 527μs (17.9% faster)
    
    def test_tie_breaking_smallest(self):
        """Test tie_breaking='smallest' option"""
        # Payoff matrix with ties: all actions have same payoff
        payoff_matrix = np.array([[1, 1], [1, 1]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=20, tie_breaking='smallest', 
                            random_state=99); ts = codeflash_output # 503μs -> 447μs (12.5% faster)
    
    def test_tie_breaking_random(self):
        """Test tie_breaking='random' option"""
        payoff_matrix = np.array([[2, 2], [2, 2]])
        brd = BRD(payoff_matrix, N=3)
        
        codeflash_output = brd.time_series(ts_length=15, tie_breaking='random',
                            random_state=77); ts = codeflash_output # 516μs -> 486μs (6.24% faster)

class TestBRDTimeSeriesEdgeCases:
    """Edge case tests for BRD.time_series"""
    
    def test_single_action_game(self):
        """Test with 1x1 payoff matrix (single action)"""
        payoff_matrix = np.array([[5.0]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=10); ts = codeflash_output # 193μs -> 160μs (20.0% faster)
    
    def test_single_player_game(self):
        """Test with single player (N=1)"""
        payoff_matrix = np.array([[1, 0], [0, 1]])
        brd = BRD(payoff_matrix, N=1)
        
        codeflash_output = brd.time_series(ts_length=8); ts = codeflash_output # 179μs -> 153μs (16.9% faster)
    
    def test_single_time_step(self):
        """Test with ts_length=1"""
        payoff_matrix = np.array([[2, 0], [0, 2]])
        brd = BRD(payoff_matrix, N=3)
        
        init_dist = np.array([2, 1])
        codeflash_output = brd.time_series(ts_length=1, init_action_dist=init_dist); ts = codeflash_output # 59.1μs -> 59.7μs (1.06% slower)
    
    def test_negative_payoffs(self):
        """Test with negative payoff values"""
        payoff_matrix = np.array([[-1, -5], [-5, -2]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=12); ts = codeflash_output # 236μs -> 205μs (14.9% faster)
    
    def test_zero_payoffs(self):
        """Test with all zero payoffs"""
        payoff_matrix = np.array([[0, 0], [0, 0]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=10); ts = codeflash_output # 208μs -> 180μs (15.3% faster)
    
    def test_mixed_positive_negative_payoffs(self):
        """Test with mixed positive and negative payoffs"""
        payoff_matrix = np.array([[3, -1], [-2, 4]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=15); ts = codeflash_output # 276μs -> 234μs (18.0% faster)
    
    def test_large_payoff_values(self):
        """Test with very large payoff values"""
        payoff_matrix = np.array([[1e10, 1e10], [1e10, 1e10]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=10); ts = codeflash_output # 198μs -> 172μs (14.8% faster)
    
    def test_small_payoff_values(self):
        """Test with very small payoff values"""
        payoff_matrix = np.array([[1e-10, 1e-10], [1e-10, 1e-10]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=10); ts = codeflash_output # 197μs -> 173μs (13.7% faster)
    
    def test_many_actions(self):
        """Test with larger action space"""
        payoff_matrix = np.eye(5)  # 5x5 identity matrix
        brd = BRD(payoff_matrix, N=3)
        
        codeflash_output = brd.time_series(ts_length=10); ts = codeflash_output # 200μs -> 180μs (11.0% faster)
    
    def test_init_dist_array_like_input(self):
        """Test init_action_dist accepts list input"""
        payoff_matrix = np.array([[1, 0], [0, 1]])
        brd = BRD(payoff_matrix, N=4)
        
        # Pass list instead of array
        init_dist = [2, 2]
        codeflash_output = brd.time_series(ts_length=8, init_action_dist=init_dist); ts = codeflash_output # 182μs -> 160μs (14.0% faster)
    
    def test_tol_parameter_option(self):
        """Test tol parameter is accepted in options"""
        payoff_matrix = np.array([[1, 1], [1, 1]])
        brd = BRD(payoff_matrix, N=2)
        
        # Pass tol option
        codeflash_output = brd.time_series(ts_length=10, tol=1e-6); ts = codeflash_output # 208μs -> 180μs (15.5% faster)
    
    def test_3x3_coordination_game(self):
        """Test 3x3 coordination game"""
        payoff_matrix = np.array([[3, 0, 0],
                                 [0, 2, 0],
                                 [0, 0, 1]])
        brd = BRD(payoff_matrix, N=3)
        
        codeflash_output = brd.time_series(ts_length=20); ts = codeflash_output # 343μs -> 285μs (20.2% faster)

class TestBRDTimeSeriesLargeScale:
    """Large scale tests for BRD.time_series"""
    
    def test_large_number_of_players(self):
        """Test with large N (500 players)"""
        payoff_matrix = np.array([[2, 0], [0, 1]])
        N = 500
        brd = BRD(payoff_matrix, N=N)
        
        codeflash_output = brd.time_series(ts_length=50); ts = codeflash_output # 1.72ms -> 1.48ms (16.3% faster)
    
    def test_long_time_series(self):
        """Test with long time series (ts_length=500)"""
        payoff_matrix = np.array([[3, 1], [1, 3]])
        brd = BRD(payoff_matrix, N=2)
        
        codeflash_output = brd.time_series(ts_length=500); ts = codeflash_output # 6.71ms -> 5.05ms (32.8% faster)
    
    def test_large_action_space(self):
        """Test with large number of actions"""
        # 10x10 payoff matrix
        payoff_matrix = np.eye(10)
        brd = BRD(payoff_matrix, N=5)
        
        codeflash_output = brd.time_series(ts_length=50); ts = codeflash_output # 708μs -> 555μs (27.6% faster)
    
    def test_large_N_long_ts(self):
        """Test with both large N and long time series"""
        payoff_matrix = np.array([[2, 0], [0, 2]])
        brd = BRD(payoff_matrix, N=100)
        
        codeflash_output = brd.time_series(ts_length=100); ts = codeflash_output # 1.59ms -> 1.33ms (19.4% faster)
    
    def test_memory_efficiency_large_ts(self):
        """Verify reasonable memory allocation for large time series"""
        payoff_matrix = np.array([[1, 1], [1, 1]])
        brd = BRD(payoff_matrix, N=50)
        
        codeflash_output = brd.time_series(ts_length=200); ts = codeflash_output # 2.80ms -> 2.24ms (25.1% faster)
    
    def test_numerical_stability_long_series(self):
        """Test numerical stability over long time series"""
        payoff_matrix = np.array([[4, 1], [1, 4]])
        N = 10
        brd = BRD(payoff_matrix, N=N)
        
        codeflash_output = brd.time_series(ts_length=300); ts = codeflash_output # 4.07ms -> 3.08ms (32.1% faster)
    
    def test_various_payoff_values_large_scale(self):
        """Test large scale with various payoff magnitudes"""
        payoff_matrix = np.array([[1e6, 1e-6], [1e-6, 1e6]])
        brd = BRD(payoff_matrix, N=20)
        
        codeflash_output = brd.time_series(ts_length=100); ts = codeflash_output # 1.34ms -> 1.04ms (29.2% faster)
    
    def test_repeated_large_scale_runs_reproducible(self):
        """Test that repeated runs with same seed are reproducible"""
        payoff_matrix = np.array([[2, 1], [1, 2]])
        brd = BRD(payoff_matrix, N=50)
        
        codeflash_output = brd.time_series(ts_length=150, random_state=12345); ts1 = codeflash_output # 2.34ms -> 1.89ms (24.1% faster)
        codeflash_output = brd.time_series(ts_length=150, random_state=12345); ts2 = codeflash_output # 2.34ms -> 1.90ms (23.1% faster)
    
    def test_large_payoff_matrix(self):
        """Test with larger payoff matrix (8x8)"""
        payoff_matrix = np.random.rand(8, 8)
        brd = BRD(payoff_matrix, N=3)
        
        codeflash_output = brd.time_series(ts_length=75); ts = codeflash_output # 1.02ms -> 787μs (29.0% faster)

class TestBRDTimeSeriesIntegration:
    """Integration tests combining multiple features"""
    
    def test_custom_init_with_random_state(self):
        """Test custom init_action_dist combined with random_state"""
        payoff_matrix = np.array([[3, 0], [0, 3]])
        brd = BRD(payoff_matrix, N=4)
        
        init_dist = np.array([3, 1])
        
        codeflash_output = brd.time_series(ts_length=25, init_action_dist=init_dist, 
                             random_state=55); ts1 = codeflash_output # 584μs -> 499μs (17.1% faster)
        codeflash_output = brd.time_series(ts_length=25, init_action_dist=init_dist,
                             random_state=55); ts2 = codeflash_output # 569μs -> 480μs (18.6% faster)
    
    def test_custom_init_with_tol_and_tie_breaking(self):
        """Test multiple options together"""
        payoff_matrix = np.array([[1, 1], [1, 1]])
        brd = BRD(payoff_matrix, N=3)
        
        init_dist = np.array([1, 2])
        codeflash_output = brd.time_series(ts_length=20, init_action_dist=init_dist,
                            tol=1e-7, tie_breaking='smallest',
                            random_state=88); ts = codeflash_output # 516μs -> 449μs (14.9% faster)
    
    def test_monotonic_like_dynamics(self):
        """Test pure coordination game dynamics"""
        # Pure coordination game: both prefer same action
        payoff_matrix = np.array([[2, 0], [0, 1]])
        brd = BRD(payoff_matrix, N=2)
        
        # Start with all on action 0
        init_dist = np.array([2, 0])
        codeflash_output = brd.time_series(ts_length=30, init_action_dist=init_dist,
                            random_state=42); ts = codeflash_output # 655μs -> 544μs (20.6% faster)
    
    def test_full_diversity_initial(self):
        """Test starting from diverse initial distribution"""
        payoff_matrix = np.eye(5)
        brd = BRD(payoff_matrix, N=5)
        
        # One player on each action
        init_dist = np.ones(5, dtype=int)
        codeflash_output = brd.time_series(ts_length=40, init_action_dist=init_dist,
                            random_state=99); ts = codeflash_output # 752μs -> 624μs (20.4% faster)
    
    def test_asymmetric_payoffs_large_scale(self):
        """Test asymmetric payoff matrix at scale"""
        payoff_matrix = np.array([[5, 2], [1, 3]])
        brd = BRD(payoff_matrix, N=100)
        
        codeflash_output = brd.time_series(ts_length=200, random_state=77); ts = codeflash_output # 3.08ms -> 2.56ms (20.2% faster)
    
    def test_different_init_dists_same_brd(self):
        """Test same BRD with different initial distributions"""
        payoff_matrix = np.array([[2, 0], [0, 2]])
        brd = BRD(payoff_matrix, N=4)
        
        init_dist1 = np.array([4, 0])
        init_dist2 = np.array([2, 2])
        init_dist3 = np.array([0, 4])
        
        codeflash_output = brd.time_series(ts_length=15, init_action_dist=init_dist1,
                             random_state=1); ts1 = codeflash_output # 442μs -> 385μs (14.8% faster)
        codeflash_output = brd.time_series(ts_length=15, init_action_dist=init_dist2,
                             random_state=1); ts2 = codeflash_output # 423μs -> 378μs (11.8% faster)
        codeflash_output = brd.time_series(ts_length=15, init_action_dist=init_dist3,
                             random_state=1); ts3 = codeflash_output # 414μs -> 360μs (14.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-BRD.time_series-mkp5t5k4 and push.

Codeflash Static Badge

The optimized code achieves a **23% speedup** through three key optimizations:

### 1. **Vectorized `_set_action_dist` using `np.bincount`** (58% faster)
The original implementation uses a Python loop to populate the action distribution array:
```python
for i in range(self.N):
    action_dist[actions[i]] += 1
```

The optimized version replaces this with NumPy's vectorized `bincount`:
```python
counts = np.bincount(np.asarray(actions, dtype=np.intp), minlength=self.num_actions)
action_dist[:counts.shape[0]] = counts
```

**Why it's faster**: `np.bincount` is implemented in optimized C code and processes the entire array in one operation, avoiding Python loop overhead. The line profiler shows this function improved from 1.23ms to 0.515ms.

### 2. **Inlined `play()` logic in `time_series` loop** (Eliminates function call overhead)
The original code calls `self.play()` inside the main loop, which:
- Incurs function call overhead for 2,881 iterations
- Performs redundant `check_random_state()` calls (taking 13.3% of total time)
- Creates unnecessary intermediate variables

The optimized version inlines the critical operations directly:
```python
action_dist[action] -= 1
next_action = self.player.best_response(...)
action_dist[next_action] += 1
```

**Why it's faster**: Line profiler shows the redundant `check_random_state` calls in the original loop consumed 27.9ms (13.3% of total time). Eliminating these calls and function overhead provides significant savings.

### 3. **Incremental cumulative sum updates** (Avoids redundant recomputation)
The original code recomputes the cumulative sum on every iteration:
```python
action = np.searchsorted(action_dist.cumsum(), player_ind_seq[t], side='right')
```

The optimized version maintains the cumulative sum and updates it incrementally:
```python
cum = action_dist.cumsum()  # Once before loop
# In loop:
action = np.searchsorted(cum, idx, side='right')
# Update cum based on what changed
if action != next_action:
    if action < next_action:
        cum[action:next_action] -= 1
    else:
        cum[next_action:action] += 1
```

**Why it's faster**: Computing `cumsum()` takes 13.5% of loop time in the original code. The incremental update only modifies the affected range, which is cheaper than full recomputation, especially when `action == next_action` (no update needed).

### Impact on Workloads
The test results show consistent speedups across all scenarios:
- **Small games** (N=2-5, ts_length=5-30): 8-21% faster
- **Large-scale** (N=100-500, ts_length=100-500): 16-33% faster, with the best gains on long time series (32.8% for ts_length=500)

The optimizations are particularly effective for:
- Long time series where loop overhead and redundant cumsum computations accumulate
- Games with many players where `_set_action_dist` is more expensive
- Scenarios where actions frequently don't change (`action == next_action`), avoiding cumsum updates entirely
@codeflash-ai codeflash-ai bot requested a review from aseembits93 January 22, 2026 07:58
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants