⚡️ Speed up function _get_mixed_actions by 11%#118
Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
Open
⚡️ Speed up function _get_mixed_actions by 11%#118codeflash-ai[bot] wants to merge 1 commit intomainfrom
_get_mixed_actions by 11%#118codeflash-ai[bot] wants to merge 1 commit intomainfrom
Conversation
The optimized code achieves an **11% speedup** by reducing overhead in the Numba-compiled function through three key strategies: ## Key Optimizations **1. Localized tuple indexing** The original code repeatedly accessed `equation_tup[0]`, `equation_tup[1]`, `trans_recips[0]`, and `trans_recips[1]` inside loops. The optimized version hoists these into local variables (`eq0`, `eq1`, `tr0`, `tr1`, `last0`, `last1`) at the start. In Numba's nopython mode, this eliminates redundant tuple indexing overhead on every iteration. **2. Loop unrolling and explicit normalization** Instead of iterating over a tuple of `(start, stop, skip)` parameters, the optimized code manually handles each player block separately. This eliminates the interpreter overhead of unpacking tuples in the loop. Additionally, normalization is changed from in-place slice division (`out[start:stop] /= sum_`) to an explicit loop with precomputed inverse (`inv = 1.0 / sum_`), which Numba can optimize more effectively and avoids potential slice operation overhead. **3. Reduced bit manipulation overhead** The code maintains a local copy of `labeling_bits` as `lb` and reuses a constant `mask = np.uint64(1)` instead of recreating it each iteration, reducing per-iteration constant creation overhead. ## Impact Analysis From the `function_references`, `_get_mixed_actions` is called within a generator that iterates over potentially many vertex pairs in game theory equilibrium computation. The function sits in the **hot path** of vertex enumeration, being called once per matching labeling pair. Given that: - Tests show **5-14% speedups** across various input sizes (most consistently 7-11%) - The `test_large_scale_many_iterations` test (100 calls) shows **14.6% speedup** (183μs → 160μs), confirming cumulative benefits - Larger action spaces (m=50, n=50; m=10, n=90) maintain **6-8% gains** The optimization is particularly valuable when: - The equilibrium enumeration involves many vertices (common in games with multiple actions) - The function is called repeatedly in batch computations - Players have moderate to large action spaces (n, m > 10), where the reduced per-iteration overhead compounds The changes preserve exact numerical behavior (all tests pass) while delivering consistent performance gains across edge cases, including extreme coefficient ranges and various bit patterns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 11% (0.11x) speedup for
_get_mixed_actionsinquantecon/game_theory/vertex_enumeration.py⏱️ Runtime :
345 microseconds→311 microseconds(best of250runs)📝 Explanation and details
The optimized code achieves an 11% speedup by reducing overhead in the Numba-compiled function through three key strategies:
Key Optimizations
1. Localized tuple indexing
The original code repeatedly accessed
equation_tup[0],equation_tup[1],trans_recips[0], andtrans_recips[1]inside loops. The optimized version hoists these into local variables (eq0,eq1,tr0,tr1,last0,last1) at the start. In Numba's nopython mode, this eliminates redundant tuple indexing overhead on every iteration.2. Loop unrolling and explicit normalization
Instead of iterating over a tuple of
(start, stop, skip)parameters, the optimized code manually handles each player block separately. This eliminates the interpreter overhead of unpacking tuples in the loop. Additionally, normalization is changed from in-place slice division (out[start:stop] /= sum_) to an explicit loop with precomputed inverse (inv = 1.0 / sum_), which Numba can optimize more effectively and avoids potential slice operation overhead.3. Reduced bit manipulation overhead
The code maintains a local copy of
labeling_bitsaslband reuses a constantmask = np.uint64(1)instead of recreating it each iteration, reducing per-iteration constant creation overhead.Impact Analysis
From the
function_references,_get_mixed_actionsis called within a generator that iterates over potentially many vertex pairs in game theory equilibrium computation. The function sits in the hot path of vertex enumeration, being called once per matching labeling pair. Given that:test_large_scale_many_iterationstest (100 calls) shows 14.6% speedup (183μs → 160μs), confirming cumulative benefitsThe optimization is particularly valuable when:
The changes preserve exact numerical behavior (all tests pass) while delivering consistent performance gains across edge cases, including extreme coefficient ranges and various bit patterns.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-_get_mixed_actions-mkp8bi4nand push.