fix: use parallelized numba functions if possible#155
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #155 +/- ##
==========================================
- Coverage 99.14% 97.10% -2.04%
==========================================
Files 19 20 +1
Lines 466 519 +53
==========================================
+ Hits 462 504 +42
- Misses 4 15 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will degrade performance by 47.48%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| 👁 | test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-float32-is_constant] |
1.8 ms | 2.9 ms | -38.18% |
| 👁 | test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-float64-is_constant] |
1.8 ms | 2.9 ms | -37.6% |
| 👁 | test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-int32-is_constant] |
1.7 ms | 2.8 ms | -38.88% |
| 👁 | test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-float32-is_constant] |
1.7 ms | 2.8 ms | -38.12% |
| 👁 | test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-int32-is_constant] |
1.9 ms | 3.5 ms | -47.48% |
| 👁 | test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-float64-is_constant] |
1.7 ms | 2.8 ms | -37.98% |
Comparing ig/parallel_kernels (5e82a76) with main (e50c44f)
|
The codspeed looks to be entirely overhead, which I take to be on account of the size of the data. Otherwise, the actual function calls are faster. WDYT @flying-sheep ? |
flying-sheep
left a comment
There was a problem hiding this comment.
Can you explain dask_single_threaded? isn’t that a red flag? Shouldn’t numba just work in dask with our decorator?
|
OK, running with the right flags shows the warning how users would see it: ❯ hatch test -- -s -p no:warnings tests/test_stats.py::test_is_constant
[…]
tests/test_stats.py ...sss...............sss./Users/philipp.angerer/Dev/Python/fast-array-utils/src/fast_array_utils/stats/_is_constant.py:66: UserWarning: Detected unsupported threading environment. Trying to run _is_constant_cs_major in serial mode. In case of problems, install `tbb`.
return _is_constant_cs_major(a, shape)
Numba workqueue threading layer is terminating: Concurrent access has been detected.
- The workqueue threading layer is not threadsafe and may not be accessed concurrently by multiple threads. Concurrent access typically occurs through a nested parallel region launch or by calling Numba parallel=True functions from multiple Python threads.
- Try using the TBB threading layer as an alternative, as it is, itself, threadsafe. Docs: https://numba.readthedocs.io/en/stable/user/threading-layer.html
Fatal Python error: Aborted
[…]so numba actually has something more to say and has the same recommendation |
|
Wew, seems like there’s no way to get a thread-safe numba backend on macOS: numba/numba#10492 |
|
regarding these too-fast benchmarks: kinda interesting that our numba function for rowwise counting is 200× faster than 10M elements in 20µs is crazy, no? (that’s walltime on my CPU for these) |
|
Seems like good news :) |
|
wait, sorry, but running
Isn’t the whole purpose of the decorator to use the same code that is currently on |
|
HELL YEAH! So all we needed to do was making it actually use different caches. https://github.com/numba/numba/blob/2f464e5deb07071bd365db971b4a4ae57dca5153/numba/core/caching.py#L388 A more robust solution might be to override After the change:
Foolish of me that they would support caching the same function twice with different |

This is basically a one-to-one port of the
scanpyfunction. It might make sense to export this function from this package but maybe it's a weird fit as well. Not sure!