Skip to content

fix: use parallelized numba functions if possible#155

Merged
flying-sheep merged 20 commits intomainfrom
ig/parallel_kernels
Mar 27, 2026
Merged

fix: use parallelized numba functions if possible#155
flying-sheep merged 20 commits intomainfrom
ig/parallel_kernels

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented Mar 2, 2026

This is basically a one-to-one port of the scanpy function. It might make sense to export this function from this package but maybe it's a weird fit as well. Not sure!

@ilan-gold ilan-gold added the run-gpu-ci Apply this label to run GPU CI once label Mar 2, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.10%. Comparing base (e50c44f) to head (5e82a76).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #155      +/-   ##
==========================================
- Coverage   99.14%   97.10%   -2.04%     
==========================================
  Files          19       20       +1     
  Lines         466      519      +53     
==========================================
+ Hits          462      504      +42     
- Misses          4       15      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 2, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 2, 2026

Merging this PR will degrade performance by 47.48%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 6 (👁 6) regressed benchmarks
✅ 226 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
👁 test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-float32-is_constant] 1.8 ms 2.9 ms -38.18%
👁 test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-float64-is_constant] 1.8 ms 2.9 ms -37.6%
👁 test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-int32-is_constant] 1.7 ms 2.8 ms -38.88%
👁 test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-float32-is_constant] 1.7 ms 2.8 ms -38.12%
👁 test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-int32-is_constant] 1.9 ms 3.5 ms -47.48%
👁 test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-float64-is_constant] 1.7 ms 2.8 ms -37.98%

Comparing ig/parallel_kernels (5e82a76) with main (e50c44f)

Open in CodSpeed

@ilan-gold ilan-gold added the run-gpu-ci Apply this label to run GPU CI once label Mar 19, 2026
@ilan-gold
Copy link
Copy Markdown
Contributor Author

ilan-gold commented Mar 19, 2026

The codspeed looks to be entirely overhead, which I take to be on account of the size of the data. Otherwise, the actual function calls are faster. WDYT @flying-sheep ?

@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 19, 2026
Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain dask_single_threaded? isn’t that a red flag? Shouldn’t numba just work in dask with our decorator?

@flying-sheep
Copy link
Copy Markdown
Member

OK, running with the right flags shows the warning how users would see it:

hatch test -- -s -p no:warnings tests/test_stats.py::test_is_constant
[…]                                                                                                                                                                                                                                                                                                                      
tests/test_stats.py ...sss...............sss./Users/philipp.angerer/Dev/Python/fast-array-utils/src/fast_array_utils/stats/_is_constant.py:66: UserWarning: Detected unsupported threading environment. Trying to run _is_constant_cs_major in serial mode. In case of problems, install `tbb`.
  return _is_constant_cs_major(a, shape)
Numba workqueue threading layer is terminating: Concurrent access has been detected.

 - The workqueue threading layer is not threadsafe and may not be accessed concurrently by multiple threads. Concurrent access typically occurs through a nested parallel region launch or by calling Numba parallel=True functions from multiple Python threads.
 - Try using the TBB threading layer as an alternative, as it is, itself, threadsafe. Docs: https://numba.readthedocs.io/en/stable/user/threading-layer.html

Fatal Python error: Aborted
[…]

so numba actually has something more to say and has the same recommendation

@flying-sheep
Copy link
Copy Markdown
Member

Wew, seems like there’s no way to get a thread-safe numba backend on macOS: numba/numba#10492

@flying-sheep flying-sheep marked this pull request as ready for review March 23, 2026 15:27
@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 23, 2026
@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 23, 2026
@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@flying-sheep
Copy link
Copy Markdown
Member

flying-sheep commented Mar 27, 2026

regarding these too-fast benchmarks: kinda interesting that our numba function for rowwise counting is 200× faster than bool((a == a.flat[0]).all())

10M elements in 20µs is crazy, no? (that’s walltime on my CPU for these)

@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@ilan-gold
Copy link
Copy Markdown
Contributor Author

Seems like good news :)

@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@flying-sheep
Copy link
Copy Markdown
Member

flying-sheep commented Mar 27, 2026

wait, sorry, but running hatch test tests/test_stats.py::test_is_constant on macOS

  1. works on main (@numba.njit(parallel=False))
  2. crashes when removing the dask_single_threaded fixture (our @njit)

Isn’t the whole purpose of the decorator to use the same code that is currently on main when in a threadpool? Why doesn’t it work?

@flying-sheep
Copy link
Copy Markdown
Member

flying-sheep commented Mar 27, 2026

HELL YEAH!

So all we needed to do was making it actually use different caches. https://github.com/numba/numba/blob/2f464e5deb07071bd365db971b4a4ae57dca5153/numba/core/caching.py#L388

A more robust solution might be to override nb.config.CACHE_LOCATOR_CLASSES which would work in case the cache location would stop being qualname based. But it also doesn’t save us from having to modify the Python function as numba’s cache only gets that.

After the change:

image

Foolish of me that they would support caching the same function twice with different jit parameters 🙄

@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@flying-sheep flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@github-actions github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026
@flying-sheep flying-sheep merged commit b4241d4 into main Mar 27, 2026
4 of 7 checks passed
@flying-sheep flying-sheep deleted the ig/parallel_kernels branch March 27, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants