fix: use parallelized `numba` functions if possible by ilan-gold · Pull Request #155 · scverse/fast-array-utils

ilan-gold · 2026-03-02T14:22:47Z

This is basically a one-to-one port of the scanpy function. It might make sense to export this function from this package but maybe it's a weird fit as well. Not sure!

codecov · 2026-03-02T14:24:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.10%. Comparing base (e50c44f) to head (5e82a76).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #155      +/-   ##
==========================================
- Coverage   99.14%   97.10%   -2.04%     
==========================================
  Files          19       20       +1     
  Lines         466      519      +53     
==========================================
+ Hits          462      504      +42     
- Misses          4       15      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2026-03-02T14:34:02Z

Merging this PR will degrade performance by 47.48%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 6 (👁 6) regressed benchmarks
✅ 226 untouched benchmarks

Performance Changes

	Benchmark	`BASE`	`HEAD`	Efficiency
👁	`test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-float32-is_constant]`	1.8 ms	2.9 ms	-38.18%
👁	`test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-float64-is_constant]`	1.8 ms	2.9 ms	-37.6%
👁	`test_stats_benchmark[scipy.sparse.csc_array-2d-ax0-int32-is_constant]`	1.7 ms	2.8 ms	-38.88%
👁	`test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-float32-is_constant]`	1.7 ms	2.8 ms	-38.12%
👁	`test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-int32-is_constant]`	1.9 ms	3.5 ms	-47.48%
👁	`test_stats_benchmark[scipy.sparse.csr_array-2d-ax1-float64-is_constant]`	1.7 ms	2.8 ms	-37.98%

_{Comparing ig/parallel_kernels (5e82a76) with main (e50c44f)}

ilan-gold · 2026-03-19T15:51:26Z

The codspeed looks to be entirely overhead, which I take to be on account of the size of the data. Otherwise, the actual function calls are faster. WDYT @flying-sheep ?

src/testing/fast_array_utils/pytest.py

flying-sheep

Can you explain dask_single_threaded? isn’t that a red flag? Shouldn’t numba just work in dask with our decorator?

…-array-utils into ig/parallel_kernels

flying-sheep · 2026-03-23T12:45:37Z

OK, running with the right flags shows the warning how users would see it:

❯ hatch test -- -s -p no:warnings tests/test_stats.py::test_is_constant
[…]                                                                                                                                                                                                                                                                                                                      
tests/test_stats.py ...sss...............sss./Users/philipp.angerer/Dev/Python/fast-array-utils/src/fast_array_utils/stats/_is_constant.py:66: UserWarning: Detected unsupported threading environment. Trying to run _is_constant_cs_major in serial mode. In case of problems, install `tbb`.
  return _is_constant_cs_major(a, shape)
Numba workqueue threading layer is terminating: Concurrent access has been detected.

 - The workqueue threading layer is not threadsafe and may not be accessed concurrently by multiple threads. Concurrent access typically occurs through a nested parallel region launch or by calling Numba parallel=True functions from multiple Python threads.
 - Try using the TBB threading layer as an alternative, as it is, itself, threadsafe. Docs: https://numba.readthedocs.io/en/stable/user/threading-layer.html

Fatal Python error: Aborted
[…]

so numba actually has something more to say and has the same recommendation

flying-sheep · 2026-03-23T14:41:03Z

Wew, seems like there’s no way to get a thread-safe numba backend on macOS: numba/numba#10492

flying-sheep · 2026-03-27T09:47:19Z

regarding these too-fast benchmarks: kinda interesting that our numba function for rowwise counting is 200× faster than bool((a == a.flat[0]).all())

10M elements in 20µs is crazy, no? (that’s walltime on my CPU for these)

ilan-gold · 2026-03-27T10:06:34Z

Seems like good news :)

flying-sheep · 2026-03-27T10:53:52Z

wait, sorry, but running hatch test tests/test_stats.py::test_is_constant on macOS

works on main (@numba.njit(parallel=False))
crashes when removing the dask_single_threaded fixture (our @njit)

Isn’t the whole purpose of the decorator to use the same code that is currently on main when in a threadpool? Why doesn’t it work?

flying-sheep · 2026-03-27T11:42:22Z

HELL YEAH!

So all we needed to do was making it actually use different caches. https://github.com/numba/numba/blob/2f464e5deb07071bd365db971b4a4ae57dca5153/numba/core/caching.py#L388

A more robust solution might be to override nb.config.CACHE_LOCATOR_CLASSES which would work in case the cache location would stop being qualname based. But it also doesn’t save us from having to modify the Python function as numba’s cache only gets that.

After the change:

Foolish of me that they would support caching the same function twice with different jit parameters 🙄

fix: use parallelized numba functions if possible

e91911a

ilan-gold added the run-gpu-ci Apply this label to run GPU CI once label Mar 2, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 2, 2026

Merge branch 'main' into ig/parallel_kernels

f5f9bb6

ilan-gold added the run-gpu-ci Apply this label to run GPU CI once label Mar 19, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 19, 2026

ilan-gold mentioned this pull request Mar 20, 2026

perf: parallel downsample scverse/scanpy#4004

Draft

flying-sheep reviewed Mar 20, 2026

View reviewed changes

src/testing/fast_array_utils/pytest.py Outdated Show resolved Hide resolved

flying-sheep reviewed Mar 20, 2026

View reviewed changes

ilan-gold added 3 commits March 20, 2026 14:33

fix: no dask threaded

ba7da13

Merge branch 'ig/parallel_kernels' of https://github.com/scverse/fast…

54584b2

…-array-utils into ig/parallel_kernels

Merge branch 'main' into ig/parallel_kernels

b073c49

flying-sheep added 2 commits March 23, 2026 15:22

hopefully fix tests

9bbae70

only on linux and win

e5ee035

flying-sheep added 2 commits March 23, 2026 16:09

re-introduce workaround conditionally

4db019b

whoops

a8256ea

flying-sheep marked this pull request as ready for review March 23, 2026 15:27

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 23, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 23, 2026

flying-sheep added 2 commits March 24, 2026 11:07

Merge branch 'main' into ig/parallel_kernels

3c15c24

better docs

4b7e580

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

Update coverage report exclusions in pyproject.toml

b6c6b2f

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

flying-sheep added 2 commits March 27, 2026 11:24

Merge branch 'main' into ig/parallel_kernels

0046aac

coverage

01c2a5f

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

fix: actually fix crash

dfa9985

flying-sheep added 3 commits March 27, 2026 12:51

prettier cache name

9aeb18c

export

1c24679

whoops

dee7880

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

maybe fix GPU code

867411a

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

github-actions bot removed the run-gpu-ci Apply this label to run GPU CI once label Mar 27, 2026

cov

5e82a76

flying-sheep merged commit b4241d4 into main Mar 27, 2026
4 of 7 checks passed

flying-sheep deleted the ig/parallel_kernels branch March 27, 2026 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use parallelized `numba` functions if possible#155

fix: use parallelized `numba` functions if possible#155
flying-sheep merged 20 commits intomainfrom
ig/parallel_kernels

ilan-gold commented Mar 2, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

flying-sheep left a comment

Uh oh!

flying-sheep commented Mar 23, 2026

Uh oh!

flying-sheep commented Mar 23, 2026

Uh oh!

flying-sheep commented Mar 27, 2026 •

edited

Loading

Uh oh!

ilan-gold commented Mar 27, 2026

Uh oh!

flying-sheep commented Mar 27, 2026 •

edited

Loading

Uh oh!

flying-sheep commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ilan-gold commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 47.48%

Performance Changes

Uh oh!

ilan-gold commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

flying-sheep left a comment

Choose a reason for hiding this comment

Uh oh!

flying-sheep commented Mar 23, 2026

Uh oh!

flying-sheep commented Mar 23, 2026

Uh oh!

flying-sheep commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold commented Mar 27, 2026

Uh oh!

flying-sheep commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilan-gold commented Mar 2, 2026 •

edited

Loading

codecov bot commented Mar 2, 2026 •

edited

Loading

codspeed-hq bot commented Mar 2, 2026 •

edited

Loading

ilan-gold commented Mar 19, 2026 •

edited

Loading

flying-sheep commented Mar 27, 2026 •

edited

Loading

flying-sheep commented Mar 27, 2026 •

edited

Loading

flying-sheep commented Mar 27, 2026 •

edited

Loading