Fix flaky test_max_pending_count by preventing premature GC#857
Fix flaky test_max_pending_count by preventing premature GC#857xyaz1313 wants to merge 1 commit into
Conversation
…mature GC The test creates DeviceNDArray objects without storing references, allowing Python's GC to non-deterministically collect them between loop iterations. This causes extra deallocations to appear in the pending queue, making assertions like 'len(deallocs) == i + 1' fail intermittently. Fix: store all arrays in a list, then explicitly delete them one by one. Added gc.collect() calls to ensure finalizers fire before assertions. Fixes NVIDIA#856
|
I don't understand the root cause analysis here. My understanding is that by storing no reference to the My concern is that the real underlying issue could be that with all the changes we've made to leverage cuda-core / cuda-python more, we've accidentally introduced reference cycles that keep device memory alive even when nothing "downstream" of the call to Perhaps I have some misunderstanding about the Python GC or the deallocation behaviour is supposed to work - if so, could you help me understand how things fit together please? |
Problem
TestDeallocation.test_max_pending_count is flaky — intermittently fails with .
Root Cause
The test creates DeviceNDArray objects without storing references:
Each
cuda.to_device()returns a DeviceNDArray → OwnedPointer → weakref.finalizer. When Python's GC non-deterministically collects these temporaries between loop iterations, their finalizers fire early and add deallocations to the pending queue. This makeslen(deallocs)exceed the expectedi + 1.Fix
gc.collect()after each deletion to ensure finalizers fireThis makes deallocation timing fully deterministic.
Reproduction
The flakiness depends on GC timing — it happens more frequently with high memory pressure or when running the full test suite (where previous tests accumulate GC-tracked objects).
Fixes #856