-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Component Selection
- Core Engine (Expression eval, Memory, Vector)
- Connectors / File Formats (Hive, Parquet, etc.)
- API / Bindings (Python, etc.)
- Build
- Other
Describe the Bug
I have seen some runs in CI where the bolt_thrustjit_test fails with an errors.
99% tests passed, 1 tests failed out of 415
Total Test time (real) = 164.95 sec
The following tests did not run:
90 - */MemoryAllocatorTest.allocContiguousVsize/* (Disabled)
128 - */MemoryCapExceededTest.singleDriver/* (Disabled)
Errors while running CTest
130 - */MemoryCapExceededTest.allocatorCapacityExceededError/* (Disabled)
151 - */MemoryPoolTest.memoryLeakCheck/* (Disabled)
190 - */MemoryPoolTest.concurrentUpdateToSharedPools/* (Disabled)
282 - */SharedArbitrationTestWithThreadingModes.raceBetweenTaskTerminateAndReclaim/* (Disabled)
The following tests FAILED:
315 - bolt_thrustjit_test (SEGFAULT)
make: *** [Makefile:360: unittest_release] Error 8
The root cause of failures has varied. One example is reproduced below
302/421 Test #315: bolt_thrustjit_test ..............................................................................***Exception: SegFault 0.27 sec
Running main() from /github/home/.conan2/p/b/gtest9f9ec4a65659c/b/src/googletest/src/gtest_main.cc
[==========] Running 10 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 7 tests from RowContainerJitTest
[ RUN ] RowContainerJitTest.all_types
[ OK ] RowContainerJitTest.all_types (95 ms)
[ RUN ] RowContainerJitTest.two_float_point
[ OK ] RowContainerJitTest.two_float_point (5 ms)
[ RUN ] RowContainerJitTest.float_with_nulls
[ OK ] RowContainerJitTest.float_with_nulls (16 ms)
[ RUN ] RowContainerJitTest.float_point_nan_test
[ OK ] RowContainerJitTest.float_point_nan_test (0 ms)
[ RUN ] RowContainerJitTest.timestamp
[ OK ] RowContainerJitTest.timestamp (5 ms)
[ RUN ] RowContainerJitTest.singleKey
[ OK ] RowContainerJitTest.singleKey (4 ms)
[ RUN ] RowContainerJitTest.stringview
[ OK ] RowContainerJitTest.stringview (8 ms)
[----------] 7 tests from RowContainerJitTest (136 ms total)
[----------] 3 tests from JitEngineTest
[ RUN ] JitEngineTest.basic
[ OK ] JitEngineTest.basic (3 ms)
[ RUN ] JitEngineTest.cacheLimit
JIT session error: Resource tracker 0x38f7e190 became defunct
We need to find the root cause and ensure this test does not fail. It will help the stability of CI and ensure that the failures do not occur in production workloads.
Reproduction Steps
$ make release_with_test
$ _build/Release/bolt/jit/tests/bolt_thrustjit_testBolt Version / Commit ID
System Configuration
- **OS**: Debian Bookworm
- **Compiler**: GCC 12.5.0
- **Build Type**: Release
- **CPU Arch**: x86
- **Framework**: N/ALogs / Stack Trace
[----------] 3 tests from JitEngineTest
[ RUN ] JitEngineTest.basic
[ OK ] JitEngineTest.basic (3 ms)
[ RUN ] JitEngineTest.cacheLimit
JIT session error: Resource tracker 0x38f7e190 became defunct
Sample run: https://github.com/bytedance/bolt/actions/runs/21076158907/job/60618467545Expected Behavior
No SEGFAULT
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working