Currently, each device-scope algorithm in CUB has a debug_synchronous parameter which is defaulted to true. Being a runtime variable, it leads to a few issues:
- In case of
-rdc compilation, there's a higher memory footprint (issue).
_CubLog has to inject code into binary, which might affect performance
- Thrust uses a
THRUST_DEBUG_SYNC macro instead of a runtime variable. It'd be better to have consistency between libraries.
Since 2.0 is a breaking release, I suggest we break debug_synchronous support. Deprecation would lead to a code bloat, because we'd have to introduce new template parameters into dispatch/agent layers and still maintain the new scheme (see below). In general, the option should only be used for CUB debugging, so I don't think that it's a big issue. To inform users about the breaking change I suggest we leave the API when possible, but use a static assert to tell about the replacing macro:
cub::Device::API(...) { dispatch(...); }
cub::Device::API(..., bool debug_synchronous) { static_assert(false, "Use new approach"); }
To replace the parameter, I suggest we introduce a CUB_DEBUG_LEVEL macro with various convenience aliases. When set to 1 it'll lead to the same behaviour we have in the case of debug_synchronous=true: device synchronization after each kernel invocation, logging of kernel launches. When set to 2 it'll lead to precondition checks. For instance, we can check that pointers are device-accessible, that segments in segmented sort don't overlap etc. More importantly, this approach would allow us to embed precondition checks into kernels with no overhead.
The convenience macros might be:
#ifdef CUB_DEBUG_SYNC
#define CUB_DEBUG_LEVEL 1
#endif
#ifdef CUB_DEBUG_ASSERTIONS
#define CUB_DEBUG_LEVEL 2
#endif
Currently, each device-scope algorithm in CUB has a
debug_synchronousparameter which is defaulted totrue. Being a runtime variable, it leads to a few issues:-rdccompilation, there's a higher memory footprint (issue)._CubLoghas to inject code into binary, which might affect performanceTHRUST_DEBUG_SYNCmacro instead of a runtime variable. It'd be better to have consistency between libraries.Since 2.0 is a breaking release, I suggest we break
debug_synchronoussupport. Deprecation would lead to a code bloat, because we'd have to introduce new template parameters into dispatch/agent layers and still maintain the new scheme (see below). In general, the option should only be used for CUB debugging, so I don't think that it's a big issue. To inform users about the breaking change I suggest we leave the API when possible, but use a static assert to tell about the replacing macro:To replace the parameter, I suggest we introduce a
CUB_DEBUG_LEVELmacro with various convenience aliases. When set to1it'll lead to the same behaviour we have in the case ofdebug_synchronous=true: device synchronization after each kernel invocation, logging of kernel launches. When set to2it'll lead to precondition checks. For instance, we can check that pointers are device-accessible, that segments in segmented sort don't overlap etc. More importantly, this approach would allow us to embed precondition checks into kernels with no overhead.The convenience macros might be: