Commit 58d3593
authored
bugfix: fix cu118 cub usage (#410)
Related issue: sgl-project/sglang#771
This PR fixes the usage of `FlagHeads` cub API in sampling kernels.
As
[documented](https://nvidia.github.io/cccl/cub/api/classcub_1_1BlockDiscontinuity.html),
the default FlagHeads api will always flag the first element, which is
not expected when first element is not `true`.
> For thread0, item input[0] is always flagged.
This PR sets the `tile_predecessor_item` argument (to 0) which will be
compared against input[0].
CUDA 12+ don't have this issue because we are using the new
`SubtractLeft` API instead of `FlagHeads`.1 parent aaa929a commit 58d3593
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
121 | | - | |
| 121 | + | |
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
| |||
0 commit comments