[Question] Correct implementation of Fused Conv3d + Bias on SM80 Tensor Cores #2958

Crinton · 2026-01-14T13:28:31Z

Crinton
Jan 14, 2026

I am trying to implement a fused Conv3d (NDHWC) + Bias operator using CUTLASS on an Ampere (SM80) GPU with Tensor Cores.

The goal is to achieve: Output = alpha * Conv3d(Input, Weight) + Bias.

When I don't use bias, the results are exactly the same as with torch.conv3d. However, when I use bias, a calculation error occurs, and this is not due to accumulated error.

The Problem

Implicit Broadcast (Stride 0): Setting Tensor C's stride to 0 to mimic a bias vector. This works for Conv2d in official examples but fails for Conv3d, likely due to 5D coordinate mapping issues.

Request
Could the maintainers provide a clear example or confirm the correct Arguments structure for Conv3d + Bias on SM80 Tensor Cores? Specifically:

Is LinearCombination fully supported for OpClassTensorOp in 3D convolutions?

What is the correct way to pass the Bias vector?

``

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Correct implementation of Fused Conv3d + Bias on SM80 Tensor Cores #2958

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[Question] Correct implementation of Fused Conv3d + Bias on SM80 Tensor Cores #2958

Uh oh!

Crinton Jan 14, 2026

Replies: 0 comments

Crinton
Jan 14, 2026