NVBit infinite loops when instrument a cuDNN function.

I try to use NVBit to instrument the following tensorflow program.
```python
import tensorflow as tf
from keras import layers
import os
os.environ["TF_DISABLE_RZ_CHECK"] = "1"
os.environ["TF_GPU_ALLOCATOR"] = "cuda_malloc_async"
tf.keras.backend.set_image_data_format('channels_first')
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
tf.config.run_functions_eagerly(True)

tensor = tf.zeros([1, 2, 859043])
model = layers.Conv1D(filters=2, kernel_size=524287, strides=1, groups=2)
model(tensor)

print("DONE")
```

It stuck after launching the following kernel, which is a cuDNN kernel.
```
MEMTRACE: CTX 0x00000000050f8db0 - LAUNCH - Kernel pc 0x00007ff9a038f900 - Kernel name sm80_xmma_fprop_implicit_gemm_indexed_tf32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize1x4x1_g16_tensor16x8x8_execute_kernel__5x_cudnn - grid launch id 12 - grid size 1,5231,1 - block size 128,1,1 - nregs 166 - shmem 132096 - cuda stream id 1276264096
```

Also, after viewing the memory access pattern produced by NVBit, I found that a single address is accessed multiple times by the same CTA/wrap. Thus, I suspect that some sort of infinite loop is introduced by NVBit.

The program is OK if run without NVBit/compute-sanitizer; it finishes in a minute.

But it fails if it's instrumented with NVBit/compute-sanitizer.

Since both cuDNN and compute-sanitizer belongs to NVIDIA, I thought perhaps you could help on finding the root cause.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVBit infinite loops when instrument a cuDNN function. #146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

NVBit infinite loops when instrument a cuDNN function. #146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions