Skip to content

Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8274

Draft
alsepkow wants to merge 1 commit intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-vector-load
Draft

Fix rawBufferVectorLoad/Store to widen min precision types to 32-bit#8274
alsepkow wants to merge 1 commit intomicrosoft:mainfrom
alsepkow:user/alsepkow/fix-min-precision-vector-load

Conversation

@alsepkow
Copy link
Contributor

@alsepkow alsepkow commented Mar 18, 2026

Summary

Fixes RawBufferVectorLoad/Store to use 32-bit element types (i32/f32) for min precision types (min16int, min16uint, min16float) instead of 16-bit (i16/f16). This matches how pre-SM6.9 RawBufferLoad handles min precision.

Resolves #8273

Root Cause

TranslateBufLoad in HLOperationLower.cpp creates the vector type directly from the min precision element type (i16/f16) without widening to i32/f32. This causes WARP (and potentially other drivers) to load/store 2 bytes per element instead of 4, mismatching the buffer layout.

Fix

Apply the same widening pattern used for bool types:

  • Load: Load as v_i32/v_f32, then trunc/fptrunc back to i16/half
  • Store: sext/fpext to i32/f32, then store as v_i32/v_f32

Testing

Added FileCheck test verifying all 3 min precision types produce i32/f32 vector load/store ops.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

RawBufferVectorLoad/Store for min precision types (min16int, min16uint,
min16float) was emitting i16/f16 vector operations (e.g., v3i16) which
causes WARP and potentially other drivers to load/store 2 bytes per
element instead of 4. This mismatches the buffer layout when the CPU
writes 32-bit values.

Pre-SM6.9 RawBufferLoad correctly handles this by loading as i32/f32
and truncating. Apply the same pattern for SM6.9 vector variants:
- RawBufferVectorLoad: load as v_i32/v_f32, then trunc to i16/half
- RawBufferVectorStore: sext/fpext to i32/f32, then store as v_i32/v_f32

This matches the existing bool widening pattern already in TranslateBufLoad.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

rawBufferVectorLoad/Store emits i16/f16 for min precision types instead of i32/f32

1 participant