Skip to content

fix: unaligned memcpy#1004

Open
greenhat wants to merge 25 commits intonextfrom
i1003-fix-byte-memcpy
Open

fix: unaligned memcpy#1004
greenhat wants to merge 25 commits intonextfrom
i1003-fix-byte-memcpy

Conversation

@greenhat
Copy link
Contributor

@greenhat greenhat commented Mar 12, 2026

Close #1003

Summary

  • factor the counted loop setup used by memcpy/memset and add regression coverage for aligned and unaligned byte copies/sets, zero-length operations, and unaligned u16/i16 memory accesses
  • tighten MASM memory emission for byte-oriented memcpy fast paths so they only convert byte pointers to element addresses when the inputs are word-aligned
  • handle unaligned 16-bit loads and stores that cross a 32-bit element boundary by routing offset == 3 through the split-word intrinsics while preserving the existing within-element path for offset <= 2

I suggest reviewing on a per-commit basis skipping non-interesting commits (refactors, etc.).

@greenhat greenhat changed the title fix: byte-version of the memcpy (iteration guard) fix: byte-version of the unaligned memcpy Mar 12, 2026
Fix the byte-addressed memory paths that cross a 32-bit element boundary.
This keeps the `memcpy`/`memset` fallback coverage added in this branch
working for short unaligned copies, including scalarized `u16` loads and
stores at byte offset 3.
Zero-length memory operations must be no-ops, but both loop headers
seeded `while.true` with `count >= 0`, which executes one iteration
when `count == 0`.

Switch the entry condition to a strict unsigned `count > 0` check and
add regressions for zero-count unaligned copy/set paths.
The unaligned `u16` regressions are asserting compiler memory layout,
so they should not depend on the host endianness.

Use `to_le_bytes()` in the expected byte construction to keep the tests
portable and aligned with the byte-addressable memory model.
`memset` and fallback `memcpy` were carrying separate copies of the
same counted `while.true` control flow, which makes fixes easy to miss
in one path.

Extract the shared loop header and back-edge emission so the counted
loop protocol is defined once and reused by both sites.
@greenhat greenhat force-pushed the i1003-fix-byte-memcpy branch from 3f5b5d0 to 30b783a Compare March 17, 2026 05:02
greenhat added 20 commits March 17, 2026 07:29
Only offset 3 spans two elements for a `u16` load/store. Route the
other unaligned offsets through the existing single-element logic so we
don't spuriously touch `addr + 1` at the end of memory.
Add regression cases for byte offsets 1 and 2 in the integration suite,
and add emitter-level tests that exercise unaligned `load_imm` and
`store_imm` for `u16` addresses.
Cover the aligned byte-copy fast path and a case where only `count` is
misaligned so the fast-path predicate is regression-tested as well.
@greenhat greenhat marked this pull request as ready for review March 18, 2026 14:03
@greenhat greenhat changed the title fix: byte-version of the unaligned memcpy fix: unaligned memcpy Mar 18, 2026
@greenhat greenhat requested a review from bitwalker March 18, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix byte-version of memcpy

1 participant