Open
Conversation
memcpy (iteration guard)memcpy
Fix the byte-addressed memory paths that cross a 32-bit element boundary. This keeps the `memcpy`/`memset` fallback coverage added in this branch working for short unaligned copies, including scalarized `u16` loads and stores at byte offset 3.
Zero-length memory operations must be no-ops, but both loop headers seeded `while.true` with `count >= 0`, which executes one iteration when `count == 0`. Switch the entry condition to a strict unsigned `count > 0` check and add regressions for zero-count unaligned copy/set paths.
The unaligned `u16` regressions are asserting compiler memory layout, so they should not depend on the host endianness. Use `to_le_bytes()` in the expected byte construction to keep the tests portable and aligned with the byte-addressable memory model.
`memset` and fallback `memcpy` were carrying separate copies of the same counted `while.true` control flow, which makes fixes easy to miss in one path. Extract the shared loop header and back-edge emission so the counted loop protocol is defined once and reused by both sites.
3f5b5d0 to
30b783a
Compare
Only offset 3 spans two elements for a `u16` load/store. Route the other unaligned offsets through the existing single-element logic so we don't spuriously touch `addr + 1` at the end of memory.
Add regression cases for byte offsets 1 and 2 in the integration suite, and add emitter-level tests that exercise unaligned `load_imm` and `store_imm` for `u16` addresses.
Cover the aligned byte-copy fast path and a case where only `count` is misaligned so the fast-path predicate is regression-tested as well.
memcpymemcpy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Close #1003
Summary
memcpy/memsetand add regression coverage for aligned and unaligned byte copies/sets, zero-length operations, and unalignedu16/i16memory accessesmemcpyfast paths so they only convert byte pointers to element addresses when the inputs are word-alignedoffset == 3through the split-word intrinsics while preserving the existing within-element path foroffset <= 2I suggest reviewing on a per-commit basis skipping non-interesting commits (refactors, etc.).