compile: fail gracefully instead of SIGBUS on truncated standalone binary#29665
compile: fail gracefully instead of SIGBUS on truncated standalone binary#29665dylan-conway wants to merge 1 commit into
Conversation
…truncated Since #26923, Linux standalone executables read their embedded module graph via a PT_LOAD segment that the kernel maps at execve. The kernel does not validate p_offset + p_filesz against st_size, so if the binary on disk is shorter than the segment header claims (incomplete download, partial copy), the mapping succeeds and the process SIGBUSes on first access to a page past EOF — typically while reading the trailer in StandaloneModuleGraph.fromExecutable. Before #26923 the same condition surfaced as a read() error. Restore that gracefulness: locate the segment via getauxval(AT_PHDR/AT_PHNUM), stat /proc/self/exe, and exit with a clear "binary is incomplete, re-download" message when the file is shorter than required. Also bound-check the embedded length header against the segment so a corrupt header cannot walk off the mapping. stat() needs no read permission, so the chmod-111 case that motivated the original change is unaffected; if /proc or auxv is unavailable the check is skipped rather than failing closed.
|
Updated 12:05 AM PT - Apr 24th, 2026
❌ @dylan-conway, your commit 565e67c has 1 failures in
🧪 To try this PR locally: bunx bun-pr 29665That installs a local version of the PR into your bun-29665 --bun |
WalkthroughThe changes add truncation detection to the ELF standalone loader. The loader now validates embedded module data integrity using Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/StandaloneModuleGraph.zig`:
- Around line 225-238: The call to reportTruncated(0, vaddr) incorrectly passes
an in-memory virtual address (vaddr) as the required file size; update the call
site so reportTruncated receives a meaningful required-bytes value (or 0 if the
precise file-size requirement cannot be determined) instead of vaddr. Locate the
call that passes vaddr and change the second argument to the correct
file-offset/size (or 0) and ensure reportTruncated(have: u64, required: u64)
continues to log "have" and "required" as file byte counts rather than virtual
addresses.
- Around line 191-198: The code currently calls std.mem.readInt(u64,
target[0..8], .little) (via target and payload_len) before proving the 8-byte
header is inside the segment; move or add a guard that computes the available
bytes in the segment (using seg.p_memsz, seg.p_vaddr and vaddr as used in
seg_avail) and require seg_avail >= 8 before dereferencing target[0..8]; only
after that safe-check read the 8-byte length header into payload_len and then
perform the existing payload_len > seg_avail clamp check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a909b21f-aae0-41af-8a56-c01fb2b87d0b
📒 Files selected for processing (2)
src/StandaloneModuleGraph.zigtest/bundler/bun-build-compile.test.ts
| const target: [*]const u8 = @ptrFromInt(vaddr); | ||
| const payload_len = std.mem.readInt(u64, target[0..8], .little); | ||
| if (payload_len < 8) return null; | ||
|
|
||
| // Clamp to the segment so a corrupted length header cannot walk | ||
| // off the end of the mapping into adjacent (or absent) pages. | ||
| const seg_avail = seg.p_memsz -| (vaddr - seg.p_vaddr) -| 8; | ||
| if (payload_len > seg_avail) { |
There was a problem hiding this comment.
Guard the 8-byte length header before readInt().
Line 192 dereferences target[0..8] before proving that vaddr has 8 bytes left inside seg. If BUN_COMPILED is stale and lands in the last few bytes of a PT_LOAD, this can still fault before the later payload_len > seg_avail check runs.
🔧 Suggested fix
const target: [*]const u8 = `@ptrFromInt`(vaddr);
+ const seg_offset = vaddr - seg.p_vaddr;
+ if (seg_offset > seg.p_memsz or seg.p_memsz - seg_offset < `@sizeOf`(u64)) {
+ reportTruncated(
+ seg.p_offset + `@min`(seg.p_filesz, seg_offset),
+ seg.p_offset + seg_offset + `@sizeOf`(u64),
+ );
+ }
const payload_len = std.mem.readInt(u64, target[0..8], .little);
if (payload_len < 8) return null;
// Clamp to the segment so a corrupted length header cannot walk
// off the end of the mapping into adjacent (or absent) pages.
- const seg_avail = seg.p_memsz -| (vaddr - seg.p_vaddr) -| 8;
+ const seg_avail = seg.p_memsz - seg_offset - `@sizeOf`(u64);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/StandaloneModuleGraph.zig` around lines 191 - 198, The code currently
calls std.mem.readInt(u64, target[0..8], .little) (via target and payload_len)
before proving the 8-byte header is inside the segment; move or add a guard that
computes the available bytes in the segment (using seg.p_memsz, seg.p_vaddr and
vaddr as used in seg_avail) and require seg_avail >= 8 before dereferencing
target[0..8]; only after that safe-check read the 8-byte length header into
payload_len and then perform the existing payload_len > seg_avail clamp check.
| // vaddr is set but lies outside every PT_LOAD: the segment was | ||
| // dropped (e.g. by an aggressive ELF rewriter) while BUN_COMPILED | ||
| // kept its stale pointer. Treat as truncation. | ||
| reportTruncated(0, vaddr); | ||
| } | ||
|
|
||
| fn reportTruncated(have: u64, required: u64) noreturn { | ||
| Output.errGeneric( | ||
| "this single-file executable is incomplete — its embedded module data is not fully present on disk (have {d} bytes, need at least {d}).\n" ++ | ||
| " This usually means the binary was only partially downloaded or copied. Re-download or reinstall and try again.", | ||
| .{ have, required }, | ||
| ); | ||
| bun.Global.exit(1); | ||
| } |
There was a problem hiding this comment.
Don’t format a virtual address as a required file size.
Line 228 passes vaddr into reportTruncated(), but vaddr is an in-memory address, not a byte offset in /proc/self/exe. On this path the error can report absurd “need at least … bytes” numbers instead of a meaningful corruption message.
🔧 Suggested fix
// vaddr is set but lies outside every PT_LOAD: the segment was
// dropped (e.g. by an aggressive ELF rewriter) while BUN_COMPILED
// kept its stale pointer. Treat as truncation.
- reportTruncated(0, vaddr);
+ Output.errGeneric(
+ "this single-file executable is invalid — its embedded module pointer does not belong to any PT_LOAD segment.",
+ .{},
+ );
+ bun.Global.exit(1);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/StandaloneModuleGraph.zig` around lines 225 - 238, The call to
reportTruncated(0, vaddr) incorrectly passes an in-memory virtual address
(vaddr) as the required file size; update the call site so reportTruncated
receives a meaningful required-bytes value (or 0 if the precise file-size
requirement cannot be determined) instead of vaddr. Locate the call that passes
vaddr and change the second argument to the correct file-offset/size (or 0) and
ensure reportTruncated(have: u64, required: u64) continues to log "have" and
"required" as file byte counts rather than virtual addresses.
| const seg = findLoadSegment(vaddr) orelse { | ||
| // No auxv (static? unusual loader). Fall back to the unchecked | ||
| // path rather than fail closed. | ||
| return getDataUnchecked(vaddr); | ||
| }; | ||
| const required = seg.p_offset + seg.p_filesz; | ||
| switch (bun.sys.stat("/proc/self/exe")) { | ||
| .result => |st| { | ||
| const have: u64 = @intCast(@max(st.size, 0)); | ||
| if (have < required) reportTruncated(have, required); | ||
| }, | ||
| // /proc unavailable (rare: minimal chroot). Best-effort skip. | ||
| .err => {}, | ||
| } |
There was a problem hiding this comment.
🟡 hintSourcePagesDontNeed() (called at src/bun.js.zig:469) also goes through ELF.getData(), so after this change every standalone-binary startup does a second redundant auxv/phdr walk + stat("/proc/self/exe"), and a function whose own comment says it's a "best-effort hint" (and which deliberately avoids std.posix.madvise's unreachable) can now in principle hit reportTruncated() → exit(1). Consider having it reuse the already-validated Instance.instance.?.bytes (or cache the slice) instead of re-running the validation.
Extended reasoning...
What changed for the second caller
ELF.getData() has two call sites: fromExecutable() at process start, and hintSourcePagesDontNeed() at src/bun.js.zig:469 (fired right after loadEntryPoint() returns). Before this PR getData() was a pure pointer-dereference of the kernel-mapped segment. After this PR it additionally calls getauxval(AT_PHDR)/getauxval(AT_PHNUM), walks every program header, calls bun.sys.stat("/proc/self/exe"), and may call reportTruncated() → bun.Global.exit(1). hintSourcePagesDontNeed() inherits all of that.
Why it's worth changing
hintSourcePagesDontNeed() is explicitly documented as a best-effort kernel hint — it even goes out of its way to call libc madvise directly so that an unexpected errno doesn't hit std.posix.madvise's unreachable. Having that same function transitively reach a noreturn exit(1) is at odds with its own contract. And the validation work it now repeats is pure waste: by the time the hint runs, fromExecutable() has already walked the phdrs, stat'd /proc/self/exe, validated the segment, and stored the resulting slice in Instance.instance.?.bytes. Re-deriving the same slice from scratch on every standalone startup adds two getauxval calls, an O(phnum) loop, and a stat() syscall for no benefit.
Step-by-step
- Standalone binary starts;
fromExecutable()callsELF.getData()→ walks auxv/phdrs, stats/proc/self/exe, validates, returnself_bytes; graph is built andInstance.instance.?.bytes = elf_bytes[0..byte_count]. loadEntryPoint()runs the entrypoint's synchronous prelude.- src/bun.js.zig:469 calls
hintSourcePagesDontNeed()→ on Linux this callsELF.getData()again →getauxval(AT_PHDR)+getauxval(AT_PHNUM)+ phdr loop +bun.sys.stat("/proc/self/exe")all re-run; if the size check orfindLoadSegmentfalls through,reportTruncated()→exit(1)from inside the "best-effort" hint. - In the normal case the redundant work succeeds and
madvise(DONTNEED)is issued on the same range step 1 already computed.
On the "can it actually abort?" objection
It's fair to say the abort path is very hard to reach in practice: the hint fires synchronously during startup (not at arbitrary runtime), Linux returns ETXTBSY on in-place writes to a running executable's inode, and /proc/self/exe resolves to the original exec'd inode so atomic-rename deploys still stat the old size. And if the backing file were somehow truncated in place, the mmap'd module bytes would SIGBUS on the next page-in anyway, so exit(1) wouldn't be making things worse. So the "a healthy process gets killed" scenario is largely theoretical.
That objection doesn't address the part that does happen on every run, though: the redundant phdr walk + stat() in a function that already has the answer available, and the contract mismatch of a "best-effort hint" gaining a noreturn callee. Both go away with a one-line change.
Suggested fix
Have hintSourcePagesDontNeed() use the slice that was already validated — e.g. const bytes = (Instance.instance orelse return).bytes; on Linux instead of calling ELF.getData() again (or cache the getData() result from the first call). That removes the extra syscalls and keeps the hint genuinely best-effort.
Jarred-Sumner
left a comment
There was a problem hiding this comment.
This is not the correct fix? We should not need to stat or read the executable, right?
What
Since #26923, Linux
bun build --compileoutput reads its embedded module graph from aPT_LOADsegment the kernel maps atexecve. The kernel does not checkp_offset + p_filesz <= st_size, so if the binary on disk is shorter than the program header claims (incomplete download, partial copy, etc.), the mapping succeeds and the process SIGBUSes on first access to a page past EOF — inStandaloneModuleGraph.fromExecutable, while reading the trailer:Before #26923 this surfaced as a
read()error. This PR restores the graceful failure: before touching the mapping, walkgetauxval(AT_PHDR/AT_PHNUM)to find the segment,stat("/proc/self/exe"), and if the file is shorter than the segment requires, print:and exit 1. Also bound-checks the embedded length header against the segment extent so a corrupted header can't walk off the end of the mapping.
stat()needs no read permission, so thechmod 111case that motivated #26923 is unaffected. If/procor auxv is unavailable the check is best-effort skipped.Test plan
bun bd test test/bundler/bun-build-compile.test.ts— newtruncated compiled binary fails gracefully instead of SIGBUStestUSE_SYSTEM_BUN=1 bun test ...reproduces the SIGBUS panic on the same testchmod 111tests still passbun run zig:check-all