Skip to content

compile: fail gracefully instead of SIGBUS on truncated standalone binary#29665

Closed
dylan-conway wants to merge 1 commit into
mainfrom
claude/dreamy-kirch-4c0fd1
Closed

compile: fail gracefully instead of SIGBUS on truncated standalone binary#29665
dylan-conway wants to merge 1 commit into
mainfrom
claude/dreamy-kirch-4c0fd1

Conversation

@dylan-conway

Copy link
Copy Markdown
Member

What

Since #26923, Linux bun build --compile output reads its embedded module graph from a PT_LOAD segment the kernel maps at execve. The kernel does not check p_offset + p_filesz <= st_size, so if the binary on disk is shorter than the program header claims (incomplete download, partial copy, etc.), the mapping succeeds and the process SIGBUSes on first access to a page past EOF — in StandaloneModuleGraph.fromExecutable, while reading the trailer:

panic(main thread): Bus error at address 0x...

Before #26923 this surfaced as a read() error. This PR restores the graceful failure: before touching the mapping, walk getauxval(AT_PHDR/AT_PHNUM) to find the segment, stat("/proc/self/exe"), and if the file is shorter than the segment requires, print:

error: this single-file executable is incomplete — its embedded module data is not fully present on disk (have N bytes, need at least M).
  This usually means the binary was only partially downloaded or copied. Re-download or reinstall and try again.

and exit 1. Also bound-checks the embedded length header against the segment extent so a corrupted header can't walk off the end of the mapping.

stat() needs no read permission, so the chmod 111 case that motivated #26923 is unaffected. If /proc or auxv is unavailable the check is best-effort skipped.

Test plan

  • bun bd test test/bundler/bun-build-compile.test.ts — new truncated compiled binary fails gracefully instead of SIGBUS test
  • USE_SYSTEM_BUN=1 bun test ... reproduces the SIGBUS panic on the same test
  • existing chmod 111 tests still pass
  • bun run zig:check-all

…truncated

Since #26923, Linux standalone executables read their embedded module
graph via a PT_LOAD segment that the kernel maps at execve. The kernel
does not validate p_offset + p_filesz against st_size, so if the binary
on disk is shorter than the segment header claims (incomplete download,
partial copy), the mapping succeeds and the process SIGBUSes on first
access to a page past EOF — typically while reading the trailer in
StandaloneModuleGraph.fromExecutable.

Before #26923 the same condition surfaced as a read() error. Restore
that gracefulness: locate the segment via getauxval(AT_PHDR/AT_PHNUM),
stat /proc/self/exe, and exit with a clear "binary is incomplete,
re-download" message when the file is shorter than required. Also
bound-check the embedded length header against the segment so a corrupt
header cannot walk off the mapping.

stat() needs no read permission, so the chmod-111 case that motivated
the original change is unaffected; if /proc or auxv is unavailable the
check is skipped rather than failing closed.
@robobun

robobun commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator
Updated 12:05 AM PT - Apr 24th, 2026

@dylan-conway, your commit 565e67c has 1 failures in Build #47644 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29665

That installs a local version of the PR into your bun-29665 executable, so you can run:

bun-29665 --bun

@coderabbitai

coderabbitai Bot commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

The changes add truncation detection to the ELF standalone loader. The loader now validates embedded module data integrity using PT_LOAD segment metadata from getauxval(), checks executable file size against computed requirements, and reports incomplete executables with specific error messages. A new Linux-only integration test verifies this detection behavior.

Changes

Cohort / File(s) Summary
ELF Loader Truncation Detection
src/StandaloneModuleGraph.zig
Added segment-based validation for embedded module data using getauxval(AT_PHDR/AT_PHNUM). Validates executable size against minimum computed byte requirements from p_offset + p_filesz, detects truncated executables, and reports "incomplete executable" errors. Includes fallback unchecked read path and helper functions for segment lookup and truncation reporting.
Integration Tests
test/bundler/bun-build-compile.test.ts
Added Linux-only test verifying truncation detection by building a binary, truncating it by 16KB, and asserting the process exits with status 1 and reports specific "single-file executable is incomplete" error messages with empty stdout.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main change: preventing SIGBUS on truncated standalone binaries by implementing graceful failure detection.
Description check ✅ Passed The PR description provides comprehensive coverage of both required template sections: detailed 'What' explaining the problem and solution, and 'Test plan' covering verification approach.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/StandaloneModuleGraph.zig`:
- Around line 225-238: The call to reportTruncated(0, vaddr) incorrectly passes
an in-memory virtual address (vaddr) as the required file size; update the call
site so reportTruncated receives a meaningful required-bytes value (or 0 if the
precise file-size requirement cannot be determined) instead of vaddr. Locate the
call that passes vaddr and change the second argument to the correct
file-offset/size (or 0) and ensure reportTruncated(have: u64, required: u64)
continues to log "have" and "required" as file byte counts rather than virtual
addresses.
- Around line 191-198: The code currently calls std.mem.readInt(u64,
target[0..8], .little) (via target and payload_len) before proving the 8-byte
header is inside the segment; move or add a guard that computes the available
bytes in the segment (using seg.p_memsz, seg.p_vaddr and vaddr as used in
seg_avail) and require seg_avail >= 8 before dereferencing target[0..8]; only
after that safe-check read the 8-byte length header into payload_len and then
perform the existing payload_len > seg_avail clamp check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a909b21f-aae0-41af-8a56-c01fb2b87d0b

📥 Commits

Reviewing files that changed from the base of the PR and between 5c6d6d5 and 565e67c.

📒 Files selected for processing (2)
  • src/StandaloneModuleGraph.zig
  • test/bundler/bun-build-compile.test.ts

Comment on lines 191 to +198
const target: [*]const u8 = @ptrFromInt(vaddr);
const payload_len = std.mem.readInt(u64, target[0..8], .little);
if (payload_len < 8) return null;

// Clamp to the segment so a corrupted length header cannot walk
// off the end of the mapping into adjacent (or absent) pages.
const seg_avail = seg.p_memsz -| (vaddr - seg.p_vaddr) -| 8;
if (payload_len > seg_avail) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard the 8-byte length header before readInt().

Line 192 dereferences target[0..8] before proving that vaddr has 8 bytes left inside seg. If BUN_COMPILED is stale and lands in the last few bytes of a PT_LOAD, this can still fault before the later payload_len > seg_avail check runs.

🔧 Suggested fix
             const target: [*]const u8 = `@ptrFromInt`(vaddr);
+            const seg_offset = vaddr - seg.p_vaddr;
+            if (seg_offset > seg.p_memsz or seg.p_memsz - seg_offset < `@sizeOf`(u64)) {
+                reportTruncated(
+                    seg.p_offset + `@min`(seg.p_filesz, seg_offset),
+                    seg.p_offset + seg_offset + `@sizeOf`(u64),
+                );
+            }
             const payload_len = std.mem.readInt(u64, target[0..8], .little);
             if (payload_len < 8) return null;
 
             // Clamp to the segment so a corrupted length header cannot walk
             // off the end of the mapping into adjacent (or absent) pages.
-            const seg_avail = seg.p_memsz -| (vaddr - seg.p_vaddr) -| 8;
+            const seg_avail = seg.p_memsz - seg_offset - `@sizeOf`(u64);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/StandaloneModuleGraph.zig` around lines 191 - 198, The code currently
calls std.mem.readInt(u64, target[0..8], .little) (via target and payload_len)
before proving the 8-byte header is inside the segment; move or add a guard that
computes the available bytes in the segment (using seg.p_memsz, seg.p_vaddr and
vaddr as used in seg_avail) and require seg_avail >= 8 before dereferencing
target[0..8]; only after that safe-check read the 8-byte length header into
payload_len and then perform the existing payload_len > seg_avail clamp check.

Comment on lines +225 to +238
// vaddr is set but lies outside every PT_LOAD: the segment was
// dropped (e.g. by an aggressive ELF rewriter) while BUN_COMPILED
// kept its stale pointer. Treat as truncation.
reportTruncated(0, vaddr);
}

fn reportTruncated(have: u64, required: u64) noreturn {
Output.errGeneric(
"this single-file executable is incomplete — its embedded module data is not fully present on disk (have {d} bytes, need at least {d}).\n" ++
" This usually means the binary was only partially downloaded or copied. Re-download or reinstall and try again.",
.{ have, required },
);
bun.Global.exit(1);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t format a virtual address as a required file size.

Line 228 passes vaddr into reportTruncated(), but vaddr is an in-memory address, not a byte offset in /proc/self/exe. On this path the error can report absurd “need at least … bytes” numbers instead of a meaningful corruption message.

🔧 Suggested fix
             // vaddr is set but lies outside every PT_LOAD: the segment was
             // dropped (e.g. by an aggressive ELF rewriter) while BUN_COMPILED
             // kept its stale pointer. Treat as truncation.
-            reportTruncated(0, vaddr);
+            Output.errGeneric(
+                "this single-file executable is invalid — its embedded module pointer does not belong to any PT_LOAD segment.",
+                .{},
+            );
+            bun.Global.exit(1);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/StandaloneModuleGraph.zig` around lines 225 - 238, The call to
reportTruncated(0, vaddr) incorrectly passes an in-memory virtual address
(vaddr) as the required file size; update the call site so reportTruncated
receives a meaningful required-bytes value (or 0 if the precise file-size
requirement cannot be determined) instead of vaddr. Locate the call that passes
vaddr and change the second argument to the correct file-offset/size (or 0) and
ensure reportTruncated(have: u64, required: u64) continues to log "have" and
"required" as file byte counts rather than virtual addresses.

Comment on lines +176 to +189
const seg = findLoadSegment(vaddr) orelse {
// No auxv (static? unusual loader). Fall back to the unchecked
// path rather than fail closed.
return getDataUnchecked(vaddr);
};
const required = seg.p_offset + seg.p_filesz;
switch (bun.sys.stat("/proc/self/exe")) {
.result => |st| {
const have: u64 = @intCast(@max(st.size, 0));
if (have < required) reportTruncated(have, required);
},
// /proc unavailable (rare: minimal chroot). Best-effort skip.
.err => {},
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 hintSourcePagesDontNeed() (called at src/bun.js.zig:469) also goes through ELF.getData(), so after this change every standalone-binary startup does a second redundant auxv/phdr walk + stat("/proc/self/exe"), and a function whose own comment says it's a "best-effort hint" (and which deliberately avoids std.posix.madvise's unreachable) can now in principle hit reportTruncated()exit(1). Consider having it reuse the already-validated Instance.instance.?.bytes (or cache the slice) instead of re-running the validation.

Extended reasoning...

What changed for the second caller

ELF.getData() has two call sites: fromExecutable() at process start, and hintSourcePagesDontNeed() at src/bun.js.zig:469 (fired right after loadEntryPoint() returns). Before this PR getData() was a pure pointer-dereference of the kernel-mapped segment. After this PR it additionally calls getauxval(AT_PHDR)/getauxval(AT_PHNUM), walks every program header, calls bun.sys.stat("/proc/self/exe"), and may call reportTruncated()bun.Global.exit(1). hintSourcePagesDontNeed() inherits all of that.

Why it's worth changing

hintSourcePagesDontNeed() is explicitly documented as a best-effort kernel hint — it even goes out of its way to call libc madvise directly so that an unexpected errno doesn't hit std.posix.madvise's unreachable. Having that same function transitively reach a noreturn exit(1) is at odds with its own contract. And the validation work it now repeats is pure waste: by the time the hint runs, fromExecutable() has already walked the phdrs, stat'd /proc/self/exe, validated the segment, and stored the resulting slice in Instance.instance.?.bytes. Re-deriving the same slice from scratch on every standalone startup adds two getauxval calls, an O(phnum) loop, and a stat() syscall for no benefit.

Step-by-step

  1. Standalone binary starts; fromExecutable() calls ELF.getData() → walks auxv/phdrs, stats /proc/self/exe, validates, returns elf_bytes; graph is built and Instance.instance.?.bytes = elf_bytes[0..byte_count].
  2. loadEntryPoint() runs the entrypoint's synchronous prelude.
  3. src/bun.js.zig:469 calls hintSourcePagesDontNeed() → on Linux this calls ELF.getData() againgetauxval(AT_PHDR) + getauxval(AT_PHNUM) + phdr loop + bun.sys.stat("/proc/self/exe") all re-run; if the size check or findLoadSegment falls through, reportTruncated()exit(1) from inside the "best-effort" hint.
  4. In the normal case the redundant work succeeds and madvise(DONTNEED) is issued on the same range step 1 already computed.

On the "can it actually abort?" objection

It's fair to say the abort path is very hard to reach in practice: the hint fires synchronously during startup (not at arbitrary runtime), Linux returns ETXTBSY on in-place writes to a running executable's inode, and /proc/self/exe resolves to the original exec'd inode so atomic-rename deploys still stat the old size. And if the backing file were somehow truncated in place, the mmap'd module bytes would SIGBUS on the next page-in anyway, so exit(1) wouldn't be making things worse. So the "a healthy process gets killed" scenario is largely theoretical.

That objection doesn't address the part that does happen on every run, though: the redundant phdr walk + stat() in a function that already has the answer available, and the contract mismatch of a "best-effort hint" gaining a noreturn callee. Both go away with a one-line change.

Suggested fix

Have hintSourcePagesDontNeed() use the slice that was already validated — e.g. const bytes = (Instance.instance orelse return).bytes; on Linux instead of calling ELF.getData() again (or cache the getData() result from the first call). That removes the extra syscalls and keeps the hint genuinely best-effort.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct fix? We should not need to stat or read the executable, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants