Skip to content

fix(intel): resolve syscall numbers via basic-block backtracking#138

Open
r0ny123 wants to merge 3 commits into
danielplohmann:masterfrom
r0ny123:fix/syscall-backtracking
Open

fix(intel): resolve syscall numbers via basic-block backtracking#138
r0ny123 wants to merge 3 commits into
danielplohmann:masterfrom
r0ny123:fix/syscall-backtracking

Conversation

@r0ny123

@r0ny123 r0ny123 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

SMDA's Intel backend classifies a Linux syscall as process-ending only when the syscall number register (rax==60, i.e. exit) is set by the immediately preceding mov rax/eax, <imm>. If the number is set a little less directly, the old code logged a debug message and gave up — that was the standing TODO in the except ValueError path.

Addresses upstream issue danielplohmann/smda#119.

Changes

  • Add _resolveSyscallNumber(preceding_instructions, bitness) in src/smda/intel/IntelDisassembler.py: a conservative, block-local backtracking resolver that walks the already-analyzed block instructions in reverse to find the mov/movabs that sets the syscall register.
  • Add SYSCALL_BACKTRACK_BOUNDARY (reuses the existing CALL/JMP/CJMP/LOOP/RET sets + syscall/sysenter/int/int3/hlt) so backtracking never crosses a control-flow edge.
  • Accumulate block_instructions across cache look-ahead refills (the disasm cache is only a ~15-byte window), so resolution isn't limited to one window and the direct case is preserved across refills.
  • Replace the inline immediate parse (and its TODO) with a call to the resolver.

Conservatism (no false endings)

The resolver returns None — never a guess — when the register is written by anything it can't track:

  • non-mov writes (xor eax, eax, lea, pop rax, …),
  • sub-register writes (al/ah/ax),
  • register/memory/expression sources (mov rax, rbx, mov rax, [rbp-8]),
  • or when a control-flow boundary is reached first.

So it only ever reports "process ending" when an actual mov/movabs <rax|eax>, 0x3c precedes the syscall in the same block. 64-bit honors the zero-extending mov eax, 0x3c; immediates are parsed with base 0 (handles capstone's 0x.. hex and decimal).

Tests

tests/testIntelDisassembler.py: unit tests for the direct case (64- and 32-bit), backtracking over unrelated instructions, the movabs and zero-extending-eax forms, and the unresolved paths (clobber, control-flow boundary incl. a prefixed bnd ret, register source, empty input).

Validation

  • ruff check . / ruff format --check . clean
  • make test → 116 passed
  • Reviewed by a Sonnet 4.6 subagent (extra-high reasoning). Findings addressed: fixed a cross-cache-refill regression by accumulating block_instructions (instead of slicing the look-ahead window), added movabs handling, and switched to base-0 immediate parsing.

Refs: #119

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved syscall detection with conservative backtracking for more accurate identification of program termination within basic blocks.
  • Tests

    • Added comprehensive test coverage for syscall resolution across various instruction patterns and boundary conditions.

r0ny123 and others added 3 commits June 10, 2026 17:06
The Intel backend only classified a Linux `syscall` as process-ending when
the syscall number register was set by the *immediately preceding* `mov
rax/eax, <imm>`; any indirection (an unrelated instruction in between, or
the mov landing in a previous look-ahead window) left it unresolved.

Add a conservative, block-local resolver `_resolveSyscallNumber()` that
backtracks over the instructions already analyzed in the current block to
find the `mov`/`movabs` that sets the syscall register. It stops at any
control-flow boundary (reusing the CALL/JMP/CJMP/LOOP/RET sets) or any
untrackable write to the register family (sub-register writes, computed
values, register/memory sources), returning None rather than guessing so
speculative recovery never introduces false process endings.

Instructions are accumulated into `block_instructions` across cache-window
refills (the cache is only a ~15-byte look-ahead), so backtracking is not
limited to a single window and the direct case is preserved across refills.

Refs: danielplohmann#119
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…acktracking

Harden _resolveSyscallNumber() against false process endings and false
negatives raised in review:

- Stop backtracking at instructions that implicitly write the rax/eax
  family without an explicit destination operand (cpuid, rdtsc(p), rd*,
  lods*, cbw/cwde/cdqe, lahf, xlat, mul/div/idiv, in(s), cmpxchg*, the
  legacy bcd ops), one-operand imul, and xchg when rax is any operand.
  Previously these were skipped, so a stale syscall number could be read
  past a real clobber and wrongly mark the syscall as process-ending.
- Skip read-only instructions whose first operand is a source (cmp, test,
  push, bt) instead of treating the target register there as a clobber,
  so legitimate exit sequences still resolve.

Adds unit tests for both behaviors. Addresses PR review feedback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ktracking

xgetbv reads an extended control register into edx:eax with no explicit
destination operand, so (like cpuid/rdtsc/rdpmc) it must stop syscall-number
backtracking to avoid reading a stale value past the clobber. Addresses a
CodeRabbit review note on PR #66.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant