fix(intel): resolve syscall numbers via basic-block backtracking#138
Open
r0ny123 wants to merge 3 commits into
Open
fix(intel): resolve syscall numbers via basic-block backtracking#138r0ny123 wants to merge 3 commits into
r0ny123 wants to merge 3 commits into
Conversation
The Intel backend only classified a Linux `syscall` as process-ending when the syscall number register was set by the *immediately preceding* `mov rax/eax, <imm>`; any indirection (an unrelated instruction in between, or the mov landing in a previous look-ahead window) left it unresolved. Add a conservative, block-local resolver `_resolveSyscallNumber()` that backtracks over the instructions already analyzed in the current block to find the `mov`/`movabs` that sets the syscall register. It stops at any control-flow boundary (reusing the CALL/JMP/CJMP/LOOP/RET sets) or any untrackable write to the register family (sub-register writes, computed values, register/memory sources), returning None rather than guessing so speculative recovery never introduces false process endings. Instructions are accumulated into `block_instructions` across cache-window refills (the cache is only a ~15-byte look-ahead), so backtracking is not limited to a single window and the direct case is preserved across refills. Refs: danielplohmann#119 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…acktracking Harden _resolveSyscallNumber() against false process endings and false negatives raised in review: - Stop backtracking at instructions that implicitly write the rax/eax family without an explicit destination operand (cpuid, rdtsc(p), rd*, lods*, cbw/cwde/cdqe, lahf, xlat, mul/div/idiv, in(s), cmpxchg*, the legacy bcd ops), one-operand imul, and xchg when rax is any operand. Previously these were skipped, so a stale syscall number could be read past a real clobber and wrongly mark the syscall as process-ending. - Skip read-only instructions whose first operand is a source (cmp, test, push, bt) instead of treating the target register there as a clobber, so legitimate exit sequences still resolve. Adds unit tests for both behaviors. Addresses PR review feedback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ktracking xgetbv reads an extended control register into edx:eax with no explicit destination operand, so (like cpuid/rdtsc/rdpmc) it must stop syscall-number backtracking to avoid reading a stale value past the clobber. Addresses a CodeRabbit review note on PR #66. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SMDA's Intel backend classifies a Linux
syscallas process-ending only when the syscall number register (rax==60, i.e.exit) is set by the immediately precedingmov rax/eax, <imm>. If the number is set a little less directly, the old code logged a debug message and gave up — that was the standing TODO in theexcept ValueErrorpath.Addresses upstream issue danielplohmann/smda#119.
Changes
_resolveSyscallNumber(preceding_instructions, bitness)insrc/smda/intel/IntelDisassembler.py: a conservative, block-local backtracking resolver that walks the already-analyzed block instructions in reverse to find themov/movabsthat sets the syscall register.SYSCALL_BACKTRACK_BOUNDARY(reuses the existingCALL/JMP/CJMP/LOOP/RETsets +syscall/sysenter/int/int3/hlt) so backtracking never crosses a control-flow edge.block_instructionsacross cache look-ahead refills (the disasm cache is only a ~15-byte window), so resolution isn't limited to one window and the direct case is preserved across refills.Conservatism (no false endings)
The resolver returns
None— never a guess — when the register is written by anything it can't track:movwrites (xor eax, eax,lea,pop rax, …),al/ah/ax),mov rax, rbx,mov rax, [rbp-8]),So it only ever reports "process ending" when an actual
mov/movabs <rax|eax>, 0x3cprecedes thesyscallin the same block. 64-bit honors the zero-extendingmov eax, 0x3c; immediates are parsed with base 0 (handles capstone's0x..hex and decimal).Tests
tests/testIntelDisassembler.py: unit tests for the direct case (64- and 32-bit), backtracking over unrelated instructions, themovabsand zero-extending-eaxforms, and the unresolved paths (clobber, control-flow boundary incl. a prefixedbnd ret, register source, empty input).Validation
ruff check ./ruff format --check .cleanmake test→ 116 passedblock_instructions(instead of slicing the look-ahead window), addedmovabshandling, and switched to base-0 immediate parsing.Refs: #119
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Bug Fixes
Tests