fix: MIR analysis soundness bugs + emit simplifications#3126
fix: MIR analysis soundness bugs + emit simplifications#3126antoniosarosi merged 1 commit intocanaryfrom
Conversation
Fix three miscompilation bugs in the MIR analysis pass and simplify the emit infrastructure with several targeted refactors. Bug fixes: - Fix cross-block Virtual inlining unsoundness: locals whose rvalue dependencies have multiple definitions (including parameters with implicit entry defs) are no longer virtualized across blocks. Uses transitive dependency tracking to catch indirect mutations. - Fix CopyOf propagation for mutable parameters: parameters that are reassigned anywhere in the function are no longer eligible for copy propagation. - Add debug_assert guards at MIR construction boundaries to enforce that Call/Await/DispatchFuture destinations are always local places. Refactors: - Replace TERMINATOR_IDX (usize::MAX) sentinel with StatementRef enum for type-safe, exhaustiveness-checked statement/terminator references. - Unify 6 duplicate rvalue/operand/place walker functions into 3 generic walk_*_locals walkers + 3 thin convenience wrappers. - Merge collect_def_use and collect_all_definitions into a single pass by adding all_defs field to LocalDefUse. - Deduplicate patch_jumps/patch_jump_to logic in emit.rs. - Remove dead code: always-true is_inlinable_rvalue, unused LocalDefUse::local field, unreachable emit_store_place arms. - Remove dominators/predecessors from AnalysisResult (only used internally during classification, computation unchanged). - Simplify entry/or_insert_with boilerplate to get_mut/unwrap.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis pull request refactors the analysis module to replace implicit statement indices with an explicit Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Merging this PR will not alter performance
|
Binary size checks passed✅ 7 passed
Generated by |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
baml_language/crates/baml_compiler_emit/src/analysis.rs (2)
1006-1078:⚠️ Potential issue | 🟠 MajorGuard virtualization against locals with multiple definitions.
LocalClassification::Virtualskips all assignments for a local. Without adu.all_defs.len() == 1gate, a multi-def local can drop earlier side effects (e.g.,x = fail(); x = 2; x). Add a hard check before inlining.🛠️ Suggested fix
fn can_be_virtual( du: &LocalDefUse, dominators: &Dominators, mir: &MirFunction, def_use: &HashMap<Local, LocalDefUse>, predecessors: &HashMap<BlockId, Vec<BlockId>>, ) -> bool { - // Must have exactly one definition - let Some(def) = &du.def else { - return false; - }; + // Must have exactly one definition + if du.all_defs.len() != 1 { + return false; + } + let Some(def) = &du.def else { + return false; + };
145-174:⚠️ Potential issue | 🟡 MinorRun the workspace test suite for baml_language.
This crate is part of the baml_language workspace which has specific testing practices. Run
cargo test --package baml_teststo verify your changes against the full compiler pipeline, including integration tests. Seebaml_language/TEST_INSTRUCTIONS.mdfor detailed testing guidance.
| // Regression: Virtual inlining must not re-evaluate rvalue when a dependency | ||
| // is modified in an intermediate block between def and use. | ||
| #[test] | ||
| fn virtual_cross_block_soundness() -> anyhow::Result<()> { | ||
| assert_vm_executes(Program { | ||
| source: r#" | ||
| function main(c: bool) -> int { | ||
| let a = 1; | ||
| let b = a; | ||
| if (c) { | ||
| a = 2; | ||
| } | ||
| b | ||
| } | ||
|
|
||
| function entry() -> int { | ||
| main(true) | ||
| } | ||
| "#, | ||
| function: "entry", | ||
| expected: ExecState::Complete(Value::Int(1)), | ||
| }) | ||
| } | ||
|
|
||
| // Regression: Virtual inlining of a rvalue that reads a parameter must not | ||
| // re-evaluate when the parameter is reassigned on an intermediate path. | ||
| // Parameters have an implicit entry definition NOT tracked in `all_defs`, | ||
| // so even a single explicit reassignment means multiple definitions. | ||
| #[test] | ||
| fn virtual_cross_block_param_mutation_soundness() -> anyhow::Result<()> { | ||
| assert_vm_executes(Program { | ||
| source: r#" | ||
| function main(c: bool, p: int) -> int { | ||
| let x = p; | ||
| if (c) { | ||
| p = 2; | ||
| } | ||
| x | ||
| } | ||
|
|
||
| function entry() -> int { | ||
| main(true, 42) | ||
| } | ||
| "#, | ||
| function: "entry", | ||
| expected: ExecState::Complete(Value::Int(42)), | ||
| }) | ||
| } | ||
|
|
||
| // Regression: CopyOf must not propagate through mutable parameters. | ||
| #[test] | ||
| fn copy_of_mutable_param_soundness() -> anyhow::Result<()> { | ||
| assert_vm_executes(Program { | ||
| source: r#" | ||
| function main(x: int) -> int { | ||
| let y = x; | ||
| x = 2; | ||
| y | ||
| } | ||
|
|
||
| function entry() -> int { | ||
| main(42) | ||
| } | ||
| "#, | ||
| function: "entry", | ||
| expected: ExecState::Complete(Value::Int(42)), | ||
| }) | ||
| } | ||
|
|
||
| // Regression: cross-block virtualization must account for transitive | ||
| // dependencies; `x = t` where `t = p` must not be re-evaluated as latest `p`. | ||
| #[test] | ||
| fn virtual_cross_block_transitive_param_mutation_soundness() -> anyhow::Result<()> { | ||
| assert_vm_executes(Program { | ||
| source: r#" | ||
| function main(c: bool, p: int) -> int { | ||
| let t = p; | ||
| let x = t; | ||
| if (c) { | ||
| p = 2; | ||
| } | ||
| x | ||
| } | ||
|
|
||
| function entry() -> int { | ||
| main(true, 42) | ||
| } | ||
| "#, | ||
| function: "entry", | ||
| expected: ExecState::Complete(Value::Int(42)), | ||
| }) | ||
| } | ||
|
|
||
| // Regression: multiple definitions of a local must not allow inlining that | ||
| // skips side-effecting call evaluation. | ||
| #[test] | ||
| fn virtual_multiple_defs_preserve_side_effects() -> anyhow::Result<()> { | ||
| assert_vm_fails(FailingProgram { | ||
| source: r#" | ||
| function fail() -> int { | ||
| assert(false); | ||
| 1 | ||
| } | ||
|
|
||
| function main() -> int { | ||
| let x = fail(); | ||
| x = 2; | ||
| x | ||
| } | ||
| "#, | ||
| function: "main", | ||
| expected: RuntimeError::AssertionError.into(), | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider unit-test placement if feasible.
These are integration tests under bex_vm/tests; if the same coverage can live as crate unit tests, prefer that to reduce overhead. If integration scope is required, a brief note is enough.
As per coding guidelines, "Prefer writing Rust unit tests over integration tests where possible".
Integrates 4 canary PRs: - #3126: MIR analysis soundness (StatementRef, unified walkers) - #3124: Type variants for type expressions (parse takes Type) - #3122: InitLocals(n) instruction - #3107: Full tracing system (bex_events, Collector) Conflict resolutions: - llm.baml: keep orchestration loop with panic, update parse() to use get_return_type - baml_builtins: keep both Enum and Type TypePattern variants, add get_return_type alongside orchestration builtins - baml_compiler_emit: keep HIDDEN_LLM_BUILTINS removal - baml_compiler_tir: add both Enum and Type arms in substitute functions - bex_vm_types: merge both ClientBuild* and CollectorRef re-exports - llm_render tests: use new 4-arg call_function signature, fix PromptAst FQN
Summary
Fixes three miscompilation bugs in the MIR analysis pass and simplifies the emit infrastructure with several targeted refactors. All changes are in
analysis.rs,emit.rs,builder.rs, and VM regression tests.Bug fixes (3 correctness issues)
can_be_virtual()would re-evaluate rvalues at use sites without checking whether dependencies (including transitive ones and parameters with implicit entry defs) could be modified on intermediate paths. Three related fixes:all_defs, so any explicit reassignment means multiple defscollect_transitive_readsinstead of directwalk_rvalue_localsto catchlet t = p; let x = t; p = 2; xchainsget_copy_source()assumed parameters are never reassigned, but BAML allowsx = 3wherexis a param. Added check that source parameter has no entries inall_defs.debug_assert!guards at MIR builder boundaries enforcing that Call/Await/DispatchFuture destinations are alwaysPlace::Local.Refactors
StatementRefenum — ReplacedTERMINATOR_IDX = usize::MAXsentinel withenum StatementRef { Statement(usize), Terminator }for type-safe, exhaustiveness-checked references. All comparison sites now use pattern matching.walk_{rvalue,operand,place}_localsfunctions + 3 thin wrappers. Single source of truth for tree traversal.collect_def_use()andcollect_all_definitions()into one pass by addingall_defs: Vec<(BlockId, StatementRef)>toLocalDefUse. Eliminated a redundant full-MIR walk.patch_jumps—patch_jumps()now delegates topatch_jump_to()instead of duplicating the match logic.is_inlinable_rvalue(), unusedLocalDefUse::localfield, and madeemit_store_placeField/Index armsunreachable!().dominators/predecessorsfromAnalysisResult(computation unchanged, just not stored in return value since callers never read them). Simplifiedentry().or_insert_with()boilerplate toget_mut().unwrap().Test plan
virtual_cross_block_soundness— cross-block variable modificationvirtual_cross_block_param_mutation_soundness— parameter mutation on intermediate pathvirtual_cross_block_transitive_param_mutation_soundness— transitivet = p; x = tchaincopy_of_mutable_param_soundness— CopyOf with reassigned paramvirtual_multiple_defs_preserve_side_effects— multiple defs must not skip side effectscargo fmtandcargo clippycleanSummary by CodeRabbit
Refactor
Tests