Skip to content

Add tracing capability#3107

Merged
aaronvg merged 16 commits intocanaryfrom
aaron/tracing
Feb 17, 2026
Merged

Add tracing capability#3107
aaronvg merged 16 commits intocanaryfrom
aaron/tracing

Conversation

@aaronvg
Copy link
Copy Markdown
Contributor

@aaronvg aaronvg commented Feb 12, 2026

Note

High Risk
Touches core execution (call_function/VM exec) and heap object model by adding tracing notifications, a new object type, and global event buffering, which can affect runtime behavior and performance across all calls.

Overview
Adds end-to-end runtime tracing by introducing a new bex_events crate (span IDs/contexts, in-memory event store with JSONL flush, and Collector/FunctionLog views) and wiring it into bex_engine.

Extends BexEngine::call_function to accept an optional HostSpanContext plus attached collectors, emits root + nested traced span start/end events, and deep-copies VM values for event payloads.

Updates the VM/bytecode pipeline to support tracing via a per-function trace flag (enabled for LLM functions) and new VmExecState::SpanNotify enter/exit notifications; adds a new heap object/ADT type Collector to pass collector handles through the runtime. Tests are updated and new tracing tests are added; workspace/build config is adjusted (new deps, new bridge_python crate, size-gate baselines, and ignore debug_events*).

Written by Cursor Bugbot for commit bf1143c. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Python package for running BAML (sync/async) with Collector and FunctionLog types; host-span manager and context helpers.
    • Runtime tracing with span start/end events, JSONL export, per-span collectors, and an in-memory event store.
  • Improvements

    • Engine call API accepts host span context and collectors to propagate tracing and collect logs.
  • Tests

    • Large end-to-end and unit test suites for tracing, collectors, and Python bridge.

@vercel
Copy link
Copy Markdown

vercel bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Feb 17, 2026 0:43am
promptfiddle Ready Ready Preview, Comment Feb 17, 2026 0:43am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Introduces span-based tracing and collectors across the runtime and bridges: a new bex_events crate (events, event store, collector, serialization), VM span notifications and Collector handles, engine-level per-call span plumbing, Python PyO3 bridge + HostSpanManager and Collector bindings, and accompanying tests and manifest updates.

Changes

Cohort / File(s) Summary
Size Gate Configuration
baml_language/.cargo/size-gate.toml, baml_language/.ci/size-gate/*.toml
Bumped artifact size thresholds for bridge artifacts across multiple target manifests.
Workspace & Manifests
baml_language/Cargo.toml, baml_language/crates/*/Cargo.toml
Added workspace deps (bex_events, bridge_python, uuid, pyo3-*), and adjusted crate-types/deps for bridge crates.
New Events Crate
baml_language/crates/bex_events/*
Cargo.toml, src/lib.rs, src/types.rs, src/span_id.rs, src/event_store.rs, src/collector.rs, src/serialize.rs
Adds RuntimeEvent model, SpanId/SpanContext/HostSpanContext, global EventStore (publisher + in-memory collector tracking), Collector API and FunctionLog views, and JSONL serialization.
VM Types & Collector Handle
baml_language/crates/bex_vm_types/src/types.rs, .../bytecode.rs, .../lib.rs
Introduces CollectorRef, Object::Collector, ObjectType::Collector, and adds Function.trace: bool; updates displays and re-exports.
VM Core & Notifications
baml_language/crates/bex_vm/src/vm.rs, .../native.rs, .../lib.rs
Adds VmExecState::SpanNotify, SpanNotification enum, traced_frames tracking, alloc_collector/as_collector APIs, emits span enter/exit notifications, and re-exports SpanNotification/NativeFunction.
Heap & GC Adjustments
baml_language/crates/bex_heap/src/*
accessor.rs, gc.rs, heap_debugger/real.rs
Adds as_collector_owned helper, treats Collector as skip object in GC tracing, and ignores Collector in invariant checks.
External ADT & Conversion
baml_language/crates/bex_external_types/src/bex_external_value.rs, baml_language/crates/bex_engine/src/conversion.rs
Adds BexExternalAdt::Collector variant, round-trip conversion support, deep-copy helper vm_value_to_owned, and type-matching updates for Collector ADT.
Engine: Tracing & Collectors
baml_language/crates/bex_engine/src/lib.rs, .../conversion.rs
Re-exports HostSpanContext/RuntimeEvent/SpanId, adds per-invocation SpanState/EngineSpan, extends call_function and event-loop signatures to accept host_ctx and collectors, emits FunctionStart/End and handles VmExecState::SpanNotify, and wires collectors per-call.
Factory & Bridge CFFI
baml_language/crates/bex_factory/src/lib.rs, baml_language/crates/bridge_cffi/*
Exposes EngineError, returns concrete Arc<BexEngine> from new_engine, updates internal calls to new call_function signature, makes bridge modules public, and adds bridge-side Collector wrapper and HostSpanManager shim.
Bridge Python (PyO3)
baml_language/crates/bridge_python/*
Cargo.toml, build.rs, src/*, python_src/*, tests/*
New PyO3 crate exposing BamlRuntime (from_files, async/sync call), HostSpanManager and Collector Py bindings, FunctionResult, error mapping, proto helpers, Python package surface (baml_py), and extensive pytest-based integration tests.
Bridge Ctypes & Jinja/Tools
baml_language/crates/bridge_ctypes/src/value_encode.rs, crates/sys_llm/src/jinja/value_conversion.rs, crates/tools_onionskin/src/compiler.rs
Maps Collector ADT to null for CFFI, disallows Collector in Jinja conversion, and ignores SpanNotify in VM runner output/formatting.
Tests & Test Helpers
baml_language/crates/bex_engine/tests/*, baml_language/crates/baml_tests/*, baml_language/crates/bridge_python/tests/*
Updated BexEngine call sites to new signature (pass None, &[]), added Rust tracing tests (engine-level), filtered SpanNotify in test harness, and added large Python integration tests for tracing & collectors.
Tooling / Serialization & Protos
baml_language/crates/bex_events/src/serialize.rs, bridge_python/python_src/baml/cffi/v1/*.py
Event JSONL serializer and generated protobuf stubs plus Python helpers for bridge protobuf encoding/decoding.
Miscellaneous
.gitignore, stow.toml, size-gate entries, small formatting/match-arm additions
Minor updates: ignore patterns, stow exclusion update, extra Instruction Display arms, and match-arm additions to handle Return/Assert/NotifyBlock and SpanNotify filtering.

Sequence Diagram(s)

sequenceDiagram
    participant Host as Host (Python)
    participant HostMgr as HostSpanManager
    participant Engine as BexEngine
    participant VM as BexVm
    participant EventStore as EventStore

    Host->>HostMgr: enter(function, args)
    HostMgr->>EventStore: emit(FunctionStart)
    HostMgr->>Engine: call_function(name, args, host_ctx, collectors)
    Engine->>Engine: init span_state
    Engine->>EventStore: emit(FunctionStart with host prefix)
    Engine->>VM: execute(bytecode)

    alt traced frame
        VM->>VM: push traced_frame
        VM->>Engine: SpanNotify(FunctionEnter)
        Engine->>EventStore: emit(FunctionStart for nested span)
    end

    VM->>VM: run instructions

    alt function return
        VM->>Engine: SpanNotify(FunctionExit)
        Engine->>EventStore: emit(FunctionEnd for nested span)
    end

    VM-->>Engine: result
    Engine->>EventStore: emit(FunctionEnd with duration)
    Engine-->>Host: result
    Host->>HostMgr: exit_ok()
    HostMgr->>EventStore: emit(FunctionEnd)
    Host->>EventStore: flush()
    EventStore->>EventStore: write JSONL
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add tracing capability' is concise and directly describes the main purpose of the changeset, which introduces end-to-end runtime tracing across the VM and engine with span events, event stores, and collector APIs.
Docstring Coverage ✅ Passed Docstring coverage is 92.60% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into canary

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aaron/tracing

Comment @coderabbitai help to get the list of available commands and usage tips.

{"call_id":"16d19392-a0dc-4e7c-b7b3-290ca870fc28","call_stack":["ae83f415-8dd2-46a8-ba04-89b5aca2c561","4b5b39b6-bd73-4f87-83f4-1cd749c56410","41c787dd-05c1-402e-8095-8240fcababf6","16d19392-a0dc-4e7c-b7b3-290ca870fc28"],"content":{"data":{"duration_ms":1,"function_display_name":"SummarizeInfo","result":"<handle>"},"type":"function_end"},"function_event_id":"7912535a-2790-4b2e-8678-c86fbb1d6684","timestamp_epoch_ms":1770875826293}
{"call_id":"41c787dd-05c1-402e-8095-8240fcababf6","call_stack":["ae83f415-8dd2-46a8-ba04-89b5aca2c561","4b5b39b6-bd73-4f87-83f4-1cd749c56410","41c787dd-05c1-402e-8095-8240fcababf6"],"content":{"data":{"duration_ms":20,"function_display_name":"OuterPipeline","result":"<handle>"},"type":"function_end"},"function_event_id":"3dfd1fcd-c0c1-424c-ac4d-d2e71a1e291f","timestamp_epoch_ms":1770875826293}
{"call_id":"4b5b39b6-bd73-4f87-83f4-1cd749c56410","call_stack":["ae83f415-8dd2-46a8-ba04-89b5aca2c561","4b5b39b6-bd73-4f87-83f4-1cd749c56410"],"content":{"data":{"duration_ms":21,"function_display_name":"child_py","result":null},"type":"function_end"},"function_event_id":"3b51e2c5-5ec0-484f-820d-1bd260780994","timestamp_epoch_ms":1770875826294}
{"call_id":"ae83f415-8dd2-46a8-ba04-89b5aca2c561","call_stack":["ae83f415-8dd2-46a8-ba04-89b5aca2c561"],"content":{"data":{"duration_ms":22,"function_display_name":"parent_py","result":null},"type":"function_end"},"function_event_id":"c8bfd012-0ba3-4028-a121-060341eb66a6","timestamp_epoch_ms":1770875826294}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug output files accidentally committed to repository

Low Severity

debug_events.json and debug_events.jsonl contain trace output generated during development (specific UUIDs, timestamps, function names like parent_py, child_py). These are not referenced by any code and appear to be debug artifacts that were accidentally included in the commit.

Additional Locations (1)

Fix in Cursor Fix in Web

Resolve merge conflicts between the tracing feature branch and canary:

- bex_engine: adapt call_function signature with tracing params (host_ctx, collectors)
- bex_vm: keep traced_frames and CallWithTrace instruction support
- bridge_cffi: integrate bex_factory pattern while preserving collector module
- bridge_python: use bex_factory::new_engine for direct engine access, remove env_vars param
- bex_factory: add new_engine() for concrete Arc<BexEngine> access

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +292 to +301
/// Call a function AND notify the engine that this call is traced.
///
/// Behaves exactly like `Call(n)`: pushes a frame, sets up locals.
/// Additionally, when `tracing_enabled` is true:
/// 1. Snapshots the arguments from the eval stack
/// 2. Records the new frame's depth in `traced_frames`
/// 3. Yields `SpanNotification::FunctionEnter` to the engine
///
/// When `tracing_enabled` is false, behaves identically to `Call(n)`.
CallWithTrace(usize),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary, when you Call a function in the VM you get access to a FunctionKind enum, if you add Llm there you don't need a new instruction.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Feb 13, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 15 untouched benchmarks
⏩ 84 skipped benchmarks1


Comparing aaron/tracing (bf1143c) with canary (c913c7e)

Open in CodSpeed

Footnotes

  1. 84 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 13, 2026

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 4.0 MB 4.0 MB +140 B (+0.0%) OK
bridge_cffi-stripped Linux 2.1 MB 2.1 MB +135 B (+0.0%) OK
bridge_cffi macOS 3.3 MB 3.3 MB +324 B (+0.0%) OK
bridge_cffi-stripped macOS 1.7 MB 1.7 MB +194 B (+0.0%) OK
bridge_cffi Windows 3.3 MB 3.3 MB +394 B (+0.0%) OK
bridge_cffi-stripped Windows 1.8 MB 1.8 MB +776 B (+0.0%) OK
bridge_wasm WASM 1.3 MB 1.3 MB +153 B (+0.0%) OK

Generated by cargo size-gate · workflow run


/// Whether this function should be traced (emit span notifications on call/return).
/// Set to `true` for LLM functions by the compiler.
pub trace: bool,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant trace field duplicates body_meta information

Low Severity

The trace: bool field on Function is set to true exactly when body_meta is Some(FunctionMeta::Llm { .. }), making it fully redundant. Having two fields that must stay in sync risks divergence during future changes. As the reviewer noted, the FunctionKind enum or body_meta already encodes this information and could be checked directly in the VM's Call handler.

Additional Locations (1)

Fix in Cursor Fix in Web

@aaronvg aaronvg dismissed antoniosarosi’s stale review February 17, 2026 00:02

Addressed Antonio's feedback (but need to dismiss the review to be able to merge).

@aaronvg aaronvg enabled auto-merge February 17, 2026 00:03
pub fn id(&self, span_id_str: &str) -> Option<FunctionLog> {
self.inner.id(span_id_str)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused bridge_cffi Collector wrapper is dead code

Low Severity

The bridge_cffi::collector::Collector wrapper struct is exported but never imported or used anywhere in the codebase. The bridge_python crate wraps bex_events::Collector directly via its own types::collector::Collector. This entire file is dead code that adds unnecessary indirection.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


@property
def calls(self):
return [_wrap_log(c) for c in self._inner.calls]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionLog.calls wraps LLMCall objects with wrong type

High Severity

The FunctionLog.calls property passes each LLMCall from the Rust layer through _wrap_log, which wraps it in a Python FunctionLog. Since LLMCall lacks id, result, and tags getters, accessing those properties on items returned by calls raises AttributeError at runtime. The items from self._inner.calls are LLMCall objects and need to remain unwrapped or be wrapped in an appropriate LLMCall wrapper.

Fix in Cursor Fix in Web

pub fn id(&self, span_id_str: &str) -> Option<FunctionLog> {
self.inner.id(span_id_str)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused bridge_cffi::collector module duplicates bex_events::Collector

Low Severity

The bridge_cffi::collector::Collector struct is a 1:1 delegation wrapper around bex_events::Collector that adds no logic. It's exported as a public module but never imported or used anywhere in the codebase. bridge_python wraps bex_events::Collector directly via its own PyO3 types instead.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

@aaronvg aaronvg added this pull request to the merge queue Feb 17, 2026
Merged via the queue into canary with commit 9215d51 Feb 17, 2026
45 checks passed
@aaronvg aaronvg deleted the aaron/tracing branch February 17, 2026 00:54
antoniosarosi added a commit that referenced this pull request Feb 17, 2026
Integrates 4 canary PRs:
- #3126: MIR analysis soundness (StatementRef, unified walkers)
- #3124: Type variants for type expressions (parse takes Type)
- #3122: InitLocals(n) instruction
- #3107: Full tracing system (bex_events, Collector)

Conflict resolutions:
- llm.baml: keep orchestration loop with panic, update parse() to use get_return_type
- baml_builtins: keep both Enum and Type TypePattern variants, add get_return_type alongside orchestration builtins
- baml_compiler_emit: keep HIDDEN_LLM_BUILTINS removal
- baml_compiler_tir: add both Enum and Type arms in substitute functions
- bex_vm_types: merge both ClientBuild* and CollectorRef re-exports
- llm_render tests: use new 4-arg call_function signature, fix PromptAst FQN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants