Skip to content

Potential perf win: Use SmallVec for parse tree context children #35

@jonhoo

Description

@jonhoo

As discussed on Bluesky, here's one of the LLM-generated performance improvement suggestions from profiling https://github.com/jonhoo/avdl on (synthetic) bigger .avdl files with the ANTLR-generated parser. I'd be happy to test out changes if you don't have a benchmark handy!

Symptom

BaseParserRuleContext::new_parser_ctx accounts for 2.3% of self-time (86/3785 samples). BaseParser::add_context_to_parse_tree (which calls add_childVec::push) adds 0.7% (25 samples). BaseParser::exit_rule adds 1.4% (53 samples). Dropping parse tree nodes (drop_in_place<BaseParserRuleContext<...>>) adds ~0.3% (11 samples). alloc::raw_vec::RawVecInner::finish_grow at 0.3% (11 samples) confirms allocation growth in Vec internals.

Combined, parse tree construction and teardown account for ~5% of total self-time, driven by per-rule heap allocations for the children Vec.

Root cause

BaseParserRuleContext stores children as RefCell<Vec<Rc<...>>> (parser_rule_context.rs:241). The constructor new_parser_ctx (line 448) initializes this with RefCell::new(vec![]) — zero capacity.

Every grammar rule invocation allocates a new BaseParserRuleContext, and the first add_child call triggers a heap allocation for the Vec's backing buffer. For a 1 MB Avro IDL input (~28,000 lines), the parser invokes thousands of grammar rules, each creating a Vec that typically holds 1–5 children.

The allocation pattern is: rule enters → new_parser_ctx creates empty Vec → first child added → Vec allocates (capacity 4) → rule exits. For rules with 1–4 children, this means one allocation and one deallocation per rule invocation that could be avoided entirely.

Affected files

  • runtime/Rust/src/parser_rule_context.rs:241BaseParserRuleContext.children field type
  • runtime/Rust/src/parser_rule_context.rs:448–459new_parser_ctx constructor
  • runtime/Rust/src/parser_rule_context.rs:377–379add_child, remove_last_child
  • runtime/Rust/src/parser_rule_context.rs:411–417get_child, get_child_count (in Tree impl)
  • runtime/Rust/Cargo.toml — add smallvec dependency (if SmallVec approach is chosen)

Suggested fix

Option A: SmallVec (preferred)

Replace Vec<Rc<T>> with SmallVec<[Rc<T>; 4]>:

use smallvec::SmallVec;

pub(crate) children: RefCell<SmallVec<[Rc<<Ctx::Ctx as ParserNodeType<'input>>::Type>; 4]>>,

This stores up to 4 children inline (4 × 8 bytes = 32 bytes on x86-64) without a separate heap allocation. Rules with more than 4 children spill to the heap transparently. SmallVec implements Deref<Target=[T]>, so most call sites (iteration, indexing, len(), get()) require no changes.

A capacity of 4 covers the common cases:

  • Leaf rules (token matches): 1 child (the terminal node)
  • Simple field declarations: 2–3 children (type + name + semicolon)
  • Record/enum bodies: variable, but the body rule itself typically
    has many children (spills to heap, which is fine)

Option B: Vec::with_capacity

If adding a smallvec dependency is undesirable, change the constructor to pre-allocate:

children: RefCell::new(Vec::with_capacity(4)),

This still heap-allocates on every rule invocation but avoids the reallocation on first push. The SmallVec approach is strictly better since it avoids the initial allocation entirely for small child counts.

Trade-offs

  • SmallVec adds a crate dependency (~lightweight, widely used).
  • SmallVec increases the inline size of BaseParserRuleContext by ~32 bytes (for the inline buffer). Since these are Rc-allocated anyway, the size increase is on the heap and has minimal stack impact.
  • Vec::with_capacity(4) is a zero-dependency alternative but only reduces reallocation, not the initial allocation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions