-
Notifications
You must be signed in to change notification settings - Fork 3
Potential perf win: Use SmallVec for parse tree context children #35
Description
As discussed on Bluesky, here's one of the LLM-generated performance improvement suggestions from profiling https://github.com/jonhoo/avdl on (synthetic) bigger .avdl files with the ANTLR-generated parser. I'd be happy to test out changes if you don't have a benchmark handy!
Symptom
BaseParserRuleContext::new_parser_ctx accounts for 2.3% of self-time (86/3785 samples). BaseParser::add_context_to_parse_tree (which calls add_child → Vec::push) adds 0.7% (25 samples). BaseParser::exit_rule adds 1.4% (53 samples). Dropping parse tree nodes (drop_in_place<BaseParserRuleContext<...>>) adds ~0.3% (11 samples). alloc::raw_vec::RawVecInner::finish_grow at 0.3% (11 samples) confirms allocation growth in Vec internals.
Combined, parse tree construction and teardown account for ~5% of total self-time, driven by per-rule heap allocations for the children Vec.
Root cause
BaseParserRuleContext stores children as RefCell<Vec<Rc<...>>> (parser_rule_context.rs:241). The constructor new_parser_ctx (line 448) initializes this with RefCell::new(vec![]) — zero capacity.
Every grammar rule invocation allocates a new BaseParserRuleContext, and the first add_child call triggers a heap allocation for the Vec's backing buffer. For a 1 MB Avro IDL input (~28,000 lines), the parser invokes thousands of grammar rules, each creating a Vec that typically holds 1–5 children.
The allocation pattern is: rule enters → new_parser_ctx creates empty Vec → first child added → Vec allocates (capacity 4) → rule exits. For rules with 1–4 children, this means one allocation and one deallocation per rule invocation that could be avoided entirely.
Affected files
runtime/Rust/src/parser_rule_context.rs:241—BaseParserRuleContext.childrenfield typeruntime/Rust/src/parser_rule_context.rs:448–459—new_parser_ctxconstructorruntime/Rust/src/parser_rule_context.rs:377–379—add_child,remove_last_childruntime/Rust/src/parser_rule_context.rs:411–417—get_child,get_child_count(inTreeimpl)runtime/Rust/Cargo.toml— addsmallvecdependency (if SmallVec approach is chosen)
Suggested fix
Option A: SmallVec (preferred)
Replace Vec<Rc<T>> with SmallVec<[Rc<T>; 4]>:
use smallvec::SmallVec;
pub(crate) children: RefCell<SmallVec<[Rc<<Ctx::Ctx as ParserNodeType<'input>>::Type>; 4]>>,This stores up to 4 children inline (4 × 8 bytes = 32 bytes on x86-64) without a separate heap allocation. Rules with more than 4 children spill to the heap transparently. SmallVec implements Deref<Target=[T]>, so most call sites (iteration, indexing, len(), get()) require no changes.
A capacity of 4 covers the common cases:
- Leaf rules (token matches): 1 child (the terminal node)
- Simple field declarations: 2–3 children (type + name + semicolon)
- Record/enum bodies: variable, but the body rule itself typically
has many children (spills to heap, which is fine)
Option B: Vec::with_capacity
If adding a smallvec dependency is undesirable, change the constructor to pre-allocate:
children: RefCell::new(Vec::with_capacity(4)),This still heap-allocates on every rule invocation but avoids the reallocation on first push. The SmallVec approach is strictly better since it avoids the initial allocation entirely for small child counts.
Trade-offs
- SmallVec adds a crate dependency (~lightweight, widely used).
- SmallVec increases the inline size of
BaseParserRuleContextby ~32 bytes (for the inline buffer). Since these are Rc-allocated anyway, the size increase is on the heap and has minimal stack impact. Vec::with_capacity(4)is a zero-dependency alternative but only reduces reallocation, not the initial allocation.