-
Notifications
You must be signed in to change notification settings - Fork 665
auto-capture should strip subagent runtime wrappers before smart extraction #443
Description
Summary
auto-capture can store subagent runtime wrappers such as [Subagent Context] and [Subagent Task] as long-term memories instead of filtering them out before smart extraction.
Impact
This pollutes both LanceDB and the md mirror with execution scaffolding that should never be treated as user memory.
Observed side effects:
- noisy recall results
- lower-quality smart extraction artifacts
- wrapper text persisted as
source=auto-capture - lightweight extraction models are more likely to store the wrapper verbatim
Reproduction
A subagent was given a wrapper-prefixed task similar to:
[Subagent Context] You are running as a subagent (depth 1/1)...
[Subagent Task] Reply with a brief acknowledgment only. Facts for automatic memory extraction quality test: ...
After auto-capture ran, one of the resulting memories was effectively the wrapper text itself, including [Subagent Context] / [Subagent Task], instead of only storing the actual user facts.
Expected behavior
Runtime orchestration wrappers should be treated as execution metadata and removed before smart extraction. They should never be persisted as memories.
Root cause
There are two filtering gaps:
normalizeAutoCaptureText()/stripAutoCaptureInjectedPrefix()do not strip subagent runtime wrappers.stripEnvelopeMetadata()strips channel envelope metadata, but does not strip subagent/task runtime wrapper lines.
So wrapper text can survive all the way into the extraction prompt.
Proposed fix
Add a double guard:
-
Auto-capture ingress filtering
- drop messages that are runtime wrapper payloads such as
[Subagent Context].../[Subagent Task]...
- drop messages that are runtime wrapper payloads such as
-
Extractor-side fallback filtering
- strip runtime wrapper lines in
stripEnvelopeMetadata()as a second defense
- strip runtime wrapper lines in
-
Prompt hardening
- explicitly tell the extraction model not to store runtime scaffolding / orchestration wrappers as memories
-
Regression coverage
- add tests proving wrapper lines are removed from extractor input
- add an auto-capture regression showing wrapper text does not reach the extracted memory
Notes
This looks like a preprocessing bug first, not primarily a model-selection problem. A stronger model may fail less often, but the root issue is that orchestration metadata is being passed downstream as if it were conversation content.