Skip to content

auto-capture should strip subagent runtime wrappers before smart extraction #443

@slj130

Description

@slj130

Summary

auto-capture can store subagent runtime wrappers such as [Subagent Context] and [Subagent Task] as long-term memories instead of filtering them out before smart extraction.

Impact

This pollutes both LanceDB and the md mirror with execution scaffolding that should never be treated as user memory.

Observed side effects:

  • noisy recall results
  • lower-quality smart extraction artifacts
  • wrapper text persisted as source=auto-capture
  • lightweight extraction models are more likely to store the wrapper verbatim

Reproduction

A subagent was given a wrapper-prefixed task similar to:

[Subagent Context] You are running as a subagent (depth 1/1)...
[Subagent Task] Reply with a brief acknowledgment only. Facts for automatic memory extraction quality test: ...

After auto-capture ran, one of the resulting memories was effectively the wrapper text itself, including [Subagent Context] / [Subagent Task], instead of only storing the actual user facts.

Expected behavior

Runtime orchestration wrappers should be treated as execution metadata and removed before smart extraction. They should never be persisted as memories.

Root cause

There are two filtering gaps:

  1. normalizeAutoCaptureText() / stripAutoCaptureInjectedPrefix() do not strip subagent runtime wrappers.
  2. stripEnvelopeMetadata() strips channel envelope metadata, but does not strip subagent/task runtime wrapper lines.

So wrapper text can survive all the way into the extraction prompt.

Proposed fix

Add a double guard:

  1. Auto-capture ingress filtering

    • drop messages that are runtime wrapper payloads such as [Subagent Context]... / [Subagent Task]...
  2. Extractor-side fallback filtering

    • strip runtime wrapper lines in stripEnvelopeMetadata() as a second defense
  3. Prompt hardening

    • explicitly tell the extraction model not to store runtime scaffolding / orchestration wrappers as memories
  4. Regression coverage

    • add tests proving wrapper lines are removed from extractor input
    • add an auto-capture regression showing wrapper text does not reach the extracted memory

Notes

This looks like a preprocessing bug first, not primarily a model-selection problem. A stronger model may fail less often, but the root issue is that orchestration metadata is being passed downstream as if it were conversation content.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions