fix: filter subagent runtime wrappers from auto-capture#444
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: effabe284b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
index.ts
Outdated
| if (AUTO_CAPTURE_RUNTIME_WRAPPER_RE.test(trimmed)) { | ||
| return ""; |
There was a problem hiding this comment.
Limit runtime-wrapper drop to metadata-only messages
AUTO_CAPTURE_RUNTIME_WRAPPER_RE matches any text that starts with [Subagent Context] or [Subagent Task] and then stripAutoCaptureRuntimeWrappers() returns an empty string, so normalizeAutoCaptureText() drops the whole message. In agent_end ingestion this will discard legitimate user content whenever wrapper lines are prepended to the same payload (a format already seen in subagent envelopes), causing real facts to never reach extraction. The filter should only remove wrapper-only payloads (or strip wrapper lines) instead of nulling mixed-content messages.
Useful? React with 👍 / 👎.
Review: APPROVE (rebase needed)Good fix — filtering subagent runtime wrappers from auto-capture directly improves recall quality. The three-layer defense (ingress regex, Before merge: rebase onto Two things to be aware of (not blocking):
|
AliceLJY
left a comment
There was a problem hiding this comment.
Clean three-layer defense against subagent runtime wrapper leaking into durable memory. Reviewed:
index.ts — AUTO_CAPTURE_RUNTIME_WRAPPER_RE + stripAutoCaptureRuntimeWrappers(): The regex correctly matches messages that are entirely wrapper content and returns "" to skip auto-capture. The [\s\S]*$ anchoring is intentional here — at the auto-capture level, if a message starts with [Subagent Context] or [Subagent Task], the whole message is runtime scaffolding and should be discarded. Integration point in stripAutoCaptureInjectedPrefix() is in the right position (after metadata stripping, before further processing).
smart-extractor.ts — stripEnvelopeMetadata() gets a new step 0 that strips wrapper lines (not the whole text) using /gim flags. This is the correct granularity for the extraction stage — preserve real conversation that follows wrapper lines.
extraction-prompts.ts — Explicit LLM instruction to never store runtime scaffolding. Good safety net.
Tests — Both the unit test (strip-envelope-metadata.test.mjs) and the integration test (smart-extractor-branches.mjs) cover the right scenarios: wrapper-only messages get filtered at ingress, wrapper lines get stripped before extraction, and the LLM extraction prompt doesn't see wrapper content.
LGTM. Closes #443 cleanly.
effabe2 to
91e6828
Compare
…se 2) - Extend stripEnvelopeMetadata() with 8 new patterns: <<<EXTERNAL_UNTRUSTED_CONTENT, <<<END EXTERNAL_UNTRUSTED_CONTENT, Sender/Conversation info (untrusted metadata), Thread starter, Forwarded message context, [Queued messages while agent was busy] - Add ENVELOPE_NOISE_PATTERNS to noise-filter.ts for pre-embedding guard - Add memory_store tool guard in tools.ts - Add 8 regression test cases in strip-envelope-metadata.test.mjs - Fix PR CortexReach#444 regex bug: subagent wrapper lines now stripped via entire-line matching (was leaving boilerplate on same line) Fixes CortexReach#446
…se 2) - Extend stripEnvelopeMetadata() with 8 new patterns: <<<EXTERNAL_UNTRUSTED_CONTENT, <<<END_EXTERNAL_UNTRUSTED_CONTENT, Sender/Conversation info (untrusted metadata), Thread starter, Forwarded message context, [Queued messages while agent was busy] - Add ENVELOPE_NOISE_PATTERNS to noise-filter.ts for pre-embedding guard - Add memory_store tool guard in tools.ts (strip-then-check approach) - Add 8 regression test cases in strip-envelope-metadata.test.mjs - Fix PR CortexReach#444 regex bug: subagent wrapper lines now stripped via entire-line matching (/^\[Subagent Context|Subagent Task\].*$/gm) - P1 fix: remove pre-filter from filterNoiseByEmbedding (runs before stripEnvelopeMetadata in extraction path, would cause false positives) - P2 fix: memory_store guard now strips first then checks if empty, preserving mixed-content messages Fixes CortexReach#446
Summary
This PR prevents smart extraction from storing subagent runtime scaffolding such as
[Subagent Context]and[Subagent Task]as long-term memories.Closes #443.
What changed
stripEnvelopeMetadata()as a second defenseWhy this approach
This is primarily a preprocessing bug, not just a model-quality issue.
A stronger model might fail less often, but wrapper metadata should never reach the extraction stage in the first place.
Tests
node --test test/strip-envelope-metadata.test.mjsnode test/smart-extractor-branches.mjs