Skip to content

fix: filter subagent runtime wrappers from auto-capture#444

Merged
rwmjhb merged 1 commit intoCortexReach:masterfrom
slj130:fix/filter-subagent-runtime-wrappers
Apr 3, 2026
Merged

fix: filter subagent runtime wrappers from auto-capture#444
rwmjhb merged 1 commit intoCortexReach:masterfrom
slj130:fix/filter-subagent-runtime-wrappers

Conversation

@slj130
Copy link
Copy Markdown
Contributor

@slj130 slj130 commented Apr 1, 2026

Summary

This PR prevents smart extraction from storing subagent runtime scaffolding such as [Subagent Context] and [Subagent Task] as long-term memories.

Closes #443.

What changed

  • filter wrapper-only subagent payloads during auto-capture ingress
  • strip runtime wrapper lines in stripEnvelopeMetadata() as a second defense
  • harden the extraction prompt so runtime scaffolding is explicitly excluded
  • add regression coverage for both envelope stripping and the auto-capture path

Why this approach

This is primarily a preprocessing bug, not just a model-quality issue.
A stronger model might fail less often, but wrapper metadata should never reach the extraction stage in the first place.

Tests

  • node --test test/strip-envelope-metadata.test.mjs
  • node test/smart-extractor-branches.mjs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: effabe284b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

index.ts Outdated
Comment on lines +803 to +804
if (AUTO_CAPTURE_RUNTIME_WRAPPER_RE.test(trimmed)) {
return "";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Limit runtime-wrapper drop to metadata-only messages

AUTO_CAPTURE_RUNTIME_WRAPPER_RE matches any text that starts with [Subagent Context] or [Subagent Task] and then stripAutoCaptureRuntimeWrappers() returns an empty string, so normalizeAutoCaptureText() drops the whole message. In agent_end ingestion this will discard legitimate user content whenever wrapper lines are prepended to the same payload (a format already seen in subagent envelopes), causing real facts to never reach extraction. The filter should only remove wrapper-only payloads (or strip wrapper lines) instead of nulling mixed-content messages.

Useful? React with 👍 / 👎.

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 2, 2026

Review: APPROVE (rebase needed)

Good fix — filtering subagent runtime wrappers from auto-capture directly improves recall quality. The three-layer defense (ingress regex, stripEnvelopeMetadata, prompt guidance) is solid.

Before merge: rebase onto main — the reflection-bypass-hook test failure is pre-existing, unrelated to your changes.

Two things to be aware of (not blocking):

  1. Ingress filter drops entire messagesstripAutoCaptureRuntimeWrappers returns empty string for the whole message when the first line matches [Subagent Context] or [Subagent Task]. If a message mixes wrapper lines with real user content (e.g., wrapper header + factual statements), the facts are silently lost. Consider stripping only the matching lines instead of the entire message.

  2. stripEnvelopeMetadata only strips first line of multiline wrappers — the regex is line-scoped, so continuation lines from a multi-line [Subagent Context] block survive into the extraction prompt.

Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean three-layer defense against subagent runtime wrapper leaking into durable memory. Reviewed:

index.tsAUTO_CAPTURE_RUNTIME_WRAPPER_RE + stripAutoCaptureRuntimeWrappers(): The regex correctly matches messages that are entirely wrapper content and returns "" to skip auto-capture. The [\s\S]*$ anchoring is intentional here — at the auto-capture level, if a message starts with [Subagent Context] or [Subagent Task], the whole message is runtime scaffolding and should be discarded. Integration point in stripAutoCaptureInjectedPrefix() is in the right position (after metadata stripping, before further processing).

smart-extractor.tsstripEnvelopeMetadata() gets a new step 0 that strips wrapper lines (not the whole text) using /gim flags. This is the correct granularity for the extraction stage — preserve real conversation that follows wrapper lines.

extraction-prompts.ts — Explicit LLM instruction to never store runtime scaffolding. Good safety net.

Tests — Both the unit test (strip-envelope-metadata.test.mjs) and the integration test (smart-extractor-branches.mjs) cover the right scenarios: wrapper-only messages get filtered at ingress, wrapper lines get stripped before extraction, and the LLM extraction prompt doesn't see wrapper content.

LGTM. Closes #443 cleanly.

@slj130 slj130 force-pushed the fix/filter-subagent-runtime-wrappers branch from effabe2 to 91e6828 Compare April 2, 2026 13:09
@rwmjhb rwmjhb merged commit 56dcc0a into CortexReach:master Apr 3, 2026
2 of 3 checks passed
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 3, 2026
…se 2)

- Extend stripEnvelopeMetadata() with 8 new patterns:
  <<<EXTERNAL_UNTRUSTED_CONTENT, <<<END EXTERNAL_UNTRUSTED_CONTENT,
  Sender/Conversation info (untrusted metadata), Thread starter,
  Forwarded message context, [Queued messages while agent was busy]
- Add ENVELOPE_NOISE_PATTERNS to noise-filter.ts for pre-embedding guard
- Add memory_store tool guard in tools.ts
- Add 8 regression test cases in strip-envelope-metadata.test.mjs
- Fix PR CortexReach#444 regex bug: subagent wrapper lines now stripped via
  entire-line matching (was leaving boilerplate on same line)

Fixes CortexReach#446
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 3, 2026
…se 2)

- Extend stripEnvelopeMetadata() with 8 new patterns:
  <<<EXTERNAL_UNTRUSTED_CONTENT, <<<END_EXTERNAL_UNTRUSTED_CONTENT,
  Sender/Conversation info (untrusted metadata), Thread starter,
  Forwarded message context, [Queued messages while agent was busy]
- Add ENVELOPE_NOISE_PATTERNS to noise-filter.ts for pre-embedding guard
- Add memory_store tool guard in tools.ts (strip-then-check approach)
- Add 8 regression test cases in strip-envelope-metadata.test.mjs
- Fix PR CortexReach#444 regex bug: subagent wrapper lines now stripped via
  entire-line matching (/^\[Subagent Context|Subagent Task\].*$/gm)
- P1 fix: remove pre-filter from filterNoiseByEmbedding (runs before
  stripEnvelopeMetadata in extraction path, would cause false positives)
- P2 fix: memory_store guard now strips first then checks if empty,
  preserving mixed-content messages

Fixes CortexReach#446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

auto-capture should strip subagent runtime wrappers before smart extraction

3 participants