Skip to content

fix: stop session summaries polluting LanceDB tools#486

Open
AliceLJY wants to merge 1 commit intofix/subagent-wrapper-continuation-linesfrom
fix/485-stop-session-summary-pollution
Open

fix: stop session summaries polluting LanceDB tools#486
AliceLJY wants to merge 1 commit intofix/subagent-wrapper-continuation-linesfrom
fix/485-stop-session-summary-pollution

Conversation

@AliceLJY
Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY commented Apr 3, 2026

Summary

  • stop systemSessionMemory from writing session-summary rows into the main LanceDB store
  • filter legacy session-summary rows from memory_recall and memory_list
  • add regression coverage for /new handling and user-facing tool filtering

Testing

  • node --test test/session-summary-before-reset.test.mjs
  • node --test test/recall-text-cleanup.test.mjs
  • node test/plugin-manifest-regression.mjs

Fixes #485.

@github-actions github-actions bot requested a review from rwmjhb April 3, 2026 13:43
@AliceLJY AliceLJY force-pushed the fix/485-stop-session-summary-pollution branch 2 times, most recently from a20ad65 to a66a8bc Compare April 3, 2026 16:31
@AliceLJY
Copy link
Copy Markdown
Collaborator Author

AliceLJY commented Apr 3, 2026

Cleaned up the branch and force-pushed to keep this PR scoped to #485 only. Removed one unrelated commit so the diff now contains just the session-summary filtering fix and its regression coverage.

Re-ran the targeted checks:

  • node --test test/session-summary-before-reset.test.mjs
  • node --test test/recall-text-cleanup.test.mjs
  • node test/plugin-manifest-regression.mjs

@AliceLJY AliceLJY changed the base branch from master to fix/subagent-wrapper-continuation-lines April 3, 2026 16:44
@AliceLJY
Copy link
Copy Markdown
Collaborator Author

AliceLJY commented Apr 3, 2026

Retargeted this PR onto #495 so the session-summary fix stays cleanly scoped while the current subagent-wrapper regression is fixed in a separate baseline PR.

Once #495 merges, this PR can be retargeted back to master with the same single-purpose diff.

@AliceLJY AliceLJY force-pushed the fix/485-stop-session-summary-pollution branch from a66a8bc to 4f77407 Compare April 3, 2026 16:45
@AliceLJY AliceLJY force-pushed the fix/485-stop-session-summary-pollution branch from 4f77407 to c61a9e0 Compare April 3, 2026 16:47
Copy link
Copy Markdown
Collaborator Author

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review passed. Summary:

Write path removed: deleted storeSystemSessionSummary + complex before_reset handler (-108 lines). Replaced with noop logger.

Read-time filtering for legacy data:

  • isSessionSummaryEntry() — metadata type check
  • filterSessionSummaryResults() — post-filter for recall
  • listVisibleEntries() — batch over-fetch + re-page so hidden rows don't consume visible offsets

Recall over-fetch buffer: min(includeFullText ? 40 : 24, max(safeLimit*4, safeLimit+8)).slice(0, safeLimit)

Tests: recall filtering, list pagination with interleaved session-summary rows, offset handling.

Fixes #485. mergeStateStatus: CLEAN.

@rwmjhb ready for your review.

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 4, 2026

Review: fix: stop session summaries polluting LanceDB tools

Good problem — session summaries shouldn't appear in memory_recall / memory_list results. The filtering approach is reasonable.

Must Fix

1. systemSessionMemory silently becomes a no-op

The PR removes session-memory entries from tool results, but also appears to disable the systemSessionMemory pathway. Legacy configurations and plugin-only setups that rely on systemSessionMemory to inject context into prompts will silently lose that behavior.

Should Fix

  • Over-fetch neutralized by retriever cap: memory_recall tries to over-fetch by 20% to compensate for filtered session-summary rows, but src/retriever.ts clamps limits to 1..20. At high limits, the headroom is zero — session-summary rows in top-20 directly reduce visible results.
  • listVisibleEntries performance: the loop repeatedly calls store.list() which does a full-table scan + in-memory sort each iteration. O(iterations × full-scan) on polluted stores.

Also

  • Rebase needed (stale base)

@AliceLJY
Copy link
Copy Markdown
Collaborator Author

AliceLJY commented Apr 4, 2026

Thanks for the detailed review! All points addressed:

Must Fix — systemSessionMemory no-op: Restored the full storeSystemSessionSummary + before_reset handler. Session summaries are now stored as before (for auto-recall context), but filtered from memory_recall / memory_list output — which is the actual pollution fix.

Should Fix — Over-fetch vs retriever cap: Good catch. The retriever clamps to [1, 20], so requesting 40 was dead code. Changed to Math.min(20, safeLimit + 4) — stays within the cap while still leaving headroom for filtered session-summary rows.

Should Fix — listVisibleEntries performance: Replaced the iterative while loop (O(iterations × full-scan)) with a single over-fetch: (offset + limit) * 2 + 20 in one store.list() call, then filter + slice. One scan instead of N.

Also: Rebased onto current upstream/master. All tests green.

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix: stop session summaries polluting LanceDB tools

session-summary 污染 LanceDB 导致数据库膨胀和 recall 结果质量下降,这是高价值修复。

Must Fix

  1. systemSessionMemory 被静默禁用: 当前实现把 systemSessionMemory 变成了 no-op,会 break 依赖这个 strategy 的 legacy 和 plugin-only 配置。应该保留功能但修复写入路径,而不是直接禁用。

  2. Build failure: stale_base=true,rebase 后应该能解决。

Notes

  • PR comment 提到 retarget 到 #495——base branch 是否正确?
  • 已有的污染数据没有清理方案,建议在后续 PR 中加 migration。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sessionStrategy: "systemSessionMemory" 会把无用信息写进lancedb数据库

2 participants