Skip to content

feat(i18n): add German memory trigger patterns and retrieval triggers#489

Open
Banger455 wants to merge 8 commits intoCortexReach:masterfrom
Banger455:feat/german-i18n-triggers
Open

feat(i18n): add German memory trigger patterns and retrieval triggers#489
Banger455 wants to merge 8 commits intoCortexReach:masterfrom
Banger455:feat/german-i18n-triggers

Conversation

@Banger455
Copy link
Copy Markdown

@Banger455 Banger455 commented Apr 3, 2026

Summary

Split from #406 – this PR focuses only on German i18n support.

Changes

index.ts

  • Expand AUTO_CAPTURE_EXPLICIT_REMEMBER_RE with German (merk dir, vergiss nicht, nicht vergessen) and English (remember this) patterns
    • Add 5 German trigger regexes to MEMORY_TRIGGERS: explicit-remember, preferences, decisions, personal facts, temporal markers
    • All patterns use \b word boundaries to prevent false positives on compound words (e.g. Zimmermann, Schwimmerin)
      src/adaptive-retrieval.ts
  • Add 2 German retrieval trigger patterns to FORCE_RETRIEVE_PATTERNS: conversational recall (erinnerst du dich, weißt du noch) and temporal cues (gestern, neulich, kürzlich)
    test/german-i18n-triggers.test.mjs (new)
  • 54 test cases covering: 27 positive capture triggers, 10 compound-word false-positive guards, 12+ retrieval triggers, explicit-remember consistency checks

Test plan

Incidental Changes (noted for reviewers)

src/adaptive-retrieval.tsnormalizeQuery()

  • Removed stale blank lines and reordered trailing comment; no logic change.

src/adaptive-retrieval.tsSKIP_PATTERNS regex flag /i/iu

  • Required for correct Unicode case-folding on the new German patterns (e.g. ß, ü, ä).
  • Existing CJK patterns are unaffected — none rely on case-insensitive matching.

Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The German trigger additions in index.ts are well-crafted (good use of \b word boundaries, solid false-positive guards in tests). However, src/adaptive-retrieval.ts has a critical encoding corruption that breaks existing functionality.

Blocking: UTF-8 mojibake in adaptive-retrieval.ts

The existing CJK characters and emoji in SKIP_PATTERNS and FORCE_RETRIEVE_PATTERNS have been corrupted. The diff shows:

SKIP_PATTERNS — emoji corrupted:

-  /^(...|👍|👎|✅|❌)\s*[.!]?$/i,
+  /^(...|ð|ð|â|â)\s*[.!]?$/i,

SKIP_PATTERNS — CJK corrupted:

-  /^(...|实施|實施|开始|開始|继续|繼續|好的|可以|行)\s*[.!]?$/i,
+  /^(...|宿½|實æ½|å¼å§|éå§|ç»§ç»­|ç¹¼çº|好ç|å¯ä»¥|è¡)\s*[.!]?$/i,

FORCE_RETRIEVE_PATTERNS — CJK corrupted:

-  /(你记得|[你妳]記得|之前|上次|以前|还记得|還記得|提到过|提到過|说过|說過)/i,
+  /(ä½ è®°å¾|[ä½ å¦³]è¨å¾|ä¹å|䏿¬¡|以å|è¿è®°å¾|éè¨å¾|æå°è¿|æå°é|说è¿|說é)/i,

Also corrupted: The comment on L77 ("你记得吗""ä½ è®°å¾å"), the full-width question mark on L72/L81 (ï¼), and German umlauts in the new patterns (weißtweiÃt, früherfrüher, kürzlichkürzlich).

This is classic UTF-8-as-Latin-1 mojibake. The file was likely saved or committed with wrong encoding. This will break all existing Chinese retrieval triggers and emoji skip patterns at runtime.

What needs to happen

  1. Re-save src/adaptive-retrieval.ts with correct UTF-8 encoding. Make sure your editor/git config preserves UTF-8.
  2. Verify the German additions (weißt du noch, früher, kürzlich) also come through as proper UTF-8.
  3. The test file has the same mojibake in test descriptions (the à artifacts), though those are cosmetic since assertions test against the imported functions, not source text.

Non-blocking notes on index.ts (looks good)

  • The German trigger regexes in MEMORY_TRIGGERS are well-structured
  • \b word boundaries correctly prevent Zimmermann/Schwimmerin false positives
  • The scoped immer pattern (immer\s+(?:wenn|daran|denken|merken|beachten)) is a smart improvement over the bare immer suggested in #393
  • Test coverage is thorough (27 positive, 10 negative, 12 retrieval, explicit-remember consistency)

Please fix the encoding issue and force-push. Happy to re-review after.

@Banger455
Copy link
Copy Markdown
Author

Thanks for the thorough review @AliceLJY! Fixed in the latest commit — src/adaptive-retrieval.ts has been rewritten with correct UTF-8 encoding. All CJK characters, emoji (👍👎✅❌), and German umlauts (weißt, früher, kürzlich) are now properly encoded. The German retrieval trigger patterns are in place as intended.

Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the encoding in the source files (index.ts, adaptive-retrieval.ts) — those are clean now ✅

However, the test file test/german-i18n-triggers.test.mjs still has mojibake:

Line Corrupted Should be
33 für später für später
41 über CI über CI
45 für dev für dev
193 WeiÃt du noch Weißt du noch
229 erwähnt erwähnt

Critical: Line 193 tests shouldSkipRetrieval("WeiÃt du noch...") — this passes by coincidence (returns false because it doesn't match ANY pattern, not because the German weißt du noch trigger fired). The test gives false confidence without actually validating the German retrieval trigger.

Please re-save the test file with proper UTF-8 encoding. The fix for the source files worked perfectly — just need to apply the same treatment to the test file.

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 4, 2026

Review: feat(i18n): add German memory trigger patterns and retrieval triggers

Good direction — German users currently get zero capture/retrieval support. The implementation code (index.ts, adaptive-retrieval.ts) looks correct for proper UTF-8 input.

Must Fix

1. Test file has UTF-8 mojibake — tests pass by coincidence

test/german-i18n-triggers.test.mjs contains corrupted strings: für instead of für, WeiÃt instead of Weißt, heiÃt instead of heißt. The shouldCapture("Mein Projekt heiÃt OpenClaw") test actually fails when run directly. Other mojibake tests pass by coincidence (the corrupted suffix doesn't affect the trigger match).

AliceLJY flagged this in their second review — still unfixed.

2. German memories classified as other instead of proper categories

detectCategory() has no German branches. German preferences, facts, and decisions all return "other" → mapped to patterns / working layer. English equivalents get preference / durable. This means German memories have weaker retention and category-aware recall.

Nice to Have

  • intent-analyzer.ts also has no German rules — German queries miss category boost in auto-recall ranking
  • Undocumented /iu flag change on SKIP_PATTERNS regex

@Banger455
Copy link
Copy Markdown
Author

Both issues addressed in latest commits:

@AliceLJYtest/german-i18n-triggers.test.mjs encoding fixed (9e56f31). All umlauts/ß/em-dashes now correct UTF-8. Line 193 Weißt du noch now properly validates the German retrieval trigger.

@rwmjhbdetectCategory() now has German branches in all four categories (1ca6c2b):

  • preference: bevorzuge, ich mag, ich hasse, ich will, ich brauche, lieber, am liebsten
  • decision: entschieden, wir nutzen, ab jetzt, ab sofort, in zukunft, wechseln zu, umsteigen
  • entity: heiße, heißt, mein name
  • fact: ist, sind, hat, haben (inside existing \b group)

Ready for re-review.

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(i18n): add German memory trigger patterns and retrieval triggers

解决德语用户 memory trigger 静默丢失的方向是对的。但测试文件有 UTF-8 编码问题需要先修:

Must Fix

  1. Test file mojibake: test/german-i18n-triggers.test.mjs 中多处德语字符损坏——für(应为 für)、WeiÃt(应为 Weißt)、erwähnt(应为 erwähnt)。这些测试之所以通过,是因为恰好不匹配任何 pattern,而非真正验证了德语 trigger。AliceLJY 在 4/4 的 review 中已指出此问题。

  2. Category classification: 德语 auto-captured memories 仍被分类为 other,会被降级为 generic working-memory,削弱了这个 feature 的实际效果。

Scope Drift

  • src/adaptive-retrieval.tsnormalizeQuery() 的空行删除和注释重排与德语 i18n 无关。
  • SKIP_PATTERNS 的 regex flag 从 /i 改为 /iu 是功能性变更,可能影响现有 CJK pattern 匹配,但 PR description 未提及。

请修复 test file 编码问题,确保德语 trigger 被真正测试到。

@Banger455
Copy link
Copy Markdown
Author

Three follow-up commits addressing all review feedback:

9e56f31 fix(i18n): restore correct UTF-8 encoding in test file
All 27 mojibake instances in test/german-i18n-triggers.test.mjs replaced (ü/ä/ö/ß/—). Tests now actually validate German patterns instead of passing by coincidence.

1ca6c2b feat(i18n): add German patterns to detectCategory()
German auto-captured memories are now classified as preference/decision/entity/fact instead of falling through to other. Four pattern groups added to existing branches.

cbf9f2b fix(ci): sync stripEnvelopeMetadata with master — two-step subagent stripping
The branch was forked before a master fix reached stripEnvelopeMetadata(). Synced to the two-step approach (strip prefix, then strip boilerplate lines separately). This unblocks the cli-smoke CI job.

Also updated the PR description with an Incidental Changes section documenting the normalizeQuery() whitespace cleanup and /i/iu flag rationale (Unicode case-folding for ß/ü/ä; no impact on existing CJK patterns).

@Banger455 Banger455 closed this Apr 5, 2026
@Banger455 Banger455 reopened this Apr 5, 2026
@Banger455
Copy link
Copy Markdown
Author

Follow-up fix after deep review:

fix(ci): restore full stripEnvelopeMetadata - Previous commit used a brace-counter that matched } inside a regex literal as the function closing brace, truncating steps 1-4 (System timestamps, metadata sections, JSON block stripping, blank-line collapse). This restores the complete function with the two-step subagent fix plus all original metadata stripping steps.

fix(i18n): use /iu flag on detectCategory entity branch - The entity branch regex contains heiße/heißt which needs Unicode case-folding (/iu) to match all-caps input. Changed /i to /iu on that branch.

- Expand AUTO_CAPTURE_EXPLICIT_REMEMBER_RE with German and English patterns
- Add 5 German trigger regexes to MEMORY_TRIGGERS array
- Covers: remember/merk dir, preferences, decisions, personal facts, temporal markers
Previous commit introduced mojibake due to encoding mismatch in the web editor. This commit restores correct UTF-8 for all CJK characters, emoji, and German umlauts (weißt, früher, kürzlich).
Fix mojibake in test descriptions and assertions: ü/ä/ö/ß/— now correctly encoded. Critical: line 193 "Weißt du noch" now actually validates the German retrieval trigger instead of passing by coincidence.
Add German keywords to all four detectCategory() branches so German memories are classified correctly (preference/decision/entity/fact) instead of falling through to "other". Addresses rwmjhb review feedback.
@Banger455 Banger455 force-pushed the feat/german-i18n-triggers branch from 66a7651 to 852e4f3 Compare April 5, 2026 20:59
@Banger455
Copy link
Copy Markdown
Author

Hey, rebased onto current master and cleaned things up a bit:

  • Dropped the smart-extractor.ts changes entirely — master already has a much better solution with the extracted helper functions, so our two-step regex approach wasn't needed anymore. That also addresses the scope drift concern from @rwmjhb.
  • The detectCategory() German branches, retrieval triggers, and test file are all still there and working fine.
  • All 57 tests pass locally after rebase.

So the diff is now just 3 files (index.ts, adaptive-retrieval.ts, test file) — pure i18n, nothing else.

CI will probably need an "Approve and run" from a maintainer since this is a fork PR. Would appreciate a re-review when you get a chance!

- Add \b word boundaries to `ich will`, `ich mag` in MEMORY_TRIGGERS
  to prevent substring matches (e.g. "Ich willkommen")
- Add \b word boundary to `entschieden` in detectCategory() to prevent
  matching "unentschieden"
- Require `ich` prefix for `bevorzuge` in detectCategory() preference
  branch, consistent with other German patterns
- Extend FORCE_RETRIEVE_PATTERNS with `damals`, `letzte Woche/Zeit`
- Add detectCategory() test coverage for German (preference, decision,
  entity, fact + substring false-positive prevention)
- Add false-positive regression tests (willkommen, unentschieden)
- 69 tests total (was 57), all passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Banger455 pushed a commit to Banger455/memory-lancedb-pro that referenced this pull request Apr 6, 2026
Documents OpenClaw instance setup, active agents, PR CortexReach#489 status,
local paths and working rules — persistent context for future sessions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants