feat(memory): add access tracking for synaptic retrieval (ADR-005 Phase 1)#183
feat(memory): add access tracking for synaptic retrieval (ADR-005 Phase 1)#183
Conversation
Propose integrating SynapticRAG (Hou et al., ACL 2025) concepts into Hrafn's memory retrieval pipeline: temporal association scoring, stimulus propagation on the knowledge graph, and LIF-based dynamic memory selection. Four-phase implementation plan with risk tiers. https://claude.ai/code/session_01TwL6amvP6YXc6Ht918fvaA
Address CodeRabbit review feedback: - Add language identifiers to all fenced code blocks (MD040) - Add blank lines around lists (MD032) - Fix table column alignment (MD060) - Wrap lines to <=80 characters (MD013) - Clarify that disabling w_temp triggers explicit renormalization of w_vec/w_kw with a documented fallback formula https://claude.ai/code/session_01TwL6amvP6YXc6Ht918fvaA
- Add KnowledgeNode-to-MemoryEntry bridge (memory_kg_links table) to resolve the critical entity-linking gap in Phase 3 - Clarify query-relative vs pairwise temporal scoring (Phase 2 vs Phase 3) - Document hybrid_merge API signature change and call-site impact - Add weight migration path: w_temp defaults to 0.0 for existing configs, all weights renormalized to sum to 1.0 when enabled - Note cache_key prerequisite fix (missing since/until params) - Add temporal_candidate_cap (default 20) to bound O(n^2) cost - Align dataset names with paper (SMRCs-EN/JA, PerLTQA-EN/CN) https://claude.ai/code/session_01TwL6amvP6YXc6Ht918fvaA
Correct weight defaults (0.7/0.3, not 0.5/0.3), fix hybrid_merge call site list (only sqlite.rs), update fallback weights, and add context about the existing audit infrastructure that Phase 1 extends. https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
cache_key() omitted the since/until time-range parameters, causing two recall queries with different time ranges to share a cache entry. Add both parameters to the key format string and add test coverage. Prerequisite for ADR-005 Phase 1 (synaptic memory retrieval). https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
New opt-in config for per-result access logging (ADR-005 Phase 1). Double-gated: requires audit_enabled = true. Defaults to false. https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
Add memory_access_log table to audit.db that records which specific memory entries are returned by each recall operation. This produces the "spike train" data needed for Phase 2 temporal association scoring. - New table with memory_id, query, score, and timestamp per result - log_recall_results() batch-inserts in a single transaction - get_access_spike_train() returns chronological timestamps per memory - get_coaccessed_memories() finds temporally co-accessed entries - Opt-in via access_tracking constructor parameter - Pruning extended to cover both audit and access log tables https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
Conditionally wrap all memory backends with AuditedMemory when audit_enabled is set. Uses maybe_wrap_audit() helper to wrap concrete backend types before boxing, preserving type safety. All return paths (SQLite, Lucid, MuninnDB, Qdrant, Markdown) are covered. Migration factory passes audit_enabled=false. https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
📝 WalkthroughWalkthroughImplements Phase 1 of ADR-005: adds optional per-memory access tracking, creates Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Pipeline as RetrievalPipeline
participant Memory as AuditedMemory
participant DB as AuditDB
Client->>Pipeline: recall(query, session_id, namespace, since?, until?)
Pipeline->>Pipeline: cache_key(query, limit, session_id, namespace, since, until)
Pipeline->>Memory: recall_namespaced(query, session_id, namespace, since, until)
Memory->>DB: insert rows into memory_access_log (per-memory, namespace, session, accessed_at)
DB-->>Memory: insert OK
Memory-->>Pipeline: recall results
Pipeline-->>Client: results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
🔍 Cymbal Impact AnalysisChanged symbols and their upstream callers: {
"results": [
{
"symbol": "save_model_cache_state",
"caller": "cache_live_models_for_provider",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 1854,
"depth": 1,
"context": [
"",
" save_model_cache_state(workspace_dir, \u0026state).await",
"}"
]
},
{
"symbol": "save_model_cache_state",
"caller": "model_cache_ttl_filters_stale_entries",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7590,
"depth": 1,
"context": [
"",
" save_model_cache_state(tmp.path(), \u0026stale).await.unwrap();",
""
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "run_models_refresh",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 1984,
"depth": 2,
"context": [
" Ok(models) if !models.is_empty() =\u003e {",
" cache_live_models_for_provider(\u0026config.workspace_dir, \u0026provider_name, \u0026models).await?;",
" println!("
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "setup_provider",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 2939,
"depth": 2,
"context": [
" Ok(live_model_ids) if !live_model_ids.is_empty() =\u003e {",
" cache_live_models_for_provider(",
" workspace_dir,"
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "model_cache_round_trip_returns_fresh_entry",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7565,
"depth": 2,
"context": [
"",
" cache_live_models_for_provider(tmp.path(), \"openai\", \u0026models)",
" .await"
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "run_models_refresh_uses_fresh_cache_without_network",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7607,
"depth": 2,
"context": [
"",
" cache_live_models_for_provider(tmp.path(), \"openai\", \u0026[\"gpt-5.1\".to_string()])",
" .await"
]
}
],
"version": "0.1"
} |
|
@coderabbitai full review Generated by Claude Code |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
src/config/schema.rs (1)
5290-5294: Enforce theaudit_enabledprerequisite in config validation.Line [5291] documents that
access_tracking_enabledrequires audit, but this invariant is not fail-fast validated in this file. Consider rejecting invalid combinations early to avoid silent misconfiguration.♻️ Proposed guard in
Config::validate()pub fn validate(&self) -> Result<()> { + if self.memory.access_tracking_enabled && !self.memory.audit_enabled { + anyhow::bail!( + "memory.access_tracking_enabled requires memory.audit_enabled = true" + ); + } + // Tunnel — OpenVPN if self.tunnel.provider.trim() == "openvpn" {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/schema.rs` around lines 5290 - 5294, The config documents that access_tracking_enabled requires audit_enabled, but there's no fail-fast check; update Config::validate() to reject configs where access_tracking_enabled is true while audit_enabled is false by returning/propagating a validation error (e.g., Config::validate should return an Err variant or panic with a clear message) referencing the fields access_tracking_enabled and audit_enabled; place the guard early in Config::validate() alongside other inter-field checks and use the existing validation error type/format used elsewhere in validate() for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/memory/audit.rs`:
- Around line 230-257: The query in get_coaccessed_memories currently only joins
on memory_id and timestamp, causing counts to mix different namespaces/sessions;
update the SQL in the prepared statement (still in memory_access_log JOIN in
get_coaccessed_memories) to also require b.namespace = a.namespace and
b.session_id = a.session_id (or whichever column names are used for
namespace/session) in the ON clause so only accesses from the same
namespace/session are considered, keep the existing window_secs/time condition,
and ensure the params and row mapping (memory_id, window_secs => row.get for
memory_id and count) remain consistent with the revised query.
- Around line 119-149: log_recall_results is currently inserting the
request-scoped namespace into memory_access_log (or NULL) instead of the actual
recalled entry's namespace; change it to prefer the MemoryEntry::namespace when
present and only fall back to the function parameter namespace. Concretely,
inside log_recall_results compute a single namespace value (e.g., let entry_ns =
entry.namespace.as_deref().or(namespace)) and pass that entry_ns to the INSERT
so the persisted namespace reflects the recalled entry’s namespace rather than
always using the request namespace or None.
In `@src/memory/mod.rs`:
- Around line 87-110: The None backend path skips the audit wrapper so
audit_enabled/access_tracking are ignored for MemoryBackendKind::None; replace
the direct Ok(Box::new(NoneMemory::new())) return with a call to
maybe_wrap_audit(Box::new(NoneMemory::new()), audit_enabled, access_tracking,
workspace_dir) so the NoneMemory instance is wrapped consistently like the other
branches (refer to MemoryBackendKind::None, NoneMemory::new, and
maybe_wrap_audit).
In `@src/memory/retrieval.rs`:
- Around line 71-79: The cache key built with format! using raw colon
concatenation (the format! call that combines query, limit, session_id,
namespace, since, until) is ambiguous and can collide for different inputs;
replace it with a collision-safe encoder—e.g., serialize a small struct or tuple
containing {query, limit, session_id, namespace, since, until} via
serde_json::to_string or percent-encode each field and use explicit sentinels
for None (like "__NONE__")—so that session_id.unwrap_or(""),
namespace.unwrap_or("") etc. are not conflated and distinct inputs always
produce distinct keys.
---
Nitpick comments:
In `@src/config/schema.rs`:
- Around line 5290-5294: The config documents that access_tracking_enabled
requires audit_enabled, but there's no fail-fast check; update
Config::validate() to reject configs where access_tracking_enabled is true while
audit_enabled is false by returning/propagating a validation error (e.g.,
Config::validate should return an Err variant or panic with a clear message)
referencing the fields access_tracking_enabled and audit_enabled; place the
guard early in Config::validate() alongside other inter-field checks and use the
existing validation error type/format used elsewhere in validate() for
consistency.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d4cd5a85-3cad-49f8-b3f9-c76d9d1c7e71
📒 Files selected for processing (7)
docs/architecture/adr-005-synaptic-memory-retrieval.mdsrc/config/schema.rssrc/memory/audit.rssrc/memory/battle_tests.rssrc/memory/mod.rssrc/memory/retrieval.rssrc/onboard/wizard.rs
- Use entry namespace in access log instead of request namespace - Scope co-access query to same namespace/session to prevent false temporal associations across unrelated contexts - Make cache key collision-safe with length-prefixed encoding - Wrap NoneMemory backend with audit decorator consistently - Add config validation: access_tracking_enabled requires audit_enabled https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
🔍 Cymbal Impact AnalysisChanged symbols and their upstream callers: {
"results": [
{
"symbol": "save_model_cache_state",
"caller": "cache_live_models_for_provider",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 1854,
"depth": 1,
"context": [
"",
" save_model_cache_state(workspace_dir, \u0026state).await",
"}"
]
},
{
"symbol": "save_model_cache_state",
"caller": "model_cache_ttl_filters_stale_entries",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7590,
"depth": 1,
"context": [
"",
" save_model_cache_state(tmp.path(), \u0026stale).await.unwrap();",
""
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "run_models_refresh",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 1984,
"depth": 2,
"context": [
" Ok(models) if !models.is_empty() =\u003e {",
" cache_live_models_for_provider(\u0026config.workspace_dir, \u0026provider_name, \u0026models).await?;",
" println!("
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "setup_provider",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 2939,
"depth": 2,
"context": [
" Ok(live_model_ids) if !live_model_ids.is_empty() =\u003e {",
" cache_live_models_for_provider(",
" workspace_dir,"
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "model_cache_round_trip_returns_fresh_entry",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7565,
"depth": 2,
"context": [
"",
" cache_live_models_for_provider(tmp.path(), \"openai\", \u0026models)",
" .await"
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "run_models_refresh_uses_fresh_cache_without_network",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7607,
"depth": 2,
"context": [
"",
" cache_live_models_for_provider(tmp.path(), \"openai\", \u0026[\"gpt-5.1\".to_string()])",
" .await"
]
}
],
"version": "0.1"
} |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/memory/audit.rs (1)
133-145:⚠️ Potential issue | 🟠 MajorFlip the namespace precedence before inserting access rows.
Line 135 still prefers
request_namespaceover the recalled entry’s namespace, sorecall_namespaced()can persist the request scope even when the returnedMemoryEntrybelongs to a different namespace. That breaks the namespace-scoped analysis this table is supposed to support.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/memory/audit.rs` around lines 133 - 145, The namespace precedence is reversed: instead of using request_namespace.unwrap_or(entry.namespace), prefer the recalled entry's namespace and fall back to the request-scoped value. Update the assignment that computes namespace (currently using request_namespace and entry.namespace) so it selects entry.namespace when present and only uses request_namespace as a fallback before inserting the memory_access_log rows in the loop that handles results (the block referencing entry.namespace, request_namespace, and the conn.execute call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/memory/audit.rs`:
- Around line 133-145: The namespace precedence is reversed: instead of using
request_namespace.unwrap_or(entry.namespace), prefer the recalled entry's
namespace and fall back to the request-scoped value. Update the assignment that
computes namespace (currently using request_namespace and entry.namespace) so it
selects entry.namespace when present and only uses request_namespace as a
fallback before inserting the memory_access_log rows in the loop that handles
results (the block referencing entry.namespace, request_namespace, and the
conn.execute call).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6ba4d070-9a69-49fb-9670-3d7ed316499d
📒 Files selected for processing (4)
src/config/schema.rssrc/memory/audit.rssrc/memory/mod.rssrc/memory/retrieval.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- src/memory/mod.rs
- src/memory/retrieval.rs
MemoryEntry::namespace is always populated (defaults to "default"), so it is the authoritative source. Only fall back to request namespace if the entry namespace is empty (defensive, should not happen). https://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN
🔍 Cymbal Impact AnalysisChanged symbols and their upstream callers: {
"results": [
{
"symbol": "save_model_cache_state",
"caller": "cache_live_models_for_provider",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 1854,
"depth": 1,
"context": [
"",
" save_model_cache_state(workspace_dir, \u0026state).await",
"}"
]
},
{
"symbol": "save_model_cache_state",
"caller": "model_cache_ttl_filters_stale_entries",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7590,
"depth": 1,
"context": [
"",
" save_model_cache_state(tmp.path(), \u0026stale).await.unwrap();",
""
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "run_models_refresh",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 1984,
"depth": 2,
"context": [
" Ok(models) if !models.is_empty() =\u003e {",
" cache_live_models_for_provider(\u0026config.workspace_dir, \u0026provider_name, \u0026models).await?;",
" println!("
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "setup_provider",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 2939,
"depth": 2,
"context": [
" Ok(live_model_ids) if !live_model_ids.is_empty() =\u003e {",
" cache_live_models_for_provider(",
" workspace_dir,"
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "model_cache_round_trip_returns_fresh_entry",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7565,
"depth": 2,
"context": [
"",
" cache_live_models_for_provider(tmp.path(), \"openai\", \u0026models)",
" .await"
]
},
{
"symbol": "cache_live_models_for_provider",
"caller": "run_models_refresh_uses_fresh_cache_without_network",
"file": "/workspace/src/onboard/wizard.rs",
"rel_path": "src/onboard/wizard.rs",
"line": 7607,
"depth": 2,
"context": [
"",
" cache_live_models_for_provider(tmp.path(), \"openai\", \u0026[\"gpt-5.1\".to_string()])",
" .await"
]
}
],
"version": "0.1"
} |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/memory/audit.rs (1)
131-156: Transaction error handling could be improved.The batch insert uses
execute_batch("BEGIN")andexecute_batch("COMMIT")with errors silently ignored. IfBEGINsucceeds but anINSERTfails, theCOMMITstill executes, potentially persisting partial data. While audit logging shouldn't break the main flow, consider at least logging failures for observability.♻️ Optional: Add basic error logging for observability
let _ = conn.execute_batch("BEGIN"); for entry in results { let namespace = if entry.namespace.is_empty() { request_namespace.unwrap_or("default") } else { entry.namespace.as_str() }; - let _ = conn.execute( + if let Err(e) = conn.execute( "INSERT INTO memory_access_log (memory_id, memory_key, query, score, namespace, session_id, accessed_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7)", params![ entry.id, entry.key, query, entry.score, namespace, session_id, now ], - ); + ) { + tracing::warn!(memory_id = %entry.id, error = %e, "Failed to log access entry"); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/memory/audit.rs` around lines 131 - 156, The transaction currently swallows all errors around conn.execute_batch("BEGIN") / conn.execute(...) / conn.execute_batch("COMMIT"), which can leave partial inserts; modify the loop around results to start a transaction with conn.execute_batch("BEGIN") and check results of BEGIN, then run each conn.execute and if any insert returns Err call conn.execute_batch("ROLLBACK") and log the error (include the failing entry.id/entry.key and error) instead of proceeding to COMMIT; on success call conn.execute_batch("COMMIT") and log success or at least suppress errors only after logging—update code paths referencing conn.execute_batch, conn.execute, memory_access_log, request_namespace, and results accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/memory/audit.rs`:
- Around line 131-156: The transaction currently swallows all errors around
conn.execute_batch("BEGIN") / conn.execute(...) / conn.execute_batch("COMMIT"),
which can leave partial inserts; modify the loop around results to start a
transaction with conn.execute_batch("BEGIN") and check results of BEGIN, then
run each conn.execute and if any insert returns Err call
conn.execute_batch("ROLLBACK") and log the error (include the failing
entry.id/entry.key and error) instead of proceeding to COMMIT; on success call
conn.execute_batch("COMMIT") and log success or at least suppress errors only
after logging—update code paths referencing conn.execute_batch, conn.execute,
memory_access_log, request_namespace, and results accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ddeae0b1-e3ff-4980-ad96-ca2dbeadcd77
📒 Files selected for processing (1)
src/memory/audit.rs
Closes #197
Summary
Implements Phase 1 of ADR-005 (Synaptic Memory Retrieval), laying the data-collection groundwork for temporal association scoring in later phases.
RetrievalPipeline::cache_key()now uses collision-safe length-prefixed encoding and includessince/untiltime-range parametersmemory_access_logtable: Records which specific memory entries are returned by each recall, producing per-memory "spike trains" for Phase 2 temporal scoringAuditedMemoryinto memory factory: All memory backends are now conditionally wrapped with audit logging whenaudit_enabledis setKey changes
docs/architecture/adr-005-synaptic-memory-retrieval.mdsrc/memory/retrieval.rssince/untilsrc/config/schema.rsaccess_tracking_enabledconfig + validationsrc/memory/audit.rssrc/memory/mod.rsAuditedMemoryinto factory viamaybe_wrap_audit()src/onboard/wizard.rssrc/memory/battle_tests.rsDesign decisions
audit.dbrather than a separate database — the existingmemory_audittable tracks operation-level events but not per-result data needed for spike trainsaudit_enabled = trueANDaccess_tracking_enabled = truemaybe_wrap_audit()helper, avoiding the need forimpl Memory for Box<dyn Memory>Test plan
cargo fmt --all -- --check— cleancargo clippy --all-targets -- -D warnings— cleancargo test— all tests passedhttps://claude.ai/code/session_01XpVcqwPDSLcnefFqrJBJdN