Skip to content

Commit eea0f27

Browse files
csfet9claude
andauthored
feat: Add local LLM improvements for reasoning models and Docker startup (#88)
* feat: Add local LLM improvements for reasoning models and Docker startup ## Reasoning Model Support - Strip thinking tags from local LLM responses (<think>, <thinking>, <reasoning>, |startthink|/|endthink|) - Enables Qwen3, DeepSeek, and other reasoning models to work with JSON extraction - Non-breaking: only affects responses that contain thinking tags ## Docker Retry Start Script - New retry-start.sh waits for dependencies before starting Hindsight - Checks LLM Studio availability at /v1/models endpoint - Checks database connectivity (skipped for embedded pg0) - Configurable via HINDSIGHT_RETRY_MAX and HINDSIGHT_RETRY_INTERVAL env vars - Prevents startup failures when LLM Studio isn't ready yet Tested on Apple Silicon M4 Max with Qwen3 8B via LM Studio. * refactor: make thinking token stripping opt-in via env var * refactor: merge retry logic into start-all.sh (opt-in via HINDSIGHT_WAIT_FOR_DEPS) * fix: resolve pg0 stale instance config in Docker build - Remove stale pg0 instance data after pre-caching binaries to avoid port conflicts (was using hardcoded port 5555 from build time) - Remove unused cache copy logic from start-all.sh - Add database backup instructions to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 964537f commit eea0f27

5 files changed

Lines changed: 108 additions & 17 deletions

File tree

CLAUDE.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,27 @@ PostgreSQL with pgvector. Schema managed via Alembic migrations in `hindsight-ap
8484

8585
Key tables: `banks`, `memory_units`, `documents`, `entities`, `entity_links`
8686

87+
### Database Backups (IMPORTANT)
88+
**Before any operation that may affect the database, run a backup:**
89+
```bash
90+
docker exec hindsight /backups/backup.sh
91+
```
92+
93+
Operations requiring backup:
94+
- Running database migrations
95+
- Modifying Alembic migration files
96+
- Rebuilding Docker images
97+
- Resetting or recreating containers
98+
- Any schema changes
99+
- Bulk data operations
100+
101+
Backups are stored in `~/hindsight-backups/` on the host.
102+
103+
To restore:
104+
```bash
105+
docker exec -it hindsight /backups/restore.sh <backup-file.sql.gz>
106+
```
107+
87108
## Key Conventions
88109

89110
### Memory Banks

docker/standalone/Dockerfile

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -157,9 +157,7 @@ USER hindsight
157157
# Set PATH for hindsight user
158158
ENV PATH="/app/api/.venv/bin:${PATH}"
159159

160-
# Pre-cache PostgreSQL binaries by starting/stopping pg0-embedded
161-
ENV PG0_HOME=/home/hindsight/.pg0-cache
162-
160+
# pg0 will download PostgreSQL binaries on first run
163161
ENV PG0_HOME=/home/hindsight/.pg0
164162

165163
# Pre-download ML models to avoid runtime download (conditional)
@@ -272,16 +270,17 @@ USER hindsight
272270
ENV PATH="/app/api/.venv/bin:${PATH}"
273271

274272
# Pre-cache PostgreSQL binaries by starting/stopping pg0-embedded
275-
ENV PG0_HOME=/home/hindsight/.pg0-cache
273+
# Note: We use a temp instance just to download binaries, then delete instance data
274+
# to avoid stale port config. Only installation binaries are kept.
275+
ENV PG0_HOME=/home/hindsight/.pg0
276276
RUN /app/api/.venv/bin/python -c "\
277277
from pg0 import Pg0; \
278278
print('Pre-caching PostgreSQL binaries...'); \
279-
pg = Pg0(name='hindsight', port=5555, username='hindsight', password='hindsight', database='hindsight'); \
279+
pg = Pg0(name='temp-cache', username='hindsight', password='hindsight', database='hindsight'); \
280280
pg.start(); \
281281
pg.stop(); \
282-
print('PostgreSQL pre-cached to PG0_HOME')" || echo "Pre-download skipped"
283-
284-
ENV PG0_HOME=/home/hindsight/.pg0
282+
print('PostgreSQL binaries cached')" && \
283+
rm -rf /home/hindsight/.pg0/instances || echo "Pre-download skipped"
285284

286285
# Pre-download ML models to avoid runtime download (conditional)
287286
ARG PRELOAD_ML_MODELS

docker/standalone/start-all.sh

Lines changed: 63 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,70 @@ set -e
55
ENABLE_API="${HINDSIGHT_ENABLE_API:-true}"
66
ENABLE_CP="${HINDSIGHT_ENABLE_CP:-true}"
77

8-
# Copy pre-cached PostgreSQL data if runtime directory is empty (first run with volume)
9-
if [ "$ENABLE_API" = "true" ]; then
10-
PG0_CACHE="/home/hindsight/.pg0-cache"
11-
PG0_HOME="/home/hindsight/.pg0"
12-
if [ -d "$PG0_CACHE" ] && [ "$(ls -A $PG0_CACHE 2>/dev/null)" ]; then
13-
if [ ! "$(ls -A $PG0_HOME 2>/dev/null)" ]; then
14-
echo "📦 Copying pre-cached PostgreSQL data..."
15-
cp -r "$PG0_CACHE"/* "$PG0_HOME"/ 2>/dev/null || true
16-
fi
8+
# =============================================================================
9+
# Dependency waiting (opt-in via HINDSIGHT_WAIT_FOR_DEPS=true)
10+
#
11+
# Problem: When running with LM Studio, the LLM may take time to load models.
12+
# If Hindsight starts before LM Studio is ready, it fails on LLM verification.
13+
# This wait loop ensures dependencies are ready before starting.
14+
# =============================================================================
15+
if [ "${HINDSIGHT_WAIT_FOR_DEPS:-false}" = "true" ]; then
16+
LLM_BASE_URL="${HINDSIGHT_API_LLM_BASE_URL:-http://host.docker.internal:1234/v1}"
17+
MAX_RETRIES="${HINDSIGHT_RETRY_MAX:-0}" # 0 = infinite
18+
RETRY_INTERVAL="${HINDSIGHT_RETRY_INTERVAL:-10}"
19+
20+
# Check if external database is configured (skip check for embedded pg0)
21+
SKIP_DB_CHECK=false
22+
if [ -z "${HINDSIGHT_API_DATABASE_URL}" ]; then
23+
SKIP_DB_CHECK=true
24+
else
25+
DB_CHECK_HOST=$(echo "$HINDSIGHT_API_DATABASE_URL" | sed -E 's|.*@([^:/]+):([0-9]+)/.*|\1 \2|')
1726
fi
27+
28+
check_db() {
29+
if $SKIP_DB_CHECK; then
30+
return 0
31+
fi
32+
if command -v pg_isready &> /dev/null; then
33+
pg_isready -h $(echo $DB_CHECK_HOST | cut -d' ' -f1) -p $(echo $DB_CHECK_HOST | cut -d' ' -f2) &>/dev/null
34+
else
35+
python3 -c "import socket; s=socket.socket(); s.settimeout(5); exit(0 if s.connect_ex(('$(echo $DB_CHECK_HOST | cut -d' ' -f1)', $(echo $DB_CHECK_HOST | cut -d' ' -f2))) == 0 else 1)" 2>/dev/null
36+
fi
37+
}
38+
39+
check_llm() {
40+
curl -sf "${LLM_BASE_URL}/models" --connect-timeout 5 &>/dev/null
41+
}
42+
43+
echo "⏳ Waiting for dependencies to be ready..."
44+
attempt=1
45+
46+
while true; do
47+
db_ok=false
48+
llm_ok=false
49+
50+
if check_db; then
51+
db_ok=true
52+
fi
53+
54+
if check_llm; then
55+
llm_ok=true
56+
fi
57+
58+
if $db_ok && $llm_ok; then
59+
echo "✅ Dependencies ready!"
60+
break
61+
fi
62+
63+
if [ "$MAX_RETRIES" -ne 0 ] && [ "$attempt" -ge "$MAX_RETRIES" ]; then
64+
echo "❌ Max retries ($MAX_RETRIES) reached. Dependencies not available."
65+
exit 1
66+
fi
67+
68+
echo " Attempt $attempt: DB=$( $db_ok && echo 'ok' || echo 'waiting' ), LLM=$( $llm_ok && echo 'ok' || echo 'waiting' )"
69+
sleep "$RETRY_INTERVAL"
70+
((attempt++))
71+
done
1872
fi
1973

2074
# Track PIDs for wait

hindsight-api/hindsight_api/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
ENV_LLM_BASE_URL = "HINDSIGHT_API_LLM_BASE_URL"
1919
ENV_LLM_MAX_CONCURRENT = "HINDSIGHT_API_LLM_MAX_CONCURRENT"
2020
ENV_LLM_TIMEOUT = "HINDSIGHT_API_LLM_TIMEOUT"
21+
ENV_LLM_STRIP_THINKING = "HINDSIGHT_API_LLM_STRIP_THINKING"
2122

2223
ENV_EMBEDDINGS_PROVIDER = "HINDSIGHT_API_EMBEDDINGS_PROVIDER"
2324
ENV_EMBEDDINGS_LOCAL_MODEL = "HINDSIGHT_API_EMBEDDINGS_LOCAL_MODEL"

hindsight-api/hindsight_api/engine/llm_wrapper.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import json
77
import logging
88
import os
9+
import re
910
import time
1011
from typing import Any
1112

@@ -19,6 +20,7 @@
1920
DEFAULT_LLM_MAX_CONCURRENT,
2021
DEFAULT_LLM_TIMEOUT,
2122
ENV_LLM_MAX_CONCURRENT,
23+
ENV_LLM_STRIP_THINKING,
2224
ENV_LLM_TIMEOUT,
2325
)
2426

@@ -310,6 +312,20 @@ async def call(
310312

311313
content = response.choices[0].message.content
312314

315+
# Strip reasoning model thinking tags when enabled (opt-in for local LLMs)
316+
# Supports: <think>, <thinking>, <reasoning>, |startthink|/|endthink|
317+
# Enable with HINDSIGHT_API_LLM_STRIP_THINKING=true for reasoning models
318+
# that embed thinking in their output (e.g., Qwen3, DeepSeek on LM Studio)
319+
if content and os.getenv(ENV_LLM_STRIP_THINKING, "false").lower() == "true":
320+
original_len = len(content)
321+
content = re.sub(r"<think>.*?</think>", "", content, flags=re.DOTALL)
322+
content = re.sub(r"<thinking>.*?</thinking>", "", content, flags=re.DOTALL)
323+
content = re.sub(r"<reasoning>.*?</reasoning>", "", content, flags=re.DOTALL)
324+
content = re.sub(r"\|startthink\|.*?\|endthink\|", "", content, flags=re.DOTALL)
325+
content = content.strip()
326+
if len(content) < original_len:
327+
logger.debug(f"Stripped {original_len - len(content)} chars of reasoning tokens")
328+
313329
# For local models, they may wrap JSON in markdown code blocks
314330
if self.provider in ("lmstudio", "ollama"):
315331
clean_content = content

0 commit comments

Comments
 (0)