Three related bugs were diagnosed on a long-running production daemon (new-plebbit host, bitsocial container, image ghcr.io/bitsocialnet/bitsocial-cli:latest v0.19.49, up 11 days, healthy). Symptoms reported: bitsocial logs is stuck — no new output despite the daemon clearly being alive (RPC reachable, healthcheck green).
Bug 1 — silent 20 MB cap permanently disables file logging
src/cli/commands/daemon.ts lines 121–134:
const logFile = fs.createWriteStream(logFilePath, { flags: "a" });
const isLogFileOverLimit = () => logFile.bytesWritten > 20000000; // 20mb
const writeTimestampedLine = (text: string, stream: "stdout" | "stderr") => {
if (isLogFileOverLimit()) return; // ← drops all further writes for the rest of the daemon's life
...
logFile.write(timestamped);
};
Once logFile.bytesWritten exceeds 20,000,000, every subsequent write is silently dropped for the lifetime of the daemon process. There is no in-process rotation (_getNewLogfileByEvacuatingOldLogsIfNeeded only runs at startup).
On the affected container this happened on 2026-04-25 14:38:48 after ~28 hours of operation, ~10 days ago. The log file is exactly 20,022,081 bytes, daemon PID 7 still holds fd 20 open (pos: 20022081, flags: 02102001). bitsocial logs -f is correctly tailing a file that no longer changes.
What filled it up: a flood of pkc-js:base-client-manager:resolveAuthorNamesInBackground:error debug entries (~1.5 KB each) for failing .eth resolutions.
Fix
Replace the silent hard-cap with front-truncate rotation: when bytesWritten exceeds 20 MB, drop the oldest portion of the file (keep the most recent ~15 MB), reopen the stream in append mode (resetting bytesWritten), and continue writing. Always keep the file around 20 MB max.
Bug 2 — stderr override silently swallows debug output when file logger fails
Same file, lines 157–164:
process.stderr.write = (...args) => {
// Only write stderr to the log file, not to the terminal.
writeTimestampedLine(asString(args[0]).trimStart(), "stderr");
return true;
};
process.stderr.write was overridden to ONLY route to the file logger. After Bug 1 trips, all stderr output from JS code disappears entirely (not in file, not in docker logs) — only errors hand-routed via writeErrorToTerminal (uncaughtException/unhandledRejection handlers) survive because they call the captured original stderrWrite.
Fix
If file write fails or the stream is closed, fall back to writing to the original stderr. With Bug 1 fixed (file rotation always writes), the fallback is mostly defensive, but it ensures we never silently lose error output.
Bug 3 — tcpPortUsed.check rejection in keepKuboUpInterval becomes unhandledRejection
Same file, lines 533–545:
keepKuboUpInterval = setInterval(async () => {
if (mainProcessExited) return;
const isRpcPortTaken = await tcpPortUsed.check(Number(pkcRpcUrl.port), pkcRpcUrl.hostname);
try {
...
} catch (error) {
log.trace(`keepKuboUp error (will retry): ${...}`);
}
await createOrConnectRpc();
}, 5000);
The await tcpPortUsed.check(...) is outside the try/catch. tcp-port-used rejects with anything other than ECONNREFUSED (e.g., ETIMEDOUT). When the connect attempt experiences a transient network hiccup and the kernel SYN retry exhausts (~127s), the async setInterval callback rejects → unhandledRejection. The daemon's handler then prints the bare error to stderr/docker logs:
Error: connect ETIMEDOUT 127.0.0.1:9138
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)
Observed on the affected container: 18 occurrences over ~5 hours on 2026-05-04, then it stopped (transient host networking issue). No daemon crash because the runtime tolerates these — but they pollute logs and indicate a real bug.
Fix
Move await tcpPortUsed.check(...) inside the try/catch. Treat its rejection as "port unknown — retry next tick", same as keepKuboUp errors are already handled.
Diagnostics summary (affected daemon)
- Container:
bitsocial, started 2026-04-24T10:20:58Z, PID namespaces verified
- Log file:
/logs/bitsocial/bitsocial_cli_daemon_2026-04-24T10-21-02.253Z.log — 20,022,081 bytes, last write 2026-04-25T14:38:48Z
- RPC: listening on
[::]:9138 (IPv6 dual-stack, also reachable via 127.0.0.1)
- Process tree: PID 1 tini, PID 7 daemon (still alive), PID 51 kubo, PID 45489 stuck
bitsocial logs -f (correctly tailing a frozen file)
- Healthcheck: passing (uses IPv6-mapped localhost connect)
Plan
- Add regression tests (unit-level, exercising the file-logger module in isolation and the keepKuboUp tick body)
- Fix Bug 1 (rotation)
- Fix Bug 2 (stderr fallback)
- Fix Bug 3 (try/catch around tcpPortUsed.check)
- Build + run full test suite
Three related bugs were diagnosed on a long-running production daemon (
new-plebbithost,bitsocialcontainer, imageghcr.io/bitsocialnet/bitsocial-cli:latestv0.19.49, up 11 days, healthy). Symptoms reported:bitsocial logsis stuck — no new output despite the daemon clearly being alive (RPC reachable, healthcheck green).Bug 1 — silent 20 MB cap permanently disables file logging
src/cli/commands/daemon.tslines 121–134:Once
logFile.bytesWrittenexceeds 20,000,000, every subsequent write is silently dropped for the lifetime of the daemon process. There is no in-process rotation (_getNewLogfileByEvacuatingOldLogsIfNeededonly runs at startup).On the affected container this happened on
2026-04-25 14:38:48after ~28 hours of operation, ~10 days ago. The log file is exactly 20,022,081 bytes, daemon PID 7 still holds fd 20 open (pos: 20022081,flags: 02102001).bitsocial logs -fis correctly tailing a file that no longer changes.What filled it up: a flood of
pkc-js:base-client-manager:resolveAuthorNamesInBackground:errordebug entries (~1.5 KB each) for failing.ethresolutions.Fix
Replace the silent hard-cap with front-truncate rotation: when
bytesWrittenexceeds 20 MB, drop the oldest portion of the file (keep the most recent ~15 MB), reopen the stream in append mode (resettingbytesWritten), and continue writing. Always keep the file around 20 MB max.Bug 2 — stderr override silently swallows debug output when file logger fails
Same file, lines 157–164:
process.stderr.writewas overridden to ONLY route to the file logger. After Bug 1 trips, all stderr output from JS code disappears entirely (not in file, not in docker logs) — only errors hand-routed viawriteErrorToTerminal(uncaughtException/unhandledRejection handlers) survive because they call the captured originalstderrWrite.Fix
If file write fails or the stream is closed, fall back to writing to the original
stderr. With Bug 1 fixed (file rotation always writes), the fallback is mostly defensive, but it ensures we never silently lose error output.Bug 3 — tcpPortUsed.check rejection in keepKuboUpInterval becomes unhandledRejection
Same file, lines 533–545:
The
await tcpPortUsed.check(...)is outside the try/catch.tcp-port-usedrejects with anything other thanECONNREFUSED(e.g.,ETIMEDOUT). When the connect attempt experiences a transient network hiccup and the kernel SYN retry exhausts (~127s), the async setInterval callback rejects →unhandledRejection. The daemon's handler then prints the bare error to stderr/docker logs:Observed on the affected container: 18 occurrences over ~5 hours on 2026-05-04, then it stopped (transient host networking issue). No daemon crash because the runtime tolerates these — but they pollute logs and indicate a real bug.
Fix
Move
await tcpPortUsed.check(...)inside the try/catch. Treat its rejection as "port unknown — retry next tick", same askeepKuboUperrors are already handled.Diagnostics summary (affected daemon)
bitsocial, started 2026-04-24T10:20:58Z, PID namespaces verified/logs/bitsocial/bitsocial_cli_daemon_2026-04-24T10-21-02.253Z.log— 20,022,081 bytes, last write 2026-04-25T14:38:48Z[::]:9138(IPv6 dual-stack, also reachable via 127.0.0.1)bitsocial logs -f(correctly tailing a frozen file)Plan