Conversation
Replace flat resource slice with a directed acyclic graph (heimdalr/dag). Resources execute in topological layer order: dependencies complete before dependents start, resources within the same layer run concurrently. Add DependsOn field to all Opts structs for explicit dependency declaration. Add RunPlanDAG/RunApplyDAG engine functions that iterate by topological layer. Ref #4
Add Watcher and Poller optional interfaces to extensions.Extension. Resources implementing Watcher get OS-level event notifications, others fall back to polling at configurable intervals. New daemon package (internal/daemon) runs initial convergence then starts per-resource Watch/poll goroutines with a central event loop. New CLI: converge watch <blueprint> [--once] Ref #4
Automatically detect dependency relationships between resources: - service:X depends on package:X (name match) - file:/a/b/c depends on file:/a/b (parent directory) - service:X depends on file paths containing the service name Edges that would create cycles are silently skipped. Wire auto-edges into BuildGraph, RunPlan, and RunApply paths. Ref #4
Resources that fail to converge are retried with exponential backoff (baseDelay * 2^retryCount, capped at 5 minutes). After --max-retries (default 3), the resource is marked noncompliant and logged as a warning. Watching continues: new external events reset the retry count. Add Daemon.Status() for querying per-resource compliance state. Add --max-retries flag to converge watch command. Ref #4
Coalescer collapses multiple rapid events for the same resource into a single CheckApply after a configurable window (default 500ms). Per-resource rate limiter (golang.org/x/time/rate) prevents flapping resources from consuming excessive CPU. Ref #4
converge serve is cleaner and more descriptive for a persistent daemon mode than converge watch. Ref #4
Implement extensions.Watcher for the File resource on Linux using inotify via golang.org/x/sys/unix. Watches both the file and its parent directory to detect creation, modification, deletion, and attribute changes. Uses epoll for interruptible reads. This is the reference implementation for all platform-specific watchers. Other resources fall back to polling until their native watchers are implemented. Ref #4
Move all magic exit code numbers to a single package. Follows Puppet/Chef convention: 0=ok, 1=error, 2=changed, 3=partial fail, 4=all failed, 5=pending. Ref #4
Update all documentation to reflect the new event-driven DAG daemon: - README: converge serve replaces converge apply as primary command - design.md: DAG engine, auto-edges, daemon mode, and retry/backoff mermaid diagrams for DAG layers, daemon lifecycle, and plan flow - cli.md: converge serve command, --once, --max-retries flags, centralized exit code reference Ref #4
Native OS event watchers: - File: kqueue (macOS), ReadDirectoryChangesW (Windows) - Service: godbus/dbus PropertiesChanged (Linux), NotifyServiceStatusChange (Windows), poll (macOS) - Registry: RegNotifyChangeKeyValue (Windows) - Sysctl: inotify on /proc/sys/ (Linux) - Plist: kqueue on plist file (macOS) Poll-only watchers (no native OS events): - Package: 5m, Exec: 30s, User: 60s, Firewall: 30s - AuditPolicy: 60s, SecurityPolicy: 60s Remove dead code: - converge apply CLI command (replaced by converge serve) - RunApply, CheckDuplicates flat-list engine functions - isRoot from dsl/app.go Update goreleaser to produce .deb packages via nfpms. Rewrite engine tests to use DAG functions. Add BuildGraph auto-edge test. Ref #4
CRITICAL: - Remove stale converge apply from cli.md, examples.md, terminal output - Fix handleFailure goroutine leak (add ctx.Done select) HIGH: - Restore root privilege check in serve command - Make event loop concurrent with per-resource processing lock - Wire event reason constants (reasonPoll, reasonRetry) - Fix inotify watch re-establishment after IN_DELETE_SELF - Fix kqueue watch re-establishment after NOTE_DELETE/RENAME - Fix sysctl path traversal (validate key, filepath.Clean) - Add bounds checking to inotify/ReadDirectoryChangesW unsafe parsing - Replace Windows service SCM watcher with polling (APC incompatible with Go) - Fix D-Bus AddMatch error check, filter PropertiesChanged by ActiveState - Fix autoedge false positives for short service names (min 3 chars, path component match) - Update Security Model table, README features table, DependsOn docs MEDIUM: - Fix ReadDirectoryChangesW overlapped re-issue - Add parent dir fallback to plist watcher - Skip polling for noncompliant resources - Only recover string panics in runBlueprint - Propagate initial convergence error from Run - Validate MaxRetries > 0 - Add Graph.Flatten() to deduplicate layer flattening - Add Watcher/Poller docs to extending.md - Fix "No implicit behavior" to "No implicit mutations" LOW: - Remove unused AllExtensions() - Remove unused serviceNotify constants (Windows service watcher removed) Ref #4
Linux: systemd unit (converge.service) macOS: launchd plist (com.tseknet.converge.plist) Windows: MSI ServiceInstall/ServiceControl in WiX .deb postinst enables and starts the systemd service. .pkg postinstall bootstraps the launchd daemon. MSI registers and starts the Windows service via SCM. Packages handle upgrades: stop old service, install new binary, restart. Ref #4
"baseline" is the standard term in CIS/STIG and config management for the minimum configuration every managed host must have. Rename across: blueprint function, registration, all docs, service files, VHS demo, and CLI examples. Ref #4
1. Wire coalescer + rate limiter into daemon event loop Coalescer deduplicates burst events per resource (500ms window). Per-resource rate limiter (x/time/rate) throttles watch/poll events. Retry events bypass both (they have their own backoff). 2. Extract retry state machine into internal/daemon/retry.go retryManager owns all per-resource state, shouldProcess, reset, recordFailure, isNoncompliant. Daemon is now the event loop coordinator, not a god object. 3. Replace panics with error accumulation in DSL Run.err accumulates the first error. Blueprint functions no longer panic on duplicate resources, missing dependencies, or empty fields. BuildGraph checks run.Err() after execution. Stack traces preserved for genuine runtime panics. 4. Typed events (extensions.EventKind) replace string routing EventWatch, EventPoll, EventRetry are compile-time-checked constants. Event.Reason -> Event.Kind (typed) + Event.Detail (human-readable). No more string comparisons for event routing in the daemon. 5. Auto-edge serviceToConfigFile path component matching (from review) Already applied in previous commit. 6. CoalesceWindow configurable via Options for testing Ref #4
Replace O(V^2) GetParents-per-node query with incremental in-degree tracking during AddEdge. TopologicalLayers now runs in O(V+E) using pre-computed adjacency lists. Benchmarks at 2000 nodes: - Linear chain (worst case): 0.48ms, 364KB, 4021 allocs - Wide (10 layers x 200): 0.36ms, 261KB, 103 allocs Ref #4
Replace the heimdalr/dag wrapper with a self-contained DAG using: - Incremental in-degree + adjacency lists (O(V+E) topological sort) - DFS cycle detection on AddEdge via transitive reachability check - Insertion-order tracking for deterministic iteration Removes 3 transitive dependencies (heimdalr/dag, emirpasic/gods, google/uuid). Same benchmark performance, simpler code, no wrapper. Ref #4
Convert all tests to table-driven with t.Parallel() on every subtest. Removed fluff tests that just verify constants or stdlib behavior: - TestResourceState, TestServiceState (string constants) - TestDefaultOptions (struct literal) - TestWithTimeout (context.WithTimeout) - TestIsCritical (type assertion) - TestNodes, TestOrderedExtensions (map length) - TestApp_Version (version \!= "") Consolidated related tests into table-driven groups: - graph: TestAddNode (2 cases), TestAddEdge (4 cases), TestTopologicalLayers (4 cases) - autoedge: TestAddAutoEdges (7 cases) - dsl: TestRun_Include (2 cases), TestRun_Firewall (3 cases) - app: TestApp_RunPlan (3 cases), TestApp_BuildGraph (2 cases) Added t.Helper() to all test helper functions. Ref #4
Replace bare bool fields with atomic.Bool in mockExt and mockTransientFailExt. The inSync field is read by Check() in daemon goroutines and written by test goroutines, causing races under -race. Ref #4
CRITICAL (#1): - Fix kqueue fd leak in darwin file/plist watchers: explicit fd management instead of defer capturing stale fd HIGH (#2-3): Shared watcher multiplexer - New internal/watch/inotify_linux.go: single inotify+epoll fd for all file and sysctl watchers. Prevents hitting inotify_max_user_instances (128) at 2000+ resources. 5 tests. - File and sysctl watchers refactored to use shared multiplexer HIGH (#4-6): Graph scaling - AddEdge is now O(1) with lazy cycle detection via TopologicalLayers - Duplicate edges silently deduplicated via edge set - Auto-edge serviceToConfigFile uses exact config extension matching - WouldCycle() BFS for auto-edge cycle avoidance HIGH (#7-8): Daemon correctness - Default Timeout to 5m when unset (prevents instant context expiry) - Nil checks in retryManager for unknown resource IDs HIGH (#9): DSL simplification - Extract r.require() helper, cutting ~50 lines of boilerplate HIGH (#10): Watcher dedup (via shared multiplexer above) HIGH (#11): Unsafe pointer bounds - Use unsafe.Offsetof for Windows FILE_NOTIFY_INFORMATION headerSize HIGH (#12): DAG-aware drift remediation - After successful Apply, schedule Check for dependent resources via Children() MEDIUM (#13-20): Simplification + security - Remove dead Nodes() allocation - Systemd: NoNewPrivileges=yes, remove ProtectSystem=full - eventMeta stores EventKind not full Event - Remove retryManager.mu (states map is write-once) - ResourceMeta struct embedded in all Opts (DependsOn+Critical consolidated) - Error accumulation: []error with errors.Join, not single error - Move isRoot() to internal/platform/root.go - Registry watcher: re-register before sending event LOW (#21-25): Tests, docs, minor - Cycle detection test via TopologicalLayers - Log dropped coalescer events - Document Event struct and EventKind in extending.md - Document default blueprint in Service Installation - Rename coal -> coalescer Ref #4
All platforms now use shared watcher multiplexers: Linux: - internal/watch/inotify_linux.go: one inotify fd for all file+sysctl watchers - internal/watch/dbus_linux.go: one dbus connection for all service watchers macOS: - internal/watch/kqueue_darwin.go: one kqueue fd for all file+plist watchers Windows: - ReadDirectoryChangesW is already directory-scoped (one handle per dir) - Service watcher uses polling (SCM notify incompatible with Go scheduler) This prevents hitting OS limits at 2000+ resources: - Linux: inotify_max_user_instances (128), dbus max-connections (256) - macOS: per-process fd limits Ref #4
README: lead with "Event-driven DAG daemon" tagline. Comparison table adds drift detection latency (<1s vs ~30min cron). Features table reordered with DAG and daemon first. Cross-platform quick start. design.md: new "DAG + Event-Driven Difference" section. OS event mechanism table per platform. DAG-aware re-convergence explained. Updated Lessons from Chef with blind spot and propagation rows. examples.md: cross-platform blueprint examples (Linux, macOS, Windows). DependsOn section with three-layer DAG. Daemon mode usage with --once. Rename extending.md -> extensions.md for clarity. Ref #4
…eout AutoGrouping: - Batch package installs into single transaction (apt install git curl neovim) - All 9 package managers implement BatchInstaller (InstallBatch/RemoveBatch) - PackageGroup in internal/engine/autogroup.go replaces individual packages in each topological layer where they share manager + state - AutoGroup=false in ResourceMeta disables grouping per resource Per-resource meta overrides (NodeMeta on graph nodes): - Noop: skip Apply, only Check (per-resource dry-run) - Retry: per-resource max retries (overrides daemon default) - Limit: per-resource rate limit (0 = use daemon default) - AutoEdge: disable auto-edges for specific resources - AutoGroup: disable auto-grouping for specific resources Watcher restart on failure: - Watchers that fail (e.g., inotify max watches) now restart with exponential backoff (1s, 2s, 4s... capped at 5m) instead of dying permanently Converged timeout (--converged-timeout): - Exit after system is stable for N seconds (e.g., --converged-timeout 60s) - Useful for Packer image builds and CI idempotency validation - Tracks last change timestamp, exits when no Apply changes for the duration Ref #4
…ource-timeout --timeout on serve means "exit after stable for N seconds" (the intuitive meaning for a daemon). --resource-timeout is the per-resource Check/Apply deadline. Updated all docs. Ref #4
--timeout 1s replaces --once (converge and exit after 1s of stability). --timeout 0 (default) runs forever. One flag, one concept. Remove Once field from daemon Options. Update all docs and tests. Ref #4
Winget Install/Remove/IsInstalled now use --id instead of positional name argument, preventing "multiple packages found" errors (e.g., git matching Git.Git and Git.Git.PreRelease). Baseline blueprint uses winget IDs on Windows: Git.Git, cURL.cURL, Neovim.Neovim. Ref #4
- Add divider line after banner (before resources start) - Stream apply output as each resource completes (no buffering) - Show field-level diffs in apply mode (content: old → new, mode: 0644) - Fix spinner indentation to align with result checkmarks - Carry Check() state.Changes through to Result for display Ref #4
All three output formats (terminal, serial, JSON) now: - Show only nonzero counts in summary (no "0 ok") - Show field-level diffs in apply mode (content: old -> new) - Use consistent 2-space indentation for resources - Include divider after banner JSON: omitempty on zero summary counts, Changes in apply results. Serial: streaming diffs, nonzero-only summary, aligned indentation. Ref #4
Demo now shows two commands: 1. converge plan baseline: field-level diffs with +/~ symbols 2. converge serve baseline --timeout 1s: streaming apply with diffs and timing Regenerate GIF: vhs assets/demo.tape Ref #4
In daemon mode (no --timeout), the initial convergence summary was confusing: it looked like the daemon was done. Now it shows "WATCHING drift detection active" instead. With --timeout, the normal APPLY summary still prints on exit. Ref #4
COM vtable approach crashed (access violation on INetFwRule property setters due to vtable offset mismatch). Reverted to registry-based approach with improved notification: 1. Try SERVICE_CONTROL_PARAMCHANGE (works on most Windows versions) 2. Fallback: stop/start mpssvc service to force full registry reload 3. Rules persist in registry regardless, take effect on next boot Ref #4
…restart The rule had PrimaryStatus=Error because the registry format was missing the Profile field. Added Profile=Public, Private, Domain. Reverted the stop/start mpssvc approach (destructive). PARAMCHANGE is sufficient when the rule format is correct. Ref #4
…s Windows format)
Replace direct registry writes with the Windows Firewall COM API
(HNetCfg.FwPolicy2 / HNetCfg.FWRule) via go-ole IDispatch.
Rules take effect immediately, no service notification needed.
Proper COM lifecycle: CoInitializeEx, LockOSThread, Release.
Check reads rule properties via GetProperty for drift detection.
Apply creates via CreateObject("HNetCfg.FWRule") + Rules.Add.
New dependency: github.com/go-ole/go-ole v1.3.0
Ref #4
Replace 5s polling with WMI __InstanceModificationEvent subscription for Win32_Service. Detects service state changes in ~1 second via ExecNotificationQuery/NextEvent COM calls. Falls back to 5s polling if WMI is unavailable (e.g., restricted environments). Now go-ole is used for both firewall (HNetCfg.FwPolicy2) and service (WbemScripting.SWbemLocator) on Windows. Ref #4
feat: Add w32time service to Windows baseline Winget exit codes are inconsistent across versions. Now checks output for the package ID string and "No installed package found" regardless of exit code. Fixes false drift detection on installed packages. Added Windows Time service (w32time) to baseline for testing service management on Windows. Ref #4
Replace exec.Command("net user/localgroup") with native Win32 API:
- NetUserAdd (netapi32.dll) for account creation
- NetLocalGroupAddMembers for group membership
Replace 60s user poll with WMI __InstanceModificationEvent on
Win32_UserAccount for instant account change detection.
Falls back to 60s polling if WMI unavailable.
Ref #4
Baseline uses "ssh" on Ubuntu/Debian, "sshd" on RHEL/Fedora. Test script detects service name, skips service/firewall tests gracefully when systemd or nftables are unavailable (WSL, containers). Ref #4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
converge serve <blueprint>runs as a persistent serviceinternal/exit/--onceflag for CI/Packer (converge once, exit)New packages
internal/graph/internal/graph/autoedge/internal/daemon/internal/exit/internal/daemon/coalesce.goextensions/file/watch_linux.goNew dependencies
github.com/heimdalr/dag(thread-safe DAG)golang.org/x/time(rate limiter)Test plan
go test ./...passes (34 test files, including 11 new graph tests, 7 daemon tests, 7 auto-edge tests, 3 coalescer tests, 3 inotify watcher tests)go vet ./...cleango build ./...compilesconverge serve workstationstarts daemon, touch managed file, observe re-convergenceconverge serve workstation --onceexits after initial convergenceRef #4