refactor(go/plugin/framework/functions): redesign manager#21850
refactor(go/plugin/framework/functions): redesign manager#21850ilyam8 merged 24 commits intonetdata:masterfrom
Conversation
There was a problem hiding this comment.
5 issues found across 17 files
Confidence score: 3/5
- Most significant risk:
scheduler.goreturnserrSchedulerQueueFullfor invalid/nil inputs, which can mislead callers and mask real validation failures. - Potential behavior bug: in
scheduler.gothe guarded nil branch incomplete()can leaks.pendingand drop queued items, which could cause incorrect queue state over time. - Score reflects moderate risk from correctness issues in scheduler/queue handling, though not necessarily merge-blocking if edge paths are rare.
- Pay close attention to
src/go/plugin/framework/functions/scheduler.goandsrc/go/plugin/framework/functions/manager.go- error handling and pending-count state consistency.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/go/plugin/framework/functions/scheduler.go">
<violation number="1" location="src/go/plugin/framework/functions/scheduler.go:48">
P2: Wrong error returned for invalid/nil request: `errSchedulerQueueFull` is returned when the request fails input validation (nil req, nil fn, empty UID, or empty scheduleKey). This misrepresents the actual problem — the queue isn't full, the input is invalid. Callers that branch on error identity (e.g., to emit HTTP-like 503 vs 400 responses) will produce misleading rejection reasons.</violation>
<violation number="2" location="src/go/plugin/framework/functions/scheduler.go:168">
P2: If the defensively-guarded nil branch is hit in `complete()`, the popped queue item is silently discarded without decrementing `s.pending`, causing a permanent pending-count leak. Additionally, any remaining items behind it in the lane queue become stranded until a new request for the same key arrives. Consider looping to find the next valid item, or at minimum decrementing `pending` for the skipped entry.</violation>
</file>
<file name="src/go/plugin/framework/functions/README.md">
<violation number="1" location="src/go/plugin/framework/functions/README.md:159">
P3: Typo: extra spaces in `pre-admission`</violation>
<violation number="2" location="src/go/plugin/framework/functions/README.md:175">
P3: Typo: extra spaces in `ctx.Done()`</violation>
</file>
<file name="src/go/plugin/framework/functions/manager.go">
<violation number="1" location="src/go/plugin/framework/functions/manager.go:235">
P2: Error from `scheduler.enqueue` is discarded — the response always says "function queue is full" even when the scheduler is stopping. Consider inspecting the error to return a more accurate message (e.g., 503 "functions manager is stopping" for `errSchedulerStopping`).</violation>
</file>
Architecture diagram
sequenceDiagram
participant Downstream as SD / Job Manager
participant Input as Stdin
participant Parser as Input Parser
participant Mgr as Manager (Dispatcher)
participant Sched as Keyed Scheduler
participant Worker as Worker Pool
participant Finalizer as Terminal Finalizer
participant Stdout as Stdout (Netdata API)
Note over Downstream, Mgr: NEW: Bounded Handoff Flow
Downstream->>Downstream: NEW: BoundedSend(ctx, chan, cap)
alt Channel Busy or Shutdown
Downstream-->>Stdout: NEW: 503 Busy/Shutdown (Direct)
else Success
Downstream->>Mgr: Push function to dyncfg channel
end
Note over Input, Mgr: Command Processing
Input->>Parser: Command line (FUNCTION, CANCEL, QUIT)
Parser->>Parser: CHANGED: Parse to Event model
Parser-->>Mgr: inputEvent (Call, Cancel, Progress, Quit)
alt Event: Call
Mgr->>Mgr: Admission Check (UID, State, Queue)
alt Duplicate UID
Mgr->>Finalizer: NEW: FinalizeTerminal(UID, 409)
else Accepted
Mgr->>Sched: NEW: Enqueue by Route Key
Sched->>Sched: Buffer in Lane (Serialized per Key)
end
else Event: Cancel
Mgr->>Mgr: Identify Transaction state
alt Queued
Mgr->>Sched: NEW: Remove from Lane
Mgr->>Finalizer: NEW: FinalizeTerminal(UID, 499)
else Running
Mgr->>Mgr: NEW: Start Fallback Timer (5s)
Mgr->>Worker: Signal Context Cancel
opt Fallback Timer Fires
Mgr->>Finalizer: NEW: FinalizeTerminal(UID, 499)
end
end
end
Note over Sched, Worker: Execution Flow
Worker->>Sched: next() (wait for ready)
Sched-->>Worker: invocationRequest (Key Free)
Worker->>Worker: CHANGED: Execute Handler
alt Handler Panic
Worker->>Finalizer: NEW: FinalizeTerminal(UID, 500)
else Handler Success
Worker->>Finalizer: FinalizeTerminal(UID, Result)
end
Note over Finalizer, Stdout: Single-Winner Finalization
Finalizer->>Finalizer: NEW: Check if UID tombstoned
alt First Terminal Result
Finalizer->>Finalizer: Set Tombstone (60s)
Finalizer->>Stdout: FUNCRESULT (stdout)
Finalizer->>Sched: NEW: Release Key (Promote Next)
else Late/Duplicate Result
Finalizer->>Finalizer: Drop result (silently)
end
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
2 issues found across 15 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/go/plugin/framework/functions/manager_flow_test.go">
<violation number="1" location="src/go/plugin/framework/functions/manager_flow_test.go:373">
P2: Goroutine leak: `block` channel is never closed</violation>
</file>
<file name="src/go/plugin/framework/functions/parser.go">
<violation number="1" location="src/go/plugin/framework/functions/parser.go:143">
P2: The narrowed condition correctly prevents payload data like "FUNCTIONAL" from incorrectly aborting payloads, but it swallows `QUIT` and unknown `FUNCTION_...` commands.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Pull request overview
This PR is a comprehensive redesign of the Go plugin framework's functions manager. It introduces a route-keyed scheduler with a worker pool, single-winner terminal finalization (tombstone-guarded), and cancellation semantics (queued/running/awaiting states). It also hardens shutdown with a bounded drain path, adds overlapping prefix rejection, and wires the manager-backed finalizer into dyncfg responders (Job Manager, Service Discovery).
Changes:
- New functions manager architecture: keyed scheduler (
scheduler.go), invocation state FSM, terminal finalizer with tombstone deduplication, and worker pool with cancel/panic handling - Parser extended with
FUNCTION_CANCEL,FUNCTION_PROGRESS,QUITevent types and pre-admission cancel detection in payload mode - Dyncfg responders now use a
TerminalFinalizerinterface and bounded downstream handoff (BoundedSend)
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
functions/manager.go |
Core redesign: invocation state FSM, tryFinalize, shutdown drain, scheduler wiring |
functions/manager_worker.go |
New: worker pool goroutine with panic recovery, cancel skip, awaiting-result transition |
functions/scheduler.go |
New: key-serialized scheduler with pending budget, lane promotion, and stop/drain semantics |
functions/scheduler_test.go |
New: unit tests for lane advancement, pending counters, stop paths, and queue recovery |
functions/parser.go |
Extended: cancel/progress/quit event parsing; pre-admission cancel in payload mode |
functions/parser_test.go |
New: comprehensive parser event tests including payload-mode cancel and progress |
functions/response_payload.go |
New: shared BuildJSONPayload helper extracted from duplicated code |
functions/response_payload_test.go |
New: tests for JSON payload structure |
functions/input.go |
Input interface narrowed to <-chan string; EOF closes channel |
functions/finalizer.go |
New: TerminalFinalizer type and DirectTerminalFinalizer bypass |
functions/ext.go |
Overlapping prefix rejection at registration time |
functions/manager_test.go |
Updated: unique UIDs per function, multi-worker profiles, thread-safe mock |
functions/manager_flow_test.go |
New: integration flow tests for cancellation, shutdown, concurrency, and duplicates |
functions/manager_snapshot_test.go |
Updated: consolidated and extended snapshot tests including overlap rejection |
functions/README.md |
New: architecture documentation with flow diagram |
dyncfg/responder.go |
Wired TerminalFinalizer into all response emit paths |
dyncfg/handoff.go |
New: generic BoundedSend with deadline-capped timeout |
dyncfg/handoff_test.go |
New: tests for BoundedSend edge cases |
jobmgr/manager.go |
Finalizer propagation from FnReg and SetDyncfgResponder |
jobmgr/dyncfg_handoff.go |
New: bounded enqueue with shutting-down vs busy distinction |
jobmgr/dyncfg_vnode.go |
Replaced unbounded select with enqueueDyncfgFunction |
jobmgr/dyncfg_collector.go |
Replaced unbounded select with enqueueDyncfgFunction |
sd/sd.go |
Finalizer propagation and SetDyncfgResponder update |
sd/dyncfg_handoff.go |
New: bounded enqueue with shutting-down vs busy distinction |
sd/dyncfg.go |
Replaced unbounded select with enqueueDyncfgFunction |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
Test Plan
Additional Information
For users: How does this change affect me?
Summary by cubic
Redesigned the Go functions manager with a route-key scheduler, worker pool, single-winner finalization, and runtime metrics so runs are serialized per route, cancellable, observable, and emit one terminal result. Hardened shutdown and registration, fixed payload-mode QUIT/control handling, and wired bounded dyncfg handoffs with manager-backed finalization in Job Manager and Service Discovery; runtime metrics are exported via the runtime service and chartengine metrics were re-namespaced.
New Features
Bug Fixes
Written for commit 915a135. Summary will update on new commits.