Skip to content

feat: Add iPXE script template and new variant of operating system#293

Open
pbreton wants to merge 2 commits intomainfrom
feat/ipxe-os
Open

feat: Add iPXE script template and new variant of operating system#293
pbreton wants to merge 2 commits intomainfrom
feat/ipxe-os

Conversation

@pbreton
Copy link
Copy Markdown
Contributor

@pbreton pbreton commented Mar 20, 2026

Description

Introduce a new variant of Operating System based on iPXE templates
Ensure synchronisation with 'core'.

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • RLA - RLA service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Breaking Changes: none expected.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • ✅ Review completed - (🔄 Check again to review again)

Walkthrough

Adds end-to-end iPXE template support: REST read endpoints and API model, DB schema/DAO and migrations, site-agent manager hooks, Temporal workflows/activities for inventory discovery and reconciliation, and extensive OperatingSystem handler/model changes for iPXE, scope, and ownership semantics.

Changes

Cohort / File(s) Summary
API: iPXE handlers & routing
api/pkg/api/handler/ipxetemplate.go, api/pkg/api/handler/ipxetemplate_test.go, api/pkg/api/routes.go, api/pkg/api/routes_test.go
New GET handlers to list/fetch iPXE templates with provider/tenant auth checks, site/scoped filtering, pagination, and integration tests; routes registered and route-count test updated.
API: OperatingSystem & Site handlers
api/pkg/api/handler/operatingsystem.go, api/pkg/api/handler/operatingsystem_test.go, api/pkg/api/handler/operatingsystem_ownership_test.go, api/pkg/api/handler/site.go, api/pkg/api/handler/site_test.go
Expanded OS create/read/update/delete to support iPXE templates/parameters/artifacts, scope-aware site associations, ownership resolution (tenant vs provider), broadened role checks, altered workflow selection, and site creation triggering per-OS workflows; tests added/adjusted.
API models & util
api/pkg/api/model/ipxetemplate.go, api/pkg/api/model/operatingsystem.go, api/pkg/api/handler/util/common/common.go
Added API model APIIpxeTemplate; extended OperatingSystem API types/validation for iPXE fields and scopes; removed GetSiteOperatingSystemtID helper.
DB: iPXE model, DAO, migration & tests
db/pkg/db/model/ipxetemplate.go, db/pkg/db/model/ipxetemplate_test.go, db/pkg/migrations/20260306120000_ipxe_template.go
New Bun model IpxeTemplate with array fields and scope, DAO implementation (Create/Update/Delete/Get/GetAll with filters & pagination), SQL tests, and migration creating table, unique (site_id,name) constraint and indexes.
DB: OperatingSystem model & migrations
db/pkg/db/model/operatingsystem.go, db/pkg/db/model/operatingsystem_test.go, db/pkg/migrations/20260310100000_os_definition.go, db/pkg/migrations/20260320100000_os_scope.go
Added iPXE fields (template name, parameters, artifacts, definition hash, scope), optional caller-provided OS ID, removed controller OS id usage; adjusted GetAll filters to include provider IDs/scopes; tests and migrations updated.
DB: OS site association & migration
db/pkg/db/model/operatingsystemsiteassociation.go, db/pkg/migrations/20260321100000_ossa_core_status.go
Added nullable core_status column and update input/DAO handling; migration to add core_status.
Site-agent: iPXE manager & wiring
site-agent/pkg/components/managers/ipxetemplate/*, site-agent/pkg/components/managers/manager.go, site-agent/pkg/components/managers/manageraccess.go, site-agent/pkg/components/managers/managerapi/*, site-agent/pkg/components/managers/workflow/orchestrator.go, site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
New iPXE manager API, constructor, Manager accessor, cron/publisher/subscriber hooks, manager registration and workflow state field; integrated into manager orchestrator.
Site-agent: OperatingSystem manager updates
site-agent/pkg/components/managers/operatingsystem/*
Expanded cron/publisher/subscriber registration to include OperatingSystem inventory workflows/activities alongside OsImage processing.
Site-workflow: activities, workflows & mocks
site-workflow/pkg/activity/ipxetemplate.go, site-workflow/pkg/activity/ipxetemplate_test.go, site-workflow/pkg/activity/operatingsystem.go, site-workflow/pkg/activity/operatingsystem_test.go, site-workflow/pkg/grpc/client/testing.go
Activities to query Site Controller (Carbide) for templates/OS defs, publish inventories to Temporal, add on-site wrappers for OS CRUD, and unit tests; mocks extended for Forge RPCs.
Workflow: registrations & reconciliation
workflow/cmd/workflow/main.go, workflow/pkg/activity/ipxetemplate/*, workflow/pkg/workflow/ipxetemplate/*, workflow/pkg/activity/operatingsystem/*, workflow/pkg/workflow/operatingsystem/*, workflow/pkg/util/testing.go, workflow-schema/site-agent/workflows/v1/*
Registered new workflows/activities and added reconciliation activities/workflows for iPXE templates and OperatingSystems, proto/schema additions, workflow entrypoints parsing siteID, activity execution, and latency recording; tests added.
OpenAPI & SDK
openapi/spec.yaml, sdk/standard/*
OpenAPI expanded with iPXE Template endpoints and extended OperatingSystem schemas (ipxeTemplateName, ipxeParameters, ipxeArtifacts, scope); generated SDK/models updated (iPXE template, parameters, artifacts, OS create/update fields and client methods).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant API as IpxeTemplate Handler
    participant Auth as Auth/Org Resolver
    participant DAO as IpxeTemplateDAO
    participant DB as Database

    Client->>API: GET /v2/org/{org}/carbide/ipxe-template?siteId=...
    API->>Auth: SetupHandler, resolve dbUser, org membership, provider/tenant context
    alt provider or tenant authorized
        API->>DAO: GetAll(filter: siteId, scopes, pagination)
        DAO->>DB: SELECT ... FROM ipxe_template WHERE ...
        DB-->>DAO: rows + total
        DAO-->>API: templates + count
        API-->>Client: 200 JSON + pagination header
    else unauthorized
        API-->>Client: 403/400/500
    end
Loading
sequenceDiagram
    participant SiteCtrl as Site Controller (Carbide)
    participant Workflow as UpdateIpxeTemplateInventory WF
    participant Activity as ManageIpxeTemplate Activity
    participant DB as Database
    participant Metrics as Inventory Metrics

    SiteCtrl-->>Workflow: Start UpdateIpxeTemplateInventory(siteID, inventory)
    Workflow->>Activity: Execute UpdateIpxeTemplatesInDB(siteID, inventory)
    Activity->>DB: GetAll(existing templates for site)
    Activity->>DB: Insert/Update/Delete templates as needed
    DB-->>Activity: ack
    Activity-->>Workflow: success
    Workflow->>Metrics: RecordLatency("UpdateIpxeTemplateInventory")
    Metrics-->>Workflow: ack
    Workflow-->>SiteCtrl: finished
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 I hopped through templates, DB, and flows,
New routes and cronbeats where the inventory grows,
Agents and workflows hummed a tidy tune,
Providers and tenants sorted by the moon,
Happy rabbit — templates synced by noon. 🥕

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/ipxe-os

@pbreton pbreton marked this pull request as draft March 20, 2026 21:55
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 20, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-03-20 21:56:13 UTC | Commit: 7246105

@github-actions
Copy link
Copy Markdown

🛡️ Vulnerability Scan

🚨 Found 65 vulnerability(ies)
📊 vs main: 65 (no change)

Severity Breakdown:

  • 🔴 Critical/High: 65
  • 🟡 Medium: 0
  • 🔵 Low/Info: 0

🔗 View full details in Security tab

🕐 Last updated: 2026-03-20 21:56:54 UTC | Commit: 7246105

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

Test Results

8 751 tests  +120   8 751 ✅ +120   8m 15s ⏱️ +20s
  145 suites +  2       0 💤 ±  0 
   14 files   ±  0       0 ❌ ±  0 

Results for commit caea22d. ± Comparison against base commit cf66bc7.

This pull request removes 1 and adds 121 tests. Note that renamed tests count towards both.
github.com/NVIDIA/ncx-infra-controller-rest/db/pkg/db/model ‑ TestOperatingSystemSQLDAO_Clear/can_clear_ControllerOperatingSystemID
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/cannot_retrieve_from_unmanaged_site
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/filter_by_scope_returns_subset
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/missing_siteId_returns_bad_request
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/missing_user_context_returns_500
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/mixed-role_user_fails_provider_check_but_passes_tenant_authorization
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/non-privileged_tenant_with_TenantAccount_can_retrieve_templates
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/successful_GetAll_with_valid_siteId
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/tenant_with_TenantAccount_can_retrieve_templates
github.com/NVIDIA/ncx-infra-controller-rest/api/pkg/api/handler ‑ TestGetAllIpxeTemplateHandler_Handle/tenant_without_TenantAccount_is_denied
…

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/pkg/api/handler/operatingsystem.go (1)

866-868: ⚠️ Potential issue | 🟡 Minor

Remove debug print statement.

This fmt.Printf appears to be leftover debug code that should not be in production.

🧹 Proposed fix
 	for _, os := range oss {
-		if os.Type == cdbm.OperatingSystemTypeImage {
-			fmt.Printf("Processing Operating System: %s, Type: %s\n", os.Name, os.Type)
-		}
-
 		curVal := os
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/operatingsystem.go` around lines 866 - 868, Remove the
leftover debug print by deleting the fmt.Printf call inside the branch that
checks for cdbm.OperatingSystemTypeImage (the line printing os.Name and
os.Type); if you need observable logging instead, replace it with the
appropriate logger used in this package (e.g., processLogger or the package's
logger) and use structured or leveled logging rather than fmt.Printf, but
otherwise simply remove the fmt.Printf invocation to avoid debug output in
production.
🧹 Nitpick comments (6)
db/pkg/db/model/ipxetemplate.go (1)

142-147: Embedded interface in struct appears unused.

The IpxeTemplateSQLDAO struct embeds IpxeTemplateDAO interface at line 145, but this embedding doesn't appear to serve a purpose since all interface methods are explicitly implemented. This pattern is sometimes used for partial interface implementation, but here all methods are defined.

Consider removing the embedded interface if it's not intentional:

♻️ Suggested simplification
 type IpxeTemplateSQLDAO struct {
 	dbSession *db.Session
-	IpxeTemplateDAO
 	tracerSpan *stracer.TracerSpan
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/pkg/db/model/ipxetemplate.go` around lines 142 - 147, The struct
IpxeTemplateSQLDAO currently embeds the IpxeTemplateDAO interface but all
interface methods are implemented explicitly; remove the embedded interface from
the IpxeTemplateSQLDAO struct declaration (delete the IpxeTemplateDAO field) and
update any constructors or initializers that referenced the embedded field if
present; run go build to ensure no compile-time uses remain and remove any
now-unused imports or fields (e.g., verify IpxeTemplateSQLDAO, IpxeTemplateDAO,
and related constructors).
db/pkg/migrations/20260306120000_ipxe_template.go (1)

87-90: Down migration does not roll back schema changes.

The down migration at lines 87-89 explicitly does nothing. While this may be intentional to prevent accidental data loss, it means schema rollback isn't possible through the migration system. If a rollback is ever needed, it would require manual intervention.

Consider adding a proper down migration or documenting why rollback is intentionally disabled:

📝 Example down migration (if rollback support is desired)
}, func(ctx context.Context, db *bun.DB) error {
    _, err := db.NewDropTable().Model((*model.IpxeTemplate)(nil)).IfExists().Exec(ctx)
    if err != nil {
        return err
    }
    fmt.Print(" [down migration] dropped ipxe_template table")
    return nil
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/pkg/migrations/20260306120000_ipxe_template.go` around lines 87 - 90, The
down migration currently no-ops (prints "No action taken") which prevents schema
rollback; replace the no-op in the down migration callback with logic that drops
the ipxe_template table (use bun's drop table builder on model.IpxeTemplate,
e.g., db.NewDropTable().Model((*model.IpxeTemplate)(nil)).IfExists().Exec(ctx)),
return any error from Exec, and log a concise message on success instead of the
current print so rollbacks via the migration system will actually revert the
schema change.
api/pkg/api/handler/site.go (1)

319-352: Consider workflow execution timing relative to transaction commit.

The workflow executions at lines 345-349 occur before tx.Commit() at line 354. If the transaction rollback happens after some workflows have been triggered, those workflows will execute against associations that don't exist in the database.

The current mitigation (logging warnings and relying on "next inventory cycle") handles workflow execution failures, but doesn't address the scenario where workflows succeed but the transaction fails to commit.

Consider moving workflow triggering after tx.Commit() succeeds, or ensure the workflows are idempotent and handle missing associations gracefully.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/site.go` around lines 319 - 352, Workflows for global
OSes are started via stc.ExecuteWorkflow (using tClient.StartWorkflowOptions and
cwssaws.CreateOperatingSystemRequest) before tx.Commit(), risking workflows
running against uncommitted/rolled-back DB state; either move the loop that
calls scp.GetClientByID / stc.ExecuteWorkflow so it runs only after tx.Commit()
succeeds, or modify the triggered workflow ("CreateOperatingSystem") to be fully
idempotent and tolerate missing site/association data (gracefully handle
not-found and retry/abort without side effects). Ensure you reference and update
the code paths around scp.GetClientByID, tClient.StartWorkflowOptions,
stc.ExecuteWorkflow, and the tx.Commit() call to implement one of these fixes.
db/pkg/migrations/20260320100000_os_scope.go (1)

50-52: Down migration should be reversible for this schema change.

Leaving down as no-op makes rollback incomplete after deployment issues.

Proposed down migration
-		fmt.Print(" [down migration] No action taken")
-		return nil
+		tx, terr := db.BeginTx(ctx, &sql.TxOptions{})
+		if terr != nil {
+			handlePanic(terr, "failed to begin transaction")
+		}
+
+		_, err := tx.Exec("ALTER TABLE operating_system DROP COLUMN IF EXISTS scope")
+		handleError(tx, err)
+
+		terr = tx.Commit()
+		if terr != nil {
+			handlePanic(terr, "failed to commit transaction")
+		}
+
+		fmt.Print(" [down migration] ")
+		return nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/pkg/migrations/20260320100000_os_scope.go` around lines 50 - 52, The down
migration currently is a no-op (the anonymous func with signature func(ctx
context.Context, db *bun.DB) error that prints " [down migration] No action
taken"), which leaves rollbacks incomplete; replace that no-op with code that
reverses whatever the up migration in this file performed (mirror the up steps:
drop any tables created, remove columns added, drop indexes or constraints
introduced by the up migration) using the same bun DB/schema-builder calls so
the migration is fully reversible and returns an error on failure.
site-workflow/pkg/activity/ipxetemplate_test.go (1)

95-96: Use a typed context key instead of a bare string.

Using "wantCount" as a string key for context.WithValue is not idiomatic Go and risks key collisions. Define an unexported type for the key.

♻️ Proposed fix
+type contextKey string
+
+const wantCountKey contextKey = "wantCount"
+
 // In the test:
-ctx = context.WithValue(ctx, "wantCount", tt.wantCount)
+ctx = context.WithValue(ctx, wantCountKey, tt.wantCount)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-workflow/pkg/activity/ipxetemplate_test.go` around lines 95 - 96,
Replace the bare string key used in the test's context.WithValue call with a
typed, unexported context key: define a private key type (e.g., type ctxKey
struct{}) and a package-private variable (e.g., var wantCountKey = &ctxKey{})
and use wantCountKey in the context.WithValue(ctx, wantCountKey, tt.wantCount)
and when reading the value; update any context.Value lookups in the test to use
the same wantCountKey.
api/pkg/api/handler/operatingsystem_ownership_test.go (1)

392-393: Remove unused blank identifier assignments.

These lines suppress "unused variable" warnings but are unnecessary since provOS and tnOS are used in the assertions below.

🧹 Proposed cleanup
 	require.NoError(t, err)
 
-	_ = provOS
-	_ = tnOS
-
 	execGetAll := func(t *testing.T, user *cdbm.User) []model.APIOperatingSystem {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/operatingsystem_ownership_test.go` around lines 392 -
393, Remove the unnecessary blank identifier assignments that suppress unused
variable warnings by deleting the lines "_ = provOS" and "_ = tnOS" in the test;
the variables provOS and tnOS are already used later in assertions (see
operatingsystem_ownership_test.go where provOS and tnOS are declared/used), so
simply remove those two assignments to clean up the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 106-113: GetSiteFromIDString can return a UUID/parse error that
should produce a 400, but current code only special-cases cdb.ErrDoesNotExist
and sends a 500 for parse errors; update the error handling in the site lookup
(the block around common.GetSiteFromIDString) to detect the parse/invalid-ID
error returned by GetSiteFromIDString (inspect the error type or use errors.Is
for the parse error value) and return cerr.NewAPIErrorResponse(c,
http.StatusBadRequest, "Invalid siteId in request data", nil) for that case,
leaving the existing cdb.ErrDoesNotExist and the final internal-server-error
branch intact.
- Around line 115-140: The current role-check short-circuits because the code
uses "if infrastructureProvider != nil { ... } else if tenant != nil { ... }";
change this to evaluate both roles independently: convert the "else if tenant !=
nil" block into a separate "if tenant != nil" block and track authorization
outcomes (e.g., booleans like providerAllowed and tenantAllowed) so the handler
only returns Forbidden if neither path authorizes the request; keep existing
checks inside the infrastructureProvider block (site.InfrastructureProviderID
match) and inside the tenant block (tenant.Config.TargetedInstanceCreation and
taDAO.GetAll verification via NewTenantAccountDAO/GetAll), return on internal
errors as before, and add a unit/integration test for a mixed-role service
account (both provider and tenant present) that fails the provider check but
passes tenant authorization to ensure the request succeeds; apply the same fix
pattern to the similar logic around the other occurrence handling lines 281-306.

In `@api/pkg/api/handler/operatingsystem_ownership_test.go`:
- Around line 119-139: The test sets different expectations for ExecuteWorkflow
on tempClient (5 args) vs siteMock (4 args), causing mismatches; update the
siteMock expectation to match the main mock signature used by the handler —
i.e., change the siteMock.On("ExecuteWorkflow", ...) call (the mock on the
siteMock object placed into scp.IDClientMap for site.ID.String()) to accept the
same number and shape of arguments as tempClient (use mock.Anything for the same
positions) so the site-specific workflow call matches the handler's
ExecuteWorkflow invocation.

In `@api/pkg/api/handler/operatingsystem.go`:
- Around line 1472-1531: The iPXE branch (when uos.Type ==
cdbm.OperatingSystemTypeIPXE) iterates dbossas but never populates it, so add
the same site-association retrieval logic used in the Image-type update path
before the for loop: populate dbossas based on uos.Scope (if global, load all
provider sites and create associations; if limited, validate and load the
provided site IDs; otherwise load existing associations from the DB), mirroring
the code flow that produced dbossas for Image OSes (use the same helper/DB calls
and validation used earlier in the Update handler), then proceed with the
existing loop that starts workflows using dbossa.SiteID and stc.ExecuteWorkflow.

In `@api/pkg/api/model/operatingsystem.go`:
- Around line 94-97: Validation allows Scope to be "limited" without any site
IDs; update the OperatingSystem validation where Scope and siteIds are checked
so that when Scope equals "limited" the siteIds field (siteIds / SiteIds /
SiteIDs as used in the struct) must be non‑nil and have length > 0, returning a
validation error if empty, and keep the existing prohibition that other scopes
must not have siteIds. Apply the same change to the other validation locations
referenced (the other OperatingSystem/create/update validators that currently
only forbid siteIds for non‑limited scopes) so all create/update paths enforce
"limited" requires at least one siteId.
- Around line 120-125: The current validation treats a non-nil ipxeTemplateName
pointer as sufficient (isIpxe) and allows empty strings, so add non-empty checks
for ipxeTemplateName wherever ipxeScript is validated: ensure ipxeTemplateName
!= nil && *ipxeTemplateName != "" before flipping isIpxe and accept it; return
validation.Errors with key "ipxeTemplateName" and an error when the pointer is
non-nil but the dereferenced string is empty (mirror the existing ipxeScript
checks). Update the same logic in all related validation functions/paths that
reference oscr.IpxeTemplateName, oscr.IpxeScript, and isIpxe so create/update
flows enforce non-empty template names and keep the mutual-exclusivity error for
ipxeScript vs ipxeTemplateName.

In `@db/pkg/db/model/operatingsystem_test.go`:
- Around line 1254-1258: The test binds the wrong updated variable to
expectPhoneHomeEnabled (it currently uses &updatedEnableBlockStorage); update
the expectation to use &updatedPhoneHomeEnabled instead (or define
updatedPhoneHomeEnabled if missing) so the PhoneHomeEnabled assertion checks the
correct field in the updated OperatingSystem test case.

In `@db/pkg/migrations/20260320100000_os_scope.go`:
- Around line 35-41: The migration currently adds operating_system.scope as a
nullable free-form TEXT which allows NULLs and arbitrary values; change the
migration to make scope NOT NULL with a DEFAULT 'local', add a CHECK constraint
limiting values to ('local','global','limited') (or use a domain/ENUM
equivalent), and ensure existing rows are backfilled to 'local' before adding
the NOT NULL constraint; update the tx.Exec call(s) in this migration (and keep
using handleError(tx, err)) to execute the ALTER/UPDATE/ADD CONSTRAINT
statements in the proper order so no NULLs or invalid values remain.

In `@site-agent/pkg/components/managers/ipxetemplate/publisher.go`:
- Around line 45-47: The call to api.RegisterCron() currently discards its
error; change the code in publisher.go to capture the returned error from
api.RegisterCron(), check if it's non-nil, and propagate or return that error
(instead of ignoring it) so cron registration failures are visible to callers;
reference the api.RegisterCron() call and the surrounding function that
currently returns nil to locate and modify the error handling.

In `@site-workflow/pkg/activity/ipxetemplate.go`:
- Line 62: The ExecuteWorkflow calls on mii.config.TemporalPublishClient (e.g.,
the call in ipxetemplate.go that currently passes context.Background()) should
use the activity context variable (ctx) instead of context.Background() so the
workflow publish inherits activity cancellation/timeouts; update both places
(the call at the shown ExecuteWorkflow and the similar one around line 76) to
pass ctx, keeping the same workflowOptions, workflowName, and parameters so
lifecycle and shutdown handling work correctly.

In `@site-workflow/pkg/activity/operatingsystem.go`:
- Around line 327-346: Validate that request.Id is present and contains a
non-empty UUID before calling carbide-core: inside the same block that checks
request and request.Name, add a guard that request.GetId() != nil and
request.GetId().GetValue() != "" and return a
temporal.NewNonRetryableApplicationError (similar to the other validation
errors) if missing; only after this validation extract idStr :=
request.GetId().GetValue() and proceed to call
mos.CarbideAtomicClient.GetClient().Carbide().CreateOperatingSystem so the
shared UUID contract is enforced and the function does not return an empty
string.

In `@workflow-schema/site-agent/workflows/v1/forge_carbide.proto`:
- Around line 5748-5757: IpxeOsArtifact.auth_token currently appears in the read
model and returns credentials; change the schema so credentials are not returned
by responses: remove auth_token from the public/read message IpxeOsArtifact and
instead accept credentials via a write-only field or secret reference used only
on input messages (e.g., add a separate IpxeOsArtifactInput or an
auth_token_ref/secret_ref string used by create/update RPCs and
OperatingSystemDefinition inputs). Update all RPC/methods that consume artifacts
to use the input-only credential field (or secret reference) and ensure
OperatingSystemDefinition and any Get/Find responses return only non-secret
fields (keep auth_type if needed but not auth_token). Also apply the same change
for the other artifact message instances referenced in the diff region.
- Around line 5732-5738: The OsScope enum is never included on the OS
resource/messages so callers cannot round-trip scope; add a field named "scope"
of type OsScope to OperatingSystemDefinition, CreateOperatingSystemRequest, and
UpdateOperatingSystemRequest so the resource and create/update RPCs carry the
scope. Make "scope" an optional/nullable enum with a sensible default (e.g.,
OS_SCOPE_LOCAL) and pick unique field numbers consistent with nearby fields;
update any related request/response messages referenced in the same sections
(the other OperatingSystemDefinition/create/update message variants in the file)
to include the same "scope" field so sync semantics are representable by the
API.
- Around line 5819-5832: The repeated fields ipxe_parameters and ipxe_artifacts
in UpdateOperatingSystemRequest lack presence semantics, so callers cannot
distinguish "not provided" vs "clear the list"; fix by introducing wrapper
messages (e.g., message IpxeOsParameters { repeated IpxeOsParameter items = 1; }
and message IpxeOsArtifacts { repeated IpxeOsArtifact items = 1; }) and change
the UpdateOperatingSystemRequest fields to optional IpxeOsParameters
ipxe_parameters = 23; and optional IpxeOsArtifacts ipxe_artifacts = 24; then
update any code that reads these fields to check for presence and use the inner
.items list when applying patches.

In `@workflow/pkg/activity/ipxetemplate/ipxetemplate.go`:
- Around line 141-154: The delete pass is using the page-local reportedNames map
and can remove templates reported on earlier pages; update the logic so
deletions only happen when we know the inventory is complete or when we
accumulate names across pages: either (A) only execute the deletion block when
inventory.InventoryPage == nil || inventory.InventoryPage.TotalPages == 0 (i.e.,
payload is guaranteed complete) by removing the CurrentPage==TotalPages check,
or (B) change reportedNames to be an accumulated map that persists across
paginated calls (e.g., a top-level reportedNamesAcc map that you update per page
and then run the existingTemplates loop against that accumulated map), keeping
references to inventory.InventoryPage, reportedNames/reportedNamesAcc,
existingTemplates and templateDAO.Delete to locate where to change behavior.

In `@workflow/pkg/activity/operatingsystem/operatingsystem.go`:
- Around line 643-659: UpdateOperatingSystemsInDB loads all iPXE OSes but should
only consider the current reconciliation boundary; restrict the osDAO.GetAll
call to the current site/provider by adding the appropriate filter (e.g.,
include SiteID/SiteIDs or owner field) in the cdbm.OperatingSystemFilterInput so
only OS records belonging to the reconciled site are returned, then continue
using reportedIDSet/dbOS.ID/dbOS.Scope as before; this prevents soft-deleting
local OS records owned by other sites when calling osDAO.Delete.

In `@workflow/pkg/workflow/ipxetemplate/update.go`:
- Line 40: The code uses time.Now() and time.Since() (e.g., the startTime :=
time.Now() assignment and subsequent time.Since(startTime)) inside workflow code
which breaks Temporal's deterministic replay; replace time.Now() with
workflow.Now(ctx) when capturing startTime and compute elapsed using
workflow.Now(ctx).Sub(startTime) (or assign elapsed :=
workflow.Now(ctx).Sub(startTime)) so all time calls use the Temporal workflow
API, and ensure the file imports go.temporal.io/sdk/workflow.

In `@workflow/pkg/workflow/operatingsystem/update.go`:
- Line 103: Replace non-deterministic time.Now()/time.Since() uses with
Temporal-safe workflow.Now(ctx): set startTime using workflow.Now(ctx) (using
the existing ctx) instead of time.Now(), and compute elapsed with
workflow.Now(ctx).Sub(startTime) where time.Since(startTime) is currently used;
update the references around startTime in update.go so all wall-clock calls use
workflow.Now(ctx) to preserve replay determinism.

---

Outside diff comments:
In `@api/pkg/api/handler/operatingsystem.go`:
- Around line 866-868: Remove the leftover debug print by deleting the
fmt.Printf call inside the branch that checks for cdbm.OperatingSystemTypeImage
(the line printing os.Name and os.Type); if you need observable logging instead,
replace it with the appropriate logger used in this package (e.g., processLogger
or the package's logger) and use structured or leveled logging rather than
fmt.Printf, but otherwise simply remove the fmt.Printf invocation to avoid debug
output in production.

---

Nitpick comments:
In `@api/pkg/api/handler/operatingsystem_ownership_test.go`:
- Around line 392-393: Remove the unnecessary blank identifier assignments that
suppress unused variable warnings by deleting the lines "_ = provOS" and "_ =
tnOS" in the test; the variables provOS and tnOS are already used later in
assertions (see operatingsystem_ownership_test.go where provOS and tnOS are
declared/used), so simply remove those two assignments to clean up the test.

In `@api/pkg/api/handler/site.go`:
- Around line 319-352: Workflows for global OSes are started via
stc.ExecuteWorkflow (using tClient.StartWorkflowOptions and
cwssaws.CreateOperatingSystemRequest) before tx.Commit(), risking workflows
running against uncommitted/rolled-back DB state; either move the loop that
calls scp.GetClientByID / stc.ExecuteWorkflow so it runs only after tx.Commit()
succeeds, or modify the triggered workflow ("CreateOperatingSystem") to be fully
idempotent and tolerate missing site/association data (gracefully handle
not-found and retry/abort without side effects). Ensure you reference and update
the code paths around scp.GetClientByID, tClient.StartWorkflowOptions,
stc.ExecuteWorkflow, and the tx.Commit() call to implement one of these fixes.

In `@db/pkg/db/model/ipxetemplate.go`:
- Around line 142-147: The struct IpxeTemplateSQLDAO currently embeds the
IpxeTemplateDAO interface but all interface methods are implemented explicitly;
remove the embedded interface from the IpxeTemplateSQLDAO struct declaration
(delete the IpxeTemplateDAO field) and update any constructors or initializers
that referenced the embedded field if present; run go build to ensure no
compile-time uses remain and remove any now-unused imports or fields (e.g.,
verify IpxeTemplateSQLDAO, IpxeTemplateDAO, and related constructors).

In `@db/pkg/migrations/20260306120000_ipxe_template.go`:
- Around line 87-90: The down migration currently no-ops (prints "No action
taken") which prevents schema rollback; replace the no-op in the down migration
callback with logic that drops the ipxe_template table (use bun's drop table
builder on model.IpxeTemplate, e.g.,
db.NewDropTable().Model((*model.IpxeTemplate)(nil)).IfExists().Exec(ctx)),
return any error from Exec, and log a concise message on success instead of the
current print so rollbacks via the migration system will actually revert the
schema change.

In `@db/pkg/migrations/20260320100000_os_scope.go`:
- Around line 50-52: The down migration currently is a no-op (the anonymous func
with signature func(ctx context.Context, db *bun.DB) error that prints " [down
migration] No action taken"), which leaves rollbacks incomplete; replace that
no-op with code that reverses whatever the up migration in this file performed
(mirror the up steps: drop any tables created, remove columns added, drop
indexes or constraints introduced by the up migration) using the same bun
DB/schema-builder calls so the migration is fully reversible and returns an
error on failure.

In `@site-workflow/pkg/activity/ipxetemplate_test.go`:
- Around line 95-96: Replace the bare string key used in the test's
context.WithValue call with a typed, unexported context key: define a private
key type (e.g., type ctxKey struct{}) and a package-private variable (e.g., var
wantCountKey = &ctxKey{}) and use wantCountKey in the context.WithValue(ctx,
wantCountKey, tt.wantCount) and when reading the value; update any context.Value
lookups in the test to use the same wantCountKey.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2a7f7471-9260-41b6-ad16-e6ad3bf39c79

📥 Commits

Reviewing files that changed from the base of the PR and between 385861b and 7246105.

⛔ Files ignored due to path filters (4)
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide_grpc.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/ipxetemplate.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/operatingsystem.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (53)
  • api/pkg/api/handler/ipxetemplate.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • api/pkg/api/handler/operatingsystem.go
  • api/pkg/api/handler/operatingsystem_ownership_test.go
  • api/pkg/api/handler/operatingsystem_test.go
  • api/pkg/api/handler/site.go
  • api/pkg/api/handler/site_test.go
  • api/pkg/api/handler/util/common/common.go
  • api/pkg/api/model/ipxetemplate.go
  • api/pkg/api/model/operatingsystem.go
  • api/pkg/api/routes.go
  • api/pkg/api/routes_test.go
  • db/pkg/db/model/ipxetemplate.go
  • db/pkg/db/model/ipxetemplate_test.go
  • db/pkg/db/model/operatingsystem.go
  • db/pkg/db/model/operatingsystem_test.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • site-agent/pkg/components/managers/ipxetemplate/access.go
  • site-agent/pkg/components/managers/ipxetemplate/cron.go
  • site-agent/pkg/components/managers/ipxetemplate/init.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • site-agent/pkg/components/managers/manager.go
  • site-agent/pkg/components/managers/manageraccess.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-agent/pkg/components/managers/operatingsystem/subscriber.go
  • site-agent/pkg/components/managers/workflow/orchestrator.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • site-workflow/pkg/activity/ipxetemplate.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • site-workflow/pkg/activity/operatingsystem.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • site-workflow/pkg/grpc/client/testing.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • site-workflow/pkg/workflow/ipxetemplate_test.go
  • site-workflow/pkg/workflow/operatingsystem.go
  • workflow-schema/site-agent/workflows/v1/forge_carbide.proto
  • workflow-schema/site-agent/workflows/v1/ipxetemplate.proto
  • workflow-schema/site-agent/workflows/v1/operatingsystem.proto
  • workflow/cmd/workflow/main.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate_test.go
  • workflow/pkg/activity/operatingsystem/operatingsystem.go
  • workflow/pkg/util/testing.go
  • workflow/pkg/workflow/ipxetemplate/update.go
  • workflow/pkg/workflow/ipxetemplate/update_test.go
  • workflow/pkg/workflow/operatingsystem/update.go
💤 Files with no reviewable changes (1)
  • api/pkg/api/handler/util/common/common.go

@pbreton pbreton force-pushed the feat/ipxe-os branch 5 times, most recently from 5e92f7f to a703498 Compare March 30, 2026 17:04
@pbreton
Copy link
Copy Markdown
Contributor Author

pbreton commented Mar 31, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
api/pkg/api/handler/operatingsystem.go (3)

1786-1798: ⚠️ Potential issue | 🔴 Critical

Provider-owned deletes only check one tenant's instances.

The delete preflight is always scoped to TenantIDs: []uuid.UUID{tenant.ID}. For a provider-owned OS, that lets a Provider Admin delete an OS that is still referenced by instances from other tenants under the same provider.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/operatingsystem.go` around lines 1786 - 1798, The current
delete preflight always scopes instance lookup to TenantIDs:
[]uuid.UUID{tenant.ID}, allowing a provider-owned OS to be deleted while
instances in other tenants of the same provider still reference it; change the
instance query logic in the delete flow that uses NewInstanceDAO and
isDAO.GetAll: if the OS is provider-owned (check os.ProviderID or equivalent),
query instances across the provider by removing the tenant-only filter and
instead filtering by ProviderID (e.g., ProviderIDs: []uuid.UUID{os.ProviderID})
or omit TenantIDs so all tenants are checked; otherwise keep the existing
TenantIDs: []uuid.UUID{tenant.ID} behavior. Ensure the OperatingSystemIDs:
[]uuid.UUID{os.ID} filter remains in the GetAll call.

1203-1224: ⚠️ Potential issue | 🟠 Major

Use provider scope for provider-owned rename checks.

This duplicate-name query is always tenant-scoped. Renaming a provider-owned OS can therefore collide with another provider-owned OS, while an unrelated tenant-owned OS in the same org can falsely block the rename. Match the create handler and query by InfrastructureProviderIDs for provider-owned records.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/operatingsystem.go` around lines 1203 - 1224, The
tenant-scoped duplicate-name check uses OperatingSystemFilterInput with
TenantIDs only, which incorrectly blocks or allows renames for provider-owned
records; modify the filter passed to osDAO.GetAll (the call in the
apiRequest.Name != nil block) to use InfrastructureProviderIDs when the OS is
provider-owned (i.e., when the existing os has a provider ID)—for example, set
OperatingSystemFilterInput.InfrastructureProviderIDs to the provider's UUID(s)
instead of TenantIDs for provider-owned OSes, otherwise keep the TenantIDs
check; update the conditional logic around the osDAO.GetAll call (and any
variable names used) so provider-owned rename checks match the create handler.

705-727: ⚠️ Potential issue | 🟠 Major

siteId filtering ignores the provider-admin path.

Every requested site is validated only through TenantSiteDAO, so Provider Admins cannot filter by a provider site unless that site also happens to be tenant-associated. Check provider ownership as an alternative authorization path and reject only when neither provider nor tenant access applies.

Based on learnings: In Go handler code under api/pkg/api/handler, RBAC should not short-circuit on the first matching role; evaluate both authorization paths independently and only reject if all applicable paths fail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/operatingsystem.go` around lines 705 - 727, Currently
each queried site is authorized only via tsDAO.GetByTenantIDAndSiteID (tenant
path) which blocks provider-admin users; update the logic in the siteId loop
(after common.GetSiteFromIDString and before returning forbidden) to also check
the provider authorization path when the tenant check fails: detect
provider-admin role from the request context/tenant info and call the
provider-site lookup (e.g., a ProviderSite DAO method such as
GetByProviderIDAndSiteID or equivalent) to verify ownership; only return 403 if
both the tenant lookup (tsDAO.GetByTenantIDAndSiteID) and the provider lookup
fail, and if either succeeds append site.ID to filter.SiteIDs. Ensure error
logging/messages distinguish DB errors from simple non-existence and preserve
existing error responses for DB failures.
api/pkg/api/model/operatingsystem.go (1)

128-205: ⚠️ Potential issue | 🟠 Major

Don't accept iPXE definition payloads on image requests.

ipxeParameters and ipxeArtifacts are never tied to isIpxe / existingOS.Type, so a plain image create or update can persist iPXE JSON into an Image OS. cacheStrategy is also unchecked, and the handler later coerces unknown strings to the zero enum instead of rejecting them.

Also applies to: 381-457

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/model/operatingsystem.go` around lines 128 - 205, The image-OS
validation currently doesn't reject iPXE-only fields; update the image branch
(where oscr.ImageURL != nil and validation.ValidateStruct(&oscr) is called) to
add validation.Field rules that require ipxeParameters and ipxeArtifacts to be
nil/empty (validation.Nil/Error("ipxeParameters cannot be specified for image
Operating Systems") and validation.Nil/Error("ipxeArtifacts cannot be specified
for image Operating Systems")), and validate cacheStrategy by rejecting unknown
strings (use validation.When to require it be one of the allowed enum values or
validation.Nil/Error to forbid it for image OSes); mirror the same nil/enum
checks in the corresponding update/other branch referenced by lines 381-457 so
image create/update requests cannot persist iPXE JSON or unknown cacheStrategy
values (refer to oscr.ipxeParameters, oscr.ipxeArtifacts, and oscr.CacheStrategy
to locate the fields).
♻️ Duplicate comments (1)
api/pkg/api/model/operatingsystem.go (1)

120-125: ⚠️ Potential issue | 🟠 Major

Reject empty ipxeTemplateName values.

A non-nil pointer is enough to make the request look iPXE here, so {"ipxeTemplateName": ""} still bypasses the image-path checks and passes both create and update. Mirror the existing non-empty validation used for ipxeScript.

Also applies to: 228-233, 375-378, 459-463

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/model/operatingsystem.go` around lines 120 - 125, The validation
treats a non-nil IpxeTemplateName pointer as set even when it's an empty string,
allowing {"ipxeTemplateName": ""} to bypass image-path checks; update the checks
that currently do "oscr.IpxeTemplateName != nil" (the same blocks that check
oscr.IpxeScript) to require a non-empty string (e.g., oscr.IpxeTemplateName !=
nil && *oscr.IpxeTemplateName != "") and when rejecting use validation.Errors
with key "ipxeTemplateName" and the same error messaging pattern as ipxeScript;
apply this change in every validation block that mirrors the ipxeScript logic
(the mutual-exclusion checks and any places that mark the request as iPXE based
on IpxeTemplateName).
🧹 Nitpick comments (2)
api/pkg/api/handler/ipxetemplate.go (2)

285-311: Site access authorization duplicates GetAll logic — consider extraction.

Lines 285-311 duplicate the authorization logic from lines 118-144. Consider extracting this into a helper function to improve maintainability.

♻️ Proposed helper extraction
// validateSiteAccess checks if the caller has provider or tenant authorization for the given site.
// Returns nil if authorized, or an API error if not.
func validateSiteAccess(
    ctx context.Context,
    logger zerolog.Logger,
    dbSession *cdb.Session,
    site *cdbm.Site,
    infrastructureProvider *cdbm.InfrastructureProvider,
    tenant *cdbm.Tenant,
) *cerr.APIError {
    providerAllowed := infrastructureProvider != nil && site.InfrastructureProviderID == infrastructureProvider.ID

    tenantAllowed := false
    if !providerAllowed && tenant != nil {
        if !tenant.Config.TargetedInstanceCreation {
            logger.Warn().Msg("Tenant doesn't have targeted Instance creation capability")
        } else {
            taDAO := cdbm.NewTenantAccountDAO(dbSession)
            _, taCount, err := taDAO.GetAll(ctx, nil, cdbm.TenantAccountFilterInput{
                InfrastructureProviderID: &site.InfrastructureProviderID,
                TenantIDs:                []uuid.UUID{tenant.ID},
            }, cdbp.PageInput{}, []string{})
            if err != nil {
                logger.Error().Err(err).Msg("error retrieving Tenant Account for Site")
                return cerr.NewAPIError(http.StatusInternalServerError, "Error retrieving Tenant Account for Site", nil)
            }
            tenantAllowed = taCount > 0
        }
    }

    if !providerAllowed && !tenantAllowed {
        logger.Warn().Msg("neither provider nor tenant authorization succeeded for Site access")
        return cerr.NewAPIError(http.StatusForbidden, "Current org is not authorized to access the Site", nil)
    }
    return nil
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/ipxetemplate.go` around lines 285 - 311, Extract the
duplicated site-access authorization block into a reusable helper function
(e.g., validateSiteAccess) that takes ctx, logger, h.dbSession (or
*cdb.Session), the *cdbm.Site, *cdbm.InfrastructureProvider, and *cdbm.Tenant
and returns *cerr.APIError (nil when authorized). Move the
providerAllowed/tenantAllowed logic and the tenant-account lookup using
cdbm.NewTenantAccountDAO(...).GetAll into this helper, ensure it logs and
returns a cerr.APIError on DB errors or forbidden access, and replace both
inline blocks in ipxetemplate.go (the current block using
providerAllowed/tenantAllowed and the earlier duplicated block) with calls to
validateSiteAccess and early-return on any non-nil error.

150-153: Consider validating the scope query parameter value.

The scope filter accepts any string value. Consider validating against known scope values (internal, public) to provide better error messages for invalid input.

💡 Proposed validation
 // Optional scope filter
 if scopeStr := c.QueryParam("scope"); scopeStr != "" {
+    if scopeStr != cdbm.IpxeTemplateScopeInternal && scopeStr != cdbm.IpxeTemplateScopePublic {
+        return cerr.NewAPIErrorResponse(c, http.StatusBadRequest, 
+            fmt.Sprintf("Invalid scope value '%s'. Must be 'internal' or 'public'", scopeStr), nil)
+    }
     filterInput.Scopes = []string{scopeStr}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/ipxetemplate.go` around lines 150 - 153, The scope query
parameter (c.QueryParam("scope")) is currently accepted as any string and
assigned to filterInput.Scopes; validate it against the allowed values
"internal" and "public" before setting filterInput.Scopes, returning a 400 Bad
Request with a clear error message if the value is invalid; perform
case-normalization (e.g., strings.ToLower) and only set filterInput.Scopes =
[]string{scope} when the value matches one of the known scopes to prevent
invalid filters from being used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/operatingsystem.go`:
- Around line 1184-1195: The current provider-admin branch only verifies the
provider ID when ownedByProvider is false, allowing any Provider Admin to modify
any provider-owned OS; change operatingsystem.go so that when isProviderAdmin is
true you always resolve the caller's provider via
common.GetInfrastructureProviderForOrg(ctx, nil, ush.dbSession, org) and compare
provider.ID against os.InfrastructureProviderID (nil-safe) and reject if they
differ; update the check around ownedByProvider/isProviderAdmin (and the same
logic at the other location noted) to return the same Forbidden response using
logger.Warn and cutil.NewAPIErrorResponse when the IDs do not match.
- Around line 1540-1602: The loop that starts the UpdateOperatingSystem workflow
for each dbossa is missing the step that marks the
OperatingSystemSiteAssociation as Syncing and calls GenerateAndUpdateVersion
(the same steps the image branch performs), and the workflow ID omits the
association version so retries reuse the same Temporal execution; before calling
stc.ExecuteWorkflow inside the for _, dbossa range, update the association
status to Syncing and call GenerateAndUpdateVersion (use the same helper used
elsewhere — e.g., GenerateAndUpdateVersion/ossaDAO methods) to get the new
association version, persist it, and then include that association version in
the workflowOptions.ID (e.g., append the version to
"ipxe-os-update-"+dbossa.SiteID+"-"+uos.ID) so each sync has a unique Temporal
ID and the association history reflects in-progress syncs.
- Around line 115-129: The current RBAC logic uses isProviderAdmin/isTenantAdmin
and immediately returns on the provider-admin branch, which causes dual-role
callers to be denied even though the tenant-admin path would permit the action;
update the authorization so both role-specific checks are evaluated
independently (e.g., compute allowedByProvider and allowedByTenant using the
existing isProviderAdmin, isTenantAdmin, and request fields like
apiRequest.IpxeTemplateName) and only return a 403 if neither allowedByProvider
nor allowedByTenant is true; apply the same pattern to the corresponding
update/delete checks that reference isProviderAdmin/isTenantAdmin so a user with
both roles is permitted if either role authorizes the operation.

In `@db/pkg/db/model/operatingsystem.go`:
- Around line 193-223: The new iPXE fields on OperatingSystemUpdateInput
(IpxeTemplateName, IpxeParameters, IpxeArtifacts, DefinitionHash, Scope) only
support "set" semantics and there are no corresponding clear flags in
OperatingSystemClearInput, so switching between template-based and raw iPXE
leaves stale data; add explicit clear/nullable flags to
OperatingSystemClearInput (e.g., ClearIpxeTemplateName, ClearIpxeParameters,
ClearIpxeArtifacts, ClearDefinitionHash, ClearScope) and update the update/clear
handling code paths that process OperatingSystemUpdateInput and
OperatingSystemClearInput so that when a clear flag is set the corresponding DB
columns are nulled (or removed) instead of left intact; apply the same changes
for the other occurrences noted (lines ~226-243 and ~654-673) to ensure both
update and clear flows can explicitly remove old iPXE representation.

---

Outside diff comments:
In `@api/pkg/api/handler/operatingsystem.go`:
- Around line 1786-1798: The current delete preflight always scopes instance
lookup to TenantIDs: []uuid.UUID{tenant.ID}, allowing a provider-owned OS to be
deleted while instances in other tenants of the same provider still reference
it; change the instance query logic in the delete flow that uses NewInstanceDAO
and isDAO.GetAll: if the OS is provider-owned (check os.ProviderID or
equivalent), query instances across the provider by removing the tenant-only
filter and instead filtering by ProviderID (e.g., ProviderIDs:
[]uuid.UUID{os.ProviderID}) or omit TenantIDs so all tenants are checked;
otherwise keep the existing TenantIDs: []uuid.UUID{tenant.ID} behavior. Ensure
the OperatingSystemIDs: []uuid.UUID{os.ID} filter remains in the GetAll call.
- Around line 1203-1224: The tenant-scoped duplicate-name check uses
OperatingSystemFilterInput with TenantIDs only, which incorrectly blocks or
allows renames for provider-owned records; modify the filter passed to
osDAO.GetAll (the call in the apiRequest.Name != nil block) to use
InfrastructureProviderIDs when the OS is provider-owned (i.e., when the existing
os has a provider ID)—for example, set
OperatingSystemFilterInput.InfrastructureProviderIDs to the provider's UUID(s)
instead of TenantIDs for provider-owned OSes, otherwise keep the TenantIDs
check; update the conditional logic around the osDAO.GetAll call (and any
variable names used) so provider-owned rename checks match the create handler.
- Around line 705-727: Currently each queried site is authorized only via
tsDAO.GetByTenantIDAndSiteID (tenant path) which blocks provider-admin users;
update the logic in the siteId loop (after common.GetSiteFromIDString and before
returning forbidden) to also check the provider authorization path when the
tenant check fails: detect provider-admin role from the request context/tenant
info and call the provider-site lookup (e.g., a ProviderSite DAO method such as
GetByProviderIDAndSiteID or equivalent) to verify ownership; only return 403 if
both the tenant lookup (tsDAO.GetByTenantIDAndSiteID) and the provider lookup
fail, and if either succeeds append site.ID to filter.SiteIDs. Ensure error
logging/messages distinguish DB errors from simple non-existence and preserve
existing error responses for DB failures.

In `@api/pkg/api/model/operatingsystem.go`:
- Around line 128-205: The image-OS validation currently doesn't reject
iPXE-only fields; update the image branch (where oscr.ImageURL != nil and
validation.ValidateStruct(&oscr) is called) to add validation.Field rules that
require ipxeParameters and ipxeArtifacts to be nil/empty
(validation.Nil/Error("ipxeParameters cannot be specified for image Operating
Systems") and validation.Nil/Error("ipxeArtifacts cannot be specified for image
Operating Systems")), and validate cacheStrategy by rejecting unknown strings
(use validation.When to require it be one of the allowed enum values or
validation.Nil/Error to forbid it for image OSes); mirror the same nil/enum
checks in the corresponding update/other branch referenced by lines 381-457 so
image create/update requests cannot persist iPXE JSON or unknown cacheStrategy
values (refer to oscr.ipxeParameters, oscr.ipxeArtifacts, and oscr.CacheStrategy
to locate the fields).

---

Duplicate comments:
In `@api/pkg/api/model/operatingsystem.go`:
- Around line 120-125: The validation treats a non-nil IpxeTemplateName pointer
as set even when it's an empty string, allowing {"ipxeTemplateName": ""} to
bypass image-path checks; update the checks that currently do
"oscr.IpxeTemplateName != nil" (the same blocks that check oscr.IpxeScript) to
require a non-empty string (e.g., oscr.IpxeTemplateName != nil &&
*oscr.IpxeTemplateName != "") and when rejecting use validation.Errors with key
"ipxeTemplateName" and the same error messaging pattern as ipxeScript; apply
this change in every validation block that mirrors the ipxeScript logic (the
mutual-exclusion checks and any places that mark the request as iPXE based on
IpxeTemplateName).

---

Nitpick comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 285-311: Extract the duplicated site-access authorization block
into a reusable helper function (e.g., validateSiteAccess) that takes ctx,
logger, h.dbSession (or *cdb.Session), the *cdbm.Site,
*cdbm.InfrastructureProvider, and *cdbm.Tenant and returns *cerr.APIError (nil
when authorized). Move the providerAllowed/tenantAllowed logic and the
tenant-account lookup using cdbm.NewTenantAccountDAO(...).GetAll into this
helper, ensure it logs and returns a cerr.APIError on DB errors or forbidden
access, and replace both inline blocks in ipxetemplate.go (the current block
using providerAllowed/tenantAllowed and the earlier duplicated block) with calls
to validateSiteAccess and early-return on any non-nil error.
- Around line 150-153: The scope query parameter (c.QueryParam("scope")) is
currently accepted as any string and assigned to filterInput.Scopes; validate it
against the allowed values "internal" and "public" before setting
filterInput.Scopes, returning a 400 Bad Request with a clear error message if
the value is invalid; perform case-normalization (e.g., strings.ToLower) and
only set filterInput.Scopes = []string{scope} when the value matches one of the
known scopes to prevent invalid filters from being used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7224c5a1-0059-4dd3-8250-ed2879f6776d

📥 Commits

Reviewing files that changed from the base of the PR and between 7246105 and 01f02a2.

📒 Files selected for processing (21)
  • api/pkg/api/handler/ipxetemplate.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • api/pkg/api/handler/operatingsystem.go
  • api/pkg/api/handler/operatingsystem_ownership_test.go
  • api/pkg/api/handler/operatingsystem_test.go
  • api/pkg/api/handler/site.go
  • api/pkg/api/handler/site_test.go
  • api/pkg/api/handler/util/common/common.go
  • api/pkg/api/model/ipxetemplate.go
  • api/pkg/api/model/operatingsystem.go
  • api/pkg/api/routes.go
  • api/pkg/api/routes_test.go
  • db/pkg/db/model/ipxetemplate.go
  • db/pkg/db/model/ipxetemplate_test.go
  • db/pkg/db/model/operatingsystem.go
  • db/pkg/db/model/operatingsystem_test.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
💤 Files with no reviewable changes (1)
  • api/pkg/api/handler/util/common/common.go
✅ Files skipped from review due to trivial changes (4)
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • api/pkg/api/handler/ipxetemplate_test.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • api/pkg/api/handler/operatingsystem_test.go
  • api/pkg/api/routes_test.go
  • api/pkg/api/routes.go
  • api/pkg/api/model/ipxetemplate.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • db/pkg/db/model/operatingsystem_test.go

@pbreton pbreton marked this pull request as ready for review March 31, 2026 20:09
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
api/pkg/api/model/operatingsystem.go (2)

330-339: ⚠️ Potential issue | 🟠 Major

PATCH can switch to limited without the sites that define it.

APIOperatingSystemUpdateRequest exposes scope, but there is no siteIds payload here. The handler therefore has no way to create or prune associations when a caller moves between local, global, and limited, so the OS can end up with zero or stale targets. Either add siteIds + validation to the update model, or reject scope transitions until association reconciliation exists. As per coding guidelines, "Document when you have intentionally omitted code that the reader might otherwise expect to be present."

Also applies to: 466-480

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/model/operatingsystem.go` around lines 330 - 339,
APIOperatingSystemUpdateRequest currently allows changing Scope without
providing site associations, so PATCH can set Scope="limited" leaving zero or
stale targets; add an optional SiteIDs field (e.g., SiteIds *[]string
`json:"siteIds"`) to APIOperatingSystemUpdateRequest and implement validation in
the update handler (validate SiteIds present when Scope is "limited", ensure IDs
exist/are unique, and reconcile create/prune associations for functions that
manage OS→site links such as the existing OS update flow), or alternatively
block scope transitions in the handler until association reconciliation is
implemented and document that omission in APIOperatingSystemUpdateRequest to
make the limitation explicit.

120-148: ⚠️ Potential issue | 🟠 Major

Reject empty ipxeTemplateName values.

A non-nil pointer with "" still routes the request through the iPXE/template path and currently passes both create and update validation. That lets {"ipxeTemplateName": ""} bypass the image checks and, on create, even satisfy the provider-admin gate. Validate *IpxeTemplateName != "" the same way ipxeScript is validated.

Also applies to: 223-233, 375-385, 459-464

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/model/operatingsystem.go` around lines 120 - 148, The validation
currently treats a non-nil oscr.IpxeTemplateName pointer with an empty string as
a valid iPXE path; update the checks that use oscr.IpxeTemplateName (and the
derived isIpxe flag) to treat empty string the same as nil by verifying
oscr.IpxeTemplateName != nil && *oscr.IpxeTemplateName != "" wherever iPXE
detection/validation occurs (e.g., the isIpxe assignment and the subsequent
mutually-exclusive, imageURL, and siteIds/scope validations); apply the same
non-empty-string check in each duplicate block referenced so that "" does not
bypass image checks or authorization gates.
🧹 Nitpick comments (2)
site-agent/pkg/components/managers/ipxetemplate/access.go (1)

32-39: Fail fast on nil constructor inputs

NewIpxeTemplateManager accepts pointer dependencies but does not validate them. Adding guards here avoids deferred nil dereferences in lifecycle paths.

Proposed hardening
 func NewIpxeTemplateManager(superForge *elektratypes.Elektra, superAPI *Manager.ManagerAPI, superConf *Manager.ManagerConf) *API {
+	if superForge == nil || superAPI == nil || superConf == nil {
+		// TODO: replace with structured error return if constructor signature is updated.
+		panic("ipxetemplate: NewIpxeTemplateManager received nil dependency")
+	}
 	ManagerAccess = &Manager.ManagerAccess{
 		Data: &Manager.ManagerData{
 			EB: superForge,
 		},
 		API:  superAPI,
 		Conf: superConf,
 	}
 	return &API{}
 }
As per coding guidelines, "Document when you have intentionally omitted code that the reader might otherwise expect to be present" and "Add TODO comments for features or nuances not important to implement right away".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-agent/pkg/components/managers/ipxetemplate/access.go` around lines 32 -
39, NewIpxeTemplateManager currently assigns pointers without validation; add
nil-guards at the top of NewIpxeTemplateManager to fail-fast: check superForge,
superAPI, and superConf for nil and if any are nil return nil (or panic if that
pattern is used elsewhere), and add a brief TODO comment explaining the choice;
ensure ManagerAccess and its Data (EB), API and Conf fields are only set after
these validations so you avoid deferred nil dereferences.
api/pkg/api/handler/site_test.go (1)

278-529: Add a create-site test for the new global iPXE expansion path.

This table only rewires scp; it never seeds a global iPXE OperatingSystem, checks the new OperatingSystemSiteAssociation rows, or asserts the site-scoped workflow dispatch. Please add one focused case for that path, and reset the OS/association tables in testSiteSetupSchema so it stays isolated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/site_test.go` around lines 278 - 529, The test lacks a
case for the global iPXE expansion path and the schema seeding isn't isolating
OS data; update testSiteSetupSchema to truncate/reset the OperatingSystem and
OperatingSystemSiteAssociation tables so OS state is isolated, then add a new
tests[] entry in TestCreateSiteHandler_Handle that seeds a global iPXE
OperatingSystem row (e.g., name/type indicating global iPXE), uses
sc.NewClientPool and the existing csh flow to create the site, and after success
asserts that an OperatingSystemSiteAssociation row was created for the new site
and that the site-scoped workflow was dispatched (verify via the temporal mock
or side-effect used elsewhere); reference testSiteSetupSchema,
TestCreateSiteHandler_Handle, OperatingSystem, OperatingSystemSiteAssociation,
and sc.NewClientPool when making the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 121-153: When tenantAllowed is true but providerAllowed is false,
restrict tenant-scoped access to PUBLIC templates only: in the list path (where
IpxeTemplateFilterInput filterInput is built) set filterInput.Scopes =
[]string{cdbm.IpxeScriptTemplateScope.PUBLIC} (ignoring any scope query param)
when tenantAllowed && !providerAllowed; and in the single-item Get path (the
handler that loads an IpxeTemplate by ID) check the loaded template's Scope
(compare to cdbm.IpxeScriptTemplateScope.INTERNAL) and return a 404/forbidden
when tenantAllowed && !providerAllowed and the template is INTERNAL so tenants
cannot read internal templates. Ensure you reference tenantAllowed,
providerAllowed, IpxeTemplateFilterInput, filterInput and
IpxeScriptTemplateScope to locate the places to change (both the list block
shown and the Get handler around the other mentioned lines).

In `@api/pkg/api/handler/operatingsystem.go`:
- Around line 993-1004: The current GET-by-ID path authorizes provider-owned
(iPXE) OSes via the ownedByProvider check (using
GetInfrastructureProviderForOrg) but doesn't load OperatingSystemSiteAssociation
rows for iPXE images, leaving siteAssociations empty; update the reader logic
after the ownership checks (around the code that computes
ownedByTenant/ownedByProvider and the subsequent fetch of siteAssociations) so
it also queries and hydrates OperatingSystemSiteAssociation entries for
provider-owned iPXE OSes (ensure the same reader/session used for image OS
associations is invoked for iPXE cases so siteAssociations is populated on
GET-by-ID).
- Around line 1184-1213: The uniqueness check for renames still only filters by
TenantID, so provider-owned updates allowed by the new ownership logic
(ownedByProvider / InfrastructureProviderID) can create duplicate names or be
blocked incorrectly; update the name-collision query used when updating an
Operating System (the code that currently filters by tenant_id) to choose its
owner filter based on the current owner — use InfrastructureProviderID when
os.InfrastructureProviderID != nil (provider-owned) and TenantID when
os.InfrastructureProviderID == nil (tenant-owned), matching the same owner-scope
logic used in the create path and the ownedByTenant/ownedByProvider checks so
collisions are checked against the correct owner field.

In `@workflow/pkg/activity/operatingsystem/operatingsystem.go`:
- Around line 588-601: The code currently swallows any non-not-found error from
ossaDAO.GetByOperatingSystemIDAndSiteID; change the logic in the block around
tenantStateToRestStatus/ossaDAO.GetByOperatingSystemIDAndSiteID so that if
GetByOperatingSystemIDAndSiteID returns an error other than a "not found"
sentinel you propagate or return that error (or at minimum log it and fail the
activity) instead of treating it as success; keep the existing behavior for the
case where ossaErr indicates "not found" (i.e., handle nil ossa separately), and
continue to call ossaDAO.Update(…) only when ossa != nil. Reference:
tenantStateToRestStatus, NewOperatingSystemSiteAssociationDAO,
GetByOperatingSystemIDAndSiteID, and Update.
- Around line 564-572: The corrective writes for IsActive and tenant_id
currently only log failures (see reported.IsActive handling and the tenant clear
path using osDAO.Update and OperatingSystemUpdateInput) so transient failures
never trigger a retry; change this so failed corrective updates are part of the
retryable path by returning the update error (i.e., propagate uerr) from the
reconciliation function instead of just logging, and/or modify the
reconciliation predicate (where coreUpdated is evaluated) to include checks for
cur.IsActive != reported.IsActive and cur.TenantID != nil so reconciliation will
requeue and retry; apply the same change to the other corrective block (the
tenant_id clear logic around the 608-642 section) so both corrective writes can
cause retries until they succeed.

---

Duplicate comments:
In `@api/pkg/api/model/operatingsystem.go`:
- Around line 330-339: APIOperatingSystemUpdateRequest currently allows changing
Scope without providing site associations, so PATCH can set Scope="limited"
leaving zero or stale targets; add an optional SiteIDs field (e.g., SiteIds
*[]string `json:"siteIds"`) to APIOperatingSystemUpdateRequest and implement
validation in the update handler (validate SiteIds present when Scope is
"limited", ensure IDs exist/are unique, and reconcile create/prune associations
for functions that manage OS→site links such as the existing OS update flow), or
alternatively block scope transitions in the handler until association
reconciliation is implemented and document that omission in
APIOperatingSystemUpdateRequest to make the limitation explicit.
- Around line 120-148: The validation currently treats a non-nil
oscr.IpxeTemplateName pointer with an empty string as a valid iPXE path; update
the checks that use oscr.IpxeTemplateName (and the derived isIpxe flag) to treat
empty string the same as nil by verifying oscr.IpxeTemplateName != nil &&
*oscr.IpxeTemplateName != "" wherever iPXE detection/validation occurs (e.g.,
the isIpxe assignment and the subsequent mutually-exclusive, imageURL, and
siteIds/scope validations); apply the same non-empty-string check in each
duplicate block referenced so that "" does not bypass image checks or
authorization gates.

---

Nitpick comments:
In `@api/pkg/api/handler/site_test.go`:
- Around line 278-529: The test lacks a case for the global iPXE expansion path
and the schema seeding isn't isolating OS data; update testSiteSetupSchema to
truncate/reset the OperatingSystem and OperatingSystemSiteAssociation tables so
OS state is isolated, then add a new tests[] entry in
TestCreateSiteHandler_Handle that seeds a global iPXE OperatingSystem row (e.g.,
name/type indicating global iPXE), uses sc.NewClientPool and the existing csh
flow to create the site, and after success asserts that an
OperatingSystemSiteAssociation row was created for the new site and that the
site-scoped workflow was dispatched (verify via the temporal mock or side-effect
used elsewhere); reference testSiteSetupSchema, TestCreateSiteHandler_Handle,
OperatingSystem, OperatingSystemSiteAssociation, and sc.NewClientPool when
making the changes.

In `@site-agent/pkg/components/managers/ipxetemplate/access.go`:
- Around line 32-39: NewIpxeTemplateManager currently assigns pointers without
validation; add nil-guards at the top of NewIpxeTemplateManager to fail-fast:
check superForge, superAPI, and superConf for nil and if any are nil return nil
(or panic if that pattern is used elsewhere), and add a brief TODO comment
explaining the choice; ensure ManagerAccess and its Data (EB), API and Conf
fields are only set after these validations so you avoid deferred nil
dereferences.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 86b19298-d770-40b2-a2b9-19e1a2821e65

📥 Commits

Reviewing files that changed from the base of the PR and between 01f02a2 and bbe58c3.

⛔ Files ignored due to path filters (4)
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide_grpc.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/ipxetemplate.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/operatingsystem.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (54)
  • api/pkg/api/handler/ipxetemplate.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • api/pkg/api/handler/operatingsystem.go
  • api/pkg/api/handler/operatingsystem_ownership_test.go
  • api/pkg/api/handler/operatingsystem_test.go
  • api/pkg/api/handler/site.go
  • api/pkg/api/handler/site_test.go
  • api/pkg/api/handler/util/common/common.go
  • api/pkg/api/model/ipxetemplate.go
  • api/pkg/api/model/operatingsystem.go
  • api/pkg/api/routes.go
  • api/pkg/api/routes_test.go
  • db/pkg/db/model/ipxetemplate.go
  • db/pkg/db/model/ipxetemplate_test.go
  • db/pkg/db/model/operatingsystem.go
  • db/pkg/db/model/operatingsystem_test.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
  • site-agent/pkg/components/managers/ipxetemplate/access.go
  • site-agent/pkg/components/managers/ipxetemplate/cron.go
  • site-agent/pkg/components/managers/ipxetemplate/init.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • site-agent/pkg/components/managers/manager.go
  • site-agent/pkg/components/managers/manageraccess.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-agent/pkg/components/managers/operatingsystem/subscriber.go
  • site-agent/pkg/components/managers/workflow/orchestrator.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • site-workflow/pkg/activity/ipxetemplate.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • site-workflow/pkg/activity/operatingsystem.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • site-workflow/pkg/grpc/client/testing.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • site-workflow/pkg/workflow/ipxetemplate_test.go
  • site-workflow/pkg/workflow/operatingsystem.go
  • workflow-schema/site-agent/workflows/v1/forge_carbide.proto
  • workflow-schema/site-agent/workflows/v1/ipxetemplate.proto
  • workflow-schema/site-agent/workflows/v1/operatingsystem.proto
  • workflow/cmd/workflow/main.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate_test.go
  • workflow/pkg/activity/operatingsystem/operatingsystem.go
  • workflow/pkg/util/testing.go
  • workflow/pkg/workflow/ipxetemplate/update.go
  • workflow/pkg/workflow/ipxetemplate/update_test.go
  • workflow/pkg/workflow/operatingsystem/update.go
💤 Files with no reviewable changes (1)
  • api/pkg/api/handler/util/common/common.go
✅ Files skipped from review due to trivial changes (15)
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • workflow-schema/site-agent/workflows/v1/operatingsystem.proto
  • db/pkg/migrations/20260320100000_os_scope.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • site-agent/pkg/components/managers/ipxetemplate/init.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • api/pkg/api/model/ipxetemplate.go
  • site-workflow/pkg/workflow/operatingsystem.go
  • db/pkg/db/model/ipxetemplate_test.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • site-workflow/pkg/activity/operatingsystem.go
🚧 Files skipped from review as they are similar to previous changes (23)
  • api/pkg/api/routes_test.go
  • site-agent/pkg/components/managers/workflow/orchestrator.go
  • site-agent/pkg/components/managers/manageraccess.go
  • api/pkg/api/handler/operatingsystem_test.go
  • workflow/pkg/util/testing.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • site-agent/pkg/components/managers/operatingsystem/subscriber.go
  • api/pkg/api/routes.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate_test.go
  • workflow/pkg/workflow/ipxetemplate/update.go
  • site-agent/pkg/components/managers/ipxetemplate/cron.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • db/pkg/db/model/operatingsystem_test.go
  • site-workflow/pkg/activity/ipxetemplate.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate.go
  • site-workflow/pkg/grpc/client/testing.go
  • db/pkg/db/model/operatingsystem.go
  • workflow/pkg/workflow/operatingsystem/update.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

♻️ Duplicate comments (3)
db/pkg/db/model/operatingsystem_test.go (1)

1254-1258: ⚠️ Potential issue | 🟡 Minor

Fix expected value binding for PhoneHomeEnabled assertion.

Line 1257 uses &updatedEnableBlockStorage for expectPhoneHomeEnabled. This validates the wrong field and can hide bugs. Both values are currently false, so the test passes by coincidence, but this will fail silently if the values diverge.

Proposed fix
-			expectPhoneHomeEnabled:           &updatedEnableBlockStorage,
+			expectPhoneHomeEnabled:           &updatedPhoneHomeEnabled,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/pkg/db/model/operatingsystem_test.go` around lines 1254 - 1258, The test
is asserting the wrong field for PhoneHomeEnabled: replace the mistaken
reference to &updatedEnableBlockStorage used for expectPhoneHomeEnabled with the
actual PhoneHomeEnabled variable (e.g., &updatedPhoneHomeEnabled) in the
operatingsystem_test.go assertion block so the test validates the intended field
(ensure the variable updatedPhoneHomeEnabled is declared/initialized above the
table entry if not already).
api/pkg/api/handler/ipxetemplate.go (1)

308-315: ⚠️ Potential issue | 🟡 Minor

Tenant-authorized requests can retrieve internal templates by ID.

Consistent with the issue in GetAllIpxeTemplateHandler, when tenantAllowed is true but providerAllowed is false, the single-item handler should reject or 404 templates with internal scope.

🔒 Suggested fix
 	if !providerAllowed && !tenantAllowed {
 		logger.Warn().Msg("neither provider nor tenant authorization succeeded for Site access")
 		return cerr.NewAPIErrorResponse(c, http.StatusForbidden, "Current org is not authorized to access the iPXE template's Site", nil)
 	}

+	// Tenants cannot access internal-scope templates
+	if tenantAllowed && !providerAllowed && tmpl.Scope == cdbm.IpxeTemplateScopeInternal {
+		return cerr.NewAPIErrorResponse(c, http.StatusNotFound, fmt.Sprintf("Could not find iPXE template with ID: %s", templateIDStr), nil)
+	}
+
 	logger.Info().Msg("finishing API handler")
 	return c.JSON(http.StatusOK, model.NewAPIIpxeTemplate(tmpl))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/ipxetemplate.go` around lines 308 - 315, The handler
currently allows tenant-authorized requests to return an internal-scoped
template when providerAllowed is false; update the post-authorization logic (the
block after providerAllowed/tenantAllowed checks where tmpl is returned via
model.NewAPIIpxeTemplate) to detect tmpl.Scope (or equivalent scope field) and,
if tmpl.Scope == "internal" (or the internal constant), refuse access when
providerAllowed is false even if tenantAllowed is true — return a 404 or
forbidden API error (matching GetAllIpxeTemplateHandler behavior) instead of
returning the template.
site-workflow/pkg/activity/operatingsystem.go (1)

327-346: ⚠️ Potential issue | 🟠 Major

Add ID validation to enforce the shared UUID contract.

The method comment at line 322 states request.Id must be pre-set, but lines 327-334 only validate request and request.Name. If Id is nil or empty, line 346 will return an empty string, breaking the correlation with carbide-rest.

The UpdateOperatingSystemOnSite and DeleteOperatingSystemOnSite methods correctly validate ID presence (lines 360-363 and 389-392). Apply the same pattern here.

Proposed fix
 	if request == nil {
 		err := errors.New("received empty create OS definition request")
 		return "", temporal.NewNonRetryableApplicationError(err.Error(), swe.ErrTypeInvalidRequest, err)
 	}
 	if request.Name == "" {
 		err := errors.New("received create OS definition request missing Name")
 		return "", temporal.NewNonRetryableApplicationError(err.Error(), swe.ErrTypeInvalidRequest, err)
 	}
+	if request.GetId().GetValue() == "" {
+		err := errors.New("received create OS definition request missing ID")
+		return "", temporal.NewNonRetryableApplicationError(err.Error(), swe.ErrTypeInvalidRequest, err)
+	}

 	carbideClient := mos.CarbideAtomicClient.GetClient()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-workflow/pkg/activity/operatingsystem.go` around lines 327 - 346, The
CreateOperatingSystemOnSite path currently checks request and request.Name but
not request.Id, so idStr may be empty; add the same ID validation used in
UpdateOperatingSystemOnSite/DeleteOperatingSystemOnSite: verify request.GetId()
!= nil and request.GetId().GetValue() != "" and if missing return a
temporal.NewNonRetryableApplicationError with swe.ErrTypeInvalidRequest
(matching the existing error style), before calling
carbideClient.Carbide().CreateOperatingSystem; then use
request.GetId().GetValue() for idStr as now.
🧹 Nitpick comments (6)
db/pkg/db/model/operatingsystem_test.go (1)

1164-1167: Update test description to reflect removed field.

The test description mentions controlleroperatingsystemid but this field has been removed from the update input/expected values. The description should be updated to avoid confusion.

-		desc: "can update uuid fields: infrastructureproviderid, tenantid, controlleroperatingsystemid",
+		desc: "can update uuid fields: infrastructureproviderid, tenantid",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/pkg/db/model/operatingsystem_test.go` around lines 1164 - 1167, Update the
test case description string that currently reads "can update uuid fields:
infrastructureproviderid, tenantid, controlleroperatingsystemid" to accurately
reflect the fields being tested (e.g., remove "controlleroperatingsystemid" and
list only "infrastructureproviderid, tenantid") so the test case for os1tenant1
matches the actual update input/expected values in operatingsystem_test.go; edit
the desc value in the test struct to the corrected text.
api/pkg/api/handler/ipxetemplate.go (2)

285-311: Consider extracting site authorization logic to a helper.

The provider/tenant site authorization logic (lines 121-144 and 288-311) is nearly identical between the two handlers. Extracting this to a shared helper function (e.g., validateSiteAccess) would reduce duplication and ensure consistent behavior, especially as scope restrictions are added.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/ipxetemplate.go` around lines 285 - 311, The site
authorization logic duplicated across handlers should be moved into a shared
helper (e.g., validateSiteAccess) that returns (bool, *cerr.APIErrorResponse) or
similar; extract the providerAllowed/tenantAllowed checks (the providerAllowed
expression using infrastructureProvider and site.InfrastructureProviderID, the
tenant.Config.TargetedInstanceCreation branch, the TenantAccountDAO usage via
cdbm.NewTenantAccountDAO and its GetAll call with cdbm.TenantAccountFilterInput
and taCount check) into that helper, keep logging inside the helper (preserve
the warn/error messages), and replace the duplicated blocks in the handlers with
a single call to validateSiteAccess which enforces the same provider/tenant
semantics and returns the appropriate HTTP error when authorization fails.

42-60: Note: Temporal client field appears unused in this handler.

The tc tclient.Client field is included in both handler structs but not used in either Handle method. This appears to be for consistency with other handlers or future use.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/ipxetemplate.go` around lines 42 - 60, The temporal
client field tc is unused in GetAllIpxeTemplateHandler; remove the dead field
and constructor parameter to avoid confusion: delete the tc tclient.Client field
from the GetAllIpxeTemplateHandler struct, remove the tc parameter from
NewGetAllIpxeTemplateHandler and its use in the returned struct literal, and
update any NewGetAllIpxeTemplateHandler call sites to stop passing the temporal
client; if you need to preserve a future placeholder instead, rename the field
to _tc or add a comment explaining it's intentionally unused.
api/pkg/api/handler/site_test.go (2)

465-467: Discarded error from GetTemporalConfig() may mask test setup issues.

If GetTemporalConfig() fails, the test will proceed with a potentially nil or incomplete tcfg, which could cause confusing test failures. Consider checking the error.

🔧 Proposed fix
-			tcfg, _ := tt.fields.cfg.GetTemporalConfig()
+			tcfg, tcfgErr := tt.fields.cfg.GetTemporalConfig()
+			require.NoError(t, tcfgErr, "GetTemporalConfig should not fail")
 			scp := sc.NewClientPool(tcfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/site_test.go` around lines 465 - 467, The test is
discarding the error returned by GetTemporalConfig(); capture and check that
error before using tcfg so test setup failures surface immediately — call
tt.fields.cfg.GetTemporalConfig(), assert or t.Fatalf on non-nil error (so tcfg
is valid), then pass tcfg into sc.NewClientPool and NewCreateSiteHandler
(referencing GetTemporalConfig, tcfg, sc.NewClientPool, and NewCreateSiteHandler
to locate the change).

2312-2313: Same discarded error pattern.

Apply the same fix here for consistency.

🔧 Proposed fix
-	tcfg, _ := cfg.GetTemporalConfig()
+	tcfg, tcfgErr := cfg.GetTemporalConfig()
+	require.NoError(t, tcfgErr)
 	scp := sc.NewClientPool(tcfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/site_test.go` around lines 2312 - 2313, The test
currently discards the error return from cfg.GetTemporalConfig() and should
check it; change tcfg, _ := cfg.GetTemporalConfig() to capture the error (e.g.,
tcfg, err := cfg.GetTemporalConfig()) and handle it appropriately in the test
(fail the test or log and return) before calling sc.NewClientPool(tcfg); ensure
the error check references cfg.GetTemporalConfig and still uses
sc.NewClientPool(tcfg) only after the config is valid.
site-workflow/pkg/activity/operatingsystem.go (1)

272-272: Consider using activity context for workflow publish calls.

Lines 272 and 306 use context.Background() for ExecuteWorkflow calls. Similar to the iPXE template activity, this detaches from activity cancellation/timeout. If this is intentional fire-and-forget behavior, add a comment. Otherwise, propagate ctx.

Also applies to: 306-306

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-workflow/pkg/activity/operatingsystem.go` at line 272, The
ExecuteWorkflow calls are using context.Background() which detaches them from
the activity's cancellation/timeout; update both calls that invoke
m.config.TemporalPublishClient.ExecuteWorkflow (the one passing workflowName,
workflowOptions, m.config.SiteID, inv and the similar call later) to use the
activity's context variable (ctx) so cancellations/timeouts propagate, or if the
detached "fire-and-forget" semantics are intentional, add a clear inline comment
above each call explaining that deliberate choice and why it must not use ctx.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 146-153: The code currently lets tenant-authorized requests set
any scope via c.QueryParam("scope") even when providerAllowed is false; update
the logic around tenantAllowed and providerAllowed so that when tenantAllowed is
true and providerAllowed is false you forcibly set filterInput.Scopes =
[]string{"public"} (ignoring any supplied scope), otherwise preserve the
existing behavior (if providerAllowed true allow the query param, if neither
allowed handle accordingly); locate this in the handler where filterInput is
built (references: tenantAllowed, providerAllowed, filterInput,
filterInput.Scopes, c.QueryParam) and enforce the clamp to "public" before using
the filterInput.

In `@db/pkg/migrations/20260320100000_os_scope.go`:
- Around line 50-53: The down migration currently prints " [down migration] No
action taken" and returns nil, leaving the scope column added by the up
migration; replace that no-op in the down closure with an explicit rollback that
drops the column (e.g. execute ALTER TABLE <table_name> DROP COLUMN IF EXISTS
scope using db.ExecContext(ctx, ...) and return any error, and remove the
fmt.Print no-op), or if the no-op is intentional add a clear comment and
documentation stating that the down migration intentionally preserves the scope
column; locate this change inside the down migration closure where fmt.Print and
return nil appear.

In `@openapi/spec.yaml`:
- Line 8012: You added the new OpenAPI path
'/v2/org/{org}/carbide/ipxe-template' in openapi/spec.yaml but did not
regenerate the derived artifacts; run the project’s OpenAPI generation step (the
repo's SDK/docs generator or make target that produces sdk/standard/ and
docs/index.html) to regenerate the SDK and documentation so they match the
updated spec, then commit the updated files under sdk/standard/ and
docs/index.html; ensure the generated outputs include the new iPXE template
endpoints and any new schemas referenced by the spec so CI will pass.
- Around line 8039-8044: There's a stray leading space before the "shell"
command in the iPXE example inside the "template" block (the multiline template
containing "#!ipxe", "kernel {{.kernel}} autoinstall ds=nocloud-net", "initrd
{{.initrd}}", "boot || imgfree", " shell"); remove the leading space so the line
reads "shell" and apply the same fix to the second example instance of the same
"template" iPXE block to keep formatting consistent.

In `@site-agent/pkg/components/managers/ipxetemplate/publisher.go`:
- Around line 36-41: Replace the unsafe uuid.MustParse usage with error-checked
parsing: call uuid.Parse on ManagerAccess.Conf.EB.Temporal.ClusterID, handle the
parse error (return it or log and halt startup) before constructing
swa.NewManageIpxeTemplateInventory with ManageInventoryConfig.SiteID; ensure the
code references uuid.Parse, ManageIpxeTemplateInventory,
ManageInventoryConfig.SiteID and ManagerAccess so the service does not panic on
invalid ClusterID and fails gracefully.

In `@workflow/pkg/activity/ipxetemplate/ipxetemplate_test.go`:
- Around line 41-43: TestSetupSchema currently resets many models but omits the
IpxeTemplate table, causing tests (which call templateDAO.GetAll() and
templateDAO.GetBySiteAndName()) to see stale state; inside
workflow/pkg/util/testing.go add a ResetModel call for cdbm.IpxeTemplate in
TestSetupSchema (use dbSession.DB.ResetModel(context.Background(),
(*cdbm.IpxeTemplate)(nil)) and assert.Nil(t, err)) just like the other model
resets, placing it before the function's closing brace so
TestInitDB/TestSetupSchema will clear the ipxe_template table for tests.

In `@workflow/pkg/activity/ipxetemplate/ipxetemplate.go`:
- Around line 125-128: The code logs errors from templateDAO.Create in
ipxetemplate.go (the block calling templateDAO.Create(ctx, nil, input) and
logger.Error().Err(cerr).Str("Name", reported.Name).Msg(...)) but then
continues, causing partial failures to be ignored; modify the handler to not
silently continue: either return the error immediately (propagate cerr wrapped
with context) so the activity fails fast, or accumulate errors into a slice and
after processing all templates return a combined error summarizing which Create
calls failed (including reported.Name and cerr details); update the function's
return path accordingly so callers receive the failure instead of assuming
success.
- Around line 161-163: The Delete failure is only logged in the iPXE template
deletion path (templateDAO.Delete called with existing.ID) but not returned;
modify the handler in ipxetemplate.go so that after templateDAO.Delete(ctx, nil,
existing.ID) if derr != nil you return that error (or wrap it with context)
instead of just logging via logger.Error().Err(derr).Str("Name",
existing.Name).Msg(...), matching the create/update error handling pattern.
- Around line 145-147: The Update error for iPXE templates is only logged and
not propagated, causing the activity to report success on partial failures;
modify the code around templateDAO.Update (the call that currently does if _,
uerr := templateDAO.Update(ctx, nil, input); uerr != nil { logger.Error()... })
to return the error after logging (or wrap and return it) so the activity fails
on DB update errors instead of continuing silently; ensure the returned error
provides context (e.g., include reported.Name) and uses the existing error
handling path in this activity.

---

Duplicate comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 308-315: The handler currently allows tenant-authorized requests
to return an internal-scoped template when providerAllowed is false; update the
post-authorization logic (the block after providerAllowed/tenantAllowed checks
where tmpl is returned via model.NewAPIIpxeTemplate) to detect tmpl.Scope (or
equivalent scope field) and, if tmpl.Scope == "internal" (or the internal
constant), refuse access when providerAllowed is false even if tenantAllowed is
true — return a 404 or forbidden API error (matching GetAllIpxeTemplateHandler
behavior) instead of returning the template.

In `@db/pkg/db/model/operatingsystem_test.go`:
- Around line 1254-1258: The test is asserting the wrong field for
PhoneHomeEnabled: replace the mistaken reference to &updatedEnableBlockStorage
used for expectPhoneHomeEnabled with the actual PhoneHomeEnabled variable (e.g.,
&updatedPhoneHomeEnabled) in the operatingsystem_test.go assertion block so the
test validates the intended field (ensure the variable updatedPhoneHomeEnabled
is declared/initialized above the table entry if not already).

In `@site-workflow/pkg/activity/operatingsystem.go`:
- Around line 327-346: The CreateOperatingSystemOnSite path currently checks
request and request.Name but not request.Id, so idStr may be empty; add the same
ID validation used in UpdateOperatingSystemOnSite/DeleteOperatingSystemOnSite:
verify request.GetId() != nil and request.GetId().GetValue() != "" and if
missing return a temporal.NewNonRetryableApplicationError with
swe.ErrTypeInvalidRequest (matching the existing error style), before calling
carbideClient.Carbide().CreateOperatingSystem; then use
request.GetId().GetValue() for idStr as now.

---

Nitpick comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 285-311: The site authorization logic duplicated across handlers
should be moved into a shared helper (e.g., validateSiteAccess) that returns
(bool, *cerr.APIErrorResponse) or similar; extract the
providerAllowed/tenantAllowed checks (the providerAllowed expression using
infrastructureProvider and site.InfrastructureProviderID, the
tenant.Config.TargetedInstanceCreation branch, the TenantAccountDAO usage via
cdbm.NewTenantAccountDAO and its GetAll call with cdbm.TenantAccountFilterInput
and taCount check) into that helper, keep logging inside the helper (preserve
the warn/error messages), and replace the duplicated blocks in the handlers with
a single call to validateSiteAccess which enforces the same provider/tenant
semantics and returns the appropriate HTTP error when authorization fails.
- Around line 42-60: The temporal client field tc is unused in
GetAllIpxeTemplateHandler; remove the dead field and constructor parameter to
avoid confusion: delete the tc tclient.Client field from the
GetAllIpxeTemplateHandler struct, remove the tc parameter from
NewGetAllIpxeTemplateHandler and its use in the returned struct literal, and
update any NewGetAllIpxeTemplateHandler call sites to stop passing the temporal
client; if you need to preserve a future placeholder instead, rename the field
to _tc or add a comment explaining it's intentionally unused.

In `@api/pkg/api/handler/site_test.go`:
- Around line 465-467: The test is discarding the error returned by
GetTemporalConfig(); capture and check that error before using tcfg so test
setup failures surface immediately — call tt.fields.cfg.GetTemporalConfig(),
assert or t.Fatalf on non-nil error (so tcfg is valid), then pass tcfg into
sc.NewClientPool and NewCreateSiteHandler (referencing GetTemporalConfig, tcfg,
sc.NewClientPool, and NewCreateSiteHandler to locate the change).
- Around line 2312-2313: The test currently discards the error return from
cfg.GetTemporalConfig() and should check it; change tcfg, _ :=
cfg.GetTemporalConfig() to capture the error (e.g., tcfg, err :=
cfg.GetTemporalConfig()) and handle it appropriately in the test (fail the test
or log and return) before calling sc.NewClientPool(tcfg); ensure the error check
references cfg.GetTemporalConfig and still uses sc.NewClientPool(tcfg) only
after the config is valid.

In `@db/pkg/db/model/operatingsystem_test.go`:
- Around line 1164-1167: Update the test case description string that currently
reads "can update uuid fields: infrastructureproviderid, tenantid,
controlleroperatingsystemid" to accurately reflect the fields being tested
(e.g., remove "controlleroperatingsystemid" and list only
"infrastructureproviderid, tenantid") so the test case for os1tenant1 matches
the actual update input/expected values in operatingsystem_test.go; edit the
desc value in the test struct to the corrected text.

In `@site-workflow/pkg/activity/operatingsystem.go`:
- Line 272: The ExecuteWorkflow calls are using context.Background() which
detaches them from the activity's cancellation/timeout; update both calls that
invoke m.config.TemporalPublishClient.ExecuteWorkflow (the one passing
workflowName, workflowOptions, m.config.SiteID, inv and the similar call later)
to use the activity's context variable (ctx) so cancellations/timeouts
propagate, or if the detached "fire-and-forget" semantics are intentional, add a
clear inline comment above each call explaining that deliberate choice and why
it must not use ctx.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 695e4ae2-fbaa-4b4a-89f0-6523dafc0a7a

📥 Commits

Reviewing files that changed from the base of the PR and between bbe58c3 and ceb45de.

⛔ Files ignored due to path filters (4)
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide_grpc.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/ipxetemplate.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/operatingsystem.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (56)
  • api/pkg/api/handler/ipxetemplate.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • api/pkg/api/handler/operatingsystem.go
  • api/pkg/api/handler/operatingsystem_ownership_test.go
  • api/pkg/api/handler/operatingsystem_test.go
  • api/pkg/api/handler/site.go
  • api/pkg/api/handler/site_test.go
  • api/pkg/api/handler/util/common/common.go
  • api/pkg/api/model/ipxetemplate.go
  • api/pkg/api/model/operatingsystem.go
  • api/pkg/api/routes.go
  • api/pkg/api/routes_test.go
  • db/pkg/db/model/ipxetemplate.go
  • db/pkg/db/model/ipxetemplate_test.go
  • db/pkg/db/model/operatingsystem.go
  • db/pkg/db/model/operatingsystem_test.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
  • openapi/spec.yaml
  • site-agent/pkg/components/managers/ipxetemplate/access.go
  • site-agent/pkg/components/managers/ipxetemplate/cron.go
  • site-agent/pkg/components/managers/ipxetemplate/init.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • site-agent/pkg/components/managers/manager.go
  • site-agent/pkg/components/managers/manageraccess.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-agent/pkg/components/managers/operatingsystem/subscriber.go
  • site-agent/pkg/components/managers/workflow/orchestrator.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • site-workflow/pkg/activity/ipxetemplate.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • site-workflow/pkg/activity/operatingsystem.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • site-workflow/pkg/grpc/client/testing.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • site-workflow/pkg/workflow/ipxetemplate_test.go
  • site-workflow/pkg/workflow/operatingsystem.go
  • workflow-schema/site-agent/workflows/v1/forge_carbide.proto
  • workflow-schema/site-agent/workflows/v1/ipxetemplate.proto
  • workflow-schema/site-agent/workflows/v1/operatingsystem.proto
  • workflow/cmd/workflow/main.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate_test.go
  • workflow/pkg/activity/operatingsystem/operatingsystem.go
  • workflow/pkg/activity/operatingsystem/operatingsystem_test.go
  • workflow/pkg/util/testing.go
  • workflow/pkg/workflow/ipxetemplate/update.go
  • workflow/pkg/workflow/ipxetemplate/update_test.go
  • workflow/pkg/workflow/operatingsystem/update.go
💤 Files with no reviewable changes (1)
  • api/pkg/api/handler/util/common/common.go
✅ Files skipped from review due to trivial changes (13)
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • api/pkg/api/handler/operatingsystem_test.go
  • site-agent/pkg/components/managers/manageraccess.go
  • workflow-schema/site-agent/workflows/v1/operatingsystem.proto
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • workflow/pkg/workflow/ipxetemplate/update_test.go
  • db/pkg/db/model/ipxetemplate_test.go
  • workflow/pkg/activity/operatingsystem/operatingsystem_test.go
  • api/pkg/api/handler/ipxetemplate_test.go
🚧 Files skipped from review as they are similar to previous changes (19)
  • api/pkg/api/routes_test.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • api/pkg/api/routes.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-agent/pkg/components/managers/operatingsystem/subscriber.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • site-agent/pkg/components/managers/ipxetemplate/init.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • workflow/pkg/workflow/ipxetemplate/update.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • workflow/pkg/workflow/operatingsystem/update.go
  • workflow/cmd/workflow/main.go
  • site-workflow/pkg/workflow/ipxetemplate_test.go
  • site-workflow/pkg/grpc/client/testing.go
  • db/pkg/db/model/operatingsystem.go
  • workflow/pkg/activity/operatingsystem/operatingsystem.go
  • api/pkg/api/handler/operatingsystem.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
sdk/standard/model_ipxe_os_artifact.go (1)

34-36: SDK accepts arbitrary strings for cache_strategy while proto constrains to enum values.

The proto definition (workflow-schema/site-agent/workflows/v1/forge_carbide.proto lines 5787-5794) constrains cache_strategy to ArtifactCacheStrategy enum values (ARTIFACT_CACHE_STRATEGY_UNSPECIFIED, NEVER, ALWAYS, ON_DEMAND). This SDK model accepts any string, so invalid values will only be caught at the server/wire level.

This is standard for OpenAPI-generated code and relies on server-side validation, but consider documenting the valid values in the OpenAPI spec description so SDK users know the expected values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/standard/model_ipxe_os_artifact.go` around lines 34 - 36, The SDK model
field CacheStrategy (in model_ipxe_os_artifact.go) accepts any string while the
proto limits values to the ArtifactCacheStrategy enum; update the OpenAPI spec
for the cache_strategy property to list the allowed enum values
(ARTIFACT_CACHE_STRATEGY_UNSPECIFIED, NEVER, ALWAYS, ON_DEMAND) and regenerate
the SDK so the generated struct or docs reflect the constraints; if updating the
spec isn’t possible, change the SDK field to use the generated
ArtifactCacheStrategy type (or add a Validate/IsValid method on the struct to
enforce those enum values) and include the allowed values in the field
description.
openapi/spec.yaml (1)

8097-8104: Consider adding a default value for pageSize.

The pageNumber parameter specifies default: 1, but pageSize lacks a default despite having min/max constraints. Adding a default (e.g., 20 based on the example) would make behavior more predictable when clients omit this parameter.

Proposed fix
         - schema:
             type: integer
             minimum: 1
             maximum: 100
             example: 20
+            default: 20
           in: query
           name: pageSize
           description: Page size for pagination query
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openapi/spec.yaml` around lines 8097 - 8104, Add a default value for the
openapi query parameter "pageSize" to match the example and make client behavior
predictable; update the schema for parameter name pageSize (type: integer,
minimum: 1, maximum: 100) to include "default: 20" so that the specification and
any generated validators/clients assume 20 when pageSize is omitted.
sdk/standard/api_ipxe_template.go (1)

42-63: Expose the accepted scope and orderBy values in the generated API.

Both setters take arbitrary strings, but the server only accepts scope in {internal, public} and orderBy in {created, updated, name}. Please surface those constraints in the OpenAPI source/comments (or generate constants) so callers do not discover them via 400s.

As per coding guidelines, "Document when you have intentionally omitted code that the reader might otherwise expect to be present".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk/standard/api_ipxe_template.go` around lines 42 - 63, The generated
setters ApiGetAllIpxeTemplateRequest.Scope and
ApiGetAllIpxeTemplateRequest.OrderBy accept arbitrary strings but the server
only allows scope ∈ {internal, public} and orderBy ∈ {created, updated, name};
update the OpenAPI source or generated comments to document these allowed values
(or generate exported constants like ScopeInternal/ScopePublic and
OrderByCreated/OrderByUpdated/OrderByName) and update the generated file
comments above the Scope and OrderBy methods to list the accepted values so
callers see the constraints without trial-and-error 400s; keep the setter
signatures as-is but reference the new constants or documented enum values in
the method comments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openapi/spec.yaml`:
- Around line 14532-14548: The IpxeOsArtifact schema uses snake_case properties
(auth_type, auth_token, cache_strategy) which are inconsistent with the API's
camelCase convention; update those property names to authType, authToken, and
cacheStrategy in the openapi/spec.yaml (and anywhere the IpxeOsArtifact schema
or its fields are referenced) while preserving types, descriptions, and the
cache_strategy enum values (just rename the key), and ensure any refs, example
objects, and client generator inputs are updated to the new names so SDKs and
docs stay consistent.

In `@sdk/standard/api_ipxe_template.go`:
- Around line 113-118: The client currently only checks for nil siteId; update
the validation so empty string is treated as missing: in the request builder
(where r.siteId is validated before calling parameterAddToHeaderOrQuery) return
the same error when r.siteId is nil OR when *r.siteId == "" so SiteId("") is
rejected client-side; reference the r.siteId check and the
parameterAddToHeaderOrQuery call to locate the change.

---

Nitpick comments:
In `@openapi/spec.yaml`:
- Around line 8097-8104: Add a default value for the openapi query parameter
"pageSize" to match the example and make client behavior predictable; update the
schema for parameter name pageSize (type: integer, minimum: 1, maximum: 100) to
include "default: 20" so that the specification and any generated
validators/clients assume 20 when pageSize is omitted.

In `@sdk/standard/api_ipxe_template.go`:
- Around line 42-63: The generated setters ApiGetAllIpxeTemplateRequest.Scope
and ApiGetAllIpxeTemplateRequest.OrderBy accept arbitrary strings but the server
only allows scope ∈ {internal, public} and orderBy ∈ {created, updated, name};
update the OpenAPI source or generated comments to document these allowed values
(or generate exported constants like ScopeInternal/ScopePublic and
OrderByCreated/OrderByUpdated/OrderByName) and update the generated file
comments above the Scope and OrderBy methods to list the accepted values so
callers see the constraints without trial-and-error 400s; keep the setter
signatures as-is but reference the new constants or documented enum values in
the method comments.

In `@sdk/standard/model_ipxe_os_artifact.go`:
- Around line 34-36: The SDK model field CacheStrategy (in
model_ipxe_os_artifact.go) accepts any string while the proto limits values to
the ArtifactCacheStrategy enum; update the OpenAPI spec for the cache_strategy
property to list the allowed enum values (ARTIFACT_CACHE_STRATEGY_UNSPECIFIED,
NEVER, ALWAYS, ON_DEMAND) and regenerate the SDK so the generated struct or docs
reflect the constraints; if updating the spec isn’t possible, change the SDK
field to use the generated ArtifactCacheStrategy type (or add a Validate/IsValid
method on the struct to enforce those enum values) and include the allowed
values in the field description.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 142d3598-9e9a-4330-b746-9e2f6733121e

📥 Commits

Reviewing files that changed from the base of the PR and between ceb45de and 43455a0.

📒 Files selected for processing (16)
  • api/pkg/api/handler/ipxetemplate.go
  • api/pkg/api/handler/operatingsystem.go
  • openapi/spec.yaml
  • sdk/standard/api_ipxe_template.go
  • sdk/standard/api_operating_system.go
  • sdk/standard/client.go
  • sdk/standard/model_ipxe_os_artifact.go
  • sdk/standard/model_ipxe_os_parameter.go
  • sdk/standard/model_ipxe_template.go
  • sdk/standard/model_operating_system.go
  • sdk/standard/model_operating_system_create_request.go
  • sdk/standard/model_operating_system_update_request.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate.go
  • workflow/pkg/activity/operatingsystem/operatingsystem.go
  • workflow/pkg/util/testing.go
✅ Files skipped from review due to trivial changes (2)
  • sdk/standard/api_operating_system.go
  • workflow/pkg/activity/ipxetemplate/ipxetemplate.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • api/pkg/api/handler/operatingsystem.go
  • api/pkg/api/handler/ipxetemplate.go
  • workflow/pkg/activity/operatingsystem/operatingsystem.go

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (4)
site-workflow/pkg/activity/ipxetemplate.go (2)

62-62: ⚠️ Potential issue | 🟠 Major

Use activity ctx instead of context.Background() for workflow publish calls.

Detaching from activity context prevents proper lifecycle management during shutdown or activity timeout scenarios.

🐛 Proposed fix
-		if _, execErr := mii.config.TemporalPublishClient.ExecuteWorkflow(context.Background(), workflowOptions, workflowName, mii.config.SiteID, inventory); execErr != nil {
+		if _, execErr := mii.config.TemporalPublishClient.ExecuteWorkflow(ctx, workflowOptions, workflowName, mii.config.SiteID, inventory); execErr != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-workflow/pkg/activity/ipxetemplate.go` at line 62, Replace the use of
context.Background() with the activity context variable (ctx) when calling
mii.config.TemporalPublishClient.ExecuteWorkflow so the workflow publish honors
activity lifecycle; update the call that currently uses context.Background() to
pass ctx and keep the other parameters (workflowOptions, workflowName,
mii.config.SiteID, inventory) unchanged.

84-84: ⚠️ Potential issue | 🟠 Major

Same issue: use ctx instead of context.Background().

🐛 Proposed fix
-	if _, err = mii.config.TemporalPublishClient.ExecuteWorkflow(context.Background(), workflowOptions, workflowName, mii.config.SiteID, inventory); err != nil {
+	if _, err = mii.config.TemporalPublishClient.ExecuteWorkflow(ctx, workflowOptions, workflowName, mii.config.SiteID, inventory); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@site-workflow/pkg/activity/ipxetemplate.go` at line 84, The ExecuteWorkflow
call currently uses context.Background(); replace that with the function's
incoming context variable (ctx) so the workflow inherits cancellation and
deadlines — update the call to
mii.config.TemporalPublishClient.ExecuteWorkflow(..., ctx, workflowOptions,
workflowName, mii.config.SiteID, inventory) ensuring you reference the existing
ctx variable in the surrounding function where mii is used.
api/pkg/api/model/operatingsystem.go (2)

375-379: ⚠️ Potential issue | 🟠 Major

Update path also needs empty ipxeTemplateName validation.

Same issue as the create path — a non-nil pointer with empty string bypasses intended validation. Add a check for empty string values.

🐛 Proposed fix
 	if osur.IpxeScript != nil && osur.IpxeTemplateName != nil {
 		return validation.Errors{
 			"ipxeTemplateName": errors.New("ipxeScript and ipxeTemplateName are mutually exclusive"),
 		}
 	}
+
+	if osur.IpxeTemplateName != nil && *osur.IpxeTemplateName == "" {
+		return validation.Errors{
+			"ipxeTemplateName": errors.New("cannot be empty"),
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/model/operatingsystem.go` around lines 375 - 379, The update-path
validation in operatingsystem.go currently only checks pointer nil-ness for
osur.IpxeTemplateName and so allows a non-nil pointer to an empty string; update
the validation in the same block that uses osur.IpxeScript and
osur.IpxeTemplateName so it also rejects empty/blank ipxeTemplateName values
(e.g., check if osur.IpxeTemplateName != nil and
strings.TrimSpace(*osur.IpxeTemplateName) == "" and return a validation.Errors
entry for "ipxeTemplateName"); keep the existing mutual-exclusivity check for
osur.IpxeScript and osur.IpxeTemplateName and add the empty-string validation
alongside it.

120-126: ⚠️ Potential issue | 🟠 Major

Reject empty ipxeTemplateName values.

The current logic (isIpxe := oscr.IpxeScript != nil || oscr.IpxeTemplateName != nil) treats a non-nil pointer as sufficient to flip isIpxe to true. A request with {"ipxeTemplateName": ""} will bypass image validation checks. Unlike ipxeScript which has a validation.Required check at lines 228-233, ipxeTemplateName has no equivalent validation.

🐛 Proposed fix to validate non-empty template name
 	isIpxe := oscr.IpxeScript != nil || oscr.IpxeTemplateName != nil
+
+	if oscr.IpxeTemplateName != nil && *oscr.IpxeTemplateName == "" {
+		return validation.Errors{
+			"ipxeTemplateName": errors.New("cannot be empty"),
+		}
+	}

 	if oscr.IpxeScript != nil && oscr.IpxeTemplateName != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/model/operatingsystem.go` around lines 120 - 126, The validation
treats a non-nil ipxeTemplateName pointer as valid even when empty, letting
requests like {"ipxeTemplateName": ""} bypass image checks; update the logic
around isIpxe and the mutually-exclusive check to require a non-empty template
name (i.e., treat ipxeTemplateName as present only if oscr.IpxeTemplateName !=
nil && strings.TrimSpace(*oscr.IpxeTemplateName) != ""), and add a
validation.Required-style check for ipxeTemplateName (parallel to ipxeScript's
validation.Required) so empty strings produce a validation error for
"ipxeTemplateName" (also ensure the mutual exclusion branch referencing
ipxeTemplateName uses the same non-empty check).
🧹 Nitpick comments (3)
api/pkg/api/handler/ipxetemplate.go (1)

118-144: Pull the site-access authorization into one helper.

The provider/tenant access check, TenantAccountDAO lookup, and mixed-role decision logic are duplicated in both handlers. A helper that returns {providerAllowed, tenantAllowed} plus any API error would keep those paths aligned, while each handler can still apply its own list/get scope rule afterward.

Based on learnings: In Go handler code under api/pkg/api/handler, evaluate both authorization paths independently and implement a helper that checks each role path and collects errors.

Also applies to: 287-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/ipxetemplate.go` around lines 118 - 144, Extract the
duplicated provider/tenant site-access logic in the handler into a single helper
(e.g., ValidateSiteAccess or CheckSiteAccess) that takes ctx, site, tenant,
infrastructureProvider, logger and h.dbSession and returns (providerAllowed
bool, tenantAllowed bool, apiErr *cerr.APIError) so callers can short-circuit on
apiErr; move the TenantAccountDAO creation and its GetAll call
(cdbm.NewTenantAccountDAO and taDAO.GetAll) into that helper, keep the
tenant.Config.TargetedInstanceCreation warning in the helper, and ensure the
helper implements the same mixed-role decision (providerAllowed ||
tenantAllowed) so handlers can call it and then apply their own list/get scope
rules.
openapi/spec.yaml (1)

14554-14604: Consider adding required array to IpxeTemplate schema.

Response schemas benefit from documenting which fields are always present. This helps SDK generators produce accurate types and informs API consumers about guaranteed fields.

Proposed enhancement
         updated:
           type: string
           format: date-time
           description: Date/time when the template was last updated
           readOnly: true
+      required:
+        - id
+        - siteId
+        - name
+        - template
+        - scope
+        - created
+        - updated
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openapi/spec.yaml` around lines 14554 - 14604, Add a required array to the
IpxeTemplate schema listing the fields that are guaranteed in responses (e.g.,
id, siteId, name, template, created, updated); update the IpxeTemplate
definition to include a "required" property that references those property names
so SDK generators and consumers know which fields are always present.
api/pkg/api/handler/operatingsystem_ownership_test.go (1)

452-455: Finish the dual-role read coverage.

The mixed-role suite still leaves the tenant-owned GetByID path untested, and the dual-role GetAll case only checks len >= 2. Please assert the seeded IDs in the list case and add a dual-role fetch for tnOS so a regression in the provider/tenant union path cannot slip through.
Based on learnings: "Add tests covering mixed-role scenarios and ensure CI gates enforce this behavior across all relevant handlers in this directory."

Also applies to: 538-541

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/pkg/api/handler/operatingsystem_ownership_test.go` around lines 452 -
455, Update the dual-role test to assert specific seeded OS IDs and add a
dual-role GetByID for the tenant-owned OS: in the t.Run "dual-role user sees
both provider-owned and tenant-owned OSes" replace the loose len check on oss
from execGetAll(t, env.dualUser) with assertions that the returned slice
contains the expected seeded IDs (e.g., prOS.ID and tnOS.ID) and then call the
GetByID helper (eg. execGetByID or the corresponding method used elsewhere) with
env.dualUser and tnOS.ID to assert the tenant-owned OS can be retrieved; apply
the same change for the corresponding test near lines 538-541 so both list and
single-item retrievals verify the mixed-role union behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/pkg/api/handler/operatingsystem.go`:
- Around line 653-659: The siteId authorization currently always uses
tenant-only validation (TenantSite) even when isProviderAdmin is true; update
the siteId branch and the similar check at lines ~687-697 to split validation by
role: keep using TenantSite when isTenantAdmin is true, but when isProviderAdmin
is true validate the requested site against the provider ownership path (e.g.,
ProviderSite or the provider-site lookup used elsewhere) so provider-owned sites
are allowed; if a caller has both roles you can prefer tenant validation or
accept either ownership check, and ensure you return the same 403 API error via
cutil.NewAPIErrorResponse when validation fails.
- Around line 2020-2031: In dbArtifactsToProto, validate the enum mapping before
converting: lookup
cwssaws.IpxeScriptArtifactCacheStrategy_value[a.CacheStrategy] and check whether
the key exists (ok bool) instead of using the zero value; if the mapping is
missing, log a warning with the artifact identifier (e.g., a.Name or a.URL) and
either skip the artifact or set CacheStrategy to a defined UNKNOWN/invalid
sentinel (or return an error) rather than silently defaulting to
CACHE_AS_NEEDED; then construct the cwssaws.IpxeScriptArtifact using the
validated cwssaws.IpxeScriptArtifactCacheStrategy value.

In `@api/pkg/api/handler/site.go`:
- Around line 340-343: The workflow ID built in tClient.StartWorkflowOptions
(workflowOptions.ID) omits the OS version
("ipxe-os-create-"+st.ID.String()+"-"+gos.ID.String()) while operatingsystem.go
includes a version suffix, causing different deduplication/retry behavior;
either make the ID consistent by appending the same version component (e.g., add
"-"+gos.Version.String() or the equivalent version field from gos) or add a
concise comment above the tClient.StartWorkflowOptions in site.go explaining why
the version is intentionally omitted and how that affects deduplication/retries.

In `@db/pkg/db/model/ipxetemplate.go`:
- Around line 149-165: Reject unknown scopes before persisting: in
IpxeTemplateSQLDAO.Create (and the corresponding Update method around the
283-300 region) validate input.Scope against the allowed scope constants used by
the system (e.g., the two known template scope constants) and return a clear
error if it is not one of them instead of persisting arbitrary strings;
implement the same check for IpxeTemplateCreateInput and the Update input,
and/or add a DB constraint to enforce the same allowed values so only the
accepted scope constants can be stored.

In `@db/pkg/db/model/operatingsystem.go`:
- Around line 157-161: The scope filter in GetAll treats nil and "local" as
equivalent but the SQL emitted only checks "os.scope IN (?)", causing rows with
NULL scope to be excluded; update the query in GetAll that references os.scope
(the block around the os.scope IN (?) predicate) to treat NULL as "local" by
using COALESCE(os.scope, 'local') IN (?) or by expanding the IN list to include
NULL when "local" is requested so migrated rows with NULL scope are returned
correctly.
- Around line 274-276: The reconciliation path that fetches all iPXE Operating
Systems must include soft-deleted rows; update the options passed to the iPXE OS
fetch in the reconciliation routine to set IncludeDeleted: true (the same flag
used in the deletion-detection fetch at the other path). Locate the code that
constructs the Operating System query/options for the iPXE reconciliation (the
call that retrieves "all iPXE Operating Systems" / the OperatingSystem fetch
used to detect deletions) and add IncludeDeleted: true to that options struct
(the same field name IncludeDeleted in the OperatingSystem options/type).

In `@openapi/spec.yaml`:
- Around line 14116-14124: The OpenAPI description for the "scope" property is
incorrect: it says "Only meaningful for iPXE template-based OSes" but the
validation in operatingsystem.go (see oscr, IpxeScript, IpxeTemplateName,
isIpxe, SiteIDs, Scope, OsScopeLimited) treats both ipxeScript and
ipxeTemplateName the same. Update the description string for the scope property
in openapi/spec.yaml to reflect that it applies to all iPXE-based OSes (e.g.,
"Synchronization scope. Only meaningful for iPXE-based OSes") and do not change
the existing type/enum values so they remain consistent with the validation
logic.
- Around line 14309-14317: Update the description for the scope property to
match validation logic and other schemas: in the OpenAPI spec's scope property
(and specifically the CreateRequest schema where the same wording appears)
replace "for iPXE template-based OS" with wording that covers both "script and
template-based iPXE OSes" (e.g., "for script and template-based iPXE OSes") so
the documentation matches validation; ensure the enum and nullable/type
structure for scope remains unchanged.
- Around line 14426-14434: The UpdateRequest schema's scope field is
inconsistent with the other definition: locate the UpdateRequest schema and
update its scope property to match this canonical definition by setting its type
to allow "string" and "null", its description to "Synchronization scope for iPXE
template-based OS", and the enum values to ["local","global","limited"] so both
schema definitions are identical (look for the UpdateRequest -> scope property).

---

Duplicate comments:
In `@api/pkg/api/model/operatingsystem.go`:
- Around line 375-379: The update-path validation in operatingsystem.go
currently only checks pointer nil-ness for osur.IpxeTemplateName and so allows a
non-nil pointer to an empty string; update the validation in the same block that
uses osur.IpxeScript and osur.IpxeTemplateName so it also rejects empty/blank
ipxeTemplateName values (e.g., check if osur.IpxeTemplateName != nil and
strings.TrimSpace(*osur.IpxeTemplateName) == "" and return a validation.Errors
entry for "ipxeTemplateName"); keep the existing mutual-exclusivity check for
osur.IpxeScript and osur.IpxeTemplateName and add the empty-string validation
alongside it.
- Around line 120-126: The validation treats a non-nil ipxeTemplateName pointer
as valid even when empty, letting requests like {"ipxeTemplateName": ""} bypass
image checks; update the logic around isIpxe and the mutually-exclusive check to
require a non-empty template name (i.e., treat ipxeTemplateName as present only
if oscr.IpxeTemplateName != nil && strings.TrimSpace(*oscr.IpxeTemplateName) !=
""), and add a validation.Required-style check for ipxeTemplateName (parallel to
ipxeScript's validation.Required) so empty strings produce a validation error
for "ipxeTemplateName" (also ensure the mutual exclusion branch referencing
ipxeTemplateName uses the same non-empty check).

In `@site-workflow/pkg/activity/ipxetemplate.go`:
- Line 62: Replace the use of context.Background() with the activity context
variable (ctx) when calling mii.config.TemporalPublishClient.ExecuteWorkflow so
the workflow publish honors activity lifecycle; update the call that currently
uses context.Background() to pass ctx and keep the other parameters
(workflowOptions, workflowName, mii.config.SiteID, inventory) unchanged.
- Line 84: The ExecuteWorkflow call currently uses context.Background(); replace
that with the function's incoming context variable (ctx) so the workflow
inherits cancellation and deadlines — update the call to
mii.config.TemporalPublishClient.ExecuteWorkflow(..., ctx, workflowOptions,
workflowName, mii.config.SiteID, inventory) ensuring you reference the existing
ctx variable in the surrounding function where mii is used.

---

Nitpick comments:
In `@api/pkg/api/handler/ipxetemplate.go`:
- Around line 118-144: Extract the duplicated provider/tenant site-access logic
in the handler into a single helper (e.g., ValidateSiteAccess or
CheckSiteAccess) that takes ctx, site, tenant, infrastructureProvider, logger
and h.dbSession and returns (providerAllowed bool, tenantAllowed bool, apiErr
*cerr.APIError) so callers can short-circuit on apiErr; move the
TenantAccountDAO creation and its GetAll call (cdbm.NewTenantAccountDAO and
taDAO.GetAll) into that helper, keep the tenant.Config.TargetedInstanceCreation
warning in the helper, and ensure the helper implements the same mixed-role
decision (providerAllowed || tenantAllowed) so handlers can call it and then
apply their own list/get scope rules.

In `@api/pkg/api/handler/operatingsystem_ownership_test.go`:
- Around line 452-455: Update the dual-role test to assert specific seeded OS
IDs and add a dual-role GetByID for the tenant-owned OS: in the t.Run "dual-role
user sees both provider-owned and tenant-owned OSes" replace the loose len check
on oss from execGetAll(t, env.dualUser) with assertions that the returned slice
contains the expected seeded IDs (e.g., prOS.ID and tnOS.ID) and then call the
GetByID helper (eg. execGetByID or the corresponding method used elsewhere) with
env.dualUser and tnOS.ID to assert the tenant-owned OS can be retrieved; apply
the same change for the corresponding test near lines 538-541 so both list and
single-item retrievals verify the mixed-role union behavior.

In `@openapi/spec.yaml`:
- Around line 14554-14604: Add a required array to the IpxeTemplate schema
listing the fields that are guaranteed in responses (e.g., id, siteId, name,
template, created, updated); update the IpxeTemplate definition to include a
"required" property that references those property names so SDK generators and
consumers know which fields are always present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee542494-0f80-46c7-a5c7-0fab77b2e50a

📥 Commits

Reviewing files that changed from the base of the PR and between ceb45de and d36196b.

⛔ Files ignored due to path filters (2)
  • workflow-schema/schema/site-agent/workflows/v1/common_carbide.pb.go is excluded by !**/*.pb.go
  • workflow-schema/schema/site-agent/workflows/v1/forge_carbide.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (53)
  • api/pkg/api/handler/ipxetemplate.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • api/pkg/api/handler/operatingsystem.go
  • api/pkg/api/handler/operatingsystem_ownership_test.go
  • api/pkg/api/handler/operatingsystem_test.go
  • api/pkg/api/handler/site.go
  • api/pkg/api/handler/site_test.go
  • api/pkg/api/handler/util/common/common.go
  • api/pkg/api/model/ipxetemplate.go
  • api/pkg/api/model/operatingsystem.go
  • api/pkg/api/routes.go
  • api/pkg/api/routes_test.go
  • db/pkg/db/model/ipxetemplate.go
  • db/pkg/db/model/ipxetemplate_test.go
  • db/pkg/db/model/operatingsystem.go
  • db/pkg/db/model/operatingsystem_test.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
  • openapi/spec.yaml
  • sdk/standard/api_ipxe_template.go
  • sdk/standard/api_operating_system.go
  • sdk/standard/client.go
  • sdk/standard/model_ipxe_os_artifact.go
  • sdk/standard/model_ipxe_os_parameter.go
  • sdk/standard/model_ipxe_template.go
  • sdk/standard/model_operating_system.go
  • sdk/standard/model_operating_system_create_request.go
  • sdk/standard/model_operating_system_update_request.go
  • site-agent/pkg/components/managers/ipxetemplate/access.go
  • site-agent/pkg/components/managers/ipxetemplate/cron.go
  • site-agent/pkg/components/managers/ipxetemplate/init.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • site-agent/pkg/components/managers/manager.go
  • site-agent/pkg/components/managers/manageraccess.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-agent/pkg/components/managers/operatingsystem/subscriber.go
  • site-agent/pkg/components/managers/workflow/orchestrator.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • site-workflow/pkg/activity/ipxetemplate.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • site-workflow/pkg/activity/operatingsystem.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • site-workflow/pkg/grpc/client/testing.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • site-workflow/pkg/workflow/ipxetemplate_test.go
  • site-workflow/pkg/workflow/operatingsystem.go
💤 Files with no reviewable changes (1)
  • api/pkg/api/handler/util/common/common.go
✅ Files skipped from review due to trivial changes (11)
  • site-agent/pkg/components/managers/managerapi/managerapi.go
  • db/pkg/db/model/operatingsystemsiteassociation.go
  • db/pkg/migrations/20260321100000_ossa_core_status.go
  • db/pkg/migrations/20260306120000_ipxe_template.go
  • site-workflow/pkg/activity/ipxetemplate_test.go
  • db/pkg/migrations/20260310100000_os_definition.go
  • db/pkg/db/model/ipxetemplate_test.go
  • api/pkg/api/model/ipxetemplate.go
  • site-workflow/pkg/activity/operatingsystem_test.go
  • api/pkg/api/handler/ipxetemplate_test.go
  • site-workflow/pkg/grpc/client/testing.go
🚧 Files skipped from review as they are similar to previous changes (24)
  • api/pkg/api/routes_test.go
  • site-agent/pkg/components/managers/manageraccess.go
  • site-agent/pkg/datatypes/managertypes/workflow/workflowtypes.go
  • api/pkg/api/routes.go
  • site-agent/pkg/components/managers/ipxetemplate/subscriber.go
  • db/pkg/migrations/20260320100000_os_scope.go
  • sdk/standard/client.go
  • api/pkg/api/handler/site_test.go
  • site-agent/pkg/components/managers/managerapi/ipxetemplate_api.go
  • site-agent/pkg/components/managers/operatingsystem/cron.go
  • site-agent/pkg/components/managers/operatingsystem/publisher.go
  • site-workflow/pkg/workflow/ipxetemplate_test.go
  • site-agent/pkg/components/managers/ipxetemplate/access.go
  • site-agent/pkg/components/managers/ipxetemplate/cron.go
  • api/pkg/api/handler/operatingsystem_test.go
  • site-agent/pkg/components/managers/ipxetemplate/publisher.go
  • sdk/standard/api_operating_system.go
  • site-workflow/pkg/workflow/ipxetemplate.go
  • sdk/standard/model_operating_system.go
  • sdk/standard/model_operating_system_create_request.go
  • site-workflow/pkg/activity/operatingsystem.go
  • sdk/standard/model_operating_system_update_request.go
  • sdk/standard/api_ipxe_template.go
  • db/pkg/db/model/operatingsystem_test.go

@pbreton pbreton requested a review from a team as a code owner April 3, 2026 01:15

// IpxeTemplateUpdateInput are input parameters for the Update method
type IpxeTemplateUpdateInput struct {
ID uuid.UUID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most update inputs use explicit ID e.g. IpxeTemplateID

if _, ok := sttsmap[site.ID]; !ok {
return cutil.NewAPIErrorResponse(c, http.StatusBadRequest, fmt.Sprintf("Unable to associate Operating System with Site: %s, Tenant does not have access to Site", stID), nil)
}
if site.Config == nil || !site.Config.ImageBasedOperatingSystem {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only applies to Image based OS, but we're also in this conditional for limited templated iPXE OS.

}
} else {
// iPXE OS: push the definition to carbide-core using the same UUID as primary key.
workflowName = "CreateOperatingSystem"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to trigger SynchronizeOperatingSystem workflow that will be triggered asynchronously in cloud namespace, and the workflow would synchronously create/update/delete entries on Site (and update Association status as needed).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Scope bool
}

type OperatingSystemFilterInput struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the ability to filter by iPXE template

@pbreton pbreton force-pushed the feat/ipxe-os branch 6 times, most recently from 5537e9a to 5698895 Compare April 13, 2026 20:35
pbreton added 2 commits April 14, 2026 08:08
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants