Skip to content

[chassis]: Add chassis provisioning HLD#2252

Open
liamkearney-msft wants to merge 1 commit intosonic-net:masterfrom
liamkearney-msft:liam/chassis-autoprovision
Open

[chassis]: Add chassis provisioning HLD#2252
liamkearney-msft wants to merge 1 commit intosonic-net:masterfrom
liamkearney-msft:liam/chassis-autoprovision

Conversation

@liamkearney-msft
Copy link
Copy Markdown

@liamkearney-msft liamkearney-msft commented Mar 4, 2026

Add HLD for automatic module provisioning.
Introduces a new API and module operational states + pmon daemon to facilitate this within the sonic layer.

.md link with formatting: https://github.com/liamkearney-msft/SONiC/blob/liam/chassis-autoprovision/doc/chassis/module-provisioning/chassis-linecard-provisioning-hld.md

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

Signed-off-by: Liam Kearney <liamkearney@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

No pipelines are associated with this pull request.

Copy link
Copy Markdown

@Javier-Tan Javier-Tan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

```


## New pmon daemon - sonic-provisiond
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have a separate daemon for this? Can this functionality not be folded into chassisd?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go this way after a discussion with @arlakshm
Theres a few upsides of having it separate. If it's a separate daemon, disabling this feature becomes trivial. It also avoids creeping the scope for chassisd, and allows us to keep its role as "sync module state with statedb". It also simplifies keeping the state updated while conversion could potentially be running / allows us to block in provision_module() without having to block the chassisd thread (and avoid spinning out new threads in chassisd.)

So, yeah it could be folded in, but separating it out lets us avoid creeping the scope of chassisd & is more modular. If we are going through statedb anyway, there is no real need to strictly couple it with chassisd.

@liamkearney-msft
Copy link
Copy Markdown
Author

hi reviewers - PR for sonic-platform-common with API stubs / new states can be found here : sonic-net/sonic-platform-common#635
cc @patrickmacarthur @kenneth-arista @arlakshm

# Module state when module is detected, is able to run SONiC, but is not yet running SONiC.
# Modules in this state will be attempted to be converted to SONiC via calls to module.provision_module()
# This state & following "Provision" states should not be used if provision_module() is not implemented.
MODULE_STATUS_PROVISION_READY = "ProvisionReady"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the vendor supposed to differentiate between MODULE_STATUS_PROVISION_PRESENT and MODULE_STATUS_PROVISION_READY ?
You cannot easily differentiate between a linecard that has been powered off and a linecard that you just inserted.
The mechanisms that I can think about would probably better live in the common infrastructure than the platform vendor API.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are very similar states, but PRESENT would indicate the state where the platform API does not support module provisioning / the module isn't ready for provisioning. Having a separate state allows the provisioning flow to be opt-in.
Module detection is fundamentally platform specific - what would be the common mechanism for detecting this? All vendors would have to agree if we want to move this logic to the SONiC layer - my personal opinion is SONiC shouldnt mandate the implementation details for this.
I dont see why there would be difficulty in differentiating between a linecard which is powered off vs just inserted. I would expect the platform to be able to manage/monitor the power states of the modules & "presence" to be decoupled from power state.

@liamkearney-msft
Copy link
Copy Markdown
Author

Thanks for your comments @Staphylo. Ive added some responses to your questions. Let me know if these answer your concerns, and I can update the HLD to be more clear.

## Requirements
- When new linecards are inserted into a chassis, the supervisor card running SONiC is responsible for detecting the presence of these new modules. It is up to the vendors to implement a mechanism to detect this.
- A new platform API will be introduced as an entrypoint for vendor code to perform conversion on a module
- New module states will be introduced to represent the various provision states of a module
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than implement this as new module states, could we introduce a new table in CHASSIS_STATE_DB called something like MODULE_PROVISION_TABLE that would look something like:

MODULE_PROVISION_TABLE|LINE-CARD4
{
    "state": "not_started", # or "started" or "complete"
    "timestamp": <unix timestamp>
}

It feels like either chassisd or the new provisiond service should be able to track whether or not provisioning has been started, instead of putting this burden onto the vendor platform driver.

In this model, chassisd would be responsible for keeping a persistent database of provisioned linecards, and when a linecard slot goes from Empty to Present or Online, it would determine if the linecard was previously provisioned in that slot, and if not it would create the MODULE_PROVISION_TABLE entry which would be detected by provisiond.

On first boot, the linecard would look for and update its MODULE_PROVISION_TABLE entry to "complete".

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.

@liamkearney-msft
Copy link
Copy Markdown
Author

@kenneth-arista @patrickmacarthur @Staphylo
If we are happy with the design as is, can I get some approvals so we can merge?
thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants