Skip to content

Comments

feat(dpf): provide imperitive sdk to abstract k8s implementation#343

Open
FrankSpitulski wants to merge 1 commit intoNVIDIA:mainfrom
FrankSpitulski:feat/dpf/sdk-beta
Open

feat(dpf): provide imperitive sdk to abstract k8s implementation#343
FrankSpitulski wants to merge 1 commit intoNVIDIA:mainfrom
FrankSpitulski:feat/dpf/sdk-beta

Conversation

@FrankSpitulski
Copy link
Contributor

Description

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

integration with carbide (which is already in a branch) will come as a follow up PR to keep the change list smaller.

@FrankSpitulski FrankSpitulski requested a review from a team as a code owner February 20, 2026 00:58
Signed-off-by: fspitulski <fspitulski@nvidia.com>
}

/// Builder for creating a DPU watcher.
pub struct DpuWatcherBuilder<
Copy link
Contributor

@kensimon kensimon Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of type parameters...

You say above "Purpose is to allow for generic async callbacks without having to box and pin the closure", but is that actually so bad? If we were ok wrapping in a Box::new(move |event| callback(event).boxed()), then Callbacks could just be:

type DpuCallbackFn<T> = Box<dyn Fn(T) -> BoxFuture<'static, Result<(), DpfError>> + Send + Sync>;

struct Callbacks {
    dpu_event: DpuCallbackFn<DpuEvent>,
    reboot: DpuCallbackFn<RebootRequiredEvent>,
    ready: DpuCallbackFn<DpuReadyEvent>,
    maintenance: DpuCallbackFn<MaintenanceEvent>,
    error: DpuCallbackFn<DpuErrorEvent>,
}

And all of these callback type parameters (DE, RB, etc) would disappear.

the methods on DpuWatcherBuilder would just need to box it up with Box::new(move |event| callback(event).boxed()), for example:

    pub fn on_dpu_event<F, Fut>(self, callback: F) -> DpuWatcherBuilder<R>
    where
        F: Fn(DpuEvent) -> Fut + Send + Sync + 'static,
        Fut: Future<Output = Result<(), DpfError>> + Send + 'static,
    {
        DpuWatcherBuilder {
            repo: self.repo,
            namespace: self.namespace,
            cbs: Callbacks {
                dpu_event: Box::new(move |event| callback(event).boxed()),
                reboot: self.cbs.reboot,
                ready: self.cbs.ready,
                maintenance: self.cbs.maintenance,
                error: self.cbs.error,
            },
        }
    }

and the rest of the code would be the same, but we could drop the DPUStateCallback trait, NoopFn, etc... Callbacks would have no type parameters, and DpuWatcherBuilder would only have one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did start off with dyn, boxing, and pinning, but it ended up with another long type. I think the extra characters for the type state builder are worth it and we end up with less pointer indirection at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each of the callbacks still get a distinct type

Copy link
Contributor

@kensimon kensimon Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my suggestion you wouldn't need to box/pin at the actual call site (ie. the code calling .on_dpu_event() doesn't need to box.) Only the body of the on_dpu_event (etc) functions need to add a few extra characters to box it up. I tried it on my machine and everything continues to compile with just this one file changing (29 insertions, 57 deletions.)

Here's a branch if you're curious: https://github.com/FrankSpitulski/bare-metal-manager-core/compare/feat/dpf/sdk-beta...kensimon:bare-metal-manager-core:dpf-beta-boxed-callbacks?expand=1

The pointer indirection is true, but IMO it's less of a big deal than the code bloat (and slightly increased compile time) resulting from all the monomorphizing of each callback type. But to me it's more about having to add a new type parameter every single time you add a new flavor of callbacks, which could lead to dozens of type parameters at some point. (Not that we couldn't just switch to boxing then...)

Anyways, it's not a big deal either way, it's not blocking and it's ok if we disagree :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These callbacks themselves get called in a dyn handler passed to the kube-rs controller, so that extra level felt a little wrong to me. I'm okay to change it, but keeping in mind that we expect a fair few events to be processed. There are multiple DPUs per host. Maybe a tie breaker opinion?


```bash
cd crates/dpf-beta
cargo make generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we thought of adding kopium to dev-dependencies and doing the codegen as part of build.rs? Then we could .gitignore the whole crds directory... (We do similar things with tonic/prost and our protobuf files.)

I'm just thinking aloud though. Because to do this we'd either need to copy the CRD YAML itself into the repo, or use a submodule. (I'd lean towards putting the YAML in the repo with a make step to sync it from the helm chart, then putting kopium in build.rs, and gitignoring all the rust. But that's just an opinion I don't hold very strongly.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I'm waiting for official support to be published kube-rs/kopium#335

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also got a PR up to let us reduce the manual string enum definitions. kube-rs/kopium#422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants