Skip to content

Update LinuxIntelRdt to align with runtime-spec#3523

Open
tommady wants to merge 7 commits into
youki-dev:mainfrom
tommady:close-issue-3405
Open

Update LinuxIntelRdt to align with runtime-spec#3523
tommady wants to merge 7 commits into
youki-dev:mainfrom
tommady:close-issue-3405

Conversation

@tommady
Copy link
Copy Markdown
Collaborator

@tommady tommady commented May 3, 2026

Description

update the RDT (Resource DesAllocation Technology) implementation to align with runtime-spec v1.3.0

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test updates
  • CI/CD related changes
  • Other (please describe):

Testing

  • Added new unit tests
  • Added new integration tests
  • Ran existing test suite
  • Tested manually (please provide steps)

Related Issues

Fixes #3405

Additional Context

Signed-off-by: tommady <tommady@users.noreply.github.com>
tommady added 5 commits May 3, 2026 05:27
Signed-off-by: tommady <tommady@users.noreply.github.com>
Signed-off-by: tommady <tommady@users.noreply.github.com>
Signed-off-by: tommady <tommady@users.noreply.github.com>
…sctrl_subdirectory

Signed-off-by: tommady <tommady@users.noreply.github.com>
@tommady tommady changed the title add runtime-spec v1.3.0 new arch as UnsupportedArch Update LinuxIntelRdt to align with runtime-spec May 10, 2026
Signed-off-by: tommady <tommady@users.noreply.github.com>
@tommady tommady marked this pull request as ready for review May 12, 2026 02:06
@tommady
Copy link
Copy Markdown
Collaborator Author

tommady commented May 12, 2026

Hi @nayuta723
Please help review while you have time.
Thank you.

@nayuta723
Copy link
Copy Markdown
Contributor

@tommady
Thanks! I have some question for that.

  • Would it be possible to release oci-spec-rs before raising this PR? There are some changes that are not related to intelRdt.
  • Could you possibly share the execution results from a real environment?

@tommady
Copy link
Copy Markdown
Collaborator Author

tommady commented May 16, 2026

Hey @nayuta723
Thanks for the review.

  • Would it be possible to release oci-spec-rs before raising this PR? There are some changes that are not related to intelRdt.

Yes, absolutely. We should definitely get the oci-spec-rs changes released first since this PR relies on the updated struct definitions.
I can hold off on merging this until that crate is published.

  • Could you possibly share the execution results from a real environment?

I am sorry, I don't have L3 CAT resctrl supported hardware right now.
I can get some temporally (negotiating), ideally I'll test this under Intel, Amd and arch64 (MPAM) platforms.

Copy link
Copy Markdown
Contributor

@nayuta723 nayuta723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the scope of this PR seems quite broad, I’ve left comments on the high-level points for now. For intelRdt, even excluding enable_monitoring, we still need changes to align with v1.3.0 and additional tests. This will likely make the PR quite large, so I think it might be a good idea to split it.

Also, as an unreleased side effect, some changes to seccomp have slipped in. It might be better to cut a release before we proceed with reviewing this PR.

Lastly, since you mentioned that real-world testing hasn't been done yet, I'd like to see more rigorous tests added to the PR context. Also, it looks like the tests are being skipped in CI as well, so we might need to start by setting up a working environment. Could you check how runc handles this to see what they do?

Ok(compare_lines(&combined, &existing))
}

/// Retrieves the schemata data to be written to the resctrl filesystem.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both l3CacheSchema and memBwSchema are set, runtimes MUST write the values to the schemata file in that sub-directory discussed in closID. The runtimes MUST write l3CacheSchema first and memBwSchema last.
If either l3CacheSchema or memBwSchema is set, runtimes MUST write the value to the schemata file in the that sub-directory discussed in closID.
If schemata field is set, runtimes MUST write the value to the schemata file in the that sub-directory discussed in closID. If also l3CacheSchema or memBwSchema is set the value of schemata field must be written last, after the values from l3CacheSchema and memBwSchema has been written.

Please fix this, as it behaves differently from what is described above.

Copy link
Copy Markdown
Collaborator Author

@tommady tommady May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
I misunderstood the specification regarding how the legacy fields should interact with the new schemata list.

Currently, get_schemata_data returns early if the schemata list is present, completely ignoring l3CacheSchema and memBwSchema.

I will update the logic to combine all of them in the strict order mandated by the spec (L3 first, then MB, and finally the generic schemata list elements).

https://specs.opencontainers.org/runtime-spec/config-linux/?v=v1.3.0

If schemata field is set, runtimes MUST write the value to the schemata file in the that sub-directory discussed in closID. 
If also l3CacheSchema or memBwSchema is set the value of schemata field must be written last, after the values from l3CacheSchema and memBwSchema has been written.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is out of scope for this PR, so I'd like to handle it in a separate PR.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will revert the changes.

pid_file: self.pid_file.to_owned(),
};

let (init_pid, need_to_clean_up_intel_rdt_dir) =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind changing the return value here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My motivation for changing the return value to pass intel_rdt_dir and intel_rdt_monitoring_dir and storing these paths directly in state.json was to closely mirror how runc implements this.

runc explicitly stores both IntelRdtPath and IntelRdtMonPath directly in its container state object, and relies on those saved paths when it needs to clean up the directories during teardown.

You can see runc storing the paths in the state here:
https://github.com/opencontainers/runc/blob/main/libcontainer/container_linux.go#L79

}

if let Some(path) = container.intel_rdt_dir() {
if let Err(e) = delete_resctrl_subdirectory(path) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it throws an error when the file doesn't exist, it might be better not to return an error for ENOENT.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the logic to handle the ErrorKind::NotFound case gracefully so that if the directory is already gone, it silently succeeds instead of throwing an error during teardown.

/// This function includes a safety check (`parent == dir`) to ensure that only directories
/// immediately under the resctrl mount point are deleted, preventing accidental deletion
/// of nested kernel files or other sensitive directories.
pub fn delete_resctrl_subdirectory(path: &Path) -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a strong motivation to switch to path here.

Suggested change
pub fn delete_resctrl_subdirectory(path: &Path) -> Result<()> {
pub fn delete_resctrl_subdirectory(id: &str) -> Result<()> {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the above.
#3523 (comment)

runc explicitly stores both the IntelRdtPath and IntelRdtMonPath directly in its linuxContainer state structure.

Reconstructing the exact paths dynamically during container deletion is complex (it requires re-parsing the spec for closID, container_id, and enable_monitoring), and storing the literal paths ensures the teardown phase always knows exactly which directories it created and needs to remove, matching the reference implementation's behavior.

pub use_systemd: bool,
// Specifies if the Intel RDT subdirectory needs be cleaned up.
pub clean_up_intel_rdt_subdirectory: Option<bool>,
// Specifies the Intel RDT subdirectory path that needs to be cleaned up.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here implies that this path exists when a deletion is required. Could you rename it to something that makes this intent clearer? Also, what was the motivation behind using a path instead of a bool as the state's responsibility?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer the second part first regarding the motivation for using a path instead of a boolean flag:

This was done to match runc's approach. runc stores the explicit filesystem paths (IntelRdtPath and IntelRdtMonPath) directly in its linuxContainer state structure.

Since youki aims for compatibility, saving the explicit paths avoids complex dynamic reconstruction of those paths during container deletion (which requires re-evaluating the spec, closID, etc.).

Regarding the naming: I just followed how runc named them (intel_rdt_path and intel_rdt_mon_path).

}
}

let mut config_opt = None;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind changing config_opt? Personally, I don't see the necessity for this change.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation behind introducing config_opt was simply to decouple the Intel RDT cleanup logic from the YoukiConfig processing block.

I wanted to group the RDT teardown operations clearly before the cgroup and hooks teardown, similar to runc's teardown order.

@@ -33,16 +35,46 @@ fn create_spec(

pub fn test_intel_rdt() -> TestResult {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the scope of this PR is strictly for v1.3.0 support, could you please add a note to the PR context describing the other behavioral changes (outside of enable_monitoring) and what the ideal behavior for v1.3.0 should look like?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, it seems necessary to include support for schemata.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@@ -10,11 +10,13 @@ fn create_spec(
maybe_l3_cache: Option<&str>,
maybe_mem_bw: Option<&str>,
maybe_clos_id: Option<&str>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding maybe_clos_id: Option<&str>, do we actually need this arg in create_spec under this context?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I will remove the maybe_clos_id argument from create_spec to simplify the test setup.

@nayuta723
Copy link
Copy Markdown
Contributor

nayuta723 commented May 17, 2026

@utam0k
I’d like to release oci-spec-rs version v0.10.0. Are there any blocking issues we need to address before that?

@utam0k
Copy link
Copy Markdown
Member

utam0k commented May 17, 2026

@utam0k I’d like to release oci-spec-rs version v0.10.0. Are there any blocking issues we need to address before that?

There are no blockers at the moment. Could you please open the PR for the new release?

@tommady
Copy link
Copy Markdown
Collaborator Author

tommady commented May 17, 2026

Since the scope of this PR seems quite broad, I’ve left comments on the high-level points for now. For intelRdt, even excluding enable_monitoring, we still need changes to align with v1.3.0 and additional tests. This will likely make the PR quite large, so I think it might be a good idea to split it.

Also, as an unreleased side effect, some changes to seccomp have slipped in. It might be better to cut a release before we proceed with reviewing this PR.

Lastly, since you mentioned that real-world testing hasn't been done yet, I'd like to see more rigorous tests added to the PR context. Also, it looks like the tests are being skipped in CI as well, so we might need to start by setting up a working environment. Could you check how runc handles this to see what they do?

I understand the concern regarding the scope of the PR and the changes to ContainerState.

To address this and ensure existing deployments are unaffected, I will put all the new OCI v1.3.0 intel_rdt features (including enable_monitoring, the schemata list handling, and the path caching in ContainerState which aligns us with runc) behind a new Cargo feature flag (e.g., OCI_v1_3_0). WDYT?

Regarding the real-world testing and CI environment, I investigated how runc handles this since it's the reference implementation.

It turns out that runc intentionally does not run any Intel RDT integration tests in CI or against a real kernel (you can check their tests/integration/ folder, there are zero RDT tests).

Instead, runc solely relies on unit tests that mock the /sys/fs/resctrl filesystem by creating a temporary directory (t.TempDir()) and writing plain text files to simulate the kernel interface (e.g., in libcontainer/intelrdt/util_test.go).

This is because the Linux kernel strictly enforces physical hardware capabilities (via MSRs) when interacting with the real /sys/fs/resctrl.

If the host doesn't physically have Xeon L3 CAT or MBA hardware (which GitHub Actions VMs and most consumer machines do not), the kernel outright rejects the hardcoded L3 or MB schema writes, causing integration tests to fail.

That being said, I completely agree we need real-world validation for a change this large.

Once the review is complete and the test cases and code changes are all approved, I am willing to manually provision a supported bare-metal instance

such as a GCP Intel C3, AMD C3D, or Google Axion ARM instance

to prove the implementation works end-to-end against a real kernel.
I will post the successful test output here before we merge.

Alternatively, if you agree that we should follow runc's approach of mocking the filesystem entirely for unit testing instead of relying on integration tests, I am happy to implement that as well ( to save my money lol ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update LinuxIntelRdt to align with runtime-spec

3 participants