dont ignore failure to create cgroup after timeout#349
Merged
fuweid merged 1 commit intocontainerd:mainfrom Nov 5, 2024
Merged
dont ignore failure to create cgroup after timeout#349fuweid merged 1 commit intocontainerd:mainfrom
fuweid merged 1 commit intocontainerd:mainfrom
Conversation
c984793 to
8ac0138
Compare
Member
|
Looks good, but can you squash the commits and write the same blurb you have in the PR description in the commit message? Thanks! |
8ac0138 to
d919943
Compare
Contributor
Author
Done! |
dcantah
approved these changes
Sep 25, 2024
dcantah
reviewed
Sep 25, 2024
cgroup2/manager.go
Outdated
| log.G(ctx).Warnf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus. Continuing...", group) | ||
| case <-time.After(systemdStartUnitTimeout): | ||
| attemptFailedUnitReset(conn, group) | ||
| return fmt.Errorf("Timed out while waiting for StartTransientUnit(%s) completion signal from dbus after %v", group, systemdStartUnitTimeout) |
Member
There was a problem hiding this comment.
Actually, one nit: Make Timed lowercase to appease the Golang idiom gods.
Before this commit, creating a cgroup would silently ignore timeouts and carry on. Concretely, this caused cases where a cgroup failed to create, but the caller doesn't realize and ends up looking for files that should exist (e.g. cgroups.controllers), only to find they don't exist. It's very difficult as a caller to deal with this case, where NewSystemd succeeds but the group doesn't exist. The origins of this code seem to trace back to an initial implementation written 5+ years ago: containerd@5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 runc added roughly the same logic here to deal with the same issue: opencontainers/runc#3782 Now, containerd will also error if a cgroup cannot be created within the timeout window. Signed-off-by: Josh Chorlton <jchorlton@gmail.com>
d919943 to
2e25118
Compare
dcantah
approved these changes
Sep 25, 2024
Contributor
Author
|
@dcantah what's the path to getting this merged? |
Member
|
@jchorl Just needs another approval, I'll shop this around to folks |
fuweid
approved these changes
Nov 5, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed that creating a cgroup will silently ignore timeouts and continue on. Concretely, I've hit cases where a cgroup fails to get created, and the caller ends up looking for
cgroups.controllersonly to find it isn't there.It's very difficult as a caller to deal with this case, where
NewSystemdsucceeds but the group doesn't exist.I traced the origins of this code to 5efa14e#diff-3331981e4ac06a8d9b06e91842b7f2759c7af3b65287e489a88385948d311ebdR672 - which was written 5+ years ago.
runcadded roughly the same logic here: opencontainers/runc#3782