Skip to content

Conversation

@ajdecon
Copy link
Contributor

@ajdecon ajdecon commented Jun 30, 2021

Summary

Previously, the enroot configuration directory on a node was allocated on a per-group basis according to the user's primary group. However, the permissions are not set to be group readable as per #969.

From a permissions standpoint, we probably don't want the enroot cache directory to be group-readable by default. Users may have access to different sets of private container images, so we should default to making these permissions restrictive.

This PR therefore switches Enroot to use a per-user cache directory by default.

Test plan

Set up a test Slurm cluster with this configuration, and created two test users (user1 and user2) with the same primary group.

Started an enroot job as user1:

user1@virtual-login01:~$ id
uid=20001(user1) gid=100(users) groups=100(users),27(sudo)
user1@virtual-login01:~$ srun -n1 --container-image="centos" --pty bash
pyxis: importing docker image ...
root@virtual-gpu01:/#

Then started a simultaneous job on the same host as user2 and confirmed this works (addressing the error condition in #969):

user2@virtual-login01:~$ id
uid=20002(user2) gid=100(users) groups=100(users),27(sudo)
user2@virtual-login01:~$ srun -n1 -w virtual-gpu01 --container-image="centos" --pty bash
pyxis: importing docker image ...
root@virtual-gpu01:/#

Checked the cache directories on virtual-gpu01 and verified two cache directories exist:

vagrant@virtual-gpu01:~$ ls /var/lib/enroot-cache/
user-20001  user-20002

@ajdecon ajdecon requested a review from dholt June 30, 2021 17:08
@dholt
Copy link
Contributor

dholt commented Jun 30, 2021

I'm a little conflicted on this. The point was to re-use cached containers among many users. If users want their containers to remain private, can't they change the cache directory themselves to point to another location?

@ajdecon
Copy link
Contributor Author

ajdecon commented Jun 30, 2021

@dholt : I went back and forth on this to be honest, but the enroot configuration isn't well-understood by admins in most cases, let alone by regular users. I don't think expecting users to set their own private cache dirs is reasonable (unless we somehow expose this more prominently).

I generally favor "failing closed" rather than "failing open" which is why I made this change, but there's definitely a trade-off.

The cache should still be useful for users who use the same containers in subsequent jobs.

@dholt
Copy link
Contributor

dholt commented Jun 30, 2021

Fair enough.

Probably also need to change permissions here:

@ajdecon ajdecon merged commit 2ff8236 into NVIDIA:master Jun 30, 2021
@ajdecon ajdecon deleted the enroot-cache-dir-user branch June 30, 2021 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants