Skip to content

Conversation

@ajdecon
Copy link
Contributor

@ajdecon ajdecon commented Aug 6, 2021

Previously we built Slurm with statically-linked binaries, but this
turned out not to have the advantages we hoped. In addition, it turns
out this has caused problems with certain Slurm commands such as sstat
(see #720).

This commit swaps the configure flag --without-shared-libslurm to
--with-shared-libslurm. After testing, I can confirm the sstat
command seems to work again.

Previously we built Slurm with statically-linked binaries, but this
turned out not to have the advantages we hoped. In addition, it turns
out this has caused problems with certain Slurm commands such as `sstat`
(see NVIDIA#720).

This commit swaps the configure flag `--without-shared-libslurm` to
`--with-shared-libslurm`. After testing, I can confirm the `sstat`
command seems to work again.
@ajdecon ajdecon requested a review from dholt August 6, 2021 13:35
@elgalu
Copy link
Contributor

elgalu commented Aug 7, 2021

Chore: multi-line YAML can ease readability a bit:

slurm_configure: >
  ./configure
    --prefix={{ slurm_install_prefix }}
    --disable-dependency-tracking
    --disable-debug
    --disable-x11
    --enable-really-no-cray
    --enable-salloc-kill-cmd
    --with-hdf5=no
    --sysconfdir={{ slurm_config_dir }}
    --enable-pam
    --with-pam_dir={{ slurm_pam_lib_dir }}
    --with-shared-libslurm
    --without-rpath
    --with-pmix={{ pmix_install_prefix }}
    --with-hwloc={{ hwloc_install_prefix }}

@dholt dholt self-assigned this Aug 11, 2021
@dholt dholt merged commit e606618 into NVIDIA:master Aug 12, 2021
@mathrock74
Copy link

Sorry to disturb again (I was also involved with #720):
I'm seeing lines like
pam_slurm_adopt[1470]: Unable to dlopen libslurm.so.37.0.0: libslurm.so.37.0.0: cannot open shared object file: No such file or directory
in worker nodes syslog. Function of pam_slurm_adopt seems to be OK. This happens on Ubuntu 20.04.3.
Is this a problem?

thx

~# ldd /usr/lib/x86_64-linux-gnu/security/pam_slurm_adopt.so
...
libslurm.so.37 => /usr/local/lib/libslurm.so.37 (0x00007f1550e4b000)
...

~# ls -l /usr/local/lib/libslurm.so.37*
lrwxrwxrwx 1 root root 18 Okt 27 21:04 /usr/local/lib/libslurm.so.37 -> libslurm.so.37.0.0
-rwxr-xr-x 1 root root 9409088 Okt 27 21:04 /usr/local/lib/libslurm.so.37.0.0

@itzsimpl
Copy link
Contributor

On freshly installed Slurm 21.08.5 on Ubuntu 20.04.4 (via deepops 22.01) I'm also seeng this log

pam_slurm_adopt[249857]: Unable to dlopen libslurm.so.37.0.0: libslurm.so.37.0.0: cannot open shared object file: No such file or directory

with an identical situation:

$~# ldd /usr/lib/x86_64-linux-gnu/security/pam_slurm_adopt.so 
        linux-vdso.so.1 (0x00007ffd147c5000)
        libslurm.so.37 => /usr/local/lib/libslurm.so.37 (0x00007f7335874000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7335851000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f733565f000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7335659000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f733550a000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f73354ee000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f7335a5c000)
$ ls -l
-rwxr-xr-x 1 root root      980 Mar 23 16:24 /usr/local/lib/libslurm.la
lrwxrwxrwx 1 root root       18 Mar 23 16:24 /usr/local/lib/libslurm.so -> libslurm.so.37.0.0
lrwxrwxrwx 1 root root       18 Mar 23 16:24 /usr/local/lib/libslurm.so.37 -> libslurm.so.37.0.0
-rwxr-xr-x 1 root root  9414752 Mar 23 16:24 /usr/local/lib/libslurm.so.37.0.0

any ideas to what is going on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants