Ah. I suspect your issue may be the cuda. 10.1 which does not create/register all the appropriate symlinks and "provides".
I ran into that trying to install tensorflow.

If you can, downgrade to 10.0, which does a better job of installing itself.

Brian

On 8/16/2019 5:47 AM, Lou Nicotra wrote:
Brian, the package is being built and installed on the master server.  I am testing by removing all instances of V18 and installing the newly created V19 slurm rpms,  I get the error message on the slurm rpm install, all others (ctl, db, ... ) install fine.

After I get the error message, I remove all rpms from V19 and reinstall V18 using the same procedure with no issues... And the system sees all nodes as it did before trying to install V19

The nvidia libraries are installed via the official Nvidia rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm supporting cuda10. Multi GPU server currently used by multiple users (DNN training) with no errors of any type while utilizing the nvidia libs/code.

nvidia-smi command shows:  NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1

So, it is definitely something new to the V19 release... I have installed 18.08.0, .3, .4 and .8 on the same server and nodes since Sep of 2018 using the same procedures and never had any issues... Currently running 18.08.8

Thanks.
Lou

On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus <toomuc...@gmail.com <mailto:toomuc...@gmail.com>> wrote:

    Lou,

    Are you installing on the same machine you built?

    Are the nvidia libraries installed by RPM or a 'make install' on
    the box you compiled it on?

    Brian Andrus

    On 8/15/2019 7:53 AM, Lou Nicotra wrote:
    I have tried running ldconfig manually as suggested with 
    slurm-19.05.1-2 and it fails the same way...
    error: Failed dependencies:
            libnvidia-ml.so.1()(64bit) is needed by
    slurm-19.05.1-2.el7.centos.x86_64

    ldconfig -p shows:
    root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
            libnvidia-ml.so.1 (libc6,x86-64) =>
    /usr/lib64/libnvidia-ml.so.1
            libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
            libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
            libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so

    Just tried the latest release slurm-19.05.2 and it fails in the
    same way...
    root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
    error: Failed dependencies:
            libnvidia-ml.so.1()(64bit) is needed by
    slurm-19.05.2-1.el7.centos.x86_64

    Reinstalled slurm-18.08.8 and it installs with no issues... Just
    like slurm-18.08.03 and slurm-18.08.4 did...  All built on the
    same machine with rpmbuild -ta command...
    root@panther02 slurm-18.08.8# rpm -Uvh
    slurm-18.08.8-1.el7.centos.x86_64.rpm
    Preparing...  ################################# [100%]
    Updating / installing...
       1:slurm-18.08.8-1.el7.centos #################################
    [100%]

    Oh, well...

    Lou



    On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec
    <barbara.kraso...@ijs.si <mailto:barbara.kraso...@ijs.si>> wrote:

        What if you try to run ldconfig manually before building the rpm?

        Cheers,

        Barbara

        On 8/8/19 5:57 PM, Lou Nicotra wrote:
        I am running into an error while trying to
        install slurm-19.05.1-2.el7.centos.x86_64... Error is as
        follows:
        root@panther02 x86_64# rpm -Uvh
        slurm-19.05.1-2.el7.centos.x86_64.rpm
        error: Failed dependencies:
                libnvidia-ml.so.1()(64bit) is needed by
        slurm-19.05.1-2.el7.centos.x86_64

        Packages are built using rpmbuild... And complete with no
        errors...
        + cd /root/rpmbuild/BUILD
        + cd slurm-19.05.1-2
        + rm -rf
        /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
        + exit 0

        Investigation of the output while building the rpm package
        shows that nvidia-ml is found:
        checking for nvmlInit in -lnvidia-ml... yes
        .
        .
        libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
        -I../../../../slurm -I../../../.. -I../../../../src/common
        -I/usr/local/cuda/include -I/usr/cuda/include
        -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
        -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
        -fstack-protector-strong --param=ssp-buffer-size=4
        -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
        -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c  -fPIC -DPIC
        -o .libs/gpu_nvml.o
        libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
        -I../../../../slurm -I../../../.. -I../../../../src/common
        -I/usr/local/cuda/include -I/usr/cuda/include
        -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
        -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
        -fstack-protector-strong --param=ssp-buffer-size=4
        -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
        -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -o
        gpu_nvml.o >/dev/null 2>&1
        /bin/sh ../../../../libtool  --tag=CC --mode=link gcc
         -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
        -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
        -fstack-protector-strong --param=ssp-buffer-size=4
        -grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3
        -Wall -g -O1 -fno-strict-aliasing -module -avoid-version
        --export-dynamic -Wl,-z,relro   -o gpu_nvml.la
        <http://gpu_nvml.la> -rpath /usr/lib64/slurm gpu_nvml.lo
        -lnvidia-ml
        libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o
        -lnvidia-ml -O2 -g -fstack-protector-strong
        -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3 -g
        -O1 -Wl,-z -Wl,relro   -pthread -Wl,-soname -Wl,gpu_nvml.so
        -o .libs/gpu_nvml.so

        The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
        includes: NVML_LIBS = -lnvidia-ml
        but previous releases did not (slurm-18.08.8) And I was able
        to compile and install that release with no issues after
        building it with rpmbuild...

        My LD_LIBRARY_PATH is
        
/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:

        Can anyone provide suggestions on working out this issue?

        Thanks.
         --

        LOU NICOTRA

        IT Systems Engineer - SLT

        Interactions LLC

        o: 908-673-1833 <tel:781-405-5114>

        m: 908-451-6983 <tel:781-405-5114>

        _lnico...@interactions.com <mailto:lnico...@interactions.com>_

        www.interactions.com <http://www.interactions.com/>

        
*******************************************************************************

        This e-mail and any of its attachments may contain
        Interactions LLC proprietary information, which is
        privileged, confidential, or subject to copyright belonging
        to the Interactions LLC. This e-mail is intended solely for
        the use of the individual or entity to which it is
        addressed. If you are not the intended recipient of this
        e-mail, you are hereby notified that any dissemination,
        distribution, copying, or action taken in relation to the
        contents of and attachments to this e-mail is strictly
        prohibited and may be unlawful. If you have received this
        e-mail in error, please notify the sender immediately and
        permanently delete the original and any copy of this e-mail
        and any printout. Thank You.

        
*******************************************************************************



--
    LOU NICOTRA

    IT Systems Engineer - SLT

    Interactions LLC

    o: 908-673-1833 <tel:781-405-5114>

    m: 908-451-6983 <tel:781-405-5114>

    _lnico...@interactions.com <mailto:lnico...@interactions.com>_

    www.interactions.com <http://www.interactions.com/>

    
*******************************************************************************

    This e-mail and any of its attachments may contain Interactions
    LLC proprietary information, which is privileged, confidential,
    or subject to copyright belonging to the Interactions LLC. This
    e-mail is intended solely for the use of the individual or entity
    to which it is addressed. If you are not the intended recipient
    of this e-mail, you are hereby notified that any dissemination,
    distribution, copying, or action taken in relation to the
    contents of and attachments to this e-mail is strictly prohibited
    and may be unlawful. If you have received this e-mail in error,
    please notify the sender immediately and permanently delete the
    original and any copy of this e-mail and any printout. Thank You.

    
*******************************************************************************



--

LOU NICOTRA

IT Systems Engineer - SLT

Interactions LLC

o: 908-673-1833 <tel:781-405-5114>

m: 908-451-6983 <tel:781-405-5114>

_lnico...@interactions.com <mailto:lnico...@interactions.com>_

www.interactions.com <http://www.interactions.com/>

*******************************************************************************

This e-mail and any of its attachments may contain Interactions LLC proprietary information, which is privileged, confidential, or subject to copyright belonging to the Interactions LLC. This e-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this e-mail is strictly prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender immediately and permanently delete the original and any copy of this e-mail and any printout. Thank You.

*******************************************************************************

Reply via email to