Ah. I suspect your issue may be the cuda. 10.1 which does not
create/register all the appropriate symlinks and "provides".
I ran into that trying to install tensorflow.
If you can, downgrade to 10.0, which does a better job of installing itself.
Brian
On 8/16/2019 5:47 AM, Lou Nicotra wrote:
Brian, the package is being built and installed on the master server.
I am testing by removing all instances of V18 and installing the newly
created V19 slurm rpms, I get the error message on the slurm rpm
install, all others (ctl, db, ... ) install fine.
After I get the error message, I remove all rpms from V19 and
reinstall V18 using the same procedure with no issues... And the
system sees all nodes as it did before trying to install V19
The nvidia libraries are installed via the official Nvidia
rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
supporting cuda10. Multi GPU server currently used by multiple users
(DNN training) with no errors of any type while utilizing the nvidia
libs/code.
nvidia-smi command shows: NVIDIA-SMI 418.39 Driver Version:
418.39 CUDA Version: 10.1
So, it is definitely something new to the V19 release... I have
installed 18.08.0, .3, .4 and .8 on the same server and nodes since
Sep of 2018 using the same procedures and never had any issues...
Currently running 18.08.8
Thanks.
Lou
On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus <toomuc...@gmail.com
<mailto:toomuc...@gmail.com>> wrote:
Lou,
Are you installing on the same machine you built?
Are the nvidia libraries installed by RPM or a 'make install' on
the box you compiled it on?
Brian Andrus
On 8/15/2019 7:53 AM, Lou Nicotra wrote:
I have tried running ldconfig manually as suggested with
slurm-19.05.1-2 and it fails the same way...
error: Failed dependencies:
libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64
ldconfig -p shows:
root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
libnvidia-ml.so.1 (libc6,x86-64) =>
/usr/lib64/libnvidia-ml.so.1
libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
Just tried the latest release slurm-19.05.2 and it fails in the
same way...
root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
error: Failed dependencies:
libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.2-1.el7.centos.x86_64
Reinstalled slurm-18.08.8 and it installs with no issues... Just
like slurm-18.08.03 and slurm-18.08.4 did... All built on the
same machine with rpmbuild -ta command...
root@panther02 slurm-18.08.8# rpm -Uvh
slurm-18.08.8-1.el7.centos.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:slurm-18.08.8-1.el7.centos #################################
[100%]
Oh, well...
Lou
On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec
<barbara.kraso...@ijs.si <mailto:barbara.kraso...@ijs.si>> wrote:
What if you try to run ldconfig manually before building the rpm?
Cheers,
Barbara
On 8/8/19 5:57 PM, Lou Nicotra wrote:
I am running into an error while trying to
install slurm-19.05.1-2.el7.centos.x86_64... Error is as
follows:
root@panther02 x86_64# rpm -Uvh
slurm-19.05.1-2.el7.centos.x86_64.rpm
error: Failed dependencies:
libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64
Packages are built using rpmbuild... And complete with no
errors...
+ cd /root/rpmbuild/BUILD
+ cd slurm-19.05.1-2
+ rm -rf
/root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
+ exit 0
Investigation of the output while building the rpm package
shows that nvidia-ml is found:
checking for nvmlInit in -lnvidia-ml... yes
.
.
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
-Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -fPIC -DPIC
-o .libs/gpu_nvml.o
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
-Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -o
gpu_nvml.o >/dev/null 2>&1
/bin/sh ../../../../libtool --tag=CC --mode=link gcc
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
-Wall -g -O1 -fno-strict-aliasing -module -avoid-version
--export-dynamic -Wl,-z,relro -o gpu_nvml.la
<http://gpu_nvml.la> -rpath /usr/lib64/slurm gpu_nvml.lo
-lnvidia-ml
libtool: link: gcc -shared -fPIC -DPIC .libs/gpu_nvml.o
-lnvidia-ml -O2 -g -fstack-protector-strong
-grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3 -g
-O1 -Wl,-z -Wl,relro -pthread -Wl,-soname -Wl,gpu_nvml.so
-o .libs/gpu_nvml.so
The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
includes: NVML_LIBS = -lnvidia-ml
but previous releases did not (slurm-18.08.8) And I was able
to compile and install that release with no issues after
building it with rpmbuild...
My LD_LIBRARY_PATH is
/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:
Can anyone provide suggestions on working out this issue?
Thanks.
--
LOU NICOTRA
IT Systems Engineer - SLT
Interactions LLC
o: 908-673-1833 <tel:781-405-5114>
m: 908-451-6983 <tel:781-405-5114>
_lnico...@interactions.com <mailto:lnico...@interactions.com>_
www.interactions.com <http://www.interactions.com/>
*******************************************************************************
This e-mail and any of its attachments may contain
Interactions LLC proprietary information, which is
privileged, confidential, or subject to copyright belonging
to the Interactions LLC. This e-mail is intended solely for
the use of the individual or entity to which it is
addressed. If you are not the intended recipient of this
e-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the
contents of and attachments to this e-mail is strictly
prohibited and may be unlawful. If you have received this
e-mail in error, please notify the sender immediately and
permanently delete the original and any copy of this e-mail
and any printout. Thank You.
*******************************************************************************
--
LOU NICOTRA
IT Systems Engineer - SLT
Interactions LLC
o: 908-673-1833 <tel:781-405-5114>
m: 908-451-6983 <tel:781-405-5114>
_lnico...@interactions.com <mailto:lnico...@interactions.com>_
www.interactions.com <http://www.interactions.com/>
*******************************************************************************
This e-mail and any of its attachments may contain Interactions
LLC proprietary information, which is privileged, confidential,
or subject to copyright belonging to the Interactions LLC. This
e-mail is intended solely for the use of the individual or entity
to which it is addressed. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the
contents of and attachments to this e-mail is strictly prohibited
and may be unlawful. If you have received this e-mail in error,
please notify the sender immediately and permanently delete the
original and any copy of this e-mail and any printout. Thank You.
*******************************************************************************
--
LOU NICOTRA
IT Systems Engineer - SLT
Interactions LLC
o: 908-673-1833 <tel:781-405-5114>
m: 908-451-6983 <tel:781-405-5114>
_lnico...@interactions.com <mailto:lnico...@interactions.com>_
www.interactions.com <http://www.interactions.com/>
*******************************************************************************
This e-mail and any of its attachments may contain Interactions LLC
proprietary information, which is privileged, confidential, or subject
to copyright belonging to the Interactions LLC. This e-mail is
intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient of this e-mail, you
are hereby notified that any dissemination, distribution, copying, or
action taken in relation to the contents of and attachments to this
e-mail is strictly prohibited and may be unlawful. If you have
received this e-mail in error, please notify the sender immediately
and permanently delete the original and any copy of this e-mail and
any printout. Thank You.
*******************************************************************************