Ok, thank you so much for that hint... I will try doing that and report back.
Thanks! Lou On Fri, Aug 16, 2019 at 11:05 AM Brian Andrus <toomuc...@gmail.com> wrote: > Ah. I suspect your issue may be the cuda. 10.1 which does not > create/register all the appropriate symlinks and "provides". > I ran into that trying to install tensorflow. > > If you can, downgrade to 10.0, which does a better job of installing > itself. > > Brian > On 8/16/2019 5:47 AM, Lou Nicotra wrote: > > Brian, the package is being built and installed on the master server. I > am testing by removing all instances of V18 and installing the newly > created V19 slurm rpms, I get the error message on the slurm rpm install, > all others (ctl, db, ... ) install fine. > > After I get the error message, I remove all rpms from V19 and reinstall > V18 using the same procedure with no issues... And the system sees all > nodes as it did before trying to install V19 > > The nvidia libraries are installed via the official Nvidia > rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm > supporting cuda10. Multi GPU server currently used by multiple users (DNN > training) with no errors of any type while utilizing the nvidia libs/code. > > nvidia-smi command shows: NVIDIA-SMI 418.39 Driver Version: 418.39 > CUDA Version: 10.1 > > So, it is definitely something new to the V19 release... I have installed > 18.08.0, .3, .4 and .8 on the same server and nodes since Sep of 2018 using > the same procedures and never had any issues... Currently running 18.08.8 > > Thanks. > Lou > > On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus <toomuc...@gmail.com> wrote: > >> Lou, >> >> Are you installing on the same machine you built? >> >> Are the nvidia libraries installed by RPM or a 'make install' on the box >> you compiled it on? >> >> Brian Andrus >> On 8/15/2019 7:53 AM, Lou Nicotra wrote: >> >> I have tried running ldconfig manually as suggested with >> slurm-19.05.1-2 and it fails the same way... >> error: Failed dependencies: >> libnvidia-ml.so.1()(64bit) is needed by >> slurm-19.05.1-2.el7.centos.x86_64 >> >> ldconfig -p shows: >> root@panther02 slurm# ldconfig -p|grep libnvidia-ml. >> libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1 >> libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1 >> libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so >> libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so >> >> Just tried the latest release slurm-19.05.2 and it fails in the same >> way... >> root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm >> error: Failed dependencies: >> libnvidia-ml.so.1()(64bit) is needed by >> slurm-19.05.2-1.el7.centos.x86_64 >> >> Reinstalled slurm-18.08.8 and it installs with no issues... Just >> like slurm-18.08.03 and slurm-18.08.4 did... All built on the same machine >> with rpmbuild -ta command... >> root@panther02 slurm-18.08.8# rpm -Uvh >> slurm-18.08.8-1.el7.centos.x86_64.rpm >> Preparing... ################################# >> [100%] >> Updating / installing... >> 1:slurm-18.08.8-1.el7.centos ################################# >> [100%] >> >> Oh, well... >> >> Lou >> >> >> >> On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec <barbara.kraso...@ijs.si> >> wrote: >> >>> What if you try to run ldconfig manually before building the rpm? >>> >>> Cheers, >>> >>> Barbara >>> On 8/8/19 5:57 PM, Lou Nicotra wrote: >>> >>> I am running into an error while trying to >>> install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows: >>> root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm >>> error: Failed dependencies: >>> libnvidia-ml.so.1()(64bit) is needed by >>> slurm-19.05.1-2.el7.centos.x86_64 >>> >>> Packages are built using rpmbuild... And complete with no errors... >>> + cd /root/rpmbuild/BUILD >>> + cd slurm-19.05.1-2 >>> + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64 >>> + exit 0 >>> >>> Investigation of the output while building the rpm package shows that >>> nvidia-ml is found: >>> checking for nvmlInit in -lnvidia-ml... yes >>> . >>> . >>> libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../.. >>> -I../../../../slurm -I../../../.. -I../../../../src/common >>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY >>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions >>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches >>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c >>> gpu_nvml.c -fPIC -DPIC -o .libs/gpu_nvml.o >>> libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../.. >>> -I../../../../slurm -I../../../.. -I../../../../src/common >>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY >>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions >>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches >>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c >>> gpu_nvml.c -o gpu_nvml.o >/dev/null 2>&1 >>> /bin/sh ../../../../libtool --tag=CC --mode=link gcc >>> -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 >>> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 >>> -grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 >>> -fno-strict-aliasing -module -avoid-version --export-dynamic -Wl,-z,relro >>> -o gpu_nvml.la -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml >>> libtool: link: gcc -shared -fPIC -DPIC .libs/gpu_nvml.o -lnvidia-ml >>> -O2 -g -fstack-protector-strong -grecord-gcc-switches -m64 >>> -mtune=generic -pthread -ggdb3 -g -O1 -Wl,-z -Wl,relro -pthread >>> -Wl,-soname -Wl,gpu_nvml.so -o .libs/gpu_nvml.so >>> >>> The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src >>> includes: NVML_LIBS = -lnvidia-ml >>> but previous releases did not (slurm-18.08.8) And I was able to compile >>> and install that release with no issues after building it with rpmbuild... >>> >>> My LD_LIBRARY_PATH is >>> /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/: >>> >>> Can anyone provide suggestions on working out this issue? >>> >>> Thanks. >>> -- >>> >>> LOU NICOTRA >>> >>> IT Systems Engineer - SLT >>> >>> Interactions LLC >>> >>> o: 908-673-1833 <781-405-5114> >>> >>> m: 908-451-6983 <781-405-5114> >>> >>> *lnico...@interactions.com <lnico...@interactions.com>* >>> www.interactions.com >>> >>> >>> ******************************************************************************* >>> >>> This e-mail and any of its attachments may contain Interactions LLC >>> proprietary information, which is privileged, confidential, or subject to >>> copyright belonging to the Interactions LLC. This e-mail is intended solely >>> for the use of the individual or entity to which it is addressed. If you >>> are not the intended recipient of this e-mail, you are hereby notified that >>> any dissemination, distribution, copying, or action taken in relation to >>> the contents of and attachments to this e-mail is strictly prohibited and >>> may be unlawful. If you have received this e-mail in error, please notify >>> the sender immediately and permanently delete the original and any copy of >>> this e-mail and any printout. Thank You. >>> >>> >>> ******************************************************************************* >>> >>> >>> >> >> -- >> >> LOU NICOTRA >> >> IT Systems Engineer - SLT >> >> Interactions LLC >> >> o: 908-673-1833 <781-405-5114> >> >> m: 908-451-6983 <781-405-5114> >> >> *lnico...@interactions.com <lnico...@interactions.com>* >> www.interactions.com >> >> >> ******************************************************************************* >> >> This e-mail and any of its attachments may contain Interactions LLC >> proprietary information, which is privileged, confidential, or subject to >> copyright belonging to the Interactions LLC. This e-mail is intended solely >> for the use of the individual or entity to which it is addressed. If you >> are not the intended recipient of this e-mail, you are hereby notified that >> any dissemination, distribution, copying, or action taken in relation to >> the contents of and attachments to this e-mail is strictly prohibited and >> may be unlawful. If you have received this e-mail in error, please notify >> the sender immediately and permanently delete the original and any copy of >> this e-mail and any printout. Thank You. >> >> >> ******************************************************************************* >> >> >> > > -- > > LOU NICOTRA > > IT Systems Engineer - SLT > > Interactions LLC > > o: 908-673-1833 <781-405-5114> > > m: 908-451-6983 <781-405-5114> > > *lnico...@interactions.com <lnico...@interactions.com>* > www.interactions.com > > > ******************************************************************************* > > This e-mail and any of its attachments may contain Interactions LLC > proprietary information, which is privileged, confidential, or subject to > copyright belonging to the Interactions LLC. This e-mail is intended solely > for the use of the individual or entity to which it is addressed. If you > are not the intended recipient of this e-mail, you are hereby notified that > any dissemination, distribution, copying, or action taken in relation to > the contents of and attachments to this e-mail is strictly prohibited and > may be unlawful. If you have received this e-mail in error, please notify > the sender immediately and permanently delete the original and any copy of > this e-mail and any printout. Thank You. > > > ******************************************************************************* > > > -- LOU NICOTRA IT Systems Engineer - SLT Interactions LLC o: 908-673-1833 <781-405-5114> m: 908-451-6983 <781-405-5114> *lnico...@interactions.com <lnico...@interactions.com>* www.interactions.com -- ******************************************************************************* This e-mail and any of its attachments may contain Interactions LLC proprietary information, which is privileged, confidential, or subject to copyright belonging to the Interactions LLC. This e-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this e-mail is strictly prohibited and may be unlawful. If you have received this e-mail in error, please notify the sender immediately and permanently delete the original and any copy of this e-mail and any printout. Thank You. *******************************************************************************