Roland, the easiest way is to use an external hwloc that is configured with --disable-nvml
an other option is to hack the embedded hwloc configure.m4 and pass --disable-nvml to the embedded hwloc configure. note this requires you run autogen.sh and you hence needs recent autotools. i guess Open MPI 1.8 embeds an older hwloc that is not aware of nvml, hence the lack of warning. Cheers, Gilles On Wednesday, March 22, 2017, Roland Fehrenbacher <r...@q-leap.de> wrote: > >>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com <javascript:;>> writes: > > SJ> If you installed CUDA libraries and includes in /usr, then it's > SJ> not surprising hwloc finds them even without defining CFLAGS. > > Well, that's the place where distribution packages install to :) > I don't think a build system should misbehave, if libraries are installed > in default places. > > SJ> I'm just saying I think you won't get the error message if Open > SJ> MPI finds CUDA but hwloc does not. > > OK, so I think I need to ask the original question again: Is there a way > to suppress these warnings with a "normal" build? I guess the answer > must be yes, since 1.8.x didn't have this problem. The real question > then would be how ... > > Thanks, > > Roland > > SJ> On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote: > >>>>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com <javascript:;>> > writes: > >> Hi Silvain, > >> > >> I get the "NVIDIA : ..." run-time error messages just by > >> compiling with "--with-cuda=/usr": > >> > >> ./configure --prefix=${prefix} \ --mandir=${prefix}/share/man \ > >> --infodir=${prefix}/share/info \ > >> --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \ > >> --disable-memchecker \ --disable-vt \ --with-tm --with-slurm > >> --with-pmi --with-sge \ --with-cuda=/usr \ > >> --with-io-romio-flags='--with-file-system=nfs+lustre' \ > >> --with-cma --without-valgrind \ --enable-openib-connectx-xrc \ > >> --enable-orterun-prefix-by-default \ --disable-java > >> > >> Roland > >> > SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from > SJ> the fact that you add CUDA includes in the C*FLAGS. If you just > SJ> use --with-cuda, Open MPI will compile with CUDA support, but > SJ> hwloc will not find CUDA and that will be fine. However, setting > SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support > SJ> (which is not needed) and then NVML will show this error message > SJ> when not run on a machine with CUDA devices. > >> > SJ> I guess gcc picks the environment variable, while cc does not > SJ> hence the different behavior. So again, there is no need to add > SJ> all those CUDA includes, --with-cuda is enough. > >> > SJ> About the opal_list_remove_item, we'll try to reproduce the > SJ> issue and see where it comes from. > >> > SJ> Sylvain > >> > SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote: > >> >> Hi, > >> >> > >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise > >> >> Server > >> >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get > >> >> once > >> >> more a warning about a missing item for one of my small > >> >> programs (it doesn't matter if I use my cc or gcc version). My > >> >> gcc version also displays the message "NVIDIA: no NVIDIA > >> >> devices found" for the server without NVIDIA devices (I don't > >> >> get the message for my cc version). I used the following > >> >> commands to build the package (${SYSTEM_ENV} is Linux and > >> >> ${MACHINE_ENV} is x86_64). > >> >> > >> >> > >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd > >> >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc > >> >> > >> >> ../openmpi-2.1.0rc4/configure \ > >> >> --prefix=/usr/local/openmpi-2.1.0_64_cc \ > >> >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \ > >> >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \ > >> >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \ > >> >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z > >> >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \ > >> >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt > >> >> -I/usr/local/include -I/usr/local/cuda/include" \ > >> >> CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include" > >> >> \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include > >> >> -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include > >> >> -I/usr/local/cuda/include" \ --enable-mpi-cxx \ > >> >> --enable-cxx-exceptions \ --enable-mpi-java \ > >> >> --with-cuda=/usr/local/cuda \ > >> >> --with-valgrind=/usr/local/valgrind \ > >> >> --enable-mpi-thread-multiple \ --with-hwloc=internal \ > >> >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \ > >> >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" > >> >> \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee > >> >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc > >> >> > >> >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r > >> >> /usr/local/openmpi-2.1.0_64_cc.old mv > >> >> /usr/local/openmpi-2.1.0_64_cc > >> >> /usr/local/openmpi-2.1.0_64_cc.old make install |& tee > >> >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& > >> >> tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc > >> >> > >> >> > >> >> Sometimes everything works as expected. > >> >> > >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 > >> >> spawn_intra_comm Parent process 0: I create 2 slave processes > >> >> > >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 > >> >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES > >> >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in > >> >> COMM_ALL_PROCESSES: 0 > >> >> > >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 > >> >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 1 > >> >> > >> >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 > >> >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2 > >> >> > >> >> > >> >> > >> >> More often I get a warning. > >> >> > >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2 > >> >> spawn_intra_comm Parent process 0: I create 2 slave processes > >> >> > >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1 > >> >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES > >> >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in > >> >> COMM_ALL_PROCESSES: 0 > >> >> > >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2 > >> >> COMM_ALL_PROCESSES ntasks: 3 > >> >> > >> >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2 > >> >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2 > >> >> mytid in COMM_ALL_PROCESSES: 1 Warning :: > >> >> opal_list_remove_item - the item 0x25a76f0 is not on the list > >> >> 0x7f96db515998 loki spawn 144 > >> >> > >> >> > >> >> > >> >> I would be grateful, if somebody can fix the problem. Do you > >> >> need anything else? Thank you very much for any help in > >> >> advance. > >> >> > >> >> > >> >> Kind regards > >> >> > >> >> Siegmar _______________________________________________ users > >> >> mailing list users@lists.open-mpi.org <javascript:;> > >> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >> > >> > SJ> ------------------------------------------------------------ > ----------------------- > SJ> This email message is for the sole use of the intended > SJ> recipient(s) and may contain confidential information. Any > SJ> unauthorized review, use, disclosure or distribution is > SJ> prohibited. If you are not the intended recipient, please > SJ> contact the sender by reply email and destroy all copies of the > SJ> original message. > SJ> ------------------------------------------------------------ > ----------------------- > SJ> _______________________________________________ users mailing > SJ> list users@lists.open-mpi.org <javascript:;> > SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > >> > > > SJ> ------------------------------------------------------------ > ----------------------- > SJ> This email message is for the sole use of the intended > SJ> recipient(s) and may contain confidential information. Any > SJ> unauthorized review, use, disclosure or distribution is > SJ> prohibited. If you are not the intended recipient, please > SJ> contact the sender by reply email and destroy all copies of the > SJ> original message. > SJ> ------------------------------------------------------------ > ----------------------- > SJ> _______________________________________________ users mailing > SJ> list users@lists.open-mpi.org <javascript:;> > SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > -- > _______________________________________________ > users mailing list > users@lists.open-mpi.org <javascript:;> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users