Roland,

the easiest way is to use an external hwloc that is configured with
--disable-nvml

an other option is to hack the embedded hwloc configure.m4 and pass
--disable-nvml to the embedded hwloc configure. note this requires you run
autogen.sh and you hence needs recent autotools.

i guess Open MPI 1.8 embeds an older hwloc that is not aware of nvml, hence
the lack of warning.

Cheers,

Gilles

On Wednesday, March 22, 2017, Roland Fehrenbacher <r...@q-leap.de> wrote:

> >>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com <javascript:;>> writes:
>
>     SJ> If you installed CUDA libraries and includes in /usr, then it's
>     SJ> not surprising hwloc finds them even without defining CFLAGS.
>
> Well, that's the place where distribution packages install to :)
> I don't think a build system should misbehave, if libraries are installed
> in default places.
>
>     SJ> I'm just saying I think you won't get the error message if Open
>     SJ> MPI finds CUDA but hwloc does not.
>
> OK, so I think I need to ask the original question again: Is there a way
> to suppress these warnings with a "normal" build? I guess the answer
> must be yes, since 1.8.x didn't have this problem. The real question
> then would be how ...
>
> Thanks,
>
> Roland
>
>     SJ> On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote:
>     >>>>>>> "SJ" == Sylvain Jeaugey <sjeau...@nvidia.com <javascript:;>>
> writes:
>     >> Hi Silvain,
>     >>
>     >> I get the "NVIDIA : ..." run-time error messages just by
>     >> compiling with "--with-cuda=/usr":
>     >>
>     >> ./configure --prefix=${prefix} \ --mandir=${prefix}/share/man \
>     >> --infodir=${prefix}/share/info \
>     >> --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \
>     >> --disable-memchecker \ --disable-vt \ --with-tm --with-slurm
>     >> --with-pmi --with-sge \ --with-cuda=/usr \
>     >> --with-io-romio-flags='--with-file-system=nfs+lustre' \
>     >> --with-cma --without-valgrind \ --enable-openib-connectx-xrc \
>     >> --enable-orterun-prefix-by-default \ --disable-java
>     >>
>     >> Roland
>     >>
>     SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from
>     SJ> the fact that you add CUDA includes in the C*FLAGS. If you just
>     SJ> use --with-cuda, Open MPI will compile with CUDA support, but
>     SJ> hwloc will not find CUDA and that will be fine. However, setting
>     SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support
>     SJ> (which is not needed) and then NVML will show this error message
>     SJ> when not run on a machine with CUDA devices.
>     >>
>     SJ> I guess gcc picks the environment variable, while cc does not
>     SJ> hence the different behavior. So again, there is no need to add
>     SJ> all those CUDA includes, --with-cuda is enough.
>     >>
>     SJ> About the opal_list_remove_item, we'll try to reproduce the
>     SJ> issue and see where it comes from.
>     >>
>     SJ> Sylvain
>     >>
>     SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote:
>     >> >> Hi,
>     >> >>
>     >> >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise
>     >> >> Server
>     >> >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get
>     >> >>      once
>     >> >> more a warning about a missing item for one of my small
>     >> >> programs (it doesn't matter if I use my cc or gcc version). My
>     >> >> gcc version also displays the message "NVIDIA: no NVIDIA
>     >> >> devices found" for the server without NVIDIA devices (I don't
>     >> >> get the message for my cc version).  I used the following
>     >> >> commands to build the package (${SYSTEM_ENV} is Linux and
>     >> >> ${MACHINE_ENV} is x86_64).
>     >> >>
>     >> >>
>     >> >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd
>     >> >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
>     >> >>
>     >> >> ../openmpi-2.1.0rc4/configure \
>     >> >> --prefix=/usr/local/openmpi-2.1.0_64_cc \
>     >> >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
>     >> >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
>     >> >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
>     >> >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z
>     >> >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \
>     >> >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt
>     >> >> -I/usr/local/include -I/usr/local/cuda/include" \
>     >> >> CXXFLAGS="-m64 -I/usr/local/include -I/usr/local/cuda/include"
>     >> >> \ FCFLAGS="-m64" \ CPP="cpp -I/usr/local/include
>     >> >> -I/usr/local/cuda/include" \ CXXCPP="cpp -I/usr/local/include
>     >> >> -I/usr/local/cuda/include" \ --enable-mpi-cxx \
>     >> >> --enable-cxx-exceptions \ --enable-mpi-java \
>     >> >> --with-cuda=/usr/local/cuda \
>     >> >> --with-valgrind=/usr/local/valgrind \
>     >> >> --enable-mpi-thread-multiple \ --with-hwloc=internal \
>     >> >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \
>     >> >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64"
>     >> >> \ --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee
>     >> >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>     >> >>
>     >> >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r
>     >> >> /usr/local/openmpi-2.1.0_64_cc.old mv
>     >> >> /usr/local/openmpi-2.1.0_64_cc
>     >> >> /usr/local/openmpi-2.1.0_64_cc.old make install |& tee
>     >> >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |&
>     >> >> tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>     >> >>
>     >> >>
>     >> >> Sometimes everything works as expected.
>     >> >>
>     >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2
>     >> >> spawn_intra_comm Parent process 0: I create 2 slave processes
>     >> >>
>     >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1
>     >> >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES
>     >> >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in
>     >> >> COMM_ALL_PROCESSES: 0
>     >> >>
>     >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
>     >> >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 1
>     >> >>
>     >> >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
>     >> >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2
>     >> >>
>     >> >>
>     >> >>
>     >> >> More often I get a warning.
>     >> >>
>     >> >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2
>     >> >> spawn_intra_comm Parent process 0: I create 2 slave processes
>     >> >>
>     >> >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1
>     >> >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES
>     >> >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in
>     >> >> COMM_ALL_PROCESSES: 0
>     >> >>
>     >> >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
>     >> >> COMM_ALL_PROCESSES ntasks: 3
>     >> >>
>     >> >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
>     >> >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2
>     >> >> mytid in COMM_ALL_PROCESSES: 1 Warning ::
>     >> >> opal_list_remove_item - the item 0x25a76f0 is not on the list
>     >> >> 0x7f96db515998 loki spawn 144
>     >> >>
>     >> >>
>     >> >>
>     >> >> I would be grateful, if somebody can fix the problem. Do you
>     >> >> need anything else? Thank you very much for any help in
>     >> >> advance.
>     >> >>
>     >> >>
>     >> >> Kind regards
>     >> >>
>     >> >> Siegmar _______________________________________________ users
>     >> >> mailing list users@lists.open-mpi.org <javascript:;>
>     >> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >>
>     >>
>     SJ> ------------------------------------------------------------
> -----------------------
>     SJ> This email message is for the sole use of the intended
>     SJ> recipient(s) and may contain confidential information.  Any
>     SJ> unauthorized review, use, disclosure or distribution is
>     SJ> prohibited.  If you are not the intended recipient, please
>     SJ> contact the sender by reply email and destroy all copies of the
>     SJ> original message.
>     SJ> ------------------------------------------------------------
> -----------------------
>     SJ> _______________________________________________ users mailing
>     SJ> list users@lists.open-mpi.org <javascript:;>
>     SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     >>
>
>
>     SJ> ------------------------------------------------------------
> -----------------------
>     SJ> This email message is for the sole use of the intended
>     SJ> recipient(s) and may contain confidential information.  Any
>     SJ> unauthorized review, use, disclosure or distribution is
>     SJ> prohibited.  If you are not the intended recipient, please
>     SJ> contact the sender by reply email and destroy all copies of the
>     SJ> original message.
>     SJ> ------------------------------------------------------------
> -----------------------
>     SJ> _______________________________________________ users mailing
>     SJ> list users@lists.open-mpi.org <javascript:;>
>     SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
> --
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <javascript:;>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to