If you installed CUDA libraries and includes in /usr, then it's not surprising hwloc finds them even without defining CFLAGS.

I'm just saying I think you won't get the error message if Open MPI finds CUDA but hwloc does not.

On 03/21/2017 11:05 AM, Roland Fehrenbacher wrote:
"SJ" == Sylvain Jeaugey <sjeau...@nvidia.com> writes:
Hi Silvain,

I get the "NVIDIA : ..." run-time error messages just by compiling
with "--with-cuda=/usr":

./configure --prefix=${prefix} \
     --mandir=${prefix}/share/man \
     --infodir=${prefix}/share/info \
     --sysconfdir=/etc/openmpi/${VERSION} --with-devel-headers \
     --disable-memchecker \
     --disable-vt \
     --with-tm --with-slurm --with-pmi --with-sge \
     --with-cuda=/usr \
     --with-io-romio-flags='--with-file-system=nfs+lustre' \
     --with-cma --without-valgrind \
     --enable-openib-connectx-xrc \
     --enable-orterun-prefix-by-default \
     --disable-java

Roland
SJ> Hi Siegmar, I think this "NVIDIA : ..." error message comes from
     SJ> the fact that you add CUDA includes in the C*FLAGS. If you just
     SJ> use --with-cuda, Open MPI will compile with CUDA support, but
     SJ> hwloc will not find CUDA and that will be fine. However, setting
     SJ> CUDA in CFLAGS will make hwloc find CUDA, compile CUDA support
     SJ> (which is not needed) and then NVML will show this error message
     SJ> when not run on a machine with CUDA devices.

     SJ> I guess gcc picks the environment variable, while cc does not
     SJ> hence the different behavior. So again, there is no need to add
     SJ> all those CUDA includes, --with-cuda is enough.

     SJ> About the opal_list_remove_item, we'll try to reproduce the
     SJ> issue and see where it comes from.

     SJ> Sylvain

     SJ> On 03/21/2017 12:38 AM, Siegmar Gross wrote:
     >> Hi,
     >>
     >> I have installed openmpi-2.1.0rc4 on my "SUSE Linux Enterprise
     >> Server
     >> 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0. Sometimes I get
     >>      once
     >> more a warning about a missing item for one of my small programs
     >> (it doesn't matter if I use my cc or gcc version). My gcc version
     >> also displays the message "NVIDIA: no NVIDIA devices found" for
     >> the server without NVIDIA devices (I don't get the message for my
     >> cc version).  I used the following commands to build the package
     >> (${SYSTEM_ENV} is Linux and ${MACHINE_ENV} is x86_64).
     >>
     >>
     >> mkdir openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc cd
     >> openmpi-2.1.0rc4-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
     >>
     >> ../openmpi-2.1.0rc4/configure \
     >> --prefix=/usr/local/openmpi-2.1.0_64_cc \
     >> --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
     >> --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
     >> --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
     >> JAVA_HOME=/usr/local/jdk1.8.0_66 \ LDFLAGS="-m64 -mt -Wl,-z
     >> -Wl,noexecstack -L/usr/local/lib64 -L/usr/local/cuda/ lib64" \
     >> CC="cc" CXX="CC" FC="f95" \ CFLAGS="-m64 -mt -I/usr/local/include
     >> -I/usr/local/cuda/include" \ CXXFLAGS="-m64 -I/usr/local/include
     >> -I/usr/local/cuda/include" \ FCFLAGS="-m64" \ CPP="cpp
     >> -I/usr/local/include -I/usr/local/cuda/include" \ CXXCPP="cpp
     >> -I/usr/local/include -I/usr/local/cuda/include" \
     >> --enable-mpi-cxx \ --enable-cxx-exceptions \ --enable-mpi-java \
     >> --with-cuda=/usr/local/cuda \ --with-valgrind=/usr/local/valgrind
     >> \ --enable-mpi-thread-multiple \ --with-hwloc=internal \
     >> --without-verbs \ --with-wrapper-cflags="-m64 -mt" \
     >> --with-wrapper-cxxflags="-m64" \ --with-wrapper-fcflags="-m64" \
     >> --with-wrapper-ldflags="-mt" \ --enable-debug \ |& tee
     >> log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
     >>
     >> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc rm -r
     >> /usr/local/openmpi-2.1.0_64_cc.old mv
     >> /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old
     >> make install |& tee
     >> log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc make check |& tee
     >> log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc
     >>
     >>
     >> Sometimes everything works as expected.
     >>
     >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2
     >> spawn_intra_comm Parent process 0: I create 2 slave processes
     >>
     >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1
     >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES
     >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in
     >> COMM_ALL_PROCESSES: 0
     >>
     >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
     >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 1
     >>
     >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
     >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2
     >>
     >>
     >>
     >> More often I get a warning.
     >>
     >> loki spawn 144 mpiexec -np 1 --host loki,nfs1,nfs2
     >> spawn_intra_comm Parent process 0: I create 2 slave processes
     >>
     >> Parent process 0 running on loki MPI_COMM_WORLD ntasks: 1
     >> COMM_CHILD_PROCESSES ntasks_local: 1 COMM_CHILD_PROCESSES
     >> ntasks_remote: 2 COMM_ALL_PROCESSES ntasks: 3 mytid in
     >> COMM_ALL_PROCESSES: 0
     >>
     >> Child process 0 running on nfs1 MPI_COMM_WORLD ntasks: 2
     >> COMM_ALL_PROCESSES ntasks: 3
     >>
     >> Child process 1 running on nfs2 MPI_COMM_WORLD ntasks: 2
     >> COMM_ALL_PROCESSES ntasks: 3 mytid in COMM_ALL_PROCESSES: 2 mytid
     >> in COMM_ALL_PROCESSES: 1 Warning :: opal_list_remove_item - the
     >> item 0x25a76f0 is not on the list 0x7f96db515998 loki spawn 144
     >>
     >>
     >>
     >> I would be grateful, if somebody can fix the problem. Do you need
     >> anything else? Thank you very much for any help in advance.
     >>
     >>
     >> Kind regards
     >>
     >> Siegmar _______________________________________________ users
     >> mailing list users@lists.open-mpi.org
     >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


     SJ> 
-----------------------------------------------------------------------------------
     SJ> This email message is for the sole use of the intended
     SJ> recipient(s) and may contain confidential information.  Any
     SJ> unauthorized review, use, disclosure or distribution is
     SJ> prohibited.  If you are not the intended recipient, please
     SJ> contact the sender by reply email and destroy all copies of the
     SJ> original message.
     SJ> 
-----------------------------------------------------------------------------------
     SJ> _______________________________________________ users mailing
     SJ> list users@lists.open-mpi.org
     SJ> https://rfd.newmexicoconsortium.org/mailman/listinfo/users



-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to