Kawashima-san,

i'd rather consider this as a bug in the README (!)


heterogenous support has been broken for some time, but it was
eventually fixed.

truth is there are *very* limited resources (both human and hardware)
maintaining heterogeneous
support, but that does not mean heterogeneous support should not be
used, nor that bug report
will be ignored.

Cheers,

Gilles

On 2014/12/24 9:26, Kawashima, Takahiro wrote:
> Hi Siegmar,
>
> Heterogeneous environment is not supported officially.
>
> README of Open MPI master says:
>
> --enable-heterogeneous
>   Enable support for running on heterogeneous clusters (e.g., machines
>   with different endian representations).  Heterogeneous support is
>   disabled by default because it imposes a minor performance penalty.
>
>   *** THIS FUNCTIONALITY IS CURRENTLY BROKEN - DO NOT USE ***
>
>> Hi,
>>
>> today I installed openmpi-dev-602-g82c02b4 on my machines (Solaris 10 Sparc,
>> Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and the
>> new Solaris Studio 12.4 compilers. All build processes finished without
>> errors, but I have a problem running a very small program. It works for
>> three processes but hangs for six processes. I have the same behaviour
>> for both compilers.
>>
>> tyr small_prog 139 time; mpiexec -np 3 --host sunpc1,linpc1,tyr 
>> init_finalize; time
>> 827.161u 210.126s 30:51.08 56.0%        0+0k 4151+20io 2898pf+0w
>> Hello!
>> Hello!
>> Hello!
>> 827.886u 210.335s 30:54.68 55.9%        0+0k 4151+20io 2898pf+0w
>> tyr small_prog 140 time; mpiexec -np 6 --host sunpc1,linpc1,tyr 
>> init_finalize; time
>> 827.946u 210.370s 31:15.02 55.3%        0+0k 4151+20io 2898pf+0w
>> ^CKilled by signal 2.
>> Killed by signal 2.
>> 869.242u 221.644s 33:40.54 53.9%        0+0k 4151+20io 2898pf+0w
>> tyr small_prog 141 
>>
>> tyr small_prog 145 ompi_info | grep -e "Open MPI repo revision:" -e "C 
>> compiler:"
>>   Open MPI repo revision: dev-602-g82c02b4
>>               C compiler: cc
>> tyr small_prog 146 
>>
>>
>> tyr small_prog 146 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
>> GNU gdb (GDB) 7.6.1
>> ...
>> (gdb) run -np 3 --host sunpc1,linpc1,tyr init_finalize
>> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 3 --host 
>> sunpc1,linpc1,tyr 
>> init_finalize
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP    2        ]
>> Hello!
>> Hello!
>> Hello!
>> [LWP    2         exited]
>> [New Thread 2        ]
>> [Switching to Thread 1 (LWP 1)]
>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
>> satisfy query
>> (gdb) run -np 6 --host sunpc1,linpc1,tyr init_finalize
>> The program being debugged has been started already.
>> Start it from the beginning? (y or n) y
>>
>> Starting program: /usr/local/openmpi-1.9.0_64_cc/bin/mpiexec -np 6 --host 
>> sunpc1,linpc1,tyr 
>> init_finalize
>> [Thread debugging using libthread_db enabled]
>> [New Thread 1 (LWP 1)]
>> [New LWP    2        ]
>> ^CKilled by signal 2.
>> Killed by signal 2.
>>
>> Program received signal SIGINT, Interrupt.
>> [Switching to Thread 1 (LWP 1)]
>> 0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
>> (gdb) bt
>> #0  0xffffffff7d1dc6b0 in __pollsys () from /lib/sparcv9/libc.so.1
>> #1  0xffffffff7d1cb468 in _pollsys () from /lib/sparcv9/libc.so.1
>> #2  0xffffffff7d170ed8 in poll () from /lib/sparcv9/libc.so.1
>> #3  0xffffffff7e69a630 in poll_dispatch ()
>>    from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
>> #4  0xffffffff7e6894ec in opal_libevent2021_event_base_loop ()
>>    from /usr/local/openmpi-1.9.0_64_cc/lib64/libopen-pal.so.0
>> #5  0x000000010000eb14 in orterun (argc=1757447168, argv=0xffffff7ed8550cff)
>>     at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/orterun.c:1090
>> #6  0x0000000100004e2c in main (argc=256, argv=0xffffff7ed8af5c00)
>>     at ../../../../openmpi-dev-602-g82c02b4/orte/tools/orterun/main.c:13
>> (gdb) 
>>
>> Any ideas? Unfortunately I'm leaving for vaccation so that I cannot test
>> any patches until the end of the year. Neverthess I wanted to report the
>> problem. At the moment I cannot test if I have the same behaviour in a
>> homogeneous environment with three machines because the new version isn't
>> available before tomorrow on the other machines. I used the following
>> configure command.
>>
>> ../openmpi-dev-602-g82c02b4/configure 
>> --prefix=/usr/local/openmpi-1.9.0_64_cc \
>>   --libdir=/usr/local/openmpi-1.9.0_64_cc/lib64 \
>>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>   JAVA_HOME=/usr/local/jdk1.8.0 \
>>   LDFLAGS="-m64 -mt" \
>>   CC="cc" CXX="CC" FC="f95" \
>>   CFLAGS="-m64 -mt" CXXFLAGS="-m64 -library=stlport4" FCFLAGS="-m64" \
>>   CPP="cpp" CXXCPP="cpp" \
>>   CPPFLAGS="" CXXCPPFLAGS="" \
>>   --enable-mpi-cxx \
>>   --enable-cxx-exceptions \
>>   --enable-mpi-java \
>>   --enable-heterogeneous \
>>   --enable-mpi-thread-multiple \
>>   --with-threads=posix \
>>   --with-hwloc=internal \
>>   --without-verbs \
>>   --with-wrapper-cflags="-m64 -mt" \
>>   --with-wrapper-cxxflags="-m64 -library=stlport4" \
>>   --with-wrapper-ldflags="-mt" \
>>   --enable-debug \
>>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
>>
>> Furthermore I used the following test program.
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include "mpi.h"
>>
>> int main (int argc, char *argv[])
>> {
>>   MPI_Init (&argc, &argv);
>>   printf ("Hello!\n");
>>   MPI_Finalize ();
>>   return EXIT_SUCCESS;
>> }
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26067.php

Reply via email to