Dear All, As already said on this mailing list, I found that a simple Hello_world program does not necessarily end (the program just hangs after the MPI_Finalize(), and I can printf the MPI_FINALIZED which confirm that the MPI part of the code has finished, but the exit() or return() never ends).
So I tried to use MPI_Abort(), and observed two different behaviors (description of the architecture is given below). Either it ends with a segfault, or the application doesn't return to shell, even if the string "MPI_ABORT was [...] here)." appears on screen (program just hangs, as with MPI_Finalize()). This is annoying since I need several execution in a batch script, since several submission cost a lot of time in queues. Then, if you have any tips to bypass the hanging of the application, I take it (even if it means recompile OpenMPI with specific options of course). Thank you! .Yves. Here is an example of the output produced on screen. Note that errorcode is the rank of the process which called MPI_Abort(). ############################################ MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpiexec has exited due to process rank 0 with PID 18062 on node ha8000-1 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpiexec (as reported here). -------------------------------------------------------------------------- [ha8000-1:18060] *** Process received signal *** [ha8000-1:18060] Signal: Segmentation fault (11) [ha8000-1:18060] Signal code: Address not mapped (1) [ha8000-1:18060] Failing at address: 0x2aaaac1bd940 Segmentation fault ############################################ The architecture is a Quad-Core AMD Opteron(tm) Processor 8356, Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (10G-PCIE-8A), the version of OMPI is 1.4.2 and have been compiled with GCC-4.5 $>ompi_info Package: Open MPI p10015@ha8000-1 Distribution Open MPI: 1.4.2 Open MPI SVN revision: r23093 Open MPI release date: May 04, 2010 Open RTE: 1.4.2 Open RTE SVN revision: r23093 Open RTE release date: May 04, 2010 OPAL: 1.4.2 OPAL SVN revision: r23093 OPAL release date: May 04, 2010 Ident string: 1.4.2 Prefix: /home/p10015/openmpi Configured architecture: x86_64-unknown-linux-gnu Configure host: ha8000-1 Configured by: p10015 Configured on: Wed May 19 19:01:19 JST 2010 Configure host: ha8000-1 Built by: p10015 Built on: Wed May 19 21:03:33 JST 2010 Built host: ha8000-1 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0 C compiler absolute: C++ compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-g++ C++ compiler absolute: Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: yes, progress: yes) Sparse Groups: yes Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: yes MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.2) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.2) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.2) MCA carto: file (MCA v2.0, API v2.0, Component v1.4.2) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2) MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2) MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.2) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.2) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.2) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.2) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.2) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.2) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: self (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.2) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.2) MCA io: romio (MCA v2.0, API v2.0, Component v1.4.2) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.2) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.2) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.2) MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.2) MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.2) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.2) MCA pml: v (MCA v2.0, API v2.0, Component v1.4.2) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.2) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.2) MCA btl: self (MCA v2.0, API v2.0, Component v1.4.2) MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.2) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.2) MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.2) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.2) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.2) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.2) MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.2) MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.2) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.2) MCA odls: default (MCA v2.0, API v2.0, Component v1.4.2) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.2) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4.2) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.2) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.2) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.2) MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.2) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.2) MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.2) MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.2) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.2) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.2) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.2) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: env (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.2) MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.2) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.2) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.2) -- Yves Caniou Associate Professor at Université Lyon 1, Member of the team project INRIA GRAAL in the LIP ENS-Lyon, Délégation CNRS in Japan French Laboratory of Informatics (JFLI), * in Information Technology Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan tel: +81-3-5841-0540 * in National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan tel: +81-3-4212-2412 http://graal.ens-lyon.fr/~ycaniou/