Weird - it works fine for me: sjc-vpn5-109:mpi rhc$ mpirun -n 3 ./abort Hello, World, I am 1 of 3 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 2.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 1 with PID 22980 on node sjc-vpn5-109.cisco.com exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- Hello, World, I am 0 of 3 Hello, World, I am 2 of 3 I built it with gcc 4.2.1, though - I know we have a problem with shared memory hanging when built with gcc 4.4.x, so I wonder if the issue here is your use of gcc 4.5? Can you try running this again with -mca btl ^sm? On Wed, Jun 2, 2010 at 3:49 AM, Yves Caniou <yves.can...@ens-lyon.fr> wrote: > Dear All, > > As already said on this mailing list, I found that a simple Hello_world > program does not necessarily end (the program just hangs after the > MPI_Finalize(), and I can printf the MPI_FINALIZED which confirm that the > MPI > part of the code has finished, but the exit() or return() never ends). > > So I tried to use MPI_Abort(), and observed two different behaviors > (description of the architecture is given below). > Either it ends with a segfault, or the application doesn't return to shell, > even if the string "MPI_ABORT was [...] here)." appears on screen (program > just hangs, as with MPI_Finalize()). > > This is annoying since I need several execution in a batch script, since > several submission cost a lot of time in queues. Then, if you have any tips > to bypass the hanging of the application, I take it (even if it means > recompile OpenMPI with specific options of course). > > Thank you! > > .Yves. > > Here is an example of the output produced on screen. Note that errorcode is > the rank of the process which called MPI_Abort(). > > ############################################ > MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD > with errorcode 0. > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > You may or may not see output from other processes, depending on > exactly when Open MPI kills them. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec has exited due to process rank 0 with PID 18062 on > node ha8000-1 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpiexec (as reported here). > -------------------------------------------------------------------------- > [ha8000-1:18060] *** Process received signal *** > [ha8000-1:18060] Signal: Segmentation fault (11) > [ha8000-1:18060] Signal code: Address not mapped (1) > [ha8000-1:18060] Failing at address: 0x2aaaac1bd940 > Segmentation fault > ############################################ > > The architecture is a Quad-Core AMD Opteron(tm) Processor 8356, Ethernet > controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (10G-PCIE-8A), the > version of OMPI is 1.4.2 and have been compiled with GCC-4.5 > $>ompi_info > Package: Open MPI p10015@ha8000-1 Distribution > Open MPI: 1.4.2 > Open MPI SVN revision: r23093 > Open MPI release date: May 04, 2010 > Open RTE: 1.4.2 > Open RTE SVN revision: r23093 > Open RTE release date: May 04, 2010 > OPAL: 1.4.2 > OPAL SVN revision: r23093 > OPAL release date: May 04, 2010 > Ident string: 1.4.2 > Prefix: /home/p10015/openmpi > Configured architecture: x86_64-unknown-linux-gnu > Configure host: ha8000-1 > Configured by: p10015 > Configured on: Wed May 19 19:01:19 JST 2010 > Configure host: ha8000-1 > Built by: p10015 > Built on: Wed May 19 21:03:33 JST 2010 > Built host: ha8000-1 > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C > compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-gcc-4.5.0 > C compiler absolute: > C++ compiler: /home/p10015/gcc/bin/x86_64-unknown-linux-gnu-g++ > C++ compiler absolute: > Fortran77 compiler: gfortran > Fortran77 compiler abs: /usr/bin/gfortran > Fortran90 compiler: gfortran > Fortran90 compiler abs: /usr/bin/gfortran > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: yes, progress: yes) > Sparse Groups: yes > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > Heterogeneous support: yes > mpirun default --prefix: yes > MPI I/O support: yes > MPI_WTIME support: gettimeofday > Symbol visibility support: yes > FT Checkpoint support: no (checkpoint thread: no) > MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.2) > MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.2) > MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2) > MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.2) > MCA carto: file (MCA v2.0, API v2.0, Component v1.4.2) > MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2) > MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.2) > MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.2) > MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.2) > MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.2) > MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.2) > MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.2) > MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.2) > MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: self (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.2) > MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.2) > MCA io: romio (MCA v2.0, API v2.0, Component v1.4.2) > MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.2) > MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.2) > MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.2) > MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.2) > MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.2) > MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.2) > MCA pml: v (MCA v2.0, API v2.0, Component v1.4.2) > MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.2) > MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.2) > MCA btl: self (MCA v2.0, API v2.0, Component v1.4.2) > MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.2) > MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.2) > MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.2) > MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.2) > MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.2) > MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.2) > MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.2) > MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.2) > MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.2) > MCA odls: default (MCA v2.0, API v2.0, Component v1.4.2) > MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.2) > MCA rmaps: load_balance (MCA v2.0, API v2.0, Component > v1.4.2) > MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.2) > MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.2) > MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.2) > MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.2) > MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.2) > MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.2) > MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.2) > MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.2) > MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.2) > MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.2) > MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.2) > MCA ess: env (MCA v2.0, API v2.0, Component v1.4.2) > MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.2) > MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.2) > MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.2) > MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.2) > MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.2) > MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.2) > > -- > Yves Caniou > Associate Professor at Université Lyon 1, > Member of the team project INRIA GRAAL in the LIP ENS-Lyon, > Délégation CNRS in Japan French Laboratory of Informatics (JFLI), > * in Information Technology Center, The University of Tokyo, > 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8658, Japan > tel: +81-3-5841-0540 > * in National Institute of Informatics > 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan > tel: +81-3-4212-2412 > http://graal.ens-lyon.fr/~ycaniou/ <http://graal.ens-lyon.fr/%7Eycaniou/> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users