Are the procs still alive? Is this on a single node?

> On Jun 30, 2016, at 8:49 AM, Orion Poplawski <or...@cora.nwra.com> wrote:
> 
> I'm seeing hangs when MPI_Abort is called.  This is with openmpi 1.10.3.  e.g:
> 
> program output:
> 
> Testing  -- big dataset test (bigdset)
> Proc 3: *** Parallel ERROR ***
>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
> aborting MPI processes
> Testing  -- big dataset test (bigdset)
> Proc 0: *** Parallel ERROR ***
>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
> aborting MPI processes
> Testing  -- big dataset test (bigdset)
> Proc 2: *** Parallel ERROR ***
>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
> with errorcode 1.
> 
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> Testing  -- big dataset test (bigdset)
> Proc 5: *** Parallel ERROR ***
>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
> aborting MPI processes
> aborting MPI processes
> Testing  -- big dataset test (bigdset)
> Proc 1: *** Parallel ERROR ***
>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
> aborting MPI processes
> Testing  -- big dataset test (bigdset)
> Proc 4: *** Parallel ERROR ***
>    VRFY (sizeof(MPI_Offset)>4) failed at line  479 in ../../testpar/t_mdset.c
> aborting MPI processes
> 
> 
> strace of mpiexec process:
> 
> poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN},
> {fd=14, events=POLLIN}], 4, -1
> 
> mpiexec 21511 orion    1w      REG        8,3    10547 17696145
> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog
> mpiexec 21511 orion    2w      REG        8,3    10547 17696145
> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/builddir/build/BUILD/hdf5-1.8.17/openmpi/testpar/testphdf5.chklog
> mpiexec 21511 orion    3u     unix 0xdaedbc80      0t0  4818918 type=STREAM
> mpiexec 21511 orion    4u     unix 0xdaed8000      0t0  4818919 type=STREAM
> mpiexec 21511 orion    5u  a_inode       0,11        0     8731 [eventfd]
> mpiexec 21511 orion    6u      REG       0,38        0  4818921
> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/dev/shm/open_mpi.0000
> (deleted)
> mpiexec 21511 orion    7r     FIFO       0,10      0t0  4818922 pipe
> mpiexec 21511 orion    8w     FIFO       0,10      0t0  4818922 pipe
> mpiexec 21511 orion    9r      DIR        8,3     4096 15471703
> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root
> mpiexec 21511 orion   10r      DIR       0,16        0       82
> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/sys/firmware/devicetree/base/cpus
> mpiexec 21511 orion   11u     IPv4    4818926      0t0      TCP *:39619 
> (LISTEN)
> mpiexec 21511 orion   12r     FIFO       0,10      0t0  4818927 pipe
> mpiexec 21511 orion   13w     FIFO       0,10      0t0  4818927 pipe
> mpiexec 21511 orion   14r     FIFO        8,3      0t0 17965730
> /var/lib/mock/fedora-rawhide-armhfp--orion-hdf5/root/tmp/openmpi-sessions-mockbuild@arm03-packager00_0/46622/0/debugger_attach_fifo
> 
> Any suggestions on what to look for?  FWIW, it was a 6 process run on a 4 core
> machine.
> 
> Thanks.
> 
> -- 
> Orion Poplawski
> Technical Manager                     303-415-9701 x222
> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
> 3380 Mitchell Lane                       or...@nwra.com
> Boulder, CO 80301                   http://www.nwra.com
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29573.php

Reply via email to