Re: [OMPI users] Abort/ Deadlock issue in allreduce (Gilles Gouaillardet)

Christof Koehler Mon, 12 Dec 2016 11:27:45 -0800

Hello,

yes, I already tried the 2.0.x git branch with the original problem. It
now dies quite noisy


forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line Source             
vasp-mpi-sca       00000000040DD64D  Unknown               Unknown Unknown
...
...
...
mpirun has exited due to process rank 0 with PID 0 on
node node109 exiting improperly. There are three reasons this could
occur:
...
...

but apparently does not hang any more.

Thanks to everyone involved for fixing this !

Best Regards

Christof




On Mon, Dec 12, 2016 at 12:00:01PM -0700, users-requ...@lists.open-mpi.org 
wrote:
> Send users mailing list submissions to
>       users@lists.open-mpi.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
>       users-requ...@lists.open-mpi.org
> 
> You can reach the person managing the list at
>       users-ow...@lists.open-mpi.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Abort/ Deadlock issue in allreduce (Gilles Gouaillardet)
>    2. Re: How to yield CPU more when not computing (was curious
>       behavior during wait for broadcast: 100% cpu) (Dave Love)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 12 Dec 2016 09:32:25 +0900
> From: Gilles Gouaillardet <gil...@rist.or.jp>
> To: users@lists.open-mpi.org
> Subject: Re: [OMPI users] Abort/ Deadlock issue in allreduce
> Message-ID: <8316882f-01a6-8886-5308-bfed9af2a...@rist.or.jp>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
> 
> Christof,
> 
> 
> Ralph fixed the issue,
> 
> meanwhile, the patch can be manually downloaded at 
> https://patch-diff.githubusercontent.com/raw/open-mpi/ompi/pull/2552.patch
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> 
> On 12/9/2016 5:39 PM, Christof Koehler wrote:
> > Hello,
> >
> > our case is. The libwannier.a is a "third party"
> > library which is built seperately and the just linked in. So the vasp
> > preprocessor never touches it. As far as I can see no preprocessing of
> > the f90 source is involved in the libwannier build process.
> >
> > I finally managed to set a breakpoint at the program exit of the root
> > rank:
> >
> > (gdb) bt
> > #0  0x00002b7ccd2e4220 in _exit () from /lib64/libc.so.6
> > #1  0x00002b7ccd25ee2b in __run_exit_handlers () from /lib64/libc.so.6
> > #2  0x00002b7ccd25eeb5 in exit () from /lib64/libc.so.6
> > #3  0x000000000407298d in for_stop_core ()
> > #4  0x00000000012fad41 in w90_io_mp_io_error_ ()
> > #5  0x0000000001302147 in w90_parameters_mp_param_read_ ()
> > #6  0x00000000012f49c6 in wannier_setup_ ()
> > #7  0x0000000000e166a8 in mlwf_mp_mlwf_wannier90_ ()
> > #8  0x00000000004319ff in vamp () at main.F:2640
> > #9  0x000000000040d21e in main ()
> > #10 0x00002b7ccd247b15 in __libc_start_main () from /lib64/libc.so.6
> > #11 0x000000000040d129 in _start ()
> >
> > So for_stop_core is called apparently ? Of course it is below the main()
> > process of vasp, so additional things might happen which are not
> > visible. Is SIGCHILD (as observed when catching signals in mpirun) the
> > signal expectd after a for_stop_core ?
> >
> > Thank you very much for investigating this !
> >
> > Cheers
> >
> > Christof
> >
> > On Thu, Dec 08, 2016 at 03:15:47PM -0500, Noam Bernstein wrote:
> >>> On Dec 8, 2016, at 6:05 AM, Gilles Gouaillardet 
> >>> <gilles.gouaillar...@gmail.com> wrote:
> >>>
> >>> Christof,
> >>>
> >>>
> >>> There is something really odd with this stack trace.
> >>> count is zero, and some pointers do not point to valid addresses (!)
> >>>
> >>> in OpenMPI, MPI_Allreduce(...,count=0,...) is a no-op, so that suggests 
> >>> that
> >>> the stack has been corrupted inside MPI_Allreduce(), or that you are not 
> >>> using the library you think you use
> >>> pmap <pid> will show you which lib is used
> >>>
> >>> btw, this was not started with
> >>> mpirun --mca coll ^tuned ...
> >>> right ?
> >>>
> >>> just to make it clear ...
> >>> a task from your program bluntly issues a fortran STOP, and this is kind 
> >>> of a feature.
> >>> the *only* issue is mpirun does not kill the other MPI tasks and mpirun 
> >>> never completes.
> >>> did i get it right ?
> >> I just ran across very similar behavior in VASP (which we just switched 
> >> over to openmpi 2.0.1), also in a allreduce + STOP combination (some nodes 
> >> call one, others call the other), and I discovered several interesting 
> >> things.
> >>
> >> The most important is that when MPI is active, the preprocessor converts 
> >> (via a #define in symbol.inc) fortran STOP into calls to m_exit() (defined 
> >> in mpi.F), which is a wrapper around mpi_finalize.  So in my case some 
> >> processes in the communicator call mpi_finalize, others call 
> >> mpi_allreduce.  I?m not really surprised this hangs, because I think the 
> >> correct thing to replace STOP with is mpi_abort, not mpi_finalize.  If you 
> >> know where the STOP is called, you can check the preprocessed equivalent 
> >> file (.f90 instead of .F), and see if it?s actually been replaced with a 
> >> call to m_exit.  I?m planning to test whether replacing m_exit with m_stop 
> >> in symbol.inc gives more sensible behavior, i.e. program termination when 
> >> the original source file executes a STOP.
> >>
> >> I?m assuming that a mix of mpi_allreduce and mpi_finalize is really 
> >> expected to hang, but just in case that?s surprising, here are my stack 
> >> traces:
> >>
> >>
> >> hung in collective:
> >>
> >> (gdb) where
> >> #0  0x00002b8d5a095ec6 in opal_progress () from 
> >> /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libopen-pal.so.20
> >> #1  0x00002b8d59b3a36d in ompi_request_default_wait_all () from 
> >> /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
> >> #2  0x00002b8d59b8107c in ompi_coll_base_allreduce_intra_recursivedoubling 
> >> () from /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
> >> #3  0x00002b8d59b495ac in PMPI_Allreduce () from 
> >> /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
> >> #4  0x00002b8d598e4027 in pmpi_allreduce__ () from 
> >> /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi_mpifh.so.20
> >> #5  0x0000000000414077 in m_sum_i (comm=..., ivec=warning: Range for type 
> >> (null) has invalid bounds 1..-12884901892
> >> warning: Range for type (null) has invalid bounds 1..-12884901892
> >> warning: Range for type (null) has invalid bounds 1..-12884901892
> >> warning: Range for type (null) has invalid bounds 1..-12884901892
> >> warning: Range for type (null) has invalid bounds 1..-12884901892
> >> warning: Range for type (null) has invalid bounds 1..-12884901892
> >> warning: Range for type (null) has invalid bounds 1..-12884901892
> >> ..., n=2) at mpi.F:989
> >> #6  0x0000000000daac54 in full_kpoints::set_indpw_full (grid=..., 
> >> wdes=..., kpoints_f=...) at mkpoints_full.F:1099
> >> #7  0x0000000001441654 in set_indpw_fock (t_info=..., p=warning: Range for 
> >> type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> ..., wdes=..., grid=..., latt_cur=..., lmdim=Cannot access memory at 
> >> address 0x1
> >> ) at fock.F:1669
> >> #8  fock::setup_fock (t_info=..., p=warning: Range for type (null) has 
> >> invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> warning: Range for type (null) has invalid bounds 1..-1
> >> ..., wdes=..., grid=..., latt_cur=..., lmdim=Cannot access memory at 
> >> address 0x1
> >> ) at fock.F:1413
> >> #9  0x0000000002976478 in vamp () at main.F:2093
> >> #10 0x0000000000412f9e in main ()
> >> #11 0x000000383a41ed1d in __libc_start_main () from /lib64/libc.so.6
> >> #12 0x0000000000412ea9 in _start ()
> >>
> >> hung in mpi_finalize:
> >>
> >> #0  0x000000383a4acbdd in nanosleep () from /lib64/libc.so.6
> >> #1  0x000000383a4e1d94 in usleep () from /lib64/libc.so.6
> >> #2  0x00002b11db1e0ae7 in ompi_mpi_finalize () from 
> >> /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi.so.20
> >> #3  0x00002b11daf8b399 in pmpi_finalize__ () from 
> >> /usr/local/openmpi/2.0.1/x86_64/ib/intel/12.1.6/lib/libmpi_mpifh.so.20
> >> #4  0x00000000004199c5 in m_exit () at mpi.F:375
> >> #5  0x0000000000dab17f in full_kpoints::set_indpw_full (grid=..., 
> >> wdes=Cannot resolve DW_OP_push_object_address for a missing object
> >> ) at mkpoints_full.F:1065
> >> #6  0x0000000001441654 in set_indpw_fock (t_info=..., p=Cannot resolve 
> >> DW_OP_push_object_address for a missing object
> >> ) at fock.F:1669
> >> #7  fock::setup_fock (t_info=..., p=Cannot resolve 
> >> DW_OP_push_object_address for a missing object
> >> ) at fock.F:1413
> >> #8  0x0000000002976478 in vamp () at main.F:2093
> >> #9  0x0000000000412f9e in main ()
> >> #10 0x000000383a41ed1d in __libc_start_main () from /lib64/libc.so.6
> >> #11 0x0000000000412ea9 in _start ()
> >>
> >>
> >>
> >> ____________
> >> ||
> >> |U.S. NAVAL|
> >> |_RESEARCH_|
> >> LABORATORY
> >> Noam Bernstein, Ph.D.
> >> Center for Materials Physics and Technology
> >> U.S. Naval Research Laboratory
> >> T +1 202 404 8628  F +1 202 404 7546
> >> https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
> >
> >
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> <https://rfd.newmexicoconsortium.org/mailman/private/users/attachments/20161212/193d2296/attachment.html>
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 12 Dec 2016 14:24:16 +0000
> From: Dave Love <d.l...@liverpool.ac.uk>
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] How to yield CPU more when not computing
>       (was curious behavior during wait for broadcast: 100% cpu)
> Message-ID: <877f755ipb....@pc102091.liv.ac.uk>
> Content-Type: text/plain; charset=iso-8859-1
> 
> Andreas Sch?fer <gent...@gmx.de> writes:
> 
> >> Yes, as root, and there are N different systems to at least provide
> >> unprivileged read access on HPC systems, but that's a bit different, I
> >> think.
> >
> > LIKWID[1] uses a daemon to provide limited RW access to MSRs for
> > applications. I wouldn't wonder if support for this was added to
> > LIKWID by RRZE.
> 
> Yes, that's one of the N I had in mind; others provide Linux modules.
> 
> >From a system manager's point of view it's not clear what are the
> implications of the unprivileged access, or even how much it really
> helps.  I've seen enough setups suggested for HPC systems in areas I
> understand (and used by vendors) which allow privilege escalation more
> or less trivially, maybe without any real operational advantage.  If
> it's clearly safe and helpful then great, but I couldn't assess that.
> 
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ------------------------------
> 
> End of users Digest, Vol 3674, Issue 1
> **************************************

-- 
Dr. rer. nat. Christof Köhler       email: c.koeh...@bccms.uni-bremen.de
Universitaet Bremen/ BCCMS          phone:  +49-(0)421-218-62334
Am Fallturm 1/ TAB/ Raum 3.12       fax: +49-(0)421-218-62770
28359 Bremen  

PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/

signature.asc
Description: Digital signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Abort/ Deadlock issue in allreduce (Gilles Gouaillardet)

Reply via email to