Ralph,

Thanks for the advice.  I have to set 'coll_sync_barrier_before=5' to do
the job.  This is a big change from the default value (1000), so our
application seems to be a pretty extreme case.

T. Rosmond


On Mon, 2011-11-14 at 16:17 -0700, Ralph Castain wrote:
> Yes, this is well documented - may be on the FAQ, but certainly has been in 
> the user list multiple times.
> 
> The problem is that one process falls behind, which causes it to begin 
> accumulating "unexpected messages" in its queue. This causes the matching 
> logic to run a little slower, thus making the process fall further and 
> further behind. Eventually, things hang because everyone is sitting in bcast 
> waiting for the slow proc to catch up, but it's queue is saturated and it 
> can't.
> 
> The solution is to do exactly what you describe - add some barriers to force 
> the slow process to catch up. This happened enough that we even added support 
> for it in OMPI itself so you don't have to modify your code. Look at the 
> following from "ompi_info --param coll sync"
> 
>                 MCA coll: parameter "coll_base_verbose" (current value: <0>, 
> data source: default value)
>                           Verbosity level for the coll framework (0 = no 
> verbosity)
>                 MCA coll: parameter "coll_sync_priority" (current value: 
> <50>, data source: default value)
>                           Priority of the sync coll component; only relevant 
> if barrier_before or barrier_after is > 0
>                MCA coll: parameter "coll_sync_barrier_before" (current value: 
> <1000>, data source: default value)
>                           Do a synchronization before each Nth collective
>                 MCA coll: parameter "coll_sync_barrier_after" (current value: 
> <0>, data source: default value)
>                           Do a synchronization after each Nth collective
> 
> Take your pick - inserting a barrier before or after doesn't seem to make a 
> lot of difference, but most people use "before". Try different values until 
> you get something that works for you.
> 
>  
> On Nov 14, 2011, at 3:10 PM, Tom Rosmond wrote:
> 
> > Hello:
> > 
> > A colleague and I have been running a large F90 application that does an
> > enormous number of mpi_bcast calls during execution.  I deny any
> > responsibility for the design of the code and why it needs these calls,
> > but it is what we have inherited and have to work with.
> > 
> > Recently we ported the code to an 8 node, 6 processor/node NUMA system
> > (lstopo output attached) running Debian linux 6.0.3 with Open_MPI 1.5.3,
> > and began having trouble with mysterious 'hangs' in the program inside
> > the mpi_bcast calls.  The hangs were always in the same calls, but not
> > necessarily at the same time during integration.  We originally didn't
> > have NUMA support, so reinstalled with libnuma support added, but the
> > problem persisted.  Finally, just as a wild guess, we inserted
> > 'mpi_barrier' calls just before the 'mpi_bcast' calls, and the program
> > now runs without problems.
> > 
> > I believe conventional wisdom is that properly formulated MPI programs
> > should run correctly without barriers, so do you have any thoughts on
> > why we found it necessary to add them?  The code has run correctly on
> > other architectures, i.g. Crayxe6, so I don't think there is a bug
> > anywhere.  My only explanation is that some internal resource gets
> > exhausted because of the large number of 'mpi_bcast' calls in rapid
> > succession, and the barrier calls force synchronization which allows the
> > resource to be restored.  Does this make sense?  I'd appreciate any
> > comments and advice you can provide.
> > 
> > 
> > I have attached compressed copies of config.log and ompi_info for the
> > system.  The program is built with ifort 12.0 and typically runs with 
> > 
> >  mpirun -np 36 -bycore -bind-to-core program.exe
> > 
> > We have run both interactively and with PBS, but that doesn't seem to
> > make any difference in program behavior.
> > 
> > T. Rosmond
> > 
> > 
> > <lstopo_out.txt><config.log.bz2><ompi_info.bz2>_______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to