Given that you could only reproduce it with either your custom compiler or by 
forcibly introducing a delay, is this indicating an issue with the custom 
compiler? It does seem strange that we don't see this anywhere else, given the 
number of times that code gets run.

Only alternative solution I can think of would be to push the finalize request 
into the event loop, and thus execute the loopbreak from within an event. You 
might try and see if that solves the problem.


> On Jan 14, 2015, at 11:54 PM, Leonid <lchis...@pathscale.com> wrote:
> 
> Hi all.
> 
> I believe there is a bug in event_base_loop() function from file event.c 
> (opal/mca/event/libevent2022/libevent/).
> 
> Consider the case when application is going to be finalized and both 
> event_base_loop() and event_base_loopbreak() are called in the same time in 
> parallel threads.
> 
> Then if event_base_loopbreak() happens to acquire lock first, it will set 
> "event_base->event_break = 1", but won't send any signal to event loop, 
> because it did not started yet.
> 
> After that, event_base_loop() will acquire the lock and will clear 
> event_break flag with the following statement: "base->event_gotterm = 
> base->event_break = 0;". Then it will go into polling with timeout = -1 and 
> thus block forever.
> 
> This issue was reproduced on a custom compiler (using Lulesh benchmark and 
> x86 4-core PC), but it can be also reproduced for me with GCC compiler (on 
> almost any benchmark and in same HW settings) by putting some delay to 
> orte_progress_thread_engine() function:
> 
> static void* orte_progress_thread_engine(opal_object_t *obj)
> {
>    while (orte_event_base_active) {
>      usleep(1000); // add sleep to allow orte_ess_base_app_finalize() set 
> orte_event_base_active flag to false
>      opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
>    }
>    return OPAL_THREAD_CANCELLED;
> }
> 
> I am not completely sure what should be the best fix for described problem.
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/01/26181.php

Reply via email to