[OMPI users] libevent hangs on app finalize stage

Leonid Thu, 15 Jan 2015 02:54:47 -0500 (EST)

Hi all.

I believe there is a bug in event_base_loop() function from file event.c(opal/mca/event/libevent2022/libevent/).

Consider the case when application is going to be finalized and bothevent_base_loop() and event_base_loopbreak() are called in the same timein parallel threads.

Then if event_base_loopbreak() happens to acquire lock first, it willset "event_base->event_break = 1", but won't send any signal to eventloop, because it did not started yet.

After that, event_base_loop() will acquire the lock and will clearevent_break flag with the following statement: "base->event_gotterm =base->event_break = 0;". Then it will go into polling with timeout = -1and thus block forever.

This issue was reproduced on a custom compiler (using Lulesh benchmarkand x86 4-core PC), but it can be also reproduced for me with GCCcompiler (on almost any benchmark and in same HW settings) by puttingsome delay to orte_progress_thread_engine() function:


static void* orte_progress_thread_engine(opal_object_t *obj)
{
    while (orte_event_base_active) {

usleep(1000); // add sleep to allow orte_ess_base_app_finalize()set orte_event_base_active flag to false

      opal_event_loop(orte_event_base, OPAL_EVLOOP_ONCE);
    }
    return OPAL_THREAD_CANCELLED;
}

I am not completely sure what should be the best fix for described problem.

[OMPI users] libevent hangs on app finalize stage

Reply via email to