Which spurious wake-ups ? What is the problem with pthread_cond_wait ?

Anyway, my point is that the patch is not correct. It will lead to deadlocks in nearly all the situations. Let's just look at the req_wait.c file. Here is the diff:

Modified: trunk/ompi/request/req_wait.c
===================================================================
--- trunk/ompi/request/req_wait.c       2006-01-12 04:05:02 UTC (rev 8681)
+++ trunk/ompi/request/req_wait.c       2006-01-12 17:13:08 UTC (rev 8682)
@@ -66,7 +66,10 @@
     /* give up and sleep until completion */
     OPAL_THREAD_LOCK(&ompi_request_lock);
     ompi_request_waiting++;
-    do {
+    /*
+ * We will break out of while{} as soon as all requests have completed.
+     */
+    while (1) {
         rptr = requests;
         num_requests_null_inactive = 0;
         for (i = 0; i < count; i++, rptr++) {
@@ -87,10 +90,10 @@
         }
         if(num_requests_null_inactive == count)
             break;
-        if (completed < 0) {
+        while (completed < 0) {
opal_condition_wait(&ompi_request_cond, &ompi_request_lock);
         }
-    } while (completed < 0);
+    }
     ompi_request_waiting--;
     OPAL_THREAD_UNLOCK(&ompi_request_lock);

Imagine one thread goes in the ompi_request_wait_any function. It set completed to -1 (initialization) then it test the status of the requests ... and let's suppose they are all actives. completed is still set to -1 in this case. Then the thread reach the for loop around the requests starting at line 72. As any of the requests are completed when it goes out of this loop completed is still set to -1. Now the thread will reach the
+        while (completed < 0) {
opal_condition_wait(&ompi_request_cond, &ompi_request_lock); As completed is a local variable the only thread that can modify it is the thread that will be in the opal_condition_wait. As now we look around the condition as long as the completed is less than zero ... it look like we will loop there forever ... Basically, the if that was there before allow the thread to redo the check every time it get out from the opal_condition_wait, giving a chance to modify the completed variable.

Similar behavior happens in all the patched places. Tell me if I'm wrong ... otherwise the patch should be rolled-back ASAP as we are not thread safe anymore ...

  Thanks,
    george.



On Jan 11, 2006, at 3:05 AM, Rainer Keller wrote:

Hello dear all,
I had a point on the tbd-list, that I would like to ask here:
 - Shouldn't we have a while-loop condition around every occurence
   of opal_condition_wait (spurious wake-ups)
   As we may do a pthread_cond_wait,
    e.g. in opal_free_list.h and OPAL_FREE_LIST_WAIT ?

Occurrences:
      ompi/class/ompi_free_list.h
      opal/class/opal_free_list.h
      ompi/request/req_wait.c          /* Two Occurences: not a
               must, but... */
      orte/mca/gpr/proxy/gpr_proxy_compound_cmd.c
      orte/mca/iof/base/iof_base_flush.c:108
      orte/mca/pls/rsh/pls_rsh_module.c:892

May I check in the patch attached below?

"Half of what I say is meaningless; but I say it so that the other half may reach you"
                                  Kahlil Gibran


Reply via email to