Dear David,

I took a look at all occurrences of PeriodicTask_lock. We are in luck:

PeriodicTask::enroll and PeriodicTask::disenroll check for safepoints when
acquiring PeriodicTask_lock.

PeriodicTask::time_to_wait and PeriodicTask::real_time_tick are called only
by the watcher thread.

WatcherThread::unpark is used only in contexts where PeriodicTask_lock is
owned by the caller.

WatcherThread::sleep is called only by the watcher thread.

I took another look at WatcherThread::sleep and realized that there is a
path from acquiring PeriodicTask_lock to waiting on the lock where
_should_terminate is not checked. I added that to the minimal fix attached.

Looking at these methods made me want to clean up a little more. See
better.patch attached.

I feel pretty good about the patch with the limited usage of
no_safepoint_check_flag with PeriodicTask_lock.

Carsten


On Tue, Feb 3, 2015 at 11:28 PM, David Holmes <david.hol...@oracle.com>
wrote:

> On 4/02/2015 1:28 PM, Carsten Varming wrote:
>
>> Dear David Holmes,
>>
>> Please see inline response,
>>
>> On Tue, Feb 3, 2015 at 9:38 PM, David Holmes <david.hol...@oracle.com
>> <mailto:david.hol...@oracle.com>> wrote:
>>
>>     On 4/02/2015 5:00 AM, Carsten Varming wrote:
>>
>>         Greetings all,
>>
>>         I was recently introduced to the deadlock described in
>>         JDK-8047720 and
>>         fixed in JDK9. The fix seems a little messy to me, and it looks
>>         like it
>>         left some very short races in the code. So I decided to clean up
>> the
>>         code. See attached diff.
>>
>>         The change takes a step back and acquires PeriodicTask_lock in
>>         WatcherThread::stop (like before the fix in JDK9), but this time
>>         safepoints are allowed to proceed when acquiring
>> PeriodicTask_lock,
>>         preventing the deadlock.
>>
>>
>>     It isn't obvious that blocking for a safepoint whilst in this method
>>     will always be a safe thing to do. That would need to be examined in
>>     detail - potential interactions can be very subtle.
>>
>>
>> Absolutely true. For reference, the pattern is repeated with the
>> Terminator_lock a few lines below. The pattern is also used in
>> Threads::destroy_vm before and after calling before_exit, and the java
>> shutdown hooks are called right before the call to before_exit. So there
>> is a lot of evidence that safepoints are allowed to happen in this
>> context.
>>
>
> The thread calling before_exit is a JavaThread so of course it
> participates in safepoints. The issue is whether the interaction between
> that thread and the WatcherThread, via the PeriodicTask_lock, can allow for
> the JavaThread blocking at a safepoint whilst owning that lock. If another
> JavaThread can try to lock it without a safepoint check then we can get a
> deadlock.
>
> Cheers,
> David
>
>
>  Thank you for taking your time to look at this,
>> Carsten
>>
>>
>>     David
>>
>>
>>         Let me know what you think,
>>         Carsten
>>
>>
>>
diff --git a/src/share/vm/runtime/task.cpp b/src/share/vm/runtime/task.cpp
--- a/src/share/vm/runtime/task.cpp
+++ b/src/share/vm/runtime/task.cpp
@@ -74,8 +74,7 @@
 }
 
 int PeriodicTask::time_to_wait() {
-  MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ?
-                     NULL : PeriodicTask_lock, Mutex::_no_safepoint_check_flag);
+  assert(PeriodicTask_lock->owned_by_self(), "precondition");
 
   if (_num_tasks == 0) {
     return 0; // sleep until shutdown or a task is enrolled
diff --git a/src/share/vm/runtime/thread.cpp b/src/share/vm/runtime/thread.cpp
--- a/src/share/vm/runtime/thread.cpp
+++ b/src/share/vm/runtime/thread.cpp
@@ -1198,6 +1198,10 @@
 int WatcherThread::sleep() const {
   MutexLockerEx ml(PeriodicTask_lock, Mutex::_no_safepoint_check_flag);
 
+  if (_should_terminate) {
+    return 0; // we did not sleep.
+  }
+
   // remaining will be zero if there are no tasks,
   // causing the WatcherThread to sleep until a task is
   // enrolled
@@ -1252,7 +1256,7 @@
   this->initialize_thread_local_storage();
   this->set_native_thread_name(this->name());
   this->set_active_handles(JNIHandleBlock::allocate_block());
-  while (!_should_terminate) {
+  while (true) {
     assert(watcher_thread() == Thread::current(), "thread consistency check");
     assert(watcher_thread() == this, "thread consistency check");
 
@@ -1288,6 +1292,10 @@
       }
     }
 
+    if (_should_terminate) {
+      break;
+    }
+
     PeriodicTask::real_time_tick(time_waited);
   }
 
@@ -1318,24 +1326,20 @@
 }
 
 void WatcherThread::stop() {
-  // Get the PeriodicTask_lock if we can. If we cannot, then the
-  // WatcherThread is using it and we don't want to block on that lock
-  // here because that might cause a safepoint deadlock depending on
-  // what the current WatcherThread tasks are doing.
-  bool have_lock = PeriodicTask_lock->try_lock();
-
   _should_terminate = true;
-  OrderAccess::fence();  // ensure WatcherThread sees update in main loop
-
-  if (have_lock) {
+  {
+    // Let safepoints happen to avoid the deadlock:
+    // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint.
+    // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock.
+    // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock,
+    // and holds up the VM thread by not reaching a safepoint.
+    MutexLocker mu(PeriodicTask_lock);
+
     WatcherThread* watcher = watcher_thread();
     if (watcher != NULL) {
-      // If we managed to get the lock, then we should unpark the
-      // WatcherThread so that it can see we want it to stop.
+      // Unpark the WatcherThread so that it can see that it should terminate.
       watcher->unpark();
     }
-
-    PeriodicTask_lock->unlock();
   }
 
   // it is ok to take late safepoints here, if needed
@@ -1358,9 +1362,7 @@
 }
 
 void WatcherThread::unpark() {
-  MutexLockerEx ml(PeriodicTask_lock->owned_by_self()
-                   ? NULL
-                   : PeriodicTask_lock, Mutex::_no_safepoint_check_flag);
+  assert(PeriodicTask_lock->owned_by_self(), "precondition");
   PeriodicTask_lock->notify();
 }
 
diff --git a/src/share/vm/runtime/thread.cpp b/src/share/vm/runtime/thread.cpp
--- a/src/share/vm/runtime/thread.cpp
+++ b/src/share/vm/runtime/thread.cpp
@@ -1198,6 +1198,10 @@
 int WatcherThread::sleep() const {
   MutexLockerEx ml(PeriodicTask_lock, Mutex::_no_safepoint_check_flag);
 
+  if (_should_terminate) {
+    return 0; // we did not sleep.
+  }
+
   // remaining will be zero if there are no tasks,
   // causing the WatcherThread to sleep until a task is
   // enrolled
@@ -1318,24 +1322,20 @@
 }
 
 void WatcherThread::stop() {
-  // Get the PeriodicTask_lock if we can. If we cannot, then the
-  // WatcherThread is using it and we don't want to block on that lock
-  // here because that might cause a safepoint deadlock depending on
-  // what the current WatcherThread tasks are doing.
-  bool have_lock = PeriodicTask_lock->try_lock();
-
   _should_terminate = true;
-  OrderAccess::fence();  // ensure WatcherThread sees update in main loop
-
-  if (have_lock) {
+  {
+    // Let safepoints happen to avoid the deadlock:
+    // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint.
+    // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock.
+    // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock,
+    // and holds up the VM thread by not reaching a safepoint.
+    MutexLocker mu(PeriodicTask_lock);
+
     WatcherThread* watcher = watcher_thread();
     if (watcher != NULL) {
-      // If we managed to get the lock, then we should unpark the
-      // WatcherThread so that it can see we want it to stop.
+      // Unpark the WatcherThread so that it can see that it should terminate.
       watcher->unpark();
     }
-
-    PeriodicTask_lock->unlock();
   }
 
   // it is ok to take late safepoints here, if needed

Reply via email to