Dear David, I took a look at all occurrences of PeriodicTask_lock. We are in luck:
PeriodicTask::enroll and PeriodicTask::disenroll check for safepoints when acquiring PeriodicTask_lock. PeriodicTask::time_to_wait and PeriodicTask::real_time_tick are called only by the watcher thread. WatcherThread::unpark is used only in contexts where PeriodicTask_lock is owned by the caller. WatcherThread::sleep is called only by the watcher thread. I took another look at WatcherThread::sleep and realized that there is a path from acquiring PeriodicTask_lock to waiting on the lock where _should_terminate is not checked. I added that to the minimal fix attached. Looking at these methods made me want to clean up a little more. See better.patch attached. I feel pretty good about the patch with the limited usage of no_safepoint_check_flag with PeriodicTask_lock. Carsten On Tue, Feb 3, 2015 at 11:28 PM, David Holmes <david.hol...@oracle.com> wrote: > On 4/02/2015 1:28 PM, Carsten Varming wrote: > >> Dear David Holmes, >> >> Please see inline response, >> >> On Tue, Feb 3, 2015 at 9:38 PM, David Holmes <david.hol...@oracle.com >> <mailto:david.hol...@oracle.com>> wrote: >> >> On 4/02/2015 5:00 AM, Carsten Varming wrote: >> >> Greetings all, >> >> I was recently introduced to the deadlock described in >> JDK-8047720 and >> fixed in JDK9. The fix seems a little messy to me, and it looks >> like it >> left some very short races in the code. So I decided to clean up >> the >> code. See attached diff. >> >> The change takes a step back and acquires PeriodicTask_lock in >> WatcherThread::stop (like before the fix in JDK9), but this time >> safepoints are allowed to proceed when acquiring >> PeriodicTask_lock, >> preventing the deadlock. >> >> >> It isn't obvious that blocking for a safepoint whilst in this method >> will always be a safe thing to do. That would need to be examined in >> detail - potential interactions can be very subtle. >> >> >> Absolutely true. For reference, the pattern is repeated with the >> Terminator_lock a few lines below. The pattern is also used in >> Threads::destroy_vm before and after calling before_exit, and the java >> shutdown hooks are called right before the call to before_exit. So there >> is a lot of evidence that safepoints are allowed to happen in this >> context. >> > > The thread calling before_exit is a JavaThread so of course it > participates in safepoints. The issue is whether the interaction between > that thread and the WatcherThread, via the PeriodicTask_lock, can allow for > the JavaThread blocking at a safepoint whilst owning that lock. If another > JavaThread can try to lock it without a safepoint check then we can get a > deadlock. > > Cheers, > David > > > Thank you for taking your time to look at this, >> Carsten >> >> >> David >> >> >> Let me know what you think, >> Carsten >> >> >>
diff --git a/src/share/vm/runtime/task.cpp b/src/share/vm/runtime/task.cpp --- a/src/share/vm/runtime/task.cpp +++ b/src/share/vm/runtime/task.cpp @@ -74,8 +74,7 @@ } int PeriodicTask::time_to_wait() { - MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ? - NULL : PeriodicTask_lock, Mutex::_no_safepoint_check_flag); + assert(PeriodicTask_lock->owned_by_self(), "precondition"); if (_num_tasks == 0) { return 0; // sleep until shutdown or a task is enrolled diff --git a/src/share/vm/runtime/thread.cpp b/src/share/vm/runtime/thread.cpp --- a/src/share/vm/runtime/thread.cpp +++ b/src/share/vm/runtime/thread.cpp @@ -1198,6 +1198,10 @@ int WatcherThread::sleep() const { MutexLockerEx ml(PeriodicTask_lock, Mutex::_no_safepoint_check_flag); + if (_should_terminate) { + return 0; // we did not sleep. + } + // remaining will be zero if there are no tasks, // causing the WatcherThread to sleep until a task is // enrolled @@ -1252,7 +1256,7 @@ this->initialize_thread_local_storage(); this->set_native_thread_name(this->name()); this->set_active_handles(JNIHandleBlock::allocate_block()); - while (!_should_terminate) { + while (true) { assert(watcher_thread() == Thread::current(), "thread consistency check"); assert(watcher_thread() == this, "thread consistency check"); @@ -1288,6 +1292,10 @@ } } + if (_should_terminate) { + break; + } + PeriodicTask::real_time_tick(time_waited); } @@ -1318,24 +1326,20 @@ } void WatcherThread::stop() { - // Get the PeriodicTask_lock if we can. If we cannot, then the - // WatcherThread is using it and we don't want to block on that lock - // here because that might cause a safepoint deadlock depending on - // what the current WatcherThread tasks are doing. - bool have_lock = PeriodicTask_lock->try_lock(); - _should_terminate = true; - OrderAccess::fence(); // ensure WatcherThread sees update in main loop - - if (have_lock) { + { + // Let safepoints happen to avoid the deadlock: + // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint. + // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock. + // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock, + // and holds up the VM thread by not reaching a safepoint. + MutexLocker mu(PeriodicTask_lock); + WatcherThread* watcher = watcher_thread(); if (watcher != NULL) { - // If we managed to get the lock, then we should unpark the - // WatcherThread so that it can see we want it to stop. + // Unpark the WatcherThread so that it can see that it should terminate. watcher->unpark(); } - - PeriodicTask_lock->unlock(); } // it is ok to take late safepoints here, if needed @@ -1358,9 +1362,7 @@ } void WatcherThread::unpark() { - MutexLockerEx ml(PeriodicTask_lock->owned_by_self() - ? NULL - : PeriodicTask_lock, Mutex::_no_safepoint_check_flag); + assert(PeriodicTask_lock->owned_by_self(), "precondition"); PeriodicTask_lock->notify(); }
diff --git a/src/share/vm/runtime/thread.cpp b/src/share/vm/runtime/thread.cpp --- a/src/share/vm/runtime/thread.cpp +++ b/src/share/vm/runtime/thread.cpp @@ -1198,6 +1198,10 @@ int WatcherThread::sleep() const { MutexLockerEx ml(PeriodicTask_lock, Mutex::_no_safepoint_check_flag); + if (_should_terminate) { + return 0; // we did not sleep. + } + // remaining will be zero if there are no tasks, // causing the WatcherThread to sleep until a task is // enrolled @@ -1318,24 +1322,20 @@ } void WatcherThread::stop() { - // Get the PeriodicTask_lock if we can. If we cannot, then the - // WatcherThread is using it and we don't want to block on that lock - // here because that might cause a safepoint deadlock depending on - // what the current WatcherThread tasks are doing. - bool have_lock = PeriodicTask_lock->try_lock(); - _should_terminate = true; - OrderAccess::fence(); // ensure WatcherThread sees update in main loop - - if (have_lock) { + { + // Let safepoints happen to avoid the deadlock: + // The VM thread has Threads_lock and waits for Java threads to report back for a safepoint. + // The watcher thread has PeriodicTask_lock and tries to acquire Threads_lock. + // A Java thread executes WatcherThread::stop, tries to acquire PeriodicTask_lock, + // and holds up the VM thread by not reaching a safepoint. + MutexLocker mu(PeriodicTask_lock); + WatcherThread* watcher = watcher_thread(); if (watcher != NULL) { - // If we managed to get the lock, then we should unpark the - // WatcherThread so that it can see we want it to stop. + // Unpark the WatcherThread so that it can see that it should terminate. watcher->unpark(); } - - PeriodicTask_lock->unlock(); } // it is ok to take late safepoints here, if needed