Markus,

Thanks for the fast review!

Dan


On 6/27/14 11:46 AM, Markus Grönlund wrote:
Hi Dan,

This looks good, thanks for chasing this down!

Cheers
Markus



-----Original Message-----
From: Daniel D. Daugherty
Sent: den 27 juni 2014 18:18
To: hotspot-runtime-...@openjdk.java.net; serviceability-dev@openjdk.java.net
Subject: RFR (XS) fix for a safepoint deadlock (8047720)

Greetings,

I have a fix ready for the following bug:

      8047720 Xprof hangs on Solaris
      https://bugs.openjdk.java.net/browse/JDK-8047720

Here is the webrev URL:

http://cr.openjdk.java.net/~dcubed/8047720-webrev/0-jdk9-hs-rt/

This deadlock occurred between the following threads:

      Main thread   - Trying to stop the WatcherThread as part of
                      shutting down the VM; this thread is blocked
                      on the PeriodicTask_lock which keeps it from
                      reaching a safepoint.
      WatcherThread - Requested a VM_ForceSafepoint to complete
                      a JavaThread::java_suspend() call as part
                      of a FlatProfiler record_thread_ticks()
                      call; this thread owns the PeriodicTask_lock
                      since it is processing a periodic task.
      VMThread      - Trying to start a safepoint; this thread is
                      blocked waiting for the Main thread to reach
                      a safepoint.

The PeriodicTask_lock is one of the VM internal locks and is typically managed 
using Mutex::_no_safepoint_check_flag to avoid deadlocks. Yes, the irony is 
dripping on the floor... :-)

The interesting part of this deadlock is that I think that it is possible for 
other periodic tasks to hit it. Anything that causes the WatcherThread to start 
a safepoint while processing a periodic task should be susceptible to this 
race. Think about the -XX:+DeoptimizeALot option and how it causes VM_Deopt 
requests on thread state transitions... Interesting...

Testing:
      - I found a way to add delays to the right spots in the
        VM to make the deadlock reproduce in just about every
        run of the test associated with the bug. The new
        os::naked_short_sleep() function is your friend. Thanks
        to Fred for adding that! See the bug report for the
        debugging diffs.
      - 72 hours of running the test in the bug report with
        delays enabled for product, fastdebug and jvmg bits
        in parallel on my Solaris X86 server.
      - JPRT test run
      - Aurora Adhoc results are in process; we're having issues
        with both a broken testbase build and infra problems
        with results not being uploaded.


Reply via email to