Hi David, looks good now.
Thanks, Volker On Fri, Aug 5, 2016 at 4:28 AM, David Holmes <david.hol...@oracle.com> wrote: > Hi Volker, > > Thanks for looking at this. > > On 5/08/2016 1:48 AM, Volker Simonis wrote: >> >> Hi David, >> >> thanks for doing this change on all platforms. >> The fix looks good. Maybe you can just extend the following comment with >> something like: >> >> // Note that the SR_lock plays no role in this suspend/resume protocol. >> // It is only used in SR_handler as a thread termination indicator if >> NULL. > > > Darn this code is confusing - too many "SR"'s :( I have added > > // Note that the SR_lock plays no role in this suspend/resume protocol, > // but is checked for NULL in SR_handler as a thread termination indicator. > > Updated webrev: > > http://cr.openjdk.java.net/~dholmes/8159461/webrev.v2/ > > This also reminded me to follow up on why the Solaris SR_handler is > different and I found it is not actually installed as a direct signal > handler, but is called from the real signal handler if dealing with a > JavaThread or the VMThread. Consequently the Solaris version of the > SR_handler can not encounter this specific bug and so I have reverted the > changes to os_solaris.cpp > > Thanks, > David > > >> Regards, >> Volker >> >> On Wed, Aug 3, 2016 at 3:13 AM, David Holmes <david.hol...@oracle.com >> <mailto:david.hol...@oracle.com>> wrote: >> >> webrev: http://cr.openjdk.java.net/~dholmes/8159461/webrev/ >> <http://cr.openjdk.java.net/~dholmes/8159461/webrev/> >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8159461 >> <https://bugs.openjdk.java.net/browse/JDK-8159461> >> >> The suspend/resume signal (SR_signum) is never sent to a thread once >> it has started to terminate. On one platform (SuSE 12) we have seen >> what appears to be a "stuck" signal, which is only delivered when >> the terminating thread restores its original signal mask (as if >> pthread_sigmask makes the system realize there is a pending signal - >> we already check the signal was not blocked). At this point in the >> thread termination we have freed the osthread, so the the SR_handler >> would access deallocated memory. In debug builds we first hit an >> assertion that the current thread is a JavaThread or the VMThread - >> that assertion fails, even though it is a JavaThread, because we >> have already executed the ~JavaThread destructor and inside the >> ~Thread destructor we are a plain Thread not a JavaThread. >> >> The fix was to make a small adjustment to the thread termination >> process so that we delete the SR_lock before calling >> os::free_thread(). In the SR_handler() we can then use a NULL check >> of SR_lock() to indicate the thread has terminated and we return. >> >> While only seen on Linux I took the opportunity to apply the fix on >> all platforms and also cleaned up the code where we were using >> Thread::current() unsafely in a signal-handling context. >> >> Testing: regular tier 1 (JPRT) >> Kitchensink (in progress) >> >> As we can't readily reproduce the problem I tested this by having a >> terminating thread raise SR_signum directly from within the ~Thread >> destructor. >> >> Thanks, >> David >> >> >