Re: RFR: 8257831: Suspend with handshakes [v2]

David Holmes Wed, 31 Mar 2021 07:00:47 -0700

On 31/03/2021 8:51 pm, Robbin Ehn wrote:

On Wed, 31 Mar 2021 02:43:21 GMT, David Holmes <[email protected]> wrote:

src/hotspot/os/posix/signals_posix.cpp line 1587:

1585:   // destructor has completed.
1586:
1587:   if (thread->is_Java_thread() && 
thread->as_Java_thread()->is_terminated()) {


We need @dholmes-ora to verify that this version of the code will
still solve the bug he was fixing when he added old L1610.


Thanks @dcubed-ojdk - no it won't. The problem is that the signal can hit very 
late in a thread's termination process, after the JavaThread destructor has 
executed, so no query of JavaThread state is possible. So we needed something 
in the Thread state and the SR_lock was a good enough proxy for that. It may be 
possible to use a different Thread field (perhaps _ParkEvent but encapsulated 
in a Thread::has_terminated() helper method).


SR_handler is used for OS-level suspend/resume (not to conflict with this 
change-set).
This feature is only used by JFR (AFAIK), and JFR only samples threads on it's 
ThreadsList.
This means the JavaThread can never be terminated, hence this code will always 
pass.

I need to dig up the exact details but from memory the problem we werefixing was that the signal was seemingly not being delivered to thetarget thread until it modified its own signal mask sometime duringthread termination. IIRC JFR may have a target thread on a ThreadListand try to suspend it, but the signal may not get delivered,consequently JFR will timeout and cancel the suspend request. After thatthe thread can continue on its way and may start to terminate, at whichpoint the signal gets delivered and the SR_handler runs.

If someone else is using OS-level suspend/resume without a ThreadsList, the bug 
is there is no ThreadsList AFAICT.

Since ThreadLocalStorage::thread() is cleared last in ~Thread with 
Thread::clear_thread_current(); may be in the ~Thread destructor.
So the argument is that would be safe to read stuff from Thread but not 
JavaThread?

If the SR_handler gets a non-null value fromThread::current_or_null_safe() then we know that the signal gotdelivered before we called clear_current_thread() in the Threaddestructor. We don't exactly where, but we know that it could be afterthe JavaThread destructor and so trying to treat the thread as aJavaThread is definitely not safe (hence the way the bug wasdiscovered!). We deliberately nulled _SR_lock so that we could query itas a termination indicator and we do that while we still have a validosThread. Hence if you see a non-NULL _SR_lock you know you can safelyinteract with the osthread.

Since the compiler (and CPU) may reorder and optimize away code, so I argue 
reading from a half destroyed object is not a great idea.

The Thread object itself is not destroyed at this point, only objectsreferred to there-from.

E.g. Monitor* _SR_lock; is not a volatile pointer; since reading from this 
memory is UB after destruction, compiler is free to remove or not publish the 
store to NULL.

It may well be free to do so but that wasn't what we found. Perhaps itshould have been volatile with a storestore() for good measure.

So I suggest either to remove this check, since the only user is using a 
ThreadsList and any other should also be using that too.
Or call Thread::clear_thread_current() before the JavaThread destructor is 
called, that way we can be certain that there is no UB.

That would be very fragile IMO as it would require that no codeexecuting from the destructors can require access to Thread:current() towork. It would certainly take some analysis to determine if it could bedone that way. (Crash reporting would be affected.)


There is no UB in what we currently do AFAICS.

David
-----

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191

Re: RFR: 8257831: Suspend with handshakes [v2]

Reply via email to