On 31/03/2021 8:51 pm, Robbin Ehn wrote:
On Wed, 31 Mar 2021 02:43:21 GMT, David Holmes <dhol...@openjdk.org> wrote:

src/hotspot/os/posix/signals_posix.cpp line 1587:

1585:   // destructor has completed.
1586:
1587:   if (thread->is_Java_thread() && 
thread->as_Java_thread()->is_terminated()) {

We need @dholmes-ora to verify that this version of the code will
still solve the bug he was fixing when he added old L1610.

Thanks @dcubed-ojdk - no it won't. The problem is that the signal can hit very 
late in a thread's termination process, after the JavaThread destructor has 
executed, so no query of JavaThread state is possible. So we needed something 
in the Thread state and the SR_lock was a good enough proxy for that. It may be 
possible to use a different Thread field (perhaps _ParkEvent but encapsulated 
in a Thread::has_terminated() helper method).

SR_handler is used for OS-level suspend/resume (not to conflict with this 
change-set).
This feature is only used by JFR (AFAIK), and JFR only samples threads on it's 
ThreadsList.
This means the JavaThread can never be terminated, hence this code will always 
pass.

I need to dig up the exact details but from memory the problem we were fixing was that the signal was seemingly not being delivered to the target thread until it modified its own signal mask sometime during thread termination. IIRC JFR may have a target thread on a ThreadList and try to suspend it, but the signal may not get delivered, consequently JFR will timeout and cancel the suspend request. After that the thread can continue on its way and may start to terminate, at which point the signal gets delivered and the SR_handler runs.

If someone else is using OS-level suspend/resume without a ThreadsList, the bug 
is there is no ThreadsList AFAICT.

Since ThreadLocalStorage::thread() is cleared last in ~Thread with 
Thread::clear_thread_current(); may be in the ~Thread destructor.
So the argument is that would be safe to read stuff from Thread but not 
JavaThread?

If the SR_handler gets a non-null value from Thread::current_or_null_safe() then we know that the signal got delivered before we called clear_current_thread() in the Thread destructor. We don't exactly where, but we know that it could be after the JavaThread destructor and so trying to treat the thread as a JavaThread is definitely not safe (hence the way the bug was discovered!). We deliberately nulled _SR_lock so that we could query it as a termination indicator and we do that while we still have a valid osThread. Hence if you see a non-NULL _SR_lock you know you can safely interact with the osthread.

Since the compiler (and CPU) may reorder and optimize away code, so I argue 
reading from a half destroyed object is not a great idea.

The Thread object itself is not destroyed at this point, only objects referred to there-from.

E.g. Monitor* _SR_lock; is not a volatile pointer; since reading from this 
memory is UB after destruction, compiler is free to remove or not publish the 
store to NULL.

It may well be free to do so but that wasn't what we found. Perhaps it should have been volatile with a storestore() for good measure.

So I suggest either to remove this check, since the only user is using a 
ThreadsList and any other should also be using that too.
Or call Thread::clear_thread_current() before the JavaThread destructor is 
called, that way we can be certain that there is no UB.

That would be very fragile IMO as it would require that no code executing from the destructors can require access to Thread:current() to work. It would certainly take some analysis to determine if it could be done that way. (Crash reporting would be affected.)

There is no UB in what we currently do AFAICS.

David
-----

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191

Reply via email to