ThreadCpuTimesDeadlock.java crashes with SEGV in pthread_getcpuclockid+0x0

Thomas Stüfe Tue, 20 Nov 2018 02:22:43 -0800

Hi David,

out of interest, would pthread_getcpuclockid also crash when fed an
pthread_t of an existing thread which has been suspended and not yet
started (on Solaris, AIX)? Not that it would have much to report :)


--

1643   // exclude externally visible JavaThreads
1644   if (thread->is_Java_thread() &&
!thread->is_hidden_from_external_view()) {
1645     return;
1646   }
1647
1648   // NonJavaThread instances may not be fully initialized yet, so
we need to
1649   // skip any that aren't - check for zero stack_size()
1650   if (!thread->is_Java_thread() && thread->stack_size() == 0) {
1651     return;
1652   }

So do we really want to skip the check for externally hidden java
threads? If not, would not this b better:

1648   // Thread instances may not be fully initialized yet, so we need to
1649   // skip any that aren't - check for zero stack_size()
1650   if (thread->stack_size() == 0) {
1651     return;
1652   }

--

I agree an "is_initialized" flag would be better. We do have the
ThreadState in the associated OsThread, could that be used?

Thanks, Thomas
On Tue, Nov 20, 2018 at 9:53 AM David Holmes <[email protected]> wrote:
>
> After discussions with Kim I've decided to split out the NJT list update
> into a separate RFE:
>
> https://bugs.openjdk.java.net/browse/JDK-8214097
>
> So only the change in management.cpp needs reviewing and testing.
>
> Updated webrev:
>
> http://cr.openjdk.java.net/~dholmes/8212207/webrev.v2/
>
> Thanks,
> David
>
> On 20/11/2018 10:01 am, David Holmes wrote:
> > Bug: https://bugs.openjdk.java.net/browse/JDK-8212207
> > webrev: http://cr.openjdk.java.net/~dholmes/8212207/webrev/
> >
> > There is an internal management API that reports CPU times for
> > NonJavaThreads (NJTs). That functionality requires a valid/live target
> > thread so that we can use its pthread_t identity to obtain its CPU clock
> > via pthread_getcpuclockid().
> >
> > There is an iteration mechanism for NJTs in which the NJT is registered
> > during its constructor and de-registered during its destructor. A thread
> > that has only been constructed has not yet executed and so is not a
> > valid target for this management API. This seems to be the cause of
> > failures reported in this bug (and JDK-8213434). Registering a NJT only
> > when it starts executing is an appealing fix for this, but that impacts
> > all current users of the NJT list and straight-away causes a problem
> > with the BarrierSet initialization logic. So I don't attempt that.
> >
> > Instead the first part of the fix is for ThreadTimesClosure::do_thread
> > to skip threads that have not yet executed - which we can recognize by
> > seeing an uninitialized (i.e. zero) stackbase.
> >
> > A second part of the fix, which can be deferred to a separate RFE for
> > NJT lifecycle management if desired, tackles the problem of encountering
> > a terminated thread during iteration - which can also lead to SEGVs.
> > This can arise because NJT's are not actually "destructed", even if they
> > terminate, and so they never get removed from the NJT list. Calling
> > destructors is problematic because the code using these NJTs assume they
> > are always valid. So the fix in this case is to move the de-registering
> > from the NJT list out of the destructor and into the Thread::call_run()
> > method so it is done before a thread actually terminates. This can be
> > considered a first step in cleaning up the NJT lifecycle, where the
> > remaining steps touch on a lot of areas and so need to be handled
> > separately e.g. see JDK-8087340 for shutting down WorkGang GC worker
> > threads.
> >
> > Testing: tiers 1 -3
> >
> > I should point out that I've been unable to reproduce this failure
> > locally, even after thousands of runs. I'm hoping Zhengyu can test this
> > in the conditions reported in JDK-8213434.
> >
> > Thanks,
> > David

Re: RFR (S) 8212207: runtime/InternalApi/ThreadCpuTimesDeadlock.java crashes with SEGV in pthread_getcpuclockid+0x0

Reply via email to