On Tue, Nov 20, 2018 at 11:38 AM David Holmes <david.hol...@oracle.com> wrote: > > Hi Thomas, > > Thanks for taking a look at this. > > On 20/11/2018 8:20 pm, Thomas Stüfe wrote: > > Hi David, > > > > out of interest, would pthread_getcpuclockid also crash when fed an > > pthread_t of an existing thread which has been suspended and not yet > > started (on Solaris, AIX)? Not that it would have much to report :) > > I don't know for certain but I would certainly hope not! Once the > pthread_t has been returned to the caller it should be valid. Note > however on Solaris we don't normally use pthreads. > > > -- > > > > 1643 // exclude externally visible JavaThreads > > 1644 if (thread->is_Java_thread() && > > !thread->is_hidden_from_external_view()) { > > 1645 return; > > 1646 } > > 1647 > > 1648 // NonJavaThread instances may not be fully initialized yet, so > > we need to > > 1649 // skip any that aren't - check for zero stack_size() > > 1650 if (!thread->is_Java_thread() && thread->stack_size() == 0) { > > 1651 return; > > 1652 } > > > > So do we really want to skip the check for externally hidden java > > threads? If not, would not this b better: > > > > 1648 // Thread instances may not be fully initialized yet, so we need to > > 1649 // skip any that aren't - check for zero stack_size() > > 1650 if (thread->stack_size() == 0) { > > 1651 return; > > 1652 } > > JavaThreads can't be "not fully initialized" as they don't appear in the > ThreadsList until they are fully initialized.
Are you sure? See e.g. CompileBroker::make_thread() : Threads::add(thread); Thread::start(thread); since stack_size is still 0 before Thread::start() on platforms where the threads are created suspended, you could encounter threads with stack_size==0. I do not know whether this translates to an invalid pthread_t handle though (hence my question about suspended threads). > So I prefer to be clear > this only affects NJTs. > > > -- > > > > I agree an "is_initialized" flag would be better. We do have the > > ThreadState in the associated OsThread, could that be used? > > I suppose it could use: > > thread->osThread()->get_state() >= RUNNABLE > > but OSThread ThreadState is a bit of an anachronism - as it states in > the header: > > // Note: the ThreadState is legacy code and is not correctly implemented. > // Uses of ThreadState need to be replaced by the state in the JavaThread. > > so I'd rather not use that. Checking the stack_size has been set ensures > the thread has reached as "reasonable" point in its execution. It's > later than strictly necessary but we're racing regardless. > Querying for thread_stack==0 is certainly fine at this point. Thanks, Thomas > Thanks, > David > ----- > > > Thanks, Thomas > > On Tue, Nov 20, 2018 at 9:53 AM David Holmes <david.hol...@oracle.com> > > wrote: > >> > >> After discussions with Kim I've decided to split out the NJT list update > >> into a separate RFE: > >> > >> https://bugs.openjdk.java.net/browse/JDK-8214097 > >> > >> So only the change in management.cpp needs reviewing and testing. > >> > >> Updated webrev: > >> > >> http://cr.openjdk.java.net/~dholmes/8212207/webrev.v2/ > >> > >> Thanks, > >> David > >> > >> On 20/11/2018 10:01 am, David Holmes wrote: > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8212207 > >>> webrev: http://cr.openjdk.java.net/~dholmes/8212207/webrev/ > >>> > >>> There is an internal management API that reports CPU times for > >>> NonJavaThreads (NJTs). That functionality requires a valid/live target > >>> thread so that we can use its pthread_t identity to obtain its CPU clock > >>> via pthread_getcpuclockid(). > >>> > >>> There is an iteration mechanism for NJTs in which the NJT is registered > >>> during its constructor and de-registered during its destructor. A thread > >>> that has only been constructed has not yet executed and so is not a > >>> valid target for this management API. This seems to be the cause of > >>> failures reported in this bug (and JDK-8213434). Registering a NJT only > >>> when it starts executing is an appealing fix for this, but that impacts > >>> all current users of the NJT list and straight-away causes a problem > >>> with the BarrierSet initialization logic. So I don't attempt that. > >>> > >>> Instead the first part of the fix is for ThreadTimesClosure::do_thread > >>> to skip threads that have not yet executed - which we can recognize by > >>> seeing an uninitialized (i.e. zero) stackbase. > >>> > >>> A second part of the fix, which can be deferred to a separate RFE for > >>> NJT lifecycle management if desired, tackles the problem of encountering > >>> a terminated thread during iteration - which can also lead to SEGVs. > >>> This can arise because NJT's are not actually "destructed", even if they > >>> terminate, and so they never get removed from the NJT list. Calling > >>> destructors is problematic because the code using these NJTs assume they > >>> are always valid. So the fix in this case is to move the de-registering > >>> from the NJT list out of the destructor and into the Thread::call_run() > >>> method so it is done before a thread actually terminates. This can be > >>> considered a first step in cleaning up the NJT lifecycle, where the > >>> remaining steps touch on a lot of areas and so need to be handled > >>> separately e.g. see JDK-8087340 for shutting down WorkGang GC worker > >>> threads. > >>> > >>> Testing: tiers 1 -3 > >>> > >>> I should point out that I've been unable to reproduce this failure > >>> locally, even after thousands of runs. I'm hoping Zhengyu can test this > >>> in the conditions reported in JDK-8213434. > >>> > >>> Thanks, > >>> David