On Mon, 27 Mar 2023 17:30:03 GMT, Roman Kennke <rken...@openjdk.org> wrote:

>>> @rkennke Question about ZGC and LockStack::contains(): how does this work 
>>> with colored pointers? Don't we have to mask the color bits out somehow 
>>> when comparing? E.g. using `ZAddress::offset()` ?
>> 
>> That would be a question for @fisk and/or @stefank. AFAIK, the color bits 
>> should be masked by ZGC barriers *before* the oops enter the synchronization 
>> subsystem. But I kinda suspect that we are somehow triggering a ZGC bug 
>> here. Maybe we require barriers when reading oops from the lock-stack too?
>
>> > > @rkennke Question about ZGC and LockStack::contains(): how does this 
>> > > work with colored pointers? Don't we have to mask the color bits out 
>> > > somehow when comparing? E.g. using `ZAddress::offset()` ?
>> > 
>> > 
>> > That would be a question for @fisk and/or @stefank. AFAIK, the color bits 
>> > should be masked by ZGC barriers _before_ the oops enter the 
>> > synchronization subsystem. But I kinda suspect that we are somehow 
>> > triggering a ZGC bug here. Maybe we require barriers when reading oops 
>> > from the lock-stack too?
>> 
>> Oops that are processed in Thread::oops_do should not have load barriers. 
>> Other oops should have load barriers.
> 
> Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which 
> is called from Thread::oops_do(). But help me here: I believe ZGC processes 
> this stuff concurrently, right? So there might be a window where the 
> lock-stack oops would be unprocessed. The lock-stack would not go under the 
> stack-watermark machinery. And if some code (like JVMTI deadlock detection 
> pause) inspects the lockstack, it might see invalid oops? Is that a plausible 
> scenario, or am I missing something?

> > > > > @rkennke Question about ZGC and LockStack::contains(): how does this 
> > > > > work with colored pointers? Don't we have to mask the color bits out 
> > > > > somehow when comparing? E.g. using `ZAddress::offset()` ?
> > 
> > 
> > > >
> > 
> > 
> > > >
> > 
> > 
> > > > That would be a question for @fisk and/or @stefank. AFAIK, the color 
> > > > bits should be masked by ZGC barriers _before_ the oops enter the 
> > > > synchronization subsystem. But I kinda suspect that we are somehow 
> > > > triggering a ZGC bug here. Maybe we require barriers when reading oops 
> > > > from the lock-stack too?
> > 
> > 
> > > 
> > 
> > 
> > > Oops that are processed in Thread::oops_do should not have load barriers. 
> > > Other oops should have load barriers.
> > 
> > 
> > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() 
> > which is called from Thread::oops_do(). But help me here: I believe ZGC 
> > processes this stuff concurrently, right? So there might be a window where 
> > the lock-stack oops would be unprocessed. The lock-stack would not go under 
> > the stack-watermark machinery. And if some code (like JVMTI deadlock 
> > detection pause) inspects the lockstack, it might see invalid oops? Is that 
> > a plausible scenario, or am I missing something?
> 
> The JVMTI deadlock detection runs in a safepoint, doesn't it? Safepoints call 
> start_processing on all threads in safepoint cleanup for non-GC safepoints. 
> That means the lock stack oops should have been processed when the deadlock 
> detection logic runs in a safepoint.

There appears to be a single code-path that inspects the lock-stack (and also 
the usual stack under non-fast-locking) outside of safepoints:

V  [libjvm.so+0x180abd4]  Threads::owning_thread_from_monitor(ThreadsList*, 
ObjectMonitor*)+0x54  (threads.cpp:1433)
V  [libjvm.so+0x17a4bfc]  ObjectSynchronizer::get_lock_owner(ThreadsList*, 
Handle)+0x9c  (synchronizer.cpp:1109)
V  [libjvm.so+0x1802db0]  ThreadSnapshot::initialize(ThreadsList*, 
JavaThread*)+0x270  (threadService.cpp:942)
V  [libjvm.so+0x1803244]  
ThreadDumpResult::add_thread_snapshot(JavaThread*)+0x5c  (threadService.cpp:567)
V  [libjvm.so+0x12a0f64]  jmm_GetThreadInfo+0x480  (management.cpp:1136)
j  
sun.management.ThreadImpl.getThreadInfo1([JI[Ljava/lang/management/ThreadInfo;)V+0
 java.management@21-internal

Curiously, this seems to be in JMX code, which is also roughly where the 
failure happens. I came across this code a couple of times and couldn't really 
tell if it is safe to do that outside of a safepoint. In doubt I have to assume 
it is not, and maybe this is the source of the failure? WDYT?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485692007

Reply via email to