On Mon, 27 Mar 2023 17:30:03 GMT, Roman Kennke <rken...@openjdk.org> wrote:
>>> @rkennke Question about ZGC and LockStack::contains(): how does this work >>> with colored pointers? Don't we have to mask the color bits out somehow >>> when comparing? E.g. using `ZAddress::offset()` ? >> >> That would be a question for @fisk and/or @stefank. AFAIK, the color bits >> should be masked by ZGC barriers *before* the oops enter the synchronization >> subsystem. But I kinda suspect that we are somehow triggering a ZGC bug >> here. Maybe we require barriers when reading oops from the lock-stack too? > >> > > @rkennke Question about ZGC and LockStack::contains(): how does this >> > > work with colored pointers? Don't we have to mask the color bits out >> > > somehow when comparing? E.g. using `ZAddress::offset()` ? >> > >> > >> > That would be a question for @fisk and/or @stefank. AFAIK, the color bits >> > should be masked by ZGC barriers _before_ the oops enter the >> > synchronization subsystem. But I kinda suspect that we are somehow >> > triggering a ZGC bug here. Maybe we require barriers when reading oops >> > from the lock-stack too? >> >> Oops that are processed in Thread::oops_do should not have load barriers. >> Other oops should have load barriers. > > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() which > is called from Thread::oops_do(). But help me here: I believe ZGC processes > this stuff concurrently, right? So there might be a window where the > lock-stack oops would be unprocessed. The lock-stack would not go under the > stack-watermark machinery. And if some code (like JVMTI deadlock detection > pause) inspects the lockstack, it might see invalid oops? Is that a plausible > scenario, or am I missing something? > > > > > @rkennke Question about ZGC and LockStack::contains(): how does this > > > > > work with colored pointers? Don't we have to mask the color bits out > > > > > somehow when comparing? E.g. using `ZAddress::offset()` ? > > > > > > > > > > > > > > > > > > > > > > > > That would be a question for @fisk and/or @stefank. AFAIK, the color > > > > bits should be masked by ZGC barriers _before_ the oops enter the > > > > synchronization subsystem. But I kinda suspect that we are somehow > > > > triggering a ZGC bug here. Maybe we require barriers when reading oops > > > > from the lock-stack too? > > > > > > > > > > > > > > Oops that are processed in Thread::oops_do should not have load barriers. > > > Other oops should have load barriers. > > > > > > Ok, good. The lockstack is processed in JavaThread::oops_do_no_frames() > > which is called from Thread::oops_do(). But help me here: I believe ZGC > > processes this stuff concurrently, right? So there might be a window where > > the lock-stack oops would be unprocessed. The lock-stack would not go under > > the stack-watermark machinery. And if some code (like JVMTI deadlock > > detection pause) inspects the lockstack, it might see invalid oops? Is that > > a plausible scenario, or am I missing something? > > The JVMTI deadlock detection runs in a safepoint, doesn't it? Safepoints call > start_processing on all threads in safepoint cleanup for non-GC safepoints. > That means the lock stack oops should have been processed when the deadlock > detection logic runs in a safepoint. There appears to be a single code-path that inspects the lock-stack (and also the usual stack under non-fast-locking) outside of safepoints: V [libjvm.so+0x180abd4] Threads::owning_thread_from_monitor(ThreadsList*, ObjectMonitor*)+0x54 (threads.cpp:1433) V [libjvm.so+0x17a4bfc] ObjectSynchronizer::get_lock_owner(ThreadsList*, Handle)+0x9c (synchronizer.cpp:1109) V [libjvm.so+0x1802db0] ThreadSnapshot::initialize(ThreadsList*, JavaThread*)+0x270 (threadService.cpp:942) V [libjvm.so+0x1803244] ThreadDumpResult::add_thread_snapshot(JavaThread*)+0x5c (threadService.cpp:567) V [libjvm.so+0x12a0f64] jmm_GetThreadInfo+0x480 (management.cpp:1136) j sun.management.ThreadImpl.getThreadInfo1([JI[Ljava/lang/management/ThreadInfo;)V+0 java.management@21-internal Curiously, this seems to be in JMX code, which is also roughly where the failure happens. I came across this code a couple of times and couldn't really tell if it is safe to do that outside of a safepoint. In doubt I have to assume it is not, and maybe this is the source of the failure? WDYT? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1485692007