On Fri, 6 Mar 2026 14:36:37 GMT, Zhengyu Gu <[email protected]> wrote:
>> Please review this change that fixes crashes by jmm_GetThreadInfo() call.
>>
>> There are several issues:
>> - ThreadIdTable::lazy_initialize() has typical double-checked locking
>> pattern without proper memory barrier, that may result in
>> uninitialized/partial initialized table to be observed by other threads.
>> Added release/acquire barrier to address this issue.
>> - Query ThreadIdTable can race add/remove thread operations. In short, the
>> thread returned from the query may be freed. Fortunately,
>> jmm_GetThreadInfo() acquires stable thread list before query, so we only
>> need to make sure that returned thread is in the list (checking
>> thread->is_exiting() does not help due to the race)
>> - I moved thread Id insertion code from ThreadSMR to Threads, to be
>> symmetric to thread Id removal code.
>>
>> Tests:
>> - [x] Tier1 on Linux and MacOSX (fastdebug)
>> (`tools/javac/annotations/typeAnnotations/IncorrectCastOffsetTest.java`
>> failure seems unrelated, it also fails in master)
>
> Zhengyu Gu has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Remove tailing whitespaces
Very surprised this racy initialization got through the initial review process.
:( The `_initialized` field could/should be replaced by an `Atomic<bool>`.
IIUC the main issue is the fact two threads can populate the table from two
different ThreadsLists and no guarantee a thread it then finds was covered by
the ThreadsList it was using - so good catch for that.
That said I have a few issues.
Thanks
src/hotspot/share/runtime/threadSMR.cpp line 730:
> 728: }
> 729: } else if (includes(thread)) {
> 730: // The thread is protected by this list
It needs to be in the list and not exiting.
src/hotspot/share/runtime/threadSMR.cpp line 889:
> 887: jlong tid = SharedRuntime::get_java_tid(thread);
> 888: ThreadIdTable::add_thread(tid, thread);
> 889: }
I would not move this code. Logically the management of the ThreadIdTable is
part of ThreadSMR. There must be reason we do delete from other code, but I'd
be looking at whether that code can move into ThreadSMR rather than moving this
code out.
-------------
Changes requested by dholmes (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/30105#pullrequestreview-3912465589
PR Review Comment: https://git.openjdk.org/jdk/pull/30105#discussion_r2902819304
PR Review Comment: https://git.openjdk.org/jdk/pull/30105#discussion_r2902841062