Re: RFR: 8378084: Implement JEP 528: Post-Mortem Crash Analysis with jcmd [v13]

David Holmes Thu, 07 May 2026 23:15:39 -0700

On Thu, 7 May 2026 16:44:11 GMT, Kevin Walls <[email protected]> wrote:


>> This implements "jcmd on core files" for Linux, and for MiniDumps on Windows 
>> (MacOS is "future work").
>> jcmd "revives" the VM memory and .so/.dll from the core/minidump, and runs 
>> the existing native diagnostic command parser and command implementations.
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Kevin Walls has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Attach API doc update

I've taken an initial walk through the VM changes to try and get a sense of 
this.

I can't really comment on the actual revival mechanism here (though I feel 
there must be some overlap with the CRaC project). I'm also not quite seeing 
the big picture here. IIUC you create a new process, load libjvm and load the 
coredump as a memory image so that this new process now looks like it was the 
crashed JVM process. You take the main thread of the new process and make it 
impersonate the original VMThread (as far as the executing code is concerned) 
and then pretend the VM is at a safepoint. This fake VMThread can then execute 
the actual DCmd logic as if it were the VMThread executing a VM safepoint op. 
Is that the gist of it? IIUC there are no other threads running. I'm actually 
quite surprised you can get away with as few of these revival checks as you 
have - though I find some things very suspect, like forcibly unlocking a Mutex 
that was in an unknown state!

I think we need clear explanations everywhere we have to check for revival.

I can imagine that it is going to be very easy to break this because new code 
may need a revival check and do something special. I can also see these revival 
checks spreading across the code as you support more jcmds.

src/hotspot/os/linux/os_linux.cpp line 4492:

> 4490: }
> 4491: 
> 4492: void os::Linux::revive_init(void) {

Why not just `os::revive_init`?

src/hotspot/os/windows/os_windows.cpp line 4480:

> 4478: }
> 4479: 
> 4480: void os::win32::revive_init(void) {

Again doesn't need to be in win32 class.

src/hotspot/share/runtime/mutex.cpp line 617:

> 615:   raw_set_owner(nullptr);
> 616:   _lock.unlock();
> 617: }

This seems more of a reset than a revive. And I see from your comments below 
you refer to it as a "clear" so that would be a better name to use. If you want 
to be explicit that this API is only related to the "revive" process then maybe 
`clear_for_revive`?

And if a mutex was locked at the time of the crash how can you just call unlock 
from the current thread when it was not the owner?

-------------

PR Review: https://git.openjdk.org/jdk/pull/31011#pullrequestreview-4249617176
PR Review Comment: https://git.openjdk.org/jdk/pull/31011#discussion_r3206404109
PR Review Comment: https://git.openjdk.org/jdk/pull/31011#discussion_r3206412176
PR Review Comment: https://git.openjdk.org/jdk/pull/31011#discussion_r3206500527

Re: RFR: 8378084: Implement JEP 528: Post-Mortem Crash Analysis with jcmd [v13]

Reply via email to