On Mon, 17 Jan 2022 09:00:25 GMT, Kevin Walls <[email protected]> wrote:
>> Hi @navyxliu,
>>
>> nice catch. I can see how this can be annoying.
>>
>> I propose a simpler and more robust way to fix it though. We (1) set up
>> general hotspot signal handling very early, then (2) proceed to initialize
>> the heap - which you have shown can take some time - then (3) set up SIGQUIT
>> handling. We core if we get a quit signal before (3).
>>
>> I would add SIGQUIT handling to the general signal handler too, just to
>> cover the time frame between (1) and (3). At (3), it would be overwritten,
>> but that would be fine. Before (3), we could just ignore SIGQUIT, or print
>> out some generic information (I assume thread dumps are not yet possible at
>> this stage).
>>
>> Since the documented behavior for the JVM is to threaddump on SIGQUIT unless
>> we run with -Xrs, I think this is acceptable behavior. Not printing thread
>> dump or only printing partial information is preferable to quitting with
>> core.
>>
>> Then, this would work for any client that sends sigquit to a JVM, not just
>> those using the attach framework. And it would work on all Posix platforms,
>> not just Linux. And we'd would not have to rely on parsing the proc fs.
>>
>> Als note that a solution implemented in the client as you did has the
>> disadvantage that I need a modern jcmd for it to work. However, I often just
>> use whatever jcmd is in the path. Better to handle this in the receiving
>> hotspot.
>>
>> I sketched out a simple patch to test if what I propose can work:
>>
>>
>> diff --git a/src/hotspot/os/posix/signals_posix.cpp
>> b/src/hotspot/os/posix/signals_posix.cpp
>> index 2c020a79408..3f0dd91e03b 100644
>> --- a/src/hotspot/os/posix/signals_posix.cpp
>> +++ b/src/hotspot/os/posix/signals_posix.cpp
>> @@ -615,6 +615,11 @@ int JVM_HANDLE_XXX_SIGNAL(int sig, siginfo_t* info,
>> #endif // ZERO
>> }
>>
>> +if (sig == BREAK_SIGNAL) {
>> + printf("too early for thread dumps...\n");
>> + signal_was_handled = true;
>> +}
>> +
>> // Ignore SIGPIPE and SIGXFSZ (4229104, 6499219).
>> if (!signal_was_handled &&
>> (sig == SIGPIPE || sig == SIGXFSZ)) {
>> @@ -1279,6 +1284,7 @@ void install_signal_handlers() {
>> set_signal_handler(SIGFPE);
>> PPC64_ONLY(set_signal_handler(SIGTRAP);)
>> set_signal_handler(SIGXFSZ);
>> + set_signal_handler(BREAK_SIGNAL); // just for initialization phase
>>
>> It still misses a number of things (I did not check signal mask setup and
>> ReduceSignalUsage must be handled too), but it shows the general direction
>> and worked as expected.
>>
>> Cheers, Thomas
>
>> I propose a simpler and more robust way to fix it though
>
> Great, this is the kind of thing I was heading towards with the conversation
> in the bug text. Although not sure why I could not reproduce the problem,
> with various different JDK versions.
Apologies @kevinjwalls as I also missed the discussion in the JBS issue.
Ideally we would know if the target VM is ready before we send the SIGQUIT to
attach - e.g. writing a well-known file that the attach mechanism looks for
before trying to attach. That seems feasible but perhaps costly and not always
possible(?) ...
What has been presented here is a side-channel way of knowing. In that respect
I like this. It is a pity it is Linux only.
The alternative suggestions of just making the window during which SIGQUIT
terminates the VM process small enough to be un-hittable, also has merit. I
don't think we have to try and accommodate extreme code that just looks up a
process in the process table and throws a signal at it. Ignoring SIGQUIT during
the early VM startup seems a reasonable solutions (we can't produce a thread
dump at that time anyway). It seems to me that we can simply install
UserHandler for SIGQUIT very early in the VM initialization process and it will
be a no-op until the JDK signal handling is properly initialized (just need to
fix one assert).
Cheers,
David
-------------
PR: https://git.openjdk.java.net/jdk/pull/7003