On Mon, 10 Jan 2022 05:19:26 GMT, Xin Liu <[email protected]> wrote:
> There is a handshake protocol between attach and HotSpot. Linux
> VirtualMachineImpl sends SIGQUIT(3) if the AttachListener has not been
> initialized. It expects "Signal Dispatcher" to handle SIGBREAK(same as
> SIGQUIT) and create AttachListener. However, it is possible that attach
> starts "handshake" before os::initialize_jdk_signal_support() is called. The
> signal handler which handles SIGQUIT has not been installed. Prior to
> os::initialize_jdk_signal_support(), universe_init() is called. Its time
> complexity will be linear to the initial size of heap with 'AlwaysPreTouch'.
> It takes 20~30 seconds to initialize 128g heap on a server-class host(eg. EC2
> m4.10xlarge). Many tools such jcmd, jstack etc may force initializing HotSpot
> quit prematurely.
>
> This patch checks '/proc/$pid/stat' SigCgt bitmask to ensure the signal will
> be caught by the target process before striking it with SIGQUIT. It will
> make HotSpot more robust. The fields of procfs are well
> [documented](https://www.kernel.org/doc/html/latest/filesystems/proc.html#id10)
> and have supported since Linux 2.6.30. libattach.so will not the only
> consumer of it. I see that os_perf_linux.cpp supports it in libjvm.so.
>
>
> Testing
>
> Before, this patch, once initialization takes long time, jcmd may quit the
> java process.
>
> $java -Xms64g -XX:+AlwaysPreTouch -Xlog:gc+heap=debug:stderr
> -XX:ParallelGCThreads=1 &
> [1] 9589
> [0.028s][debug][gc,heap] Minimum heap 68719476736 Initial heap 68719476736
> Maximum heap 68719476736
> [0.034s][debug][gc,heap] Running G1 PreTouch with 1 workers for 16384 work
> units pre-touching 68719476736B.
> $jcmd 9589 VM.flags
> 9589:
> [1] + 9589 quit java -Xms64g -XX:+AlwaysPreTouch
> -Xlog:gc+heap=debug:stderr
> java.io.IOException: No such process
> at jdk.attach/sun.tools.attach.VirtualMachineImpl.sendQuitTo(Native
> Method)
> at
> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:100)
> at
> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
> at
> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
>
> With this patch, jcmd will timeout but won't disrupt 15274.
>
> $ java -Xms64g -XX:+AlwaysPreTouch -XX:ParallelGCThreads=1 &
> [1] 15274
> $ jcmd 15274 VM.flags
> 15274:
> com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file
> /proc/15274/root/tmp/.java_pid15274: target process 15274 doesn't respond
> within 10500ms or HotSpot VM not loaded
> at
> jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:105)
> at
> jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
> at
> jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
> at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
> at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)
Hi Xin,
A couple of comments below. I'm still thinking about this one ... seems okay
but I'm not certain ...
Thanks,
David
src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 119:
> 117: * Return -1 when it runs into any error.
> 118: */
> 119: static int check_sigquit_caught(jint pid) {
can't this just be a bool function? Even if int we don't need a ternary return
value as only zero is of interest.
src/jdk.attach/linux/native/libattach/VirtualMachineImpl.c line 164:
> 162: {
> 163: // Only give up sending SIGQUIT if we see that SigCgt is not set.
> 164: if (check_sigquit_caught(pid) == 0) return;
Suggestion (assumes bool function):
// Only send the SIGQUIT if we can see that the target JVM is ready to catch it.
if (check_sigquit_caught(pid) && kill((pid_t)pid, SIGQUIT) != 0) {
JNU_ThrowIOExceptionWithLastError(env, "kill");
}
-------------
PR: https://git.openjdk.java.net/jdk/pull/7003