Hi all,

It seems like my fix for https://bugs.openjdk.org/browse/JDK-8226919 regressed one use-case for Kubernetes debug containers (and other technically similar approaches). Quoting @jdoylei from https://github.com/openjdk/jdk/pull/17628#issuecomment-1969769654:

"We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two separate containers in a Kubernetes pod. The target JVM container is already running, and then we use kubectl debug --target=... to start a Kubernetes debug container with jcmd that targets the first container. Given the --target option, they share the same Linux process namespace (both think the target JVM is PID 1). But since they are separate containers, they see different root filesystems (jcmd container sees the target JVM tmpdir under /proc/1/root/tmp but has its own distinct /tmp directory)."

I think I can confirm that there is a regression. Using a locally built JDK from master as of 2024-04-28 (16c7dcdb04a7c220684a20eb4a0da4505ae03813), but using raw Docker containers instead of Kubernetes + kubectl debug:


slovdahl@ubuntu2204:~/reproducer$ cat Reproducer.java
import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.ServerSocket;

public class Reproducer {
  public static void main(String[] args) throws InterruptedException, IOException {
    System.out.println("Hello, World!");
    try (var server = new ServerSocket()) {
      server.bind(new InetSocketAddress("localhost", 81));
      System.out.println("Bound to port 81");
      while (true) {
        Thread.sleep(1_000L);
      }
    }
  }
}

slovdahl@ubuntu2204:~/reproducer$ docker run --interactive --tty --rm --name app-container --volume ~/jdk/build/linux-x86_64-server-release/images/jdk/:/jdk --volume .:/app --workdir /app ubuntu:22.04 /bin/bash
root@d1f87b8059ea:/app# /jdk/bin/java -version
openjdk version "23-internal" 2024-09-17
OpenJDK Runtime Environment (build 23-internal-adhoc.slovdahl.jdk)
OpenJDK 64-Bit Server VM (build 23-internal-adhoc.slovdahl.jdk, mixed mode, sharing)

root@d1f87b8059ea:/app# /jdk/bin/java Reproducer.java
Hello, World!
Bound to port 81


Locally built JDK and jcmd from the host (works):

slovdahl@ubuntu2204:~/reproducer$ sudo ~/jdk/build/linux-x86_64-server-release/images/jdk/bin/jcmd 942781 VM.version
942781:
OpenJDK 64-Bit Server VM version 23-internal-adhoc.slovdahl.jdk
JDK 23.0.0

jcmd from a sidecar Docker container mounted into the same process namespace (does NOT work, regressed):

slovdahl@ubuntu2204:~/reproducer$ docker run --interactive --tty --rm --pid=container:app-container --volume ~/jdk/build/linux-x86_64-server-release/images/jdk/:/jdk ubuntu:22.04 /bin/bash
root@27d8be9186b7:/# /jdk/bin/jcmd
26 jdk.compiler/com.sun.tools.javac.launcher.SourceLauncher Reproducer.java
59 jdk.jcmd/sun.tools.jcmd.JCmd
root@27d8be9186b7:/# /jdk/bin/jcmd 26 VM.version
26:
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file /tmp/.java_pid26: target process 26 doesn't respond within 10500ms or HotSpot VM not loaded     at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:99)     at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)     at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
    at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:113)
    at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:97)


Using Temurin 17.0.11 from the host (works):

slovdahl@ubuntu2204:~/reproducer$ /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -version
openjdk version "17.0.11" 2024-04-16
OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)
OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing) slovdahl@ubuntu2204:~/reproducer$ sudo /usr/lib/jvm/temurin-17-jdk-amd64/bin/jcmd 942781 VM.version
942781:
OpenJDK 64-Bit Server VM version 23-internal-adhoc.slovdahl.jdk
JDK 23.0.0


Temurin 17.0.11 jcmd from a sidecar Docker container mounted into the same process namespace (works):

slovdahl@ubuntu2204:~/reproducer$ docker run --interactive --tty --rm --pid=container:app-container eclipse-temurin:17.0.11_9-jdk-jammy /bin/bash
root@fcbd6e4be9eb:/# java -version
openjdk version "17.0.11" 2024-04-16
OpenJDK Runtime Environment Temurin-17.0.11+9 (build 17.0.11+9)
OpenJDK 64-Bit Server VM Temurin-17.0.11+9 (build 17.0.11+9, mixed mode, sharing)
root@fcbd6e4be9eb:/# jcmd
138 jdk.jcmd/sun.tools.jcmd.JCmd
26 jdk.compiler/com.sun.tools.javac.launcher.SourceLauncher Reproducer.java
root@fcbd6e4be9eb:/# jcmd 26 VM.version
26:
OpenJDK 64-Bit Server VM version 23-internal-adhoc.slovdahl.jdk
JDK 23.0.0


Curiously enough, there is a test that on the surface seemed to be written specifically for this case (test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java). But the devil is in the details: in TestJcmdWithSideCar /tmp in the main container is a volume that is mounted into the sidecar container, so attaching from the sidecar works without going through /proc/<pid>/cwd, and hence it works both before and after my fix.

Knowing up front that /tmp needs to be a volume and that it needs to be mounted into the sidecar container feels like a hard ask to me, so I can definitely see why one would like to have the possibility to attach to containers without having to do that. So, I think it would make sense to get this regression fixed. Maybe also change the existing test to not mount /tmp between the containers? Or as an alternative, have tests for both the "mount /tmp" approach and for not doing it.

Thoughts about this? I could try to give it a look if you think it makes sense.

Best regards,

--
Sebastian Lövdahl
Senior Software Engineer, Hibox Systems - https://www.hibox.tv
[email protected]

Reply via email to