On Fri, 9 Feb 2024 18:40:04 GMT, Sebastian Lövdahl <d...@openjdk.org> wrote:
>>> I'll still fix this. So, I should change the PR title to match JDK-8226919, >>> and issue an `/issue remove` command for JDK-8307977, is that correct? >> >> Yes exactly, thanks. > > Thank you @kevinjwalls and @jerboaa for reviewing and guiding me through this > process, this was a great as a first-time JDK contributor! > > One more question, can I do anything to help getting this backported to e.g. > 21 and 17? @slovdahl - Apologies for adding a comment to a closed Pull Request, but I happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier https://bugs.openjdk.org/browse/JDK-8179498 after researching "AttachNotSupportedException: Unable to open socket file" and troubleshooting our own OpenJDK 17 jcmd setup on top of containers and Kubernetes. Reading the code changes and discussion here, I'm concerned that this change, which I understand is not yet in OpenJDK 17, might cause a regression with our setup. We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two separate containers in a Kubernetes pod. The target JVM container is already running, and then we use `kubectl debug --target=...` to start a Kubernetes debug container with jcmd that targets the first container. Given the `--target` option, they share the same Linux process namespace (both think the target JVM is PID 1). But since they are separate containers, they see different root filesystems (jcmd container sees the target JVM tmpdir under /proc/1/root/tmp but has its own distinct /tmp directory). I believe the attach file and socket file paths then work like this in OpenJDK 17: * jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd * Target JVM finds the .attach_pid1 attach file in its cwd. * Target JVM creates the .java_pid1 socket file in its tmpdir /tmp * jcmd finds the .java_pid1 socket file in /proc/1/root/tmp I think this scenario with a Kubernetes debug container may be a little different from other Docker container scenarios because these are two different containers with _different root filesystems_ but _the same Linux process namespace_. So jcmd using `/proc/<pid>/root` is necessary to find the socket file, even though jcmd and the target JVM both agree the PID is the same (1). A similar scenario with just Docker Engine is described at [docker container run - Example, join another container's PID namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace). If I understand the code change for this PR, I think it will change the behavior in this scenario, because `findSocketFile` will have `pid == ns_pid`, and now will use /tmp instead of `/proc/<pid>/root/tmp`, based on `findTargetProcessTmpDirectory`. We are lucky currently that the only place the current OpenJDK 17 code checks `pid == ns_pid` is the `createAttachFile` catch block that runs if `/proc/<pid>/cwd/.attach_pid<ns_pid>` can't be created, since as long as `/proc/<pid>/cwd` works, we are fine. But the `pid != ns_pid` check there makes an assumption that if the process namespace is the same, the root filesystem is the same and /tmp can be used, so this catch block wouldn't work if we were to hit it in our scenario. I think propagating this catch block logic into `findSocketFile` will break our scenario - it will force using `/tmp/.java_pid<ns_pid>` and that won't work. Could the `findSocketFile` logic be made more robust to the different namespace/filesystem scenarios? E.g. attempt `/proc/<pid>/root` first? Or perhaps there is a way (not `pid != ns_pid`) to more accurately determine whether / and `/proc/<pid>/root` are the same filesystem and /tmp is OK? Thanks for your time! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1969769654