On Fri, 9 Feb 2024 18:40:04 GMT, Sebastian Lövdahl <d...@openjdk.org> wrote:

>>> I'll still fix this. So, I should change the PR title to match JDK-8226919, 
>>> and issue an `/issue remove` command for JDK-8307977, is that correct?
>> 
>> Yes exactly, thanks.
>
> Thank you @kevinjwalls and @jerboaa for reviewing and guiding me through this 
> process, this was a great as a first-time JDK contributor!
> 
> One more question, can I do anything to help getting this backported to e.g. 
> 21 and 17?

@slovdahl - Apologies for adding a comment to a closed Pull Request, but I 
happened on https://bugs.openjdk.org/browse/JDK-8307977 via the earlier 
https://bugs.openjdk.org/browse/JDK-8179498 after researching 
"AttachNotSupportedException: Unable to open socket file" and troubleshooting 
our own OpenJDK 17 jcmd setup on top of containers and Kubernetes.  Reading the 
code changes and discussion here, I'm concerned that this change, which I 
understand is not yet in OpenJDK 17, might cause a regression with our setup.

We're running jcmd (OpenJDK build 17.0.10+7-LTS) and the target JVM in two 
separate containers in a Kubernetes pod.  The target JVM container is already 
running, and then we use `kubectl debug --target=...` to start a Kubernetes 
debug container with jcmd that targets the first container.  Given the 
`--target` option, they share the same Linux process namespace (both think the 
target JVM is PID 1).  But since they are separate containers, they see 
different root filesystems (jcmd container sees the target JVM tmpdir under 
/proc/1/root/tmp but has its own distinct /tmp directory).  

I believe the attach file and socket file paths then work like this in OpenJDK 
17:
* jcmd creates the .attach_pid1 attach file without issues using /proc/1/cwd
* Target JVM finds the .attach_pid1 attach file in its cwd.
* Target JVM creates the .java_pid1 socket file in its tmpdir /tmp
* jcmd finds the .java_pid1 socket file in /proc/1/root/tmp

I think this scenario with a Kubernetes debug container may be a little 
different from other Docker container scenarios because these are two different 
containers with _different root filesystems_ but _the same Linux process 
namespace_.  So jcmd using `/proc/<pid>/root` is necessary to find the socket 
file, even though jcmd and the target JVM both agree the PID is the same (1).  
A similar scenario with just Docker Engine is described at [docker container 
run - Example, join another container's PID 
namespace](https://docs.docker.com/reference/cli/docker/container/run/#example-join-another-containers-pid-namespace).

If I understand the code change for this PR, I think it will change the 
behavior in this scenario, because `findSocketFile` will have `pid == ns_pid`, 
and now will use /tmp instead of `/proc/<pid>/root/tmp`, based on 
`findTargetProcessTmpDirectory`.

We are lucky currently that the only place the current OpenJDK 17 code checks 
`pid == ns_pid` is the `createAttachFile` catch block that runs if 
`/proc/<pid>/cwd/.attach_pid<ns_pid>` can't be created, since as long as 
`/proc/<pid>/cwd` works, we are fine.  But the `pid != ns_pid` check there 
makes an assumption that if the process namespace is the same, the root 
filesystem is the same and /tmp can be used, so this catch block wouldn't work 
if we were to hit it in our scenario.  I think propagating this catch block 
logic into `findSocketFile` will break our scenario - it will force using 
`/tmp/.java_pid<ns_pid>` and that won't work.

Could the `findSocketFile` logic be made more robust to the different 
namespace/filesystem scenarios?  E.g. attempt `/proc/<pid>/root` first?  Or 
perhaps there is a way (not `pid != ns_pid`) to more accurately determine 
whether / and `/proc/<pid>/root` are the same filesystem and /tmp is OK?

Thanks for your time!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17628#issuecomment-1969769654

Reply via email to