On Fri, 26 Jan 2024 14:52:49 GMT, Jaikiran Pai <j...@openjdk.org> wrote:

> Can I please get a review of this change which proposes to address 
> https://bugs.openjdk.org/browse/JDK-8324668?
> 
> This change proposes to fix the issue in jdwp where when launching a child 
> process (for the `launch=<cmd>` option), it iterates over an extremely large 
> number of file descriptors to close each one of those. Details about the 
> issue and the proposed fixed are added in a subsequent comment in this PR 
> https://github.com/openjdk/jdk/pull/17588#issuecomment-1912256075. tier1, 
> tier2 and tier3 testing is currently in progress with this change.

The JVM can be launched with the jdwp agent, like 
`-agentlib:jdwp=transport=xxx,server=y`. Among other options to the jdwp 
agentlib, one option is the `launch` option, which is of the form 
`launch=<cmdline>`. `cmdline` can be any arbitrary command. 

When that option is set, jdwp during initialization will then launch that 
command. The implementation of launching the command (on *nix) involves 
`fork()` followed by `execvp()`. After the child process is forked, jdwp 
implementation, closes the file descriptors that were copied over (by fork()) 
from the parent (JVM) process into the (arbitrary) child process. Since jdwp 
doesn't know how many such file descriptors exist, it queries a system level 
configuration for that launched process to know max allowed file descriptors 
(`OPEN_MAX`). It then iterates over each of these file descriptor and calls 
close() on them sequentially one at a time, irrespective of whether or not 
those many file descriptors were indeed present or not.

Until recently, on macOS, for the JVM we used to limit the max file descriptors 
to 10240. The jdwp code above would then iterate over these 10240 file 
descriptors (I'm leaving out the part where the code skips the standard 
input/output/error file descriptors) and close them one at a time (even if 
there weren't that many). This apparently would finish in reasonable amount of 
time (likely slow, but wasn't noticed to cause any issues).

In https://bugs.openjdk.org/browse/JDK-8300088 we stopped setting that 10240 
limit on macOS for the JVM. After that change, on macos, the max file 
descriptors for JVM turns out to be 2147483647. When jdwp then launches the 
child process from the JVM, it then has to now iterate almost 2147483647 times 
and close each of the file descriptors in that logic. This now became extremely 
slow and noticable and caused 3 tests 
(https://bugs.openjdk.org/browse/JDK-8324578) which were using the 
`launch=<cmdline>` option to start failing with a timeout - the jdwp process 
hadn't yet finished the iteration to close the file descriptors for the child 
process, by the time jtreg noticed the test had gone past the allowed timeout.

In theory, this entire code which deals with launching a child process from the 
JVM is what the native implementation of Java's 
`java.lang.ProcessBuilder.start()` might do. So l had a look at how it's 
handled there and it turns out, that the implementation there has an optimized 
way to close these copied over descriptors for the child process. That's done 
in the `closeDescriptors()` function here 
https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L74.
 What that code does is, it relies on a platform specific directory whose 
entires contain the file descriptors that are open for the current process. 
Within the launched child process, that code then lists that platform specific 
directory and for each returned entry it closes that file descriptor. The 
advantage of this is that it returns a more accurate and realistic number of 
file descriptors that have been actually copied over to the child process, 
instead of using the (extremely high) `OPEN_MAX` v
 alue as the upper bound. The code in libjava has an additional fallback which 
it uses if this logic of closing descriptor doesn't succeed. The fallback is 
same as what jdwp current does as the default i.e. using `OPEN_MAX` as the 
upper bound and iterating over each and closing each descriptor.

The `closeDescriptors()` in libjava isn't exported, so we can't just use it in 
jdwp. The commit in this PR copies over closeDescriptors() function from 
libjava into the jdwp code and then uses that as the default approach of 
closing the copied over file descriptors of the child process. If that fails, 
then we fallback to the current approach of using OPEN_MAX as the upper bound 
for the iteration.

Relatively minor changes have also been introduced in the copied over code - 
changes like:

    - adding a JDI_ASSERT at a couple of places
    - additional code comments at relevant places
    - usage of errno in one place
    - using rlim_t (from limits.h) instead of int for max_fd variable

I have tested this change with and without the changes done in 
https://bugs.openjdk.org/browse/JDK-8300088 and the 
`com/sun/jdi/OnThrowTest.java`, `com/sun/jdi/JITDebug.java` and 
`com/sun/jdi/JdwpOnThrowTest.java` tests now pass in both cases.

My knowledge of C is elementary, so there might be a few more things that need 
to be considered in this change. Looking forward to any such changes that might 
be necessary. For example, I'm not sure if `<limits.h>` in AIX defines `rlim_t` 
type. However, looking at one other place where `rlim_t` gets used in the 
native JDK code is in IOUtil.c 
https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libnio/ch/IOUtil.c#L161
 and looking at that code, I don't see any AIX specific overridden files or 
code for its usage. So I'm guessing the usage of `rlim_t` that this change 
introduces should be OK on AIX too.

Finally, the libjava code for launching child process has some more specifics, 
which I didn't want to introduce/copy over here. I decided to focus solely on 
the file descriptors issue in this PR. I haven't yet reviewed whether jdwp 
would benefit from doing those additional things that libjava does when 
launching the child process.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17588#issuecomment-1912256075

Reply via email to