On Fri, 26 Jan 2024 14:52:49 GMT, Jaikiran Pai <j...@openjdk.org> wrote:
> Can I please get a review of this change which proposes to address > https://bugs.openjdk.org/browse/JDK-8324668? > > This change proposes to fix the issue in jdwp where when launching a child > process (for the `launch=<cmd>` option), it iterates over an extremely large > number of file descriptors to close each one of those. Details about the > issue and the proposed fixed are added in a subsequent comment in this PR > https://github.com/openjdk/jdk/pull/17588#issuecomment-1912256075. tier1, > tier2 and tier3 testing is currently in progress with this change. The JVM can be launched with the jdwp agent, like `-agentlib:jdwp=transport=xxx,server=y`. Among other options to the jdwp agentlib, one option is the `launch` option, which is of the form `launch=<cmdline>`. `cmdline` can be any arbitrary command. When that option is set, jdwp during initialization will then launch that command. The implementation of launching the command (on *nix) involves `fork()` followed by `execvp()`. After the child process is forked, jdwp implementation, closes the file descriptors that were copied over (by fork()) from the parent (JVM) process into the (arbitrary) child process. Since jdwp doesn't know how many such file descriptors exist, it queries a system level configuration for that launched process to know max allowed file descriptors (`OPEN_MAX`). It then iterates over each of these file descriptor and calls close() on them sequentially one at a time, irrespective of whether or not those many file descriptors were indeed present or not. Until recently, on macOS, for the JVM we used to limit the max file descriptors to 10240. The jdwp code above would then iterate over these 10240 file descriptors (I'm leaving out the part where the code skips the standard input/output/error file descriptors) and close them one at a time (even if there weren't that many). This apparently would finish in reasonable amount of time (likely slow, but wasn't noticed to cause any issues). In https://bugs.openjdk.org/browse/JDK-8300088 we stopped setting that 10240 limit on macOS for the JVM. After that change, on macos, the max file descriptors for JVM turns out to be 2147483647. When jdwp then launches the child process from the JVM, it then has to now iterate almost 2147483647 times and close each of the file descriptors in that logic. This now became extremely slow and noticable and caused 3 tests (https://bugs.openjdk.org/browse/JDK-8324578) which were using the `launch=<cmdline>` option to start failing with a timeout - the jdwp process hadn't yet finished the iteration to close the file descriptors for the child process, by the time jtreg noticed the test had gone past the allowed timeout. In theory, this entire code which deals with launching a child process from the JVM is what the native implementation of Java's `java.lang.ProcessBuilder.start()` might do. So l had a look at how it's handled there and it turns out, that the implementation there has an optimized way to close these copied over descriptors for the child process. That's done in the `closeDescriptors()` function here https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L74. What that code does is, it relies on a platform specific directory whose entires contain the file descriptors that are open for the current process. Within the launched child process, that code then lists that platform specific directory and for each returned entry it closes that file descriptor. The advantage of this is that it returns a more accurate and realistic number of file descriptors that have been actually copied over to the child process, instead of using the (extremely high) `OPEN_MAX` v alue as the upper bound. The code in libjava has an additional fallback which it uses if this logic of closing descriptor doesn't succeed. The fallback is same as what jdwp current does as the default i.e. using `OPEN_MAX` as the upper bound and iterating over each and closing each descriptor. The `closeDescriptors()` in libjava isn't exported, so we can't just use it in jdwp. The commit in this PR copies over closeDescriptors() function from libjava into the jdwp code and then uses that as the default approach of closing the copied over file descriptors of the child process. If that fails, then we fallback to the current approach of using OPEN_MAX as the upper bound for the iteration. Relatively minor changes have also been introduced in the copied over code - changes like: - adding a JDI_ASSERT at a couple of places - additional code comments at relevant places - usage of errno in one place - using rlim_t (from limits.h) instead of int for max_fd variable I have tested this change with and without the changes done in https://bugs.openjdk.org/browse/JDK-8300088 and the `com/sun/jdi/OnThrowTest.java`, `com/sun/jdi/JITDebug.java` and `com/sun/jdi/JdwpOnThrowTest.java` tests now pass in both cases. My knowledge of C is elementary, so there might be a few more things that need to be considered in this change. Looking forward to any such changes that might be necessary. For example, I'm not sure if `<limits.h>` in AIX defines `rlim_t` type. However, looking at one other place where `rlim_t` gets used in the native JDK code is in IOUtil.c https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libnio/ch/IOUtil.c#L161 and looking at that code, I don't see any AIX specific overridden files or code for its usage. So I'm guessing the usage of `rlim_t` that this change introduces should be OK on AIX too. Finally, the libjava code for launching child process has some more specifics, which I didn't want to introduce/copy over here. I decided to focus solely on the file descriptors issue in this PR. I haven't yet reviewed whether jdwp would benefit from doing those additional things that libjava does when launching the child process. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17588#issuecomment-1912256075