Re: NIO Connector: Too many open files

Rainer Jung Fri, 22 May 2009 05:58:07 -0700

Hi Chris,

On 22.05.2009 14:29, Christopher Schultz wrote:
>>> $ jmap -heap 1430
>>> Attaching to process ID 1430, please wait...
>>> Debugger attached successfully.
>>> Client compiler detected.
>>> JVM version is 11.3-b02
>>>
>>> using thread-local object allocation.
>>> Mark Sweep Compact GC
>>>
>>> Heap Configuration:
>>>    MinHeapFreeRatio = 40
>>>    MaxHeapFreeRatio = 70
>>>    MaxHeapSize      = 67108864 (64.0MB)
>>>    NewSize          = 1048576 (1.0MB)
>>>    MaxNewSize       = 4294901760 (4095.9375MB)
>>>    OldSize          = 4194304 (4.0MB)
>>>    NewRatio         = 12
>>>    SurvivorRatio    = 8
>>>    PermSize         = 12582912 (12.0MB)
>>>    MaxPermSize      = 67108864 (64.0MB)
>>>
>>> Heap Usage:
>>> New Generation (Eden + 1 Survivor Space):
>>>    capacity = 2228224 (2.125MB)
>>>    used     = 612888 (0.5844955444335938MB)
>>>    free     = 1615336 (1.5405044555664062MB)
>>>    27.505672679227942% used
>>> Eden Space:
>>>    capacity = 2031616 (1.9375MB)
>>>    used     = 612888 (0.5844955444335938MB)
>>>    free     = 1418728 (1.3530044555664062MB)
>>>    30.167511970766128% used
>>> - From Space:
>>>    capacity = 196608 (0.1875MB)
>>>    used     = 0 (0.0MB)
>>>    free     = 196608 (0.1875MB)
>>>    0.0% used
>>> To Space:
>>>    capacity = 196608 (0.1875MB)
>>>    used     = 0 (0.0MB)
>>>    free     = 196608 (0.1875MB)
>>>    0.0% used
>>> tenured generation:
>>>    capacity = 28311552 (27.0MB)
>>>    used     = 20464784 (19.516738891601562MB)
>>>    free     = 7846768 (7.4832611083984375MB)
>>>    72.28421811704283% used
>>> Perm Generation:
>>>    capacity = 12582912 (12.0MB)
>>>    used     = 8834304 (8.425048828125MB)
>>>    free     = 3748608 (3.574951171875MB)
>>>    70.208740234375% used
>>>
>>> Here are my <Connector> configurations:
>>>
>>>     <!-- Regular non-APR Coyote Connector -->
>>>     <Connector  port="8001"
>>>                 protocol="org.apache.coyote.http11.Http11Protocol"
>>>                 connectionTimeout="20000"
>>>                 server="Coyote1.1non-APR"
>>>            />
>>>
>>>     <!-- APR Connector -->
>>>     <Connector  port="8002"
>>>                 protocol="org.apache.coyote.http11.Http11AprProtocol"
>>>                 useSendfile="true"
>>>                 connectionTimeout="20000"
>>>                 server="Coyote1.1APR"
>>>            />
>>>
>>>     <!-- APR without sendfile -->
>>>     <Connector  port="8003"
>>>                 protocol="org.apache.coyote.http11.Http11AprProtocol"
>>>                 useSendfile="false"
>>>                 connectionTimeout="20000"
>>>                 server="Coyote1.1APRw/osendfile"
>>>            />
>>>
>>>     <!-- NIO Connector -->
>>>     <Connector  port="8004"
>>>                 protocol="org.apache.coyote.http11.Http11NioProtocol"
>>>                 useSendfile="true"
>>>                 connectionTimeout="20000"
>>>                 server="Coyote1.1NIO"
>>>            />
>>>
>>>     <!-- APR without sendfile -->
>>>     <Connector  port="8005"
>>>                 protocol="org.apache.coyote.http11.Http11NioProtocol"
>>>                 useSendfile="false"
>>>                 connectionTimeout="20000"
>>>                 server="Coyote1.1NIOw/osendfile"
>>>            />
>>>
>>> All connectors are configured at once, so I should have a maximum of 40
>>> threads in each pool. The command I ran to benchmark each connector was
>>> (for example):
>>>
>>> /usr/sbin/ab -c 40 -t 480 -n 10000000 http://localhost:8004/4kiB.bin
>>>
>>> This runs ApacheBench for 8 minutes with 40 client threads requesting a
>>> 4k file over and over again. This particular test succeeded, but there
>>> are 14 more tests, each using a file twice the size of the previous
>>> test. After the 128k file test, every single test fails after that.
>>>
>>> The last test I ran (with only 1 thread instead of 40), the NIO
>>> connector died in the same way, but the NIO connector without sendfile
>>> enabled appeared to work properly. This time (40 threads), neither of
>>> the connectors worker properly, the NIO connector failing to complete
>>> any tests after the 128kb test and the NIO-sendfile connector failed to
>>> complete /all/ of the tests (automatically run immediately following the
>>> NIO tests).
>>>
>>> No OOMEs were encountered: only the exception shown above (no more
>>> files). On my previous tests, lsof reported that only one of my files
>>> was still open by the process. After this most recent test, it appears
>>> that 954 of my static files are still open by the process (and the test
>>> ended over 24 hours ago).
>>>
>>> The initial set of tests (c=1) seemed to recover, while this second set
>>> of tests (c=40) has not.
>>>
>>> My knee-jerk reaction is that the most number of files that should ever
>>> be open is 40: one per request processing thread. Something, somewhere
>>> is causing these file descriptors to stay open.
>>>
>>> Unfortunately, I don't have any GC information for the time period
>>> covering the test.
>>>
>>> I still have the JVM running, so I can probably inspect it for certain
>>> things if anyone has any questions. Unfortunately, I can't run any new
>>> JSP files (out of files!) and it looks like I can't connect using
>>> jconsole (probably because the JVM can't open a new socket).
>>>
>>> I'd love some suggestions at to what's going on, here.
>> Maybe looking at the directory /proc/PID/fd (sorry PID) will give more
>> info (at least of the FDs are still in use). Or you might have "lsof"
>> inspected. Sometimes proc is enough, sometimes you need lsof to find out
>> more.
> 
> I have already have lsof information for the process:
> 
> $ lsof -p 1430 | grep 'ROOT/[0-9A-Za-z]*\.bin' | wc
>     954    8586  116376
> 
> Note that this is now 48 hours or so after the test completed, so I
> don't think I'm just waiting around for a GC to occur (then again,
> there's probably little in the way of activity going on, so maybe I /am/
> just waiting around for a GC). I am able to execute a JSP on the server
> side that gives me memory information... I ran it like 100 times to
> watch the available memory decrease, then recover -- probably from a
> minor GC. Maybe a full GC is required. Maybe the interaction between the
> NIO connector and tcnative results in leaked file descriptors.


You could run a JSP including a call to System.gc();
If you don't have DisableExplicitGC set, then usually this triggers a
major GC (it's not 100% sure, but until now all JVMs I saw start a major
GC). You could call the JSP two times, and then check via your aother
JSP, whether the memory numbers actually have changed.

> This is what the raw output looks like (abbreviated, of course):
> 
> COMMAND  PID  USER   FD   TYPE     DEVICE       SIZE     NODE NAME
> java    1430 chris   68r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   70r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   71r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   72r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   73r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   74r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   75r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   76r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   77r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   78r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   79r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   80r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   81r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   82r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> java    1430 chris   83r   REG        3,3     262144  8751655
> /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> 
> etc.
> 
> It's not just the 256KiB files that are still open (those were the first
> file size test to fail to complete, then all the rest failed to complete
> as well), though that file represents most of them:
> $ lsof -p 1430 | grep 'ROOT/[0-9A-Za-z]*\.bin' | sed -e 's/[^/]\+//' |
> sort | uniq -c
>       6 /home/chris/app/connector-test/8785/webapps/ROOT/1MiB.bin
>     948 /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin
> 
> Is there any way to forceably close a file handle on Linux from outside
> the process? I wasn't able to find a way to do this, or I would have
> tried to force a full GC.

First of all this really looks bad and interesting. So we would benefit
from understanding what's going on.

Either we have a bug somewhere, or it is really some finalizer that
needs to run in order to close the file. I would expect the finalizer to
be run only when the object is actually garbage collected, so if the
object moved to the tenured space, a minor GC will not be enough to get
it collected. If there is no load Tomcat only allocated very few new
objects (like when checking for changes of web.xml etc., things Tomcat
does even when no requests come in) and it is unlikely that a major GC
(GC of tenured) happens without load. So that's why the above JSP with
System.gc() could help.

Are there chances you can try again and see, whether those additional
FDs for the static content go way if ou trigger a major GC, either via
such a JSP, or via the JConsole or any other way you like?

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: NIO Connector: Too many open files

Reply via email to