Hi Chris, On 22.05.2009 14:29, Christopher Schultz wrote: >>> $ jmap -heap 1430 >>> Attaching to process ID 1430, please wait... >>> Debugger attached successfully. >>> Client compiler detected. >>> JVM version is 11.3-b02 >>> >>> using thread-local object allocation. >>> Mark Sweep Compact GC >>> >>> Heap Configuration: >>> MinHeapFreeRatio = 40 >>> MaxHeapFreeRatio = 70 >>> MaxHeapSize = 67108864 (64.0MB) >>> NewSize = 1048576 (1.0MB) >>> MaxNewSize = 4294901760 (4095.9375MB) >>> OldSize = 4194304 (4.0MB) >>> NewRatio = 12 >>> SurvivorRatio = 8 >>> PermSize = 12582912 (12.0MB) >>> MaxPermSize = 67108864 (64.0MB) >>> >>> Heap Usage: >>> New Generation (Eden + 1 Survivor Space): >>> capacity = 2228224 (2.125MB) >>> used = 612888 (0.5844955444335938MB) >>> free = 1615336 (1.5405044555664062MB) >>> 27.505672679227942% used >>> Eden Space: >>> capacity = 2031616 (1.9375MB) >>> used = 612888 (0.5844955444335938MB) >>> free = 1418728 (1.3530044555664062MB) >>> 30.167511970766128% used >>> - From Space: >>> capacity = 196608 (0.1875MB) >>> used = 0 (0.0MB) >>> free = 196608 (0.1875MB) >>> 0.0% used >>> To Space: >>> capacity = 196608 (0.1875MB) >>> used = 0 (0.0MB) >>> free = 196608 (0.1875MB) >>> 0.0% used >>> tenured generation: >>> capacity = 28311552 (27.0MB) >>> used = 20464784 (19.516738891601562MB) >>> free = 7846768 (7.4832611083984375MB) >>> 72.28421811704283% used >>> Perm Generation: >>> capacity = 12582912 (12.0MB) >>> used = 8834304 (8.425048828125MB) >>> free = 3748608 (3.574951171875MB) >>> 70.208740234375% used >>> >>> Here are my <Connector> configurations: >>> >>> <!-- Regular non-APR Coyote Connector --> >>> <Connector port="8001" >>> protocol="org.apache.coyote.http11.Http11Protocol" >>> connectionTimeout="20000" >>> server="Coyote1.1non-APR" >>> /> >>> >>> <!-- APR Connector --> >>> <Connector port="8002" >>> protocol="org.apache.coyote.http11.Http11AprProtocol" >>> useSendfile="true" >>> connectionTimeout="20000" >>> server="Coyote1.1APR" >>> /> >>> >>> <!-- APR without sendfile --> >>> <Connector port="8003" >>> protocol="org.apache.coyote.http11.Http11AprProtocol" >>> useSendfile="false" >>> connectionTimeout="20000" >>> server="Coyote1.1APRw/osendfile" >>> /> >>> >>> <!-- NIO Connector --> >>> <Connector port="8004" >>> protocol="org.apache.coyote.http11.Http11NioProtocol" >>> useSendfile="true" >>> connectionTimeout="20000" >>> server="Coyote1.1NIO" >>> /> >>> >>> <!-- APR without sendfile --> >>> <Connector port="8005" >>> protocol="org.apache.coyote.http11.Http11NioProtocol" >>> useSendfile="false" >>> connectionTimeout="20000" >>> server="Coyote1.1NIOw/osendfile" >>> /> >>> >>> All connectors are configured at once, so I should have a maximum of 40 >>> threads in each pool. The command I ran to benchmark each connector was >>> (for example): >>> >>> /usr/sbin/ab -c 40 -t 480 -n 10000000 http://localhost:8004/4kiB.bin >>> >>> This runs ApacheBench for 8 minutes with 40 client threads requesting a >>> 4k file over and over again. This particular test succeeded, but there >>> are 14 more tests, each using a file twice the size of the previous >>> test. After the 128k file test, every single test fails after that. >>> >>> The last test I ran (with only 1 thread instead of 40), the NIO >>> connector died in the same way, but the NIO connector without sendfile >>> enabled appeared to work properly. This time (40 threads), neither of >>> the connectors worker properly, the NIO connector failing to complete >>> any tests after the 128kb test and the NIO-sendfile connector failed to >>> complete /all/ of the tests (automatically run immediately following the >>> NIO tests). >>> >>> No OOMEs were encountered: only the exception shown above (no more >>> files). On my previous tests, lsof reported that only one of my files >>> was still open by the process. After this most recent test, it appears >>> that 954 of my static files are still open by the process (and the test >>> ended over 24 hours ago). >>> >>> The initial set of tests (c=1) seemed to recover, while this second set >>> of tests (c=40) has not. >>> >>> My knee-jerk reaction is that the most number of files that should ever >>> be open is 40: one per request processing thread. Something, somewhere >>> is causing these file descriptors to stay open. >>> >>> Unfortunately, I don't have any GC information for the time period >>> covering the test. >>> >>> I still have the JVM running, so I can probably inspect it for certain >>> things if anyone has any questions. Unfortunately, I can't run any new >>> JSP files (out of files!) and it looks like I can't connect using >>> jconsole (probably because the JVM can't open a new socket). >>> >>> I'd love some suggestions at to what's going on, here. >> Maybe looking at the directory /proc/PID/fd (sorry PID) will give more >> info (at least of the FDs are still in use). Or you might have "lsof" >> inspected. Sometimes proc is enough, sometimes you need lsof to find out >> more. > > I have already have lsof information for the process: > > $ lsof -p 1430 | grep 'ROOT/[0-9A-Za-z]*\.bin' | wc > 954 8586 116376 > > Note that this is now 48 hours or so after the test completed, so I > don't think I'm just waiting around for a GC to occur (then again, > there's probably little in the way of activity going on, so maybe I /am/ > just waiting around for a GC). I am able to execute a JSP on the server > side that gives me memory information... I ran it like 100 times to > watch the available memory decrease, then recover -- probably from a > minor GC. Maybe a full GC is required. Maybe the interaction between the > NIO connector and tcnative results in leaked file descriptors.
You could run a JSP including a call to System.gc(); If you don't have DisableExplicitGC set, then usually this triggers a major GC (it's not 100% sure, but until now all JVMs I saw start a major GC). You could call the JSP two times, and then check via your aother JSP, whether the memory numbers actually have changed. > This is what the raw output looks like (abbreviated, of course): > > COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME > java 1430 chris 68r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 70r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 71r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 72r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 73r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 74r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 75r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 76r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 77r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 78r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 79r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 80r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 81r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 82r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > java 1430 chris 83r REG 3,3 262144 8751655 > /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > > etc. > > It's not just the 256KiB files that are still open (those were the first > file size test to fail to complete, then all the rest failed to complete > as well), though that file represents most of them: > $ lsof -p 1430 | grep 'ROOT/[0-9A-Za-z]*\.bin' | sed -e 's/[^/]\+//' | > sort | uniq -c > 6 /home/chris/app/connector-test/8785/webapps/ROOT/1MiB.bin > 948 /home/chris/app/connector-test/8785/webapps/ROOT/256kiB.bin > > Is there any way to forceably close a file handle on Linux from outside > the process? I wasn't able to find a way to do this, or I would have > tried to force a full GC. First of all this really looks bad and interesting. So we would benefit from understanding what's going on. Either we have a bug somewhere, or it is really some finalizer that needs to run in order to close the file. I would expect the finalizer to be run only when the object is actually garbage collected, so if the object moved to the tenured space, a minor GC will not be enough to get it collected. If there is no load Tomcat only allocated very few new objects (like when checking for changes of web.xml etc., things Tomcat does even when no requests come in) and it is unlikely that a major GC (GC of tenured) happens without load. So that's why the above JSP with System.gc() could help. Are there chances you can try again and see, whether those additional FDs for the static content go way if ou trigger a major GC, either via such a JSP, or via the JConsole or any other way you like? Regards, Rainer --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org