> From: Mark Thomas <ma...@apache.org> > Sent: Tuesday, November 3, 2020 2:06 AM > To: Tomcat Users List <users@tomcat.apache.org> > Subject: Re: Weirdest Tomcat Behavior Ever? > > On 02/11/2020 12:16, Eric Robinson wrote: > > <snip/> > > > Gotcha, thanks for the clarification. Let's see what happens when the users > start hitting it at 8:00 am Eastern. > > Progress. The first attempt to write to the socket triggers the following > exception: > > 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13] > org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write > [301361476] > java.net.SocketException: Bad file descriptor (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at > org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndp > oint.java:1409) > at > org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOut > putBuffer.java:247) > at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) > at > org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutput > Buffer.java:183) > at > org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Pr > ocessor.java:761) > at org.apache.coyote.Response.action(Response.java:174) > at org.apache.coyote.Response.finish(Response.java:274) > at > org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:322) > at > org.apache.catalina.connector.Response.finishResponse(Response.java:537) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:4 > 80) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11P > rocessor.java:1083) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Ab > stractProtocol.java:640) > at > org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja > va:321) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav > a:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja > va:624) > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr > ead.java:61) > at java.lang.Thread.run(Thread.java:748) > > Because this is an instance of an IOException, Tomcat assumes it has been > caused by the client dropping the connection and silently swallows it. I'll be > changing that later today so the exception is logged as DEBUG level for all > new Tomcat versions. > > Possible causes of "java.net.SocketException: Bad file descriptor" I've been > able to find are: > > 1. OS running out of file descriptors. > > 2.Trying to use a closed socket. > > I want to review the source code to see if there are any others. > > I don't think we are seeing 2 as there is no indication of the Socket, > InputStream or OutputStream being closed in the logs. > > That leaves 1. Possible causes here are a file descriptor leak or normal > operations occasionally needing more than the current limit. I don't think it > is > a leak as I'd expect to see many more errors of this type after the first and > we aren't seeing that. That leaves the possibility of the current limit being > a > little too low. > > My recommendation at this point is to increase the limit for file descriptors. > Meanwhile, I'll look at the JRE source to see if there are any other possible > triggers for this exception. > > Mark > >
On the tomcat server, max open file descriptors is currently 2853957 [root@001app01a ~]# sysctl fs.file-max fs.file-max = 2853957 Most of the time, the number of open files appears to run about 600,000. What do you think of watching the open file count and seeing if the number gets up around the ceiling when the socket write failure occurs? Something like... [root@001app01a ~]# while [ TRUE ];do FILES=$(lsof|wc -l);echo "$(date +%H:%M:%S) $FILES";done 09:11:15 591671 09:11:35 627347 09:11:54 626851 09:12:11 626429 09:12:26 545748 09:12:42 548578 09:12:58 551487 09:13:14 516700 09:13:30 513312 09:13:45 512830 09:14:02 533338 09:14:18 568233 09:14:35 570158 09:14:51 566269 09:15:07 547389 09:15:23 544203 09:15:38 546395 It's not ideal; as it seems to take 15-20 seconds to count them using lsof. Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.