> From: Mark Thomas <ma...@apache.org>
> Sent: Tuesday, November 3, 2020 2:06 AM
> To: Tomcat Users List <users@tomcat.apache.org>
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 02/11/2020 12:16, Eric Robinson wrote:
>
> <snip/>
>
> > Gotcha, thanks for the clarification. Let's see what happens when the users
> start hitting it at 8:00 am Eastern.
>
> Progress. The first attempt to write to the socket triggers the following
> exception:
>
> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> [301361476]
>  java.net.SocketException: Bad file descriptor (Write failed)
>     at java.net.SocketOutputStream.socketWrite0(Native Method)
>     at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
>     at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>     at
> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndp
> oint.java:1409)
>     at
> org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOut
> putBuffer.java:247)
>     at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
>     at
> org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutput
> Buffer.java:183)
>     at
> org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11Pr
> ocessor.java:761)
>     at org.apache.coyote.Response.action(Response.java:174)
>     at org.apache.coyote.Response.finish(Response.java:274)
>     at
> org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:322)
>     at
> org.apache.catalina.connector.Response.finishResponse(Response.java:537)
>     at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:4
> 80)
>     at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11P
> rocessor.java:1083)
>     at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(Ab
> stractProtocol.java:640)
>     at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.ja
> va:321)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
> a:1149)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:624)
>     at
> org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
> ead.java:61)
>     at java.lang.Thread.run(Thread.java:748)
>
> Because this is an instance of an IOException, Tomcat assumes it has been
> caused by the client dropping the connection and silently swallows it. I'll be
> changing that later today so the exception is logged as DEBUG level for all
> new Tomcat versions.
>
> Possible causes of "java.net.SocketException: Bad file descriptor" I've been
> able to find are:
>
> 1. OS running out of file descriptors.
>
> 2.Trying to use a closed socket.
>
> I want to review the source code to see if there are any others.
>
> I don't think we are seeing 2 as there is no indication of the Socket,
> InputStream or OutputStream being closed in the logs.
>
> That leaves 1. Possible causes here are a file descriptor leak or normal
> operations occasionally needing more than the current limit. I don't think it 
> is
> a leak as I'd expect to see many more errors of this type after the first and
> we aren't seeing that. That leaves the possibility of the current limit being 
> a
> little too low.
>
> My recommendation at this point is to increase the limit for file descriptors.
> Meanwhile, I'll look at the JRE source to see if there are any other possible
> triggers for this exception.
>
> Mark
>
>

On the tomcat server, max open file descriptors is currently 2853957

[root@001app01a ~]# sysctl fs.file-max
fs.file-max = 2853957

Most of the time, the number of open files appears to run about 600,000.

 What do you think of watching the open file count and seeing if the number 
gets up around the ceiling when the socket write failure occurs? Something 
like...

[root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo "$(date 
+%H:%M:%S) $FILES";done
09:11:15 591671
09:11:35 627347
09:11:54 626851
09:12:11 626429
09:12:26 545748
09:12:42 548578
09:12:58 551487
09:13:14 516700
09:13:30 513312
09:13:45 512830
09:14:02 533338
09:14:18 568233
09:14:35 570158
09:14:51 566269
09:15:07 547389
09:15:23 544203
09:15:38 546395

It's not ideal; as it seems to take 15-20 seconds to count them using lsof.



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

Reply via email to