> > -----Original Message-----
> > From: Stefan Mayr <ste...@mayr-stefan.de>
> > Sent: Thursday, November 5, 2020 4:24 PM
> > To: users@tomcat.apache.org
> > Subject: Re: Weirdest Tomcat Behavior Ever?
> >
> > Am 03.11.2020 um 16:05 schrieb Eric Robinson:
> > >> -----Original Message-----
> > >> From: Eric Robinson <eric.robin...@psmnv.com>
> > >> Sent: Tuesday, November 3, 2020 8:21 AM
> > >> To: Tomcat Users List <users@tomcat.apache.org>
> > >> Subject: RE: Weirdest Tomcat Behavior Ever?
> > >>
> > >>> From: Mark Thomas <ma...@apache.org>
> > >>> Sent: Tuesday, November 3, 2020 2:06 AM
> > >>> To: Tomcat Users List <users@tomcat.apache.org>
> > >>> Subject: Re: Weirdest Tomcat Behavior Ever?
> > >>>
> > >>> On 02/11/2020 12:16, Eric Robinson wrote:
> > >>>
> > >>> <snip/>
> > >>>
> > >>>> Gotcha, thanks for the clarification. Let's see what happens when
> > >>>> the users
> > >>> start hitting it at 8:00 am Eastern.
> > >>>
> > >>> Progress. The first attempt to write to the socket triggers the
> > >>> following
> > >>> exception:
> > >>>
> > >>> 02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
> > >>> org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
> > >>> [301361476]
> > >>>  java.net.SocketException: Bad file descriptor (Write failed)
> > >>>     at java.net.SocketOutputStream.socketWrite0(Native Method)
> > >>>     at
> > >>>
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
> > >>>     at
> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> > >>>     at
> > >>>
> > ...
> >
> > >>> Because this is an instance of an IOException, Tomcat assumes it
> > >>> has been caused by the client dropping the connection and silently
> > >>> swallows it. I'll be changing that later today so the exception is
> > >>> logged as DEBUG level for all new Tomcat versions.
> > >>>
> > >>> Possible causes of "java.net.SocketException: Bad file descriptor"
> > >>> I've been able to find are:
> > >>>
> > >>> 1. OS running out of file descriptors.
> > >>>
> > >>> 2.Trying to use a closed socket.
> > >>>
> > >>> I want to review the source code to see if there are any others.
> > >>>
> > >>> I don't think we are seeing 2 as there is no indication of the
> > >>> Socket, InputStream or OutputStream being closed in the logs.
> > >>>
> > >>> That leaves 1. Possible causes here are a file descriptor leak or
> > >>> normal operations occasionally needing more than the current limit.
> > >>> I don't think it is a leak as I'd expect to see many more errors
> > >>> of this type after the first and we aren't seeing that. That
> > >>> leaves the possibility of the current limit being a little too low.
> > >>>
> > >>> My recommendation at this point is to increase the limit for file
> > descriptors.
> > >>> Meanwhile, I'll look at the JRE source to see if there are any
> > >>> other possible triggers for this exception.
> > >>>
> > >>> Mark
> > >>>
> > >>>
> > >>
> > >> On the tomcat server, max open file descriptors is currently
> > >> 2853957
> > >>
> > >> [root@001app01a ~]# sysctl fs.file-max fs.file-max = 2853957
> > >>
> > >> Most of the time, the number of open files appears to run about
> 600,000.
> > >>
> > >>  What do you think of watching the open file count and seeing if
> > >> the number gets up around the ceiling when the socket write failure
> > >> occurs? Something like...
> > >>
> > >> [root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo
> > >> "$(date
> > >> +%H:%M:%S) $FILES";done
> > >> 09:11:15 591671
> > >> 09:11:35 627347
> > >> 09:11:54 626851
> > >> 09:12:11 626429
> > >> 09:12:26 545748
> > >> 09:12:42 548578
> > >> 09:12:58 551487
> > >> 09:13:14 516700
> > >> 09:13:30 513312
> > >> 09:13:45 512830
> > >> 09:14:02 533338
> > >> 09:14:18 568233
> > >> 09:14:35 570158
> > >> 09:14:51 566269
> > >> 09:15:07 547389
> > >> 09:15:23 544203
> > >> 09:15:38 546395
> > >>
> > >> It's not ideal; as it seems to take 15-20 seconds to count them using 
> > >> lsof.
> > >>
> > >>
> > >>
> > >
> > > Wait, never mind. I realized the per-process limits are what
> > > matters. I
> > checked, and nofile was set to 4096 for the relevant java process.
> > >
> > > I did...
> > >
> > > # prlimit --pid 8730 --nofile=16384:16384
> > >
> > > That should give java some extra breathing room if the issue is max
> > > open
> > files, right?
> >
> > From my experience you should see a different exception if you hit the
> > NOFILE limit: java.net.SocketException: Too many open files
> >
> > But I've only seen that when you open or accept a new connection.
> > Never seen this later when something is written to that already open
> socket.
> >
> > To me a bad file descriptor sounds more like a closed socket. This
> > reminds me of a database or http-client connection pool that handed
> > out connections with already closed sockets. I think this could be
> > suspicious because Mark wrote this happens on the first write to the
> socket.
> >
> > - Stefan
> >
>
> It's unclear whether increasing the max file descriptors had a positive 
> effect.
> The error only once yesterday, but that may have been because of low user
> activity.
>
> We may have a new clue, though. Until yesterday, every instance of the
> error was accompanied by a premature disconnection. In other words, the
> proxy would send a GET request, but the tomcat would FIN the connection
> without sending a response (even though tomcat recorded an HTTP 200 in
> the localhost_access log.)
>
> Yesterday's error was different. In that case, tomcat apparently RESET the
> connection instead of FINing it. The nginx proxy recorded the following in its
> logs...
>
> 2020/11/05 08:35:59 [error] 951#951: *22627061 recv() failed (104:
> Connection reset by peer) while reading response header from upstream,
>
> 2020/11/05 08:35:59 [warn] 951#951: *22627061 upstream server temporarily
> disabled while reading response header from upstream,
>
> Unfortunately, I did not capture the whole exchange with tcpdump, so I am
> hoping to catch it in the act again today.
>
> --Eric

Increasing the max open files definitely did not help. There have been 10 
disconnection events already today.


Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

Reply via email to