On 12/11/2020 14:19, Eric Robinson wrote: >> From: Mark Thomas <ma...@apache.org>
<snip/> >> I keep coming back to this. Something triggered this problem (note that >> trigger not necessarily the same as root cause). Given that the app, Tomcat >> and JVM versions didn't change that again points to some other component. >> > > Perfectly understandable. It's the oldest question in the diagnostic > playbook. What changed? I wish I had an answer. Whatever it was, if impacted > both upstream servers. > >> Picking just one of the wild ideas I've had is there some sort of firewall, >> IDS, >> IPS etc. that might be doing connection tracking and is, for some reason, >> getting it wrong and closing the connection in error? >> > > Three is no firewall or IDS software running on the upstreams. The only thing > that comes to mind that may have been installed during that timeframe is > Sophos antivirus and Solar Winds RMM. Sophos was the first thing I disabled > when I saw the packet issues. ACK. >>>> The aim with this logging is to provide evidence of whether or not >>>> there is a file descriptor handling problem in the JRE. My >>>> expectation is that with these logs we will have reached the limit of >>>> what we can do with Tomcat but will be able to point you in the right >> direction for further investigation. I've had a chance to review these logs. To answer your (offlist) question about the HTTP/1.1 vs. HTTP/1.0 in the Nginx logs I *think* the Nginx logs are showing that the request received by Nginx is using HTTP/1.1. The logging does not indicate any issue with Java's handling of file descriptors. The file descriptor associated with the socket where the request fails is only observed to be associated with the socket where the request fails. There is no indication that the file descriptor is corrupted nor is there any indication that another thread tries to use the same file descriptor. I dug a little into the exception where the write fails: java.net.SocketException: Bad file descriptor (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndpoint.java:1491) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:247) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) at org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutputBuffer.java:183) ... I took a look at the JRE source code. That exception is triggered by an OS level error (9, EBADF, "Bad file descriptor") when the JRE makes the OS call to write to the socket. Everything I have found online points to one of two causes for such an error: a) the socket has already been closed b) the OS has run out of file descriptors There is no indication that the JRE or Tomcat or the application is doing a) Previous investigations have ruled out b) The wireshark trace indicates that the socket is closed before the write takes place which suggests a) rather more than b). Even so, I'd be tempted to double check b) and maybe try running Tomcat with -XX:+MaxFDLimit just to be sure. If you haven't already, I think now is the time to follow Paul Carter-Brown's advice from earlier in this thread and use strace to see what is going on between the JRE and the OS. The aim being to answer the question "what is triggering the socket close" I can try and help interpret that log but I am far from an expert. You may want to seek help elsewhere. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org