On 11/11/2020 22:48, Eric Robinson wrote:
>> -----Original Message-----
>> From: Mark Thomas <ma...@apache.org>
>> Sent: Monday, November 9, 2020 5:59 AM
>> To: users@tomcat.apache.org
>> Subject: Re: Weirdest Tomcat Behavior Ever?
>>
>> Eric,
>>
>> Time to prune the history and provide another summary I think. This
>> summary isn't complete. There is more information in the history of the
>> thread. I'm trying to focus on what seems to be the key information.
>>
> 
> Hi Mark -- So sorry for going silent for a couple of days. Our organization 
> is neck-deep in a huge compliance project. Combine that with this issue we're 
> working on together, and it's Perfect Storm time around here. We have a big 
> meeting with the client and vendor tomorrow about all this and I'm working 
> like heck to prevent this important customer from jumping ship.

Understood. Let me know if there is anything I can do to help.

> Now back to it!
> 
>>
>> Overview:
>> A small number of requests are receiving a completely empty (no headers,
>> no body) response.
>>
> 
> Just a FIN packet and that's all.

Agreed.

>> Environment
>> Tomcat 7.0.72
>>  - BIO HTTP (issue also observed with NIO)
>>  - Source unknown (probably ASF)
>> Java 1.8.0_221, Oracle
>> CentOS 7.5, Azure
>> Nginx reverse proxy
>>  - Using HTTP/1.0
>>  - No keep-alive
>>  - No compression
>> No (known) environment changes in the time period where this issue started

I keep coming back to this. Something triggered this problem (note that
trigger not necessarily the same as root cause). Given that the app,
Tomcat and JVM versions didn't change that again points to some other
component.

Picking just one of the wild ideas I've had is there some sort of
firewall, IDS, IPS etc. that might be doing connection tracking and is,
for some reason, getting it wrong and closing the connection in error?

As an aside, I mentioned earlier in this thread a similar issue we have
been observing in the CI system. I tracked that down yesterday and I am
certain the issues are unrelated. The CI issue was NIO specific (we see
this issue with BIO and NIO) and introduced by refactoring in 8.5.x (we
see this issue in 7.0.x). Sorry this doesn't help.

>> Results from debug logging
>> - The request is read without error
>> - The connection close is initiated from the Tomcat/Java side
>> - The socket is closed before Tomcat tries to write the response
>> - The application is not triggering the close of the socket
>> - Tomcat is not triggering the close of the socket
>> - When Tomcat does try and write we see the following exception
>>     java.net.SocketException: Bad file descriptor (Write failed)
>>
>> We have confirmed that the Java process is not hitting the limit for file
>> descriptors.
>>
>> The file descriptor must have been valid when the request was read from
>> the socket.
>>
>> The first debug log shows 2 other active connections from Nginx to Tomcat at
>> the point the connection is closed unexpectedly.
>>
>> The second debug log shows 1 other active connection from Nginx to Tomcat
>> at the point the connection is closed unexpectedly.
>>
>> The third debug log shows 1 other active connection from Nginx to Tomcat at
>> the point the connection is closed unexpectedly.
>>
>> The fourth debug log shows no other active connection from Nginx to
>> Tomcat at the point the connection is closed unexpectedly.
>>
>>
>> Analysis
>>
>> We know the connection close isn't coming from Tomcat or the application.
>> That leaves:
>> - the JVM
>> - the OS
>> - the virtualisation layer (since this is Azure I am assuming there is
>>   one)
>>
>> We are approaching the limit of what we can debug via Tomcat (and my area
>> of expertise. The evidence so far is pointing to an issue lower down the
>> network stack (JVM, OS or virtualisation layer).
>>
> 
> Can't disagree with you there.
> 
>> I think the next, and possibly last, thing we can do from Tomcat is log some
>> information on the file descriptor associated with the socket. That is going 
>> to
>> require some reflection to read JVM internals.
>>
>> Patch files here:
>> http://home.apache.org/~markt/dev/v7.0.72-custom-patch-v4/
>>
>> Source code here:
>> https://github.com/markt-asf/tomcat/tree/debug-7.0.72
>>
> 
> I will apply these tonight.
>
>> The file descriptor usage count is guarded by a lock object so this patch 
>> adds
>> quite a few syncs. For the load you are seeing that shouldn't an issue but
>> there is a change it will impact performance.
>>
> 
> Based on observation of load, I'm not too concerned about that. Maybe a 
> little. I'll keep an eye on it.
> 
>> The aim with this logging is to provide evidence of whether or not there is a
>> file descriptor handling problem in the JRE. My expectation is that with 
>> these
>> logs we will have reached the limit of what we can do with Tomcat but will be
>> able to point you in the right direction for further investigation.
>>
> 
> I'll get this done right away.

Thanks.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to