> -----Original Message-----
> From: Mark Thomas <ma...@apache.org>
> Sent: Friday, October 16, 2020 8:02 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> On 16/10/2020 12:37, Eric Robinson wrote:
> >> From: Mark Thomas <ma...@apache.org>
>
> <snip/>
>
> >> I'd like to see those screen shots please. Better still would be
> >> access to the captures themselves (just the relevant connections not
> >> the whole thing). I believe what you are telling us but long
> >> experience tells me it is best to double check the original data as well.
> >>
> >
> > I'll send you a link to the screen shot first, then I'll package up the 
> > captures
> and send a link to that in a bit. As the files may contain somewhat sensitive
> information, I'll send a secure mail direct to your inbox.
>
> Thanks. The screenshots didn't shed any light on this so far.
>
> >> I have observed something similar ish in the CI systems. In that case
> >> it is the requests that disappear. Client side logging shows the
> >> request was made but there is no sign of it ever being received by
> >> Tomcat. I don't have network traces for that (yet) so I'm not sure where
> the data is going missing.
> >>
> >> I am beginning to suspect there is a hard to trigger Tomcat or JVM
> >> bug here. I think a Tomcat bug is more likely although I have been
> >> over the code several times and I don't see anything.
> >>
> >
> > I'm thinking a bug of some kind, too, but I've been hosting about 1800
> instances of tomcat for 15 years and I have never seen this behavior before.
> >
> >> A few more questions:
> >>
> >
> > This is where I will begin to struggle bit.
> >
> >> Which HTTP connector are you using? BIO, NIO or APR/Native?
> >>
> >
> > I believe BIO is the default? server.xml just says...
> >
> >     <Connector port="3016" protocol="HTTP/1.1"
> >                connectionTimeout="20000"
> >                redirectPort="8443" />
>
> That will be BIO or APR/Native depending on whether you have Tomcat
> Native installed. If you look at the logs for when Tomcat starts you should 
> see
> something like:
>
> INFO: Initializing ProtocolHandler ["http-bio-3016"] or
> INFO: Initializing ProtocolHandler ["http-apr-3016"]
>
> What do you see between the square brackets?

["http-bio-3016"]

>
> >> Is the issue reproducible if you switch to a different connector?
> >>
> >
> > In 15 years of using tomcat in production, we've never tried switching the
> connector type. (Probably because the app vendor never suggested it.) I did
> a little research and I'm beginning to think about the pros/cons.
>
> If you wanted to try this, I'd recommend:
>
> protocol="org.apache.coyote.http11.Http11NioProtocol"
>

We're in the middle of a production day so I want to avoid restarting tomcat if 
I can, but I'll plan to change that tonight.

> >> How easy is it for you to reproduce this issue?
> >>
> >
> > It's not reproducible at will but it happens frequently enough that we don't
> have to wait long for it to happen. I have wireshark capturing to disk
> continuously and rotating the files at 10 minute intervals to keep them
> smallish. Then I just tail the logs and wait.
>
> Ack.
>
> >> How are you linking the request you see in the access log with the
> >> request you see in Wireshark?
> >
> > Aside from the timestamp of the packets and the timestamp of the tomcat
> log messages, each HTTP request also contains a high-resolution timestamp
> and a unique random number. That way, even if the same request occurs
> multiple times in rapid succession, we can still isolate the exact one that
> failed.
>
> Excellent.
>
> >> How comfortable are you running a patched version of Tomcat (drop
> >> class files provided by me into $CATALINA_BASE/lib in the right
> >> directory structure and restart Tomcat)? Just thinking ahead about
> >> collecting additional debug information.
> >
> > That would be a tricky in our production environment, but the users are
> getting desperate enough that we'd be willing to explore that approach.
>
> Understood.
>
> Some other questions that have come to mind:
>
> - Has this app always had this problem?
>

No, it's been running fine in this environment since October 2018.

> - If not, when did it start and what changed at that point (JVM version,
> Tomcat version etc)
>

This is a new thing in the past month or so, but we can't think of what might 
have changed. There are 2 Linux tomcat servers, each running 17 instances of on 
different ports. The symptom seems to appear in many of the instances 
intermittently, but for 2 of them it is painfully frequent. The users are 
getting kicked out multiple times per day. The tomcat and java versions have 
not changed since at least July 2019.

> - I notice the the requests in the screenshots aren't using HTTP keep-alive.
> Do you know why?
>

That's a good question. I checked the captures from the client side and they do 
have the HTTP keep-alive header. It must be something related to the proxy.

> - How heavily loaded is the system (typical requests per second etc)?
>

There are 17 instances of tomcat. Most of them get less than 10 
requests/second. The instance I'm focusing on is the busiest, with 20-60 
requests/second. From a workload standpoint, the servers are pretty bored.

> - I'm assuming there is nothing in the log files when the failed request
> happens. Is this correct?
>

If you mean the Linux system logs, I don't see anything remotely suspicious 
there. The localhost_access logs do show the failed requests with an HTTP 200 
response. However, according to WireShark the response does not go out over the 
wire.

> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

Reply via email to