-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Darren,

(Sorry... just had to remove that monstrous stack trace...)

On 1/13/15 5:04 PM, Darren Davis wrote:
> Hi Christopher.  Yes, we've tried a show process list and can find
> no evidence of the validation query running on mysql.

Strange. Maybe you are waiting for the db server's buffer to flush or
something like that.

> We also just tried an experiment outside of Tomcat completely, but 
> connecting to a downed web server host and manually opening up a
> mysql client connection to the data server and executing a single
> command.
> 
> We left that client window idle for an hour and 5 minutes, and
> attempted to execute a simple select count(*) command against a
> tiny table.  The client attempted to execute the query, and a
> NetStat on that box showed an open connection between the two
> servers using port 3306.  We also checked the process list during
> this time and could not find any queries at all from the sever in
> question.
> 
> At about the 15 minute wait mark, the client finally came back with
> this message: "ERROR 2013 (HY000): Lost connection to MySQL server
> during query.

Was this with the MySQL command-line client? What query did you issue
("SELECT 1")?

> Attempting the execute the command a 2nd time (using the up
> arrow), re-established the connection and it ran perfectly in a few
> milliseconds.

That's interesting. I've never experienced anything like that with
MySQL, but we use a VLAN between our application and database servers
with no hardware firewall, so we don't have any connection timeout
problems. Also, when connections are dropped due to inactivity, they
re-connect without any problems.

> I checked the mysql configuration and it is set to the default
> values for keeping connections/interactive connections open (for 8
> hours), so it seems that maybe the Cisco firewall between the two
> servers is terminating connections out from under us, but in a way
> what the O/S cannot detect it.

What if you set that idle connection timeout to something like 5
minutes? Can you reproduce this issue more quickly? Can you look at
the fw configuration to see if you can change the idle timeout /down/
to something more testable?

> I've also fired up the yourKit profiler on this box and am seeing
> other threads which have had to wait in the same
> SocketInputStream.read code, all three started a few seconds apart,
> it just wasn't detected as a deadlock, because it took place
> outside of any synchronized methods.

What makes you think it's deadlock? Deadlock is a very specific thing.
Just because many threads are waiting in SocketInputStream.read
doesn't mean there are any threading issues at all. I suspect that
each SocketInputStream is distinct and only in use by a single thread.
The threads are blocked on I/O, right? So they aren't waiting on a
monitor. The best you could do would be to find the native file
descriptor for each socket and determine that they are different from
each other. I would be very surprised if they are the same, used
across threads. If you *are* using Connection objects across threads,
you should be very careful. Connection objects ought to be threadsafe
(I think) but use of Statement and ResultSet objects across threads is
a terrible idea.

> It seems that sometime around the hour mark, connections get
> dropped, so we're thinking that either adding idle checking or
> dropping old connections may help us avoid this.  Although we are a
> little concerned by the various Connector / J alleged socket read
> issues which may as a possible problem.

I don't think you should blame Connector/J at this point. They may
have ClassLoader pinning issues (don't get me started), but the driver
is fairly robust and mature.

> We're running an older 5.1.18 version of the Connector/J driver,
> but aren't sure of moving to the latest .34 release would change
> anything.

We are also still using 5.1.18 and have never had any of these kinds
of issues. I would highly suspect the network environment. See what
you can find out by tinkering with the firewall and db idle policies.
You may find that the pipe across the network gets into a state where
the client is sure the connection is still valid, but it's simply
never going to return any data. In that case, you'll need to figure
out how to have that connection fail faster.

Do you have a read-timeout set on your driver?

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org

iQIcBAEBCAAGBQJUteT3AAoJEBzwKT+lPKRY7BUP/3IV0Xsakr3rWRpqnro1IbUl
nbNHHIm9fqG+7mbvkeIQIE5XnZvA822HZvp9whC+4499kfQZNtrT0IIj1F20YH5r
SMMkalCbY6XIzj1ST4aPf7YE2MlBtBwFZUwIGG2aT2XUKwSwHVdTcQxI2H4sG5vf
iCkvS7YdJg5h6QSj5CQHg6dnsVR2hbF42tftti33hOsRPZu3cXOe0ajrXsoimMuk
WWt+hpk8rjWtEnrMgaKlyntKGAI2tqXYVzPxraR3wwevm1tbHjHk2U3hFrq9teuV
FA57RhWTlba/OJ+ph+LEiT39IdEdzESspTI+JeQvN5LJEsaMpxmRpnmLnhD/3EXx
aNRze3eRw5M7qG0CcMduCMFe1j2i8TCwBLtHHJnplXWzve9PgqbJBtk7acJpn/Ls
54j23u5Z26TvAAJxiCa6/zxiJ6xRZDLfxfZsYVMImRHpC9s+GDPuAylUjaXVmoa+
HAIEQGUxTI16oQQZIG6mevehNtT8ik+zwVLMSk+QonvDRnRxsyPr8jItSanG3YXb
th0kyE99y1rogJ+zeC9S+8NBiNkrU9EH7uUWZY7WyLuEHC+EyjCVaAV8SB+QBKKm
PLs/EFnEVLLQWpj7Gzl7/421Fy6ttemeDEfj+VO2kSo7Wsy4kW3hcH/5spFCqeFj
+WEiED5vvH5w9LrOkUcL
=hPlo
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to