-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Darren,
(Sorry... just had to remove that monstrous stack trace...) On 1/13/15 5:04 PM, Darren Davis wrote: > Hi Christopher. Yes, we've tried a show process list and can find > no evidence of the validation query running on mysql. Strange. Maybe you are waiting for the db server's buffer to flush or something like that. > We also just tried an experiment outside of Tomcat completely, but > connecting to a downed web server host and manually opening up a > mysql client connection to the data server and executing a single > command. > > We left that client window idle for an hour and 5 minutes, and > attempted to execute a simple select count(*) command against a > tiny table. The client attempted to execute the query, and a > NetStat on that box showed an open connection between the two > servers using port 3306. We also checked the process list during > this time and could not find any queries at all from the sever in > question. > > At about the 15 minute wait mark, the client finally came back with > this message: "ERROR 2013 (HY000): Lost connection to MySQL server > during query. Was this with the MySQL command-line client? What query did you issue ("SELECT 1")? > Attempting the execute the command a 2nd time (using the up > arrow), re-established the connection and it ran perfectly in a few > milliseconds. That's interesting. I've never experienced anything like that with MySQL, but we use a VLAN between our application and database servers with no hardware firewall, so we don't have any connection timeout problems. Also, when connections are dropped due to inactivity, they re-connect without any problems. > I checked the mysql configuration and it is set to the default > values for keeping connections/interactive connections open (for 8 > hours), so it seems that maybe the Cisco firewall between the two > servers is terminating connections out from under us, but in a way > what the O/S cannot detect it. What if you set that idle connection timeout to something like 5 minutes? Can you reproduce this issue more quickly? Can you look at the fw configuration to see if you can change the idle timeout /down/ to something more testable? > I've also fired up the yourKit profiler on this box and am seeing > other threads which have had to wait in the same > SocketInputStream.read code, all three started a few seconds apart, > it just wasn't detected as a deadlock, because it took place > outside of any synchronized methods. What makes you think it's deadlock? Deadlock is a very specific thing. Just because many threads are waiting in SocketInputStream.read doesn't mean there are any threading issues at all. I suspect that each SocketInputStream is distinct and only in use by a single thread. The threads are blocked on I/O, right? So they aren't waiting on a monitor. The best you could do would be to find the native file descriptor for each socket and determine that they are different from each other. I would be very surprised if they are the same, used across threads. If you *are* using Connection objects across threads, you should be very careful. Connection objects ought to be threadsafe (I think) but use of Statement and ResultSet objects across threads is a terrible idea. > It seems that sometime around the hour mark, connections get > dropped, so we're thinking that either adding idle checking or > dropping old connections may help us avoid this. Although we are a > little concerned by the various Connector / J alleged socket read > issues which may as a possible problem. I don't think you should blame Connector/J at this point. They may have ClassLoader pinning issues (don't get me started), but the driver is fairly robust and mature. > We're running an older 5.1.18 version of the Connector/J driver, > but aren't sure of moving to the latest .34 release would change > anything. We are also still using 5.1.18 and have never had any of these kinds of issues. I would highly suspect the network environment. See what you can find out by tinkering with the firewall and db idle policies. You may find that the pipe across the network gets into a state where the client is sure the connection is still valid, but it's simply never going to return any data. In that case, you'll need to figure out how to have that connection fail faster. Do you have a read-timeout set on your driver? - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org iQIcBAEBCAAGBQJUteT3AAoJEBzwKT+lPKRY7BUP/3IV0Xsakr3rWRpqnro1IbUl nbNHHIm9fqG+7mbvkeIQIE5XnZvA822HZvp9whC+4499kfQZNtrT0IIj1F20YH5r SMMkalCbY6XIzj1ST4aPf7YE2MlBtBwFZUwIGG2aT2XUKwSwHVdTcQxI2H4sG5vf iCkvS7YdJg5h6QSj5CQHg6dnsVR2hbF42tftti33hOsRPZu3cXOe0ajrXsoimMuk WWt+hpk8rjWtEnrMgaKlyntKGAI2tqXYVzPxraR3wwevm1tbHjHk2U3hFrq9teuV FA57RhWTlba/OJ+ph+LEiT39IdEdzESspTI+JeQvN5LJEsaMpxmRpnmLnhD/3EXx aNRze3eRw5M7qG0CcMduCMFe1j2i8TCwBLtHHJnplXWzve9PgqbJBtk7acJpn/Ls 54j23u5Z26TvAAJxiCa6/zxiJ6xRZDLfxfZsYVMImRHpC9s+GDPuAylUjaXVmoa+ HAIEQGUxTI16oQQZIG6mevehNtT8ik+zwVLMSk+QonvDRnRxsyPr8jItSanG3YXb th0kyE99y1rogJ+zeC9S+8NBiNkrU9EH7uUWZY7WyLuEHC+EyjCVaAV8SB+QBKKm PLs/EFnEVLLQWpj7Gzl7/421Fy6ttemeDEfj+VO2kSo7Wsy4kW3hcH/5spFCqeFj +WEiED5vvH5w9LrOkUcL =hPlo -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org