One of our hosting customers is a medical practice using a commercial EMR running on tomcat+mysql. It has operated well for over a year, but users have suddenly begun experiencing slowness for about an hour at the same time every day. During the slow times, we've done all the usual troubleshooting to catch the problem in the act. The servers have plenty of power and are not overworked. There are no slow database queries. Network connectivity is solid. Tomcat has plenty of memory. The numbers of database connections, threads, questions, queries, etc., remain steady, without spikes. There is no unusual disk latency. We have not found any maintenance tasks running during that timeframe.
The customer has another load-balanced tomcat instance on a different physical server, and the problem happens on that one, too. The servers were upgraded with a new kernel and packages on 4/5/24, but the issue did not appear until 5/6/24. The vendor enabled a new feature in the customer's software, and the problem appeared the next day, but they subsequently disabled the feature, and (reportedly) the problem did not go away. It is worth mentioning that the servers are multi-tenanted, with other customers running the same medical application, but the others do not experience the slowdowns, even though they are on the same servers. There are no unusual errors in the tomcat or database server logs, EXCEPT this one: Java.sql.DriverManager.getConnection During the periods of slowness, we see lots of those errors along with a large spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). It seems obvious that the threads are stuck because tomcat is waiting on a connection to the database. However, tcpdump shows that connectivity to the database is perfect at the network and application layers. There are no unanswered SYNs, no retransmissions, no half-open connections, no failures to allocate TCP ports, no conntrack messages, and no other indications of system resource exhaustion. Every time tomcat requests a connection to the DB, it completes in less than 1 ms. Ten thousand connection attempts completed successfully in about 15 seconds, with zero failures. We are forced to conclude that some database connection requests are being initiated but are not being sent on the wire. The problem seems to be in the interaction between tomcat and the database driver, or in the driver itself. Unfortunately, the application vendor is taking the "it's your infrastructure" position without providing any evidence or offering suggestions for configuration changes, other than to deploy more tomcat instances, which is just shooting in the dark. They don't know why the software is throwing java.sql.DriverManager.getConnection errors (even though it's their code), and they've relegated the investigation to us. Any advice from the community would be greatly appreciated. RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64 Apache Tomcat/9.0.80, JVM 1.8.0_372-b07 (The tomcat and JVM versions are the ones recommended by the vendor.) We're standing by to provide whatever other information the community may need. Thanks tons! -Eric Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.