One of our hosting customers is a medical practice using a commercial EMR 
running on tomcat+mysql. It has operated well for over a year, but users have 
suddenly begun experiencing slowness for about an hour at the same time every 
day. During the slow times, we've done all the usual troubleshooting to catch 
the problem in the act. The servers have plenty of power and are not 
overworked. There are no slow database queries. Network connectivity is solid. 
Tomcat has plenty of memory. The numbers of database connections, threads, 
questions, queries, etc., remain steady, without spikes. There is no unusual 
disk latency. We have not found any maintenance tasks running during that 
timeframe.

The customer has another load-balanced tomcat instance on a different physical 
server, and the problem happens on that one, too. The servers were upgraded 
with a new kernel and packages on 4/5/24, but the issue did not appear until 
5/6/24. The vendor enabled a new feature in the customer's software, and the 
problem appeared the next day, but they subsequently disabled the feature, and 
(reportedly) the problem did not go away. It is worth mentioning that the 
servers are multi-tenanted, with other customers running the same medical 
application, but the others do not experience the slowdowns, even though they 
are on the same servers.

There are no unusual errors in the tomcat or database server logs, EXCEPT this 
one: Java.sql.DriverManager.getConnection

During the periods of slowness, we see lots of those errors along with a large 
spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). It 
seems obvious that the threads are stuck because tomcat is waiting on a 
connection to the database. However, tcpdump shows that connectivity to the 
database is perfect at the network and application layers. There are no 
unanswered SYNs, no retransmissions, no half-open connections, no failures to 
allocate TCP ports, no conntrack messages, and no other indications of system 
resource exhaustion. Every time tomcat requests a connection to the DB, it 
completes in less than 1 ms. Ten thousand connection attempts completed 
successfully in about 15 seconds, with zero failures.

We are forced to conclude that some database connection requests are being 
initiated but are not being sent on the wire. The problem seems to be in the 
interaction between tomcat and the database driver, or in the driver itself. 
Unfortunately, the application vendor is taking the "it's your infrastructure" 
position without providing any evidence or offering suggestions for 
configuration changes, other than to deploy more tomcat instances, which is 
just shooting in the dark. They don't know why the software is throwing 
java.sql.DriverManager.getConnection errors (even though it's their code), and 
they've relegated the investigation to us.

Any advice from the community would be greatly appreciated.

RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64
Apache Tomcat/9.0.80, JVM 1.8.0_372-b07

(The tomcat and JVM versions are the ones recommended by the vendor.)

We're standing by to provide whatever other information the community may need.

Thanks tons!

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

Reply via email to