Hi Eric,

I have a some follow-up questions in-line. I have also read the other messages in this thread and added a couple of additional questions based on what I read in those threads.


On 26/05/2024 02:58, Eric Robinson wrote:
One of our hosting customers is a medical practice using a commercial EMR 
running on tomcat+mysql. It has operated well for over a year, but users have 
suddenly begun experiencing slowness for about an hour at the same time every 
day.

What time does this problem start?

Does it occur every day of the week including weekends?

How does the slowness correlate to:
- request volume
- requests to any particular URL(s)?
- requests from any particular client IP?
- any other attribute of the request?

(I'm trying to see if there is something about the requests that triggers the issue.)

During the slow times, we've done all the usual troubleshooting to catch the 
problem in the act. The servers have plenty of power and are not overworked. 
There are no slow database queries. Network connectivity is solid. Tomcat has 
plenty of memory. The numbers of database connections, threads, questions, 
queries, etc., remain steady, without spikes. There is no unusual disk latency. 
We have not found any maintenance tasks running during that timeframe.

I would usually suggest taking three thread dumps approximately 5s apart and then diffing them to try and spot "slow moving" threads.

I see you have scripted trigger a thread dump when the slowness hits. If you haven't already, please configure it to capture (at least) 3 dumps ~5 seconds apart.

(If we can spot the slow moving threads we might be able to identify what it is that makes them slow moving.)

The customer has another load-balanced tomcat instance on a different physical 
server, and the problem happens on that one, too. The servers were upgraded 
with a new kernel and packages on 4/5/24, but the issue did not appear until 
5/6/24. The vendor enabled a new feature in the customer's software, and the 
problem appeared the next day, but they subsequently disabled the feature, and 
(reportedly) the problem did not go away.

Have you confirmed that the feature really is disabled? Or was it just hidden?

Has this feature been enabled for any other customers? If yes, have they experienced similar issues?

(It is suspicious that the issue occurred after the feature was disabled. I wonder if some elements of that change (e.g. a database change) are still in place and causing issues.)

It is worth mentioning that the servers are multi-tenanted, with other 
customers running the same medical application, but the others do not 
experience the slowdowns, even though they are on the same servers.

How does this customer compare, in terms of volume of requests, to other customers that are not experiencing this issue.

Is there anything unique or special about the customer experiencing the issue? Do they have some custom settings no-one else uses?

(I am trying to figure out if the issue is load related, customer specific or something else).

There are no unusual errors in the tomcat or database server logs, EXCEPT this 
one: Java.sql.DriverManager.getConnection

Can we see the full stack trace please.

During the periods of slowness, we see lots of those errors along with a large 
spike in the number of stuck tomcat threads (from 1 or 2 to as high as 100). It 
seems obvious that the threads are stuck because tomcat is waiting on a 
connection to the database. However, tcpdump shows that connectivity to the 
database is perfect at the network and application layers. There are no 
unanswered SYNs, no retransmissions, no half-open connections, no failures to 
allocate TCP ports, no conntrack messages, and no other indications of system 
resource exhaustion. Every time tomcat requests a connection to the DB, it 
completes in less than 1 ms. Ten thousand connection attempts completed 
successfully in about 15 seconds, with zero failures.

It sounds like things might be getting stuck somewhere in or near the JDBC driver.

Can you provide the exact version of the JDBC driver you are using?

Can you provide the full database configuration from context.xml (or wherever it is configured). Please redact sensitive information such as passwords.

We are forced to conclude that some database connection requests are being 
initiated but are not being sent on the wire. The problem seems to be in the 
interaction between tomcat and the database driver, or in the driver itself.

I agree.

Unfortunately, the application vendor is taking the "it's your infrastructure" 
position without providing any evidence or offering suggestions for configuration changes,

I'm sorry to hear that. We'll do what we can to help.

other than to deploy more tomcat instances, which is just shooting in the dark. 
They don't know why the software is throwing 
java.sql.DriverManager.getConnection errors (even though it's their code), and 
they've relegated the investigation to us.

I'd have to say that the evidence is pointing towards some sort of application issue at this point. That said, just because the questions are currently heading in that direction we aren't blind to the possibility that the root cause might be in Tomcat. If the evidence starts pointing that way then that is where we will look.

When we have answers to the questions above, we might have enough evidence to start asking more pointed questions of the application vendor.

Any advice from the community would be greatly appreciated.

RHEL 8.9, kernel 4.18.0-513.18.1.el8_9.x86_64
Apache Tomcat/9.0.80, JVM 1.8.0_372-b07

Is that Tomcat 9.0.80 as provided by the ASF? If so, there are a number of known security vulnerabilities you should be (and probably are) aware of. There are steps you can take to mitigate those without an upgrade - just wanted to make sure they are on your radar.

(The tomcat and JVM versions are the ones recommended by the vendor.)

We're standing by to provide whatever other information the community may need.

Finally, if you consider any of the debugging information too sensitive to share on the public list, I am happy for you to send to directly to me and I can share it with any interested Tomcat committers. If you do need to do that, I'd encourage to to share a redacted version with the list if you can. There are lots of very experienced folks on the users list who can help who aren't Tomcat committers.

Mark


Thanks tons!

-Eric



Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to