Mark,

A few other thoughts come to mind. See below.

> -----Original Message-----
> From: Eric Robinson <eric.robin...@psmnv.com>
> Sent: Wednesday, May 29, 2024 7:39 AM
> To: Tomcat Users List <users@tomcat.apache.org>
> Subject: RE: Database Connection Requests Initiated but Not Sent on the Wire
> (Some, Not All)
>
> Hi Mark,
>
>
> > -----Original Message-----
> > From: Mark Thomas <ma...@apache.org>
> > Sent: Wednesday, May 29, 2024 5:35 AM
> > To: users@tomcat.apache.org
> > Subject: Re: Database Connection Requests Initiated but Not Sent on
> > the Wire (Some, Not All)
> >
> > On 29/05/2024 10:26, Mark Thomas wrote:
> > > On 28/05/2024 16:26, Eric Robinson wrote:
> > >
> > > <snip/>
> > >
> > >> Took a bunch of thread and heap dumps during today's painful debacle.
> > >> Will send a link to those as soon as I can.
> > >
> > > Thanks. I have them. I have taken a look and I am starting to form a
> > > theory. To help with that I have a couple of questions.
> >
> > Scratch that. I've found some further information in the data Eric
> > sent me off- list and I am now pretty sure what is going on.
> >
> > There are multiple web applications deployed on the servers. I assume
> > there are related but it actually doesn't matter.
> >
> > At least one application is using the "new" MySQL JDBC driver:
> > com.mysql.cj.jdbc.Driver
> >
> > At least one application is using the "old" MySQL JDBC driver:
> > com.mysql.jdbc.Driver
> >
> >
> > (I've told Eric off-list which application is using which).
> >
> > There are, therefore, two drivers registered with the
> > java.sql.DriverManager
> >
> >
> > The web applications are not using connection pooling. Or, if they are
> > using it, they are using it very inefficiently. The result is that
> > there is a high volume of calls to create new database connections.
> >
> > This is problem number 1. Creating a database connection is expensive.
> > That is why the concept of database connection pooling was created.
> >
> >
> > When a new connection is created, java.sql.DriverManager iterates over
> > the list of registered drivers and
> > - tests to see if the current class loader can see the driver
> > - if yes, tests to see if that driver can service the connection url
> > - if yes, use it and exit
> > - go on to the next driver in the list and repeat
> >
> > The test to see if the current class loader can use the driver is,
> > essentially, to call Class.forName(driver.getClass(), true,
> > classloader)
> >
> > And that is problem number 2. That check is expensive if the current
> > class loader can't load that driver.
> >
> >
> > It is also problem number 3. The reason it is expensive is that class
> > loaders don't cache misses so if a web application has a large number
> > of JARs, they all get scanned every time the DriverManager tries to
> > create a new connection.
> >

Maybe a potential solution is to have the class loader cache misses? Wait, I 
see you answered that further down...

> >
> > The slowness occurs in the web application that depends on the second
> > JDBC driver in DriverManager's list. When a request that requires a
> > database connection is received, there is a short delay while the web
> > application tries, and fails, to load the first JDBC driver in the list.
> > Class loading is synchronized on class name being loaded so if any
> > other requests also need a database connection, they have to wait for
> > this request to finish the search for the JDBC driver before they can
> > continue. This creates a bottleneck. Requests are essentially rate
> > limited to 1 request that requires a database connection per however
> > long it takes to scan every JAR in the web application for a class
> > that isn't there. If the average rate of requests exceeds this rate
> > limit then a queue is going to build up and it won't subside until the
> > average rate of requests falls below this rate limit.
> >
> >
> >
> > Problem number 1 is an application issue. It should be using pooling.
> > It seems unlikely that we'll see a solution from the application
> > vendor and
> > - even if the vendor does commit to a fix - I suspect it will take months.
> >

I believe your assessment is correct. How hard is it to enable pooling? Can it 
be bolted on, so to speak, through changes to the app context, such that the 
webapp itself does not necessarily need to implement special code?

> >
> > Problem number 2 is a JRE issue. I think there are potentially more
> > efficient ways to perform that check but that needs research as things
> > like OSGI and JPMS make class loading more complicated.
> >
> >
> > Problem number 3 is a Tomcat issue. It should be relatively easy to
> > start caching misses (i.e. this class loader cannot load this class)
> > and save the time spent repeatedly scanning JARs for a class that isn't 
> > there.
> >
> >
> > I intend to wok on a patch for Tomcat that will add caching that
> > should speed things up considerably. I hope to have something for Eric
> > to test today but it might take me until tomorrow as I have a few
> > other time critical things fighting to get tot he top of my TODO list at the
> moment.
> >

That answers my earlier question. Standing by...

> >
> > Moving the JDBC driver JARs from WEB-INF/lib to $CATALINA_BASE/lib may
> > also be a short-term fix but is likely to create problems if the same
> > JAR ever exists in both locations at the same time.
> >

Would the problem be relieved if the vendor stuck to one driver?


> >
> > Mark
> >
>
> That's some great sleuthing and the explanation makes a ton of sense. It 
> leaves
> me with a couple of questions.
>
> If you are correct, then it follows that historic activity has been hovering
> dangerously near the threshold where this symptom would manifest. Within the
> past month, an unknown change in the system climate now causes an uptick in
> the number of DB requests/second at roughly the same time daily (with
> occasional exceptions) and the system begins to trip over its own feet. I 
> haven't
> seen anything in my Zabbix graphs that stood out as potentially problematic.
> Armed with this information, I am now taking a closer look.
>
> The natural next question is, what changed in the application or the users'
> workflow to push activity over the threshold? We'll dig into that.
>
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> > For additional commands, e-mail: users-h...@tomcat.apache.org
>
> Disclaimer : This email and any files transmitted with it are confidential and
> intended solely for intended recipients. If you are not the named addressee 
> you
> should not disseminate, distribute, copy or alter this email. Any views or
> opinions presented in this email are solely those of the author and might not
> represent those of Physician Select Management. Warning: Although Physician
> Select Management has taken reasonable precautions to ensure no viruses are
> present in this email, the company cannot accept responsibility for any loss 
> or
> damage arising from the use of this email or attachments.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org

Disclaimer : This email and any files transmitted with it are confidential and 
intended solely for intended recipients. If you are not the named addressee you 
should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of Physician Select Management. Warning: Although Physician 
Select Management has taken reasonable precautions to ensure no viruses are 
present in this email, the company cannot accept responsibility for any loss or 
damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to