Gerhardt,

On 10/12/21 13:27, Martin, Gerhardt A wrote:
Running Tomcat 9.0.50 on Centos 7.9.x Linux and using Tomcat JDBC connection 
pool to connect to my application's databases. My app connects to about a dozen 
read only databases and one read/write database. Here is a typical resource 
definition with tuning configurations for the pool and the MSSQL server JDBC 
driver:

<Resource

Note that these are the only relevant attributes:

> factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"

You are using tomcat-pool, not the default commons-dbcp.

maxActive="20"
initialSize="1"
maxIdle="8"
minIdle="1"
maxWait="5000"
testOnBorrow="false" testWhileIdle="false" testOnConnect="false" testOnReturn="false" timeBetweenEvictionRunsMillis="10000"
removeAbandonedTimeout="15"
removeAbandoned="true"
logAbandoned="false"
minEvictableIdleTimeMillis="900000"
maxAge="600000"



Important NOTE: the framework we employ does NOT execute more than one SQL statement 
per borrow from the connection pool. It's a constant cycle of getConnection(borrow) 
->execute 1 SQL->closeConnection(return)

That's kind of wasteful, but okay.

Why are we configured this way:
Ultimately we need to fail fast to avoid our APP going into death by stuck 
threads when we lose connectivity to the backend databases for a short period 
of time. ( for example a short 10 second period where the DB(s) is/are 
unreachable from the app servers).

Is this common?

-We want the evictor thread (Pool Cleaner) to run every 10 seconds
-We are only using the Pool Cleaner to clobber busy connections that have been 
running 15 seconds or more
   NOTE: we rarely see SQL that takes longer than 2 seconds, most are sub 100ms 
and the avg is about 4ms

Note that the connections aren't clobbered; they are simply dropped from the pool. The work they generated is still continuing...

-We are not testing connections period
-We want our connections replaced with new fresh connections after 10 mins of 
use hence we set maxAge=600000
The connection pool settings work hand in hand with properties set on the SQL 
server JDBC driver in the URL:
The driver will kill any SQL that takes longer than 10 seconds

Ouch.

-When everything else is OK, then the connection itself is returned intact back 
to the connection pool
-When the DB is unreachable, there is another timer cancelQueryTimeout set to 
10 seconds which is a safety valve for cases where the message to the DB server 
to cancel the SQL gracefully does not make it to the DB. After the
cancleQueryTimeout, the client side of the driver gives up. This is problematic 
because it actually returns a bad connection back to the connection pool.

The combination of the connection pool settings to run the Pool cleaner every 
10 seconds and removeAbandoned of 15 seconds should work together to remove any 
connection whose SQL execution time went past 10 seconds before the 
cancelQueryTimeout kicks in. This should prevent returning bad connections to 
the pool when bad things happen like a network switch going offline temporarily 
and dropping all packets between the app server hosting tomcat and the database 
server.

Sounds like you really just want to be validating connections, but you have explicitly disabled that feature. Why?

NOTE that for the MSSQL server driver all their timeouts default to wait indefinitely. 
This is one of the prime reasons why even short outages can kill our APP with stuck 
threads. Our Web app hits tomcat about 2000 requests per minute and each request is going 
to fire on AVG ~6 SQL statements so we are running about 12K SQL hits per minute. If we 
lose connectivity to a database for even 15 seconds without a means to fail fast we could 
easily overcome the allotted number of threads which will create a "log jam" 
that can take a very long time to recover.

Why doesn't your query-timeout handle this situation?

So I am very sorry for all this lenghty background but I thought it might help 
for readers and potential responders to know a bit about why we do what we do 
in our connection pool.

Finally now onto the problem:

I was running a load test and simulating the temporary loss of connectivity to one or 
more of the databases used in the APP and I noticed a strange behavior that does not 
align with my understanding of how Tomcat JDBC removeAbandoned works. While most of my 
database connections coming from the pool did die shortly after the simulation started 
and "failed fast" there were a few connections that actually ran for up to 5 
minutes before they terminated themselves. Given the configuration I explained I would 
have expected no database connection that was interacting with the database to live 
beyond 30 seconds. ( giving that there could be some forward/reverse lag in the evictor 
thread timing where there could be a slight chance that the first run of the evictor 
thread might not catch every connection that was at the 15 seconds mark.)

Remember that the pool won't kill the connection or anything else. It just lets the connection "leak" out of the pool and ... hopefully eventually be garbage-collected. But if the thread running that query is still running, it will continue to do so potentially forever. Any settings you have for canceling the query after some timeout have nothing to do with the pool itself; that's between the driver, the server, etc.

What I believe I saw in some cases was that the socket timeout that I set on 
the SQLServer JDBC driver for  300,000ms (5 mins) kicked in and read timeout 
was generated in the driver. This lead to a bad connection being returned to 
the connection pool and caused errors until the connection(s) aged out due to 
maxAge.

It sounds to me like you need to reduce the socket read timeout for the connections in the driver. Your maxAge applies at checkout-time. The "bad" connection will indeed be back in the pool, but it should be removed on the next checkout. If "removeAbandoned" is working as expected, a 5-minute query-time should have already resulted in the connection being removed from the pool (due to abandonment).

Does anyone know why the Pool Cleaner would be able to run and NOT forcefully 
remove a connection from the tomcat JDBC pool that has been running more than 
configured removeAbandonedTimeout?

Have you confirmed that there is a pool-cleaner thread running?

I searched google and this list on MARC for any clues and I found nothing. I 
also reviewed the source code for the Tomcat JDBC module and was unable to see 
a place in the code that indicated it would not be able to clobber a running 
connection  that was borrowed from the pool.

Anyone able to help me understand what I saw in my testing? Could this be a 
Tomcat connection pool bug?

It's possible. Have you enabled debug logging on the pool and checked your test again? I would highly recommend enabling "logAbandoned" if you are trying to debug problems with abandoned connection cleanup.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to