Gerhardt,
On 10/12/21 13:27, Martin, Gerhardt A wrote:
Running Tomcat 9.0.50 on Centos 7.9.x Linux and using Tomcat JDBC connection
pool to connect to my application's databases. My app connects to about a dozen
read only databases and one read/write database. Here is a typical resource
definition with tuning configurations for the pool and the MSSQL server JDBC
driver:
<Resource
Note that these are the only relevant attributes:
> factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
You are using tomcat-pool, not the default commons-dbcp.
maxActive="20"
initialSize="1"
maxIdle="8"
minIdle="1"
maxWait="5000"
testOnBorrow="false"
testWhileIdle="false"
testOnConnect="false"
testOnReturn="false"
timeBetweenEvictionRunsMillis="10000"
removeAbandonedTimeout="15"
removeAbandoned="true"
logAbandoned="false"
minEvictableIdleTimeMillis="900000"
maxAge="600000"
Important NOTE: the framework we employ does NOT execute more than one SQL statement
per borrow from the connection pool. It's a constant cycle of getConnection(borrow)
->execute 1 SQL->closeConnection(return)
That's kind of wasteful, but okay.
Why are we configured this way:
Ultimately we need to fail fast to avoid our APP going into death by stuck
threads when we lose connectivity to the backend databases for a short period
of time. ( for example a short 10 second period where the DB(s) is/are
unreachable from the app servers).
Is this common?
-We want the evictor thread (Pool Cleaner) to run every 10 seconds
-We are only using the Pool Cleaner to clobber busy connections that have been
running 15 seconds or more
NOTE: we rarely see SQL that takes longer than 2 seconds, most are sub 100ms
and the avg is about 4ms
Note that the connections aren't clobbered; they are simply dropped from
the pool. The work they generated is still continuing...
-We are not testing connections period
-We want our connections replaced with new fresh connections after 10 mins of
use hence we set maxAge=600000
The connection pool settings work hand in hand with properties set on the SQL
server JDBC driver in the URL:
The driver will kill any SQL that takes longer than 10 seconds
Ouch.
-When everything else is OK, then the connection itself is returned intact back
to the connection pool
-When the DB is unreachable, there is another timer cancelQueryTimeout set to
10 seconds which is a safety valve for cases where the message to the DB server
to cancel the SQL gracefully does not make it to the DB. After the
cancleQueryTimeout, the client side of the driver gives up. This is problematic
because it actually returns a bad connection back to the connection pool.
The combination of the connection pool settings to run the Pool cleaner every
10 seconds and removeAbandoned of 15 seconds should work together to remove any
connection whose SQL execution time went past 10 seconds before the
cancelQueryTimeout kicks in. This should prevent returning bad connections to
the pool when bad things happen like a network switch going offline temporarily
and dropping all packets between the app server hosting tomcat and the database
server.
Sounds like you really just want to be validating connections, but you
have explicitly disabled that feature. Why?
NOTE that for the MSSQL server driver all their timeouts default to wait indefinitely.
This is one of the prime reasons why even short outages can kill our APP with stuck
threads. Our Web app hits tomcat about 2000 requests per minute and each request is going
to fire on AVG ~6 SQL statements so we are running about 12K SQL hits per minute. If we
lose connectivity to a database for even 15 seconds without a means to fail fast we could
easily overcome the allotted number of threads which will create a "log jam"
that can take a very long time to recover.
Why doesn't your query-timeout handle this situation?
So I am very sorry for all this lenghty background but I thought it might help
for readers and potential responders to know a bit about why we do what we do
in our connection pool.
Finally now onto the problem:
I was running a load test and simulating the temporary loss of connectivity to one or
more of the databases used in the APP and I noticed a strange behavior that does not
align with my understanding of how Tomcat JDBC removeAbandoned works. While most of my
database connections coming from the pool did die shortly after the simulation started
and "failed fast" there were a few connections that actually ran for up to 5
minutes before they terminated themselves. Given the configuration I explained I would
have expected no database connection that was interacting with the database to live
beyond 30 seconds. ( giving that there could be some forward/reverse lag in the evictor
thread timing where there could be a slight chance that the first run of the evictor
thread might not catch every connection that was at the 15 seconds mark.)
Remember that the pool won't kill the connection or anything else. It
just lets the connection "leak" out of the pool and ... hopefully
eventually be garbage-collected. But if the thread running that query is
still running, it will continue to do so potentially forever. Any
settings you have for canceling the query after some timeout have
nothing to do with the pool itself; that's between the driver, the
server, etc.
What I believe I saw in some cases was that the socket timeout that I set on
the SQLServer JDBC driver for 300,000ms (5 mins) kicked in and read timeout
was generated in the driver. This lead to a bad connection being returned to
the connection pool and caused errors until the connection(s) aged out due to
maxAge.
It sounds to me like you need to reduce the socket read timeout for the
connections in the driver. Your maxAge applies at checkout-time. The
"bad" connection will indeed be back in the pool, but it should be
removed on the next checkout. If "removeAbandoned" is working as
expected, a 5-minute query-time should have already resulted in the
connection being removed from the pool (due to abandonment).
Does anyone know why the Pool Cleaner would be able to run and NOT forcefully
remove a connection from the tomcat JDBC pool that has been running more than
configured removeAbandonedTimeout?
Have you confirmed that there is a pool-cleaner thread running?
I searched google and this list on MARC for any clues and I found nothing. I
also reviewed the source code for the Tomcat JDBC module and was unable to see
a place in the code that indicated it would not be able to clobber a running
connection that was borrowed from the pool.
Anyone able to help me understand what I saw in my testing? Could this be a
Tomcat connection pool bug?
It's possible. Have you enabled debug logging on the pool and checked
your test again? I would highly recommend enabling "logAbandoned" if you
are trying to debug problems with abandoned connection cleanup.
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org