Running Tomcat 9.0.50 on Centos 7.9.x Linux and using Tomcat JDBC connection pool to connect to my application's databases. My app connects to about a dozen read only databases and one read/write database. Here is a typical resource definition with tuning configurations for the pool and the MSSQL server JDBC driver:
<Resource name="jdbc/BLAHDataSource" url="jdbc:sqlserver://dbhost\sql2017;applicationName=FOO;databaseName=BAR;SendStringParametersAsUnicode=false;socketTimeout=300000;cancelQueryTimeout=15;queryTimeout=10;loginTimeout=2" username="USER" password="NOT TELLING" auth="Container" driverClassName="com.microsoft.sqlserver.jdbc.SQLServerDriver" factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" jdbcInterceptors="StatementCache(prepared=true,max=150)" type="javax.sql.DataSource" maxActive="20" initialSize="1" maxIdle="8" minIdle="1" maxWait="5000" validationQuery="select 1" validationQueryTimeout="1" validationInterval="3600000" testOnBorrow="false" testWhileIdle="false" testOnConnect="false" testOnReturn="false" timeBetweenEvictionRunsMillis="10000" removeAbandonedTimeout="15" removeAbandoned="true" logAbandoned="false" minEvictableIdleTimeMillis="900000" maxAge="600000" /> Important NOTE: the framework we employ does NOT execute more than one SQL statement per borrow from the connection pool. It's a constant cycle of getConnection(borrow) ->execute 1 SQL->closeConnection(return) Why are we configured this way: Ultimately we need to fail fast to avoid our APP going into death by stuck threads when we lose connectivity to the backend databases for a short period of time. ( for example a short 10 second period where the DB(s) is/are unreachable from the app servers). -We want the evictor thread (Pool Cleaner) to run every 10 seconds -We are only using the Pool Cleaner to clobber busy connections that have been running 15 seconds or more NOTE: we rarely see SQL that takes longer than 2 seconds, most are sub 100ms and the avg is about 4ms -We are not testing connections period -We want our connections replaced with new fresh connections after 10 mins of use hence we set maxAge=600000 The connection pool settings work hand in hand with properties set on the SQL server JDBC driver in the URL: The driver will kill any SQL that takes longer than 10 seconds -When everything else is OK, then the connection itself is returned intact back to the connection pool -When the DB is unreachable, there is another timer cancelQueryTimeout set to 10 seconds which is a safety valve for cases where the message to the DB server to cancel the SQL gracefully does not make it to the DB. After the cancleQueryTimeout, the client side of the driver gives up. This is problematic because it actually returns a bad connection back to the connection pool. The combination of the connection pool settings to run the Pool cleaner every 10 seconds and removeAbandoned of 15 seconds should work together to remove any connection whose SQL execution time went past 10 seconds before the cancelQueryTimeout kicks in. This should prevent returning bad connections to the pool when bad things happen like a network switch going offline temporarily and dropping all packets between the app server hosting tomcat and the database server. NOTE that for the MSSQL server driver all their timeouts default to wait indefinitely. This is one of the prime reasons why even short outages can kill our APP with stuck threads. Our Web app hits tomcat about 2000 requests per minute and each request is going to fire on AVG ~6 SQL statements so we are running about 12K SQL hits per minute. If we lose connectivity to a database for even 15 seconds without a means to fail fast we could easily overcome the allotted number of threads which will create a "log jam" that can take a very long time to recover. So I am very sorry for all this lenghty background but I thought it might help for readers and potential responders to know a bit about why we do what we do in our connection pool. Finally now onto the problem: I was running a load test and simulating the temporary loss of connectivity to one or more of the databases used in the APP and I noticed a strange behavior that does not align with my understanding of how Tomcat JDBC removeAbandoned works. While most of my database connections coming from the pool did die shortly after the simulation started and "failed fast" there were a few connections that actually ran for up to 5 minutes before they terminated themselves. Given the configuration I explained I would have expected no database connection that was interacting with the database to live beyond 30 seconds. ( giving that there could be some forward/reverse lag in the evictor thread timing where there could be a slight chance that the first run of the evictor thread might not catch every connection that was at the 15 seconds mark.) What I believe I saw in some cases was that the socket timeout that I set on the SQLServer JDBC driver for 300,000ms (5 mins) kicked in and read timeout was generated in the driver. This lead to a bad connection being returned to the connection pool and caused errors until the connection(s) aged out due to maxAge. Does anyone know why the Pool Cleaner would be able to run and NOT forcefully remove a connection from the tomcat JDBC pool that has been running more than configured removeAbandonedTimeout? I searched google and this list on MARC for any clues and I found nothing. I also reviewed the source code for the Tomcat JDBC module and was unable to see a place in the code that indicated it would not be able to clobber a running connection that was borrowed from the pool. Anyone able to help me understand what I saw in my testing? Could this be a Tomcat connection pool bug? -Regards- Gerhardt Martin | Sr. Software Architect Snap-on Business Solutions 4025 Kinross Lakes Parkway | Richfield, OH 44286 Office: 330.659.1315 | Cell: 330.242.2237 gerhardt.a.mar...@snapon.com --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org