All - I am following this thread with some interest because many moons ago I had a weird problem after upgrading from pax-jdbc-0.9.0 to pax-jdbc-1.1.0. Intermittently the DB connection pool would get into a crazy loop and continued to spin off new OS threads until the OS (Windows Server in this case) ran out of threads/handles, at which point the whole application crashed and burned.
I was never able to reliably reproduce this, and it only seemed to happen on Windows OS when many new connections are opened ’simultaneous'. In the end we rolled back to 0.9.0 and life is good ever since. DB server is mySQL-5.7 Java8 I have NO idea if this is related or not, but it might… so I figure I share the info. Cheers, Erwin On Nov 22, 2017, at 19:00, Phil Steitz <[email protected]<mailto:[email protected]>> wrote: On 11/22/17 2:43 PM, Shawn Heisey wrote: On 11/22/2017 12:04 PM, Phil Steitz wrote: On 11/22/17 9:43 AM, Shawn Heisey wrote: I do have results from the isClosed method when the problem happens. That method *does* return true. That points to a Pool or DBCP bug, assuming you are sure that no other thread has a reference to the PoolableConnection or some other code path did not call close on it before you tested isClosed. If you are sure this is not happening, you should open a DBCP JIRA (which may end up reassigned to pool). Ideal would be to have a test case that makes it happen. I would absolutely love to have a test case that can reproduce it, but since I haven't got any idea what the root of the problem is, I wouldn't know how to write such a test case. What I'd really like to do is be able to look over dbcp2 and pool2 code to see if I can spot a problem, but I'm having a hard time following the code. I expected to find some kind of synchronization in the code branching from getConnection() to prevent different threads from being able to step on each other, but haven't seen any so far. I can't tell if this means I'm looking in the wrong place or not. The object inheritance is pretty extensive, so I'm having a hard time finding the right place to look. If it turns out that there is zero synchronization happening between the idle eviction thread and the depths of the code for things like getConnection, then I don't see how any kind of guarantee can be made. So far the synchronization object in the eviction thread only seems to pair with other parts of eviction, NOT with anything else in the library. If the testXXX flags I've enabled do eliminate the problems I'm seeing, that's awesome, and it's *A* solution, but I think I'm still running into some kind of issue that needs to be fixed. I just need to figure out whether it's dbcp2, pool2, or something in my own environment. I'm willing to entertain the idea that it's my environment, but based on everything that I understand about my own code and our database servers (and I fully admit it's circumstantial evidence), it points to a problem with the idle eviction thread. I believe you when you say that the *intent* is for idle eviction to never close/evict a connection that's been requested from the pool. I would like to verify whether the intent and what's actually implemented are the same. If they're not the same, then I would like to attempt a patch. I'm going to need help in figuring out exactly where I should be looking in the code for dbcp2 and pool2. If the problem is the evictor closing a connection and having that connection delivered to a client, the problem is almost certainly in pool. The thread-safety of the pool in this regard is engineered in DefaultPooledObject, which is the wrapper that pool manages and delivers to DBCP. When the evictor visits a PooledObject (in GenericObjectPool#evict) it tries to start the eviction test on the object by calling its startEvictionTest method. This method is synchronized on the DefaultPooledObject. Look at the code in that method. It checks to make sure that the object is in fact idle in the pool. The other half of the protection here is in GenericObjectPool#borrowObject, which is what PoolingDataSource calls to get a connection. That method tries to get a PooledObject from the pool and before handing it out (or validating it), it calls the PooledObject's allocate method. Look at the code for that in DefaultPooledObject. That method (also synchronized on the PooledObject) checks that the object is not under eviction and sets its state to allocated. That is the core sync protection that *should* make it impossible for the evictor to do anything to an object that has been handed out to a client. The logical place to start to get a test case that shows this protection failing is to just set up a pool with very aggressive eviction config (very small idle object timeout), frequent eviction runs and a lot of concurrent borrowing. Make sure the factory's destroy method does something to simulate what PCF does to mark the object as dead and see if you get any corpses handed out to borrowers. Also make sure that there are enough idle instances in the pool for the evictor to visit. For that, you probably want to vary the borrowing load. You can set up jmx to observe the pool stats to see how many are idle at a given time or just log it using the getNumIdle. A quick look at the existing pool2 test cases does not show exactly that scenario covered, so it would be good to add in any case. Phil Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected]<mailto:[email protected]> For additional commands, e-mail: [email protected]<mailto:[email protected]> Erwin Hogeweg CTO 3690 Airport Road Boca Raton, FL 33431 P. +1 (954) 556-6565 M. +1 (561) 306-7395 F. +1 (561) 948-2730 [Seecago]<http://www.seecago.com>
