Re: [DBCP] Connection just obtained from datasource is invalid

Erwin Hogeweg Wed, 22 Nov 2017 16:21:49 -0800

All -

I am following this thread with some interest because many moons ago I had a 
weird problem after upgrading from pax-jdbc-0.9.0 to pax-jdbc-1.1.0. 
Intermittently the DB connection pool would get into a crazy loop and continued 
to spin off new OS threads until the OS (Windows Server in this case) ran out 
of threads/handles, at which point the whole application crashed and burned.


I was never able to reliably reproduce this, and it only seemed to happen on 
Windows OS when many new connections are opened ’simultaneous'. In the end we 
rolled back to 0.9.0 and life is good ever since.

DB server is mySQL-5.7
Java8

I have NO idea if this is related or not, but it might… so I figure I share the 
info.


Cheers,

Erwin


On Nov 22, 2017, at 19:00, Phil Steitz 
<[email protected]<mailto:[email protected]>> wrote:

On 11/22/17 2:43 PM, Shawn Heisey wrote:
On 11/22/2017 12:04 PM, Phil Steitz wrote:
On 11/22/17 9:43 AM, Shawn Heisey wrote:
I do have results from the isClosed method when the problem happens.
That method *does* return true.
That points to a Pool or DBCP bug, assuming you are sure that no
other thread has a reference to the PoolableConnection or some other
code path did not call close on it before you tested isClosed.  If
you are sure this is not happening, you should open a DBCP JIRA
(which may end up reassigned to pool).  Ideal would be to have a
test case that makes it happen.
I would absolutely love to have a test case that can reproduce it, but
since I haven't got any idea what the root of the problem is, I wouldn't
know how to write such a test case.

What I'd really like to do is be able to look over dbcp2 and pool2 code
to see if I can spot a problem, but I'm having a hard time following the
code.  I expected to find some kind of synchronization in the code
branching from getConnection() to prevent different threads from being
able to step on each other, but haven't seen any so far.  I can't tell
if this means I'm looking in the wrong place or not.  The object
inheritance is pretty extensive, so I'm having a hard time finding the
right place to look.

If it turns out that there is zero synchronization happening between the
idle eviction thread and the depths of the code for things like
getConnection, then I don't see how any kind of guarantee can be made.
So far the synchronization object in the eviction thread only seems to
pair with other parts of eviction, NOT with anything else in the library.

If the testXXX flags I've enabled do eliminate the problems I'm seeing,
that's awesome, and it's *A* solution, but I think I'm still running
into some kind of issue that needs to be fixed.  I just need to figure
out whether it's dbcp2, pool2, or something in my own environment.  I'm
willing to entertain the idea that it's my environment, but based on
everything that I understand about my own code and our database servers
(and I fully admit it's circumstantial evidence), it points to a problem
with the idle eviction thread.

I believe you when you say that the *intent* is for idle eviction to
never close/evict a connection that's been requested from the pool.  I
would like to verify whether the intent and what's actually implemented
are the same.  If they're not the same, then I would like to attempt a
patch.  I'm going to need help in figuring out exactly where I should be
looking in the code for dbcp2 and pool2.

If the problem is the evictor closing a connection and having that
connection delivered to a client, the problem is almost certainly in
pool.  The thread-safety of the pool in this regard is engineered in
DefaultPooledObject, which is the wrapper that pool manages and
delivers to DBCP.  When the evictor visits a PooledObject (in
GenericObjectPool#evict) it tries to start the eviction test on the
object by calling its startEvictionTest method.  This method is
synchronized on the DefaultPooledObject.  Look at the code in that
method.  It checks to make sure that the object is in fact idle in
the pool.  The other half of the protection here is in
GenericObjectPool#borrowObject, which is what PoolingDataSource
calls to get a connection.  That method tries to get a PooledObject
from the pool and before handing it out (or validating it), it calls
the PooledObject's allocate method.  Look at the code for that in
DefaultPooledObject.  That method (also synchronized on the
PooledObject) checks that the object is not under eviction and sets
its state to allocated.  That is the core sync protection that
*should* make it impossible for the evictor to do anything to an
object that has been handed out to a client.

The logical place to start to get a test case that shows this
protection failing is to just set up a pool with very aggressive
eviction config (very small idle object timeout), frequent eviction
runs and a lot of concurrent borrowing.  Make sure the factory's
destroy method does something to simulate what PCF does to mark the
object as dead and see if you get any corpses handed out to
borrowers.  Also make sure that there are enough idle instances in
the pool for the evictor to visit.  For that, you probably want to
vary the borrowing load.  You can set up jmx to observe the pool
stats to see how many are idle at a given time or just log it using
the getNumIdle.  A quick look at the existing pool2 test cases does
not show exactly that scenario covered, so it would be good to add
in any case.

Phil



Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>




---------------------------------------------------------------------
To unsubscribe, e-mail: 
[email protected]<mailto:[email protected]>
For additional commands, e-mail: 
[email protected]<mailto:[email protected]>

Erwin Hogeweg
CTO
3690 Airport Road
Boca Raton, FL 33431
P. +1 (954) 556-6565
M. +1 (561) 306-7395
F. +1 (561) 948-2730
[Seecago]<http://www.seecago.com>

Re: [DBCP] Connection just obtained from datasource is invalid

Reply via email to