On 11/22/2017 5:00 PM, Phil Steitz wrote:
> If the problem is the evictor closing a connection and having that
> connection delivered to a client, the problem is almost certainly in
> pool.  The thread-safety of the pool in this regard is engineered in
> DefaultPooledObject, which is the wrapper that pool manages and
> delivers to DBCP.  When the evictor visits a PooledObject (in
> GenericObjectPool#evict) it tries to start the eviction test on the
> object by calling its startEvictionTest method.  This method is
> synchronized on the DefaultPooledObject.  Look at the code in that
> method.  It checks to make sure that the object is in fact idle in
> the pool.  The other half of the protection here is in
> GenericObjectPool#borrowObject, which is what PoolingDataSource
> calls to get a connection.  That method tries to get a PooledObject
> from the pool and before handing it out (or validating it), it calls
> the PooledObject's allocate method.  Look at the code for that in
> DefaultPooledObject.  That method (also synchronized on the
> PooledObject) checks that the object is not under eviction and sets
> its state to allocated.  That is the core sync protection that
> *should* make it impossible for the evictor to do anything to an
> object that has been handed out to a client.

I see the synchronization you're talking about here.  It appears that
all of the critical methods in DefaultPooledObject are synchronized (on
the object).

If you're absolutely certain that DefaultPooledObject is involved with
all of the implementation my code is using, it all looks pretty complete
to me.  So I'm really curious as to why the connection is getting
closed.  I have seen the problem only minutes after restarting my
program, so it seems unlikely that the server side is closing the
connection, since the timeout for that is 8 hours.

I did add the code a while back to test on create, borrow, return, and
while idle, but it turns out that I hadn't actually pulled it down to
the test server and recompiled.  That is now done, so we'll see if that
makes any difference.

If testing the connection on pool actions does make a difference, then
what is your speculation about what was happening when I ran into the
closed connection only minutes after restart, and would it be worthy of
an issue in Jira?  The only theory I had was a race condition between
eviction and borrowing, but unless there's something amiss in how all
the object inheritance works out, it looks like that's probably not it. 
Some kind of issue with the TCP stack in Linux (either on the machines
running my code or the MySQL server) is the only other idea I can think
of.  Or maybe a hardware/firmware issue, since it's likely that at least
one of the NICs involved is doing TCP offload.  I think that virtually
every NIC in our infrastructure has that feature and that Linux enables it.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org

Reply via email to