On Dec 6, 2013, at 10:16 AM, Sunil Adapa <su...@innopark.in> wrote:

> Hello,
> 
> I have faced a problem in my production server (gevent based); when a 
> connection attempt is made and MySQL server does not respond (due to listen 
> backlog full), the whole application hangs. This seems to be because 
> SQLAlchemy QueuePool does not allow multiple connection attempts 
> simultaneously. It is waiting for overflow count lock. I suggest that we 
> allow multiple connection attempts at the same time as I don't see any side 
> effects of doing so. Details follow.

OK, I see this is with gevent - while I like the idea of gevent, I’m not deeply 
familiar with best practices for it.  The QueuePool specifically uses 
thread-based locks to achieve it’s work.  I can’t comment on what modifications 
might be needed to it in order to work with gevent’s model, but overall I’d 
suggest an entirely different pool implementation optimized for gevent.     
When I spent some time trying out gevent I noticed that QueuePool might have 
been having problems, and this is not surprising.

For starters, I’d probably use NullPool with a gevent-based application, if 
there are in fact gevent-specific issues occurring.

> 
> Analysis:
> 
> Before making a connection attempt the overflow counter lock is obtained and 
> it is being released only after the connection either succeeds or fails. In 
> my case, a connection remained hung possibly because of a surge in new DB 
> connections and SYN backlog overflew on the database server (I have since 
> added a timeout and tuned my database server to have much higher backlog). 
> While this connection didn't respond, any new connection attempt as seen in 
> the above trace waited trying to acquire overflow lock. The whole application 
> became in capable of serving requests. Cause is this code:
> 
> class QueuePool(Pool):
>     def _do_get(self):
> 
> [...]
> 
>             if self._overflow_lock is not None:
>                 self._overflow_lock.acquire()
> 
>             if self._max_overflow > -1 and \
>                         self._overflow >= self._max_overflow:
>                 if self._overflow_lock is not None:
>                     self._overflow_lock.release()
>                 return self._do_get()
> 
>             try:
>                 con = self._create_connection()
>                 self._overflow += 1
>             finally:
>                 if self._overflow_lock is not None:
>                     self._overflow_lock.release()
>             return con

> 
> Changeset 5f0a7bb cleaned up this code but does not seem to have changed the 
> flow (behaviour should be the same on trunk). Since disabling the overflow 
> with max_overflow = -1 does not use lock at all, this behaviour is possibly 
> an oversight rather than intended behavior.

Noting that I haven’t deeply gotten into this code at the moment, overall I’m 
confused about “the application became incapable of serving requests” - if the 
QueuePool serves out as many connections as it’s supposed to, its supposed to 
block all callers at that point.    If you set max_overflow to -1, then there 
is no overflow_lock present at all, it’s set to None in the constructor.  
Otherwise, blocking on the call is what it’s supposed to do, in a traditionally 
threaded application.   If when using gevent this means that other workers are 
blocked because the whole thing expects any kind of waiting to be handled 
“async style”, then that suggests we need a totally different approach for 
gevent.

> Since the overflow lock seems to be to only maintain overflow count, I 
> suggest that we increment the counter *before* connection attempt, don't hold 
> the lock during connection attempt and then decrement the counter in case of 
> an error. If there is interest in doing this, I shall find time for a patch 
> and possibly a test case.

How would that work with a traditionally threaded application?   My program 
goes to get a connection, the QueuePool says there’s none available yet and I 
should wait, then the call returns with…what?    if it isn’t waiting.   I 
apologize that I have only a fuzzy view of how things work with gevent, and at 
this time of the morning I’m probably not engaging the traditional threading 
model in my head so well either.


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to