Hello,

I have faced a problem in my production server (gevent based); when a 
connection attempt is made and MySQL server does not respond (due to listen 
backlog full), the whole application hangs. This seems to be because 
SQLAlchemy QueuePool does not allow multiple connection attempts 
simultaneously. It is waiting for overflow count lock. I suggest that we 
allow multiple connection attempts at the same time as I don't see any side 
effects of doing so. Details follow.

Details of the problem:

python-sqlalchemy 0.7.4-1
python 2.7.3-0ubuntu2
python-gevent 1.0~b1-1
mysql-server  5.5.34-0ubuntu0.12.0

I opened a gevent backdoor connection to the hung server and created a test 
method and ran it.

def test():
  import pdb
  pdb.set_trace()
  import sqlalchemy
  import mysql.connector
  p = sqlalchemy.pool.manage(mysql.connector, pool_size=128)
  p.connect(host='myhost', port=3306, user='myuser', password='mypassword', 
buffered=True)

It ran with following trace (excerpt):

(Pdb) step
--Call--
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(931)connect()
-> def connect(self, *args, **kw):

[...]

(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(210)connect()
-> return _ConnectionFairy(self).checkout()

[...]

(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(371)__init__()
-> rec = self._connection_record = pool._do_get()

[...]

(Pdb) 
--Call--
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(671)_do_get()
-> def _do_get(self):

[...]

(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(676)_do_get()
-> except sqla_queue.Empty:
(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(677)_do_get()
-> if self._max_overflow > -1 and \
(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(678)_do_get()
-> self._overflow >= self._max_overflow:
(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(687)_do_get()
-> if self._overflow_lock is not None:
(Pdb) 
> /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py(688)_do_get()
-> self._overflow_lock.acquire()
(Pdb) 
> /usr/lib/pymodules/python2.7/gevent/coros.py(98)acquire()
-> def acquire(self, blocking=True, timeout=None):

Analysis:

Before making a connection attempt the overflow counter lock is obtained 
and it is being released only after the connection either succeeds or 
fails. In my case, a connection remained hung possibly because of a surge 
in new DB connections and SYN backlog overflew on the database server (I 
have since added a timeout and tuned my database server to have much higher 
backlog). While this connection didn't respond, any new connection attempt 
as seen in the above trace waited trying to acquire overflow lock. The 
whole application became in capable of serving requests. Cause is this code:

class QueuePool(Pool):
    def _do_get(self):

[...]

            if self._overflow_lock is not None:
                self._overflow_lock.acquire()

            if self._max_overflow > -1 and \
                        self._overflow >= self._max_overflow:
                if self._overflow_lock is not None:
                    self._overflow_lock.release()
                return self._do_get()

            try:
                con = self._create_connection()
                self._overflow += 1
            finally:
                if self._overflow_lock is not None:
                    self._overflow_lock.release()
            return con

Changeset 5f0a7bb cleaned up this code but does not seem to have changed 
the flow (behaviour should be the same on trunk). Since disabling the 
overflow with max_overflow = -1 does not use lock at all, this behaviour is 
possibly an oversight rather than intended behaviour.

Possible solution:

Since the overflow lock seems to be to only maintain overflow count, I 
suggest that we increment the counter *before* connection attempt, don't 
hold the lock during connection attempt and then decrement the counter in 
case of an error. If there is interest in doing this, I shall find time for 
a patch and possibly a test case.

Thank you,

-- 
Sunil Mohan Adapa

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to