It's not a problem with lock. I tried to get the _pool.mutex in create_connection but helpless.
Maybe we create the connection, put it into pool, but use it without get it out? 在 2018年5月8日星期二 UTC+8下午10:18:52,jiajun...@gmail.com写道: > > I added logs in sqlalchemy/pool.py > > QueuePool.checkout > > @classmethod > def checkout(cls, pool): > rec = pool._do_get() > try: > dbapi_connection = rec.get_connection() > pool.logger.debug("jjs get conn success %s", dbapi_connection) > except: > pool.logger.debug("jjs get error in get_connection") > with util.safe_reraise(): > rec.checkin() > echo = pool._should_log_debug() > fairy = _ConnectionFairy(dbapi_connection, rec, echo) > rec.fairy_ref = weakref.ref( > fairy, > lambda ref: _finalize_fairy and > _finalize_fairy( > dbapi_connection, > rec, pool, ref, echo) > ) > _refs.add(rec) > if echo: > pool.logger.debug("jjs conn %s, pool status %s", dbapi_connection, > pool.status()) > pool.logger.debug("Connection %r checked out from pool", > dbapi_connection) > return fairy > > Then I got logs below: > > coroutine A: > > 2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO [pid:55528] > [MainThread] [tid:197123760] [log.py:109 info] > [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is False, timeout > 30 > 2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:197123760] [log.py:104 debug] > [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new connection > <connection object at 0x903f8c0; closed: 0> > 2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:197123760] [log.py:104 debug] > [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn success <connection > object at 0x903f8c0; closed: 0> > 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:197123760] [log.py:104 debug] > [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn <connection object at > 0x903f8c0; closed: 0>, pool status Pool size: 10 Connections in pool: 1 > Current Overflow: 36 Max Overflow: 40 Current Checked out connections: 45 > 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:197123760] [log.py:104 debug] > [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection <connection object at > 0x903f8c0; closed: 0> checked out from pool > > coroutine B: > > 2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO [pid:55528] > [MainThread] [tid:151519312] [log.py:109 info] > [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is False, timeout > 30 > 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:151519312] [log.py:104 debug] > [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn success <connection > object at 0x903f8c0; closed: 0> > 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:151519312] [log.py:104 debug] > [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn <connection object at > 0x903f8c0; closed: 0>, pool status Pool size: 10 Connections in pool: 0 > Current Overflow: 36 Max Overflow: 40 Current Checked out connections: 46 > 2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG [pid:55528] > [MainThread] [tid:151519312] [log.py:104 debug] > [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection <connection object at > 0x903f8c0; closed: 0> checked out from pool > > > It seems that when we create_connection in QueuePool._do_get, we may not > get the lock _pool.mutex(~Queue.mutex). > > > > 在 2018年5月7日星期一 UTC+8下午8:41:14,Mike Bayer写道: >> >> >> >> On Mon, May 7, 2018, 7:01 AM <jiajun...@gmail.com> wrote: >> >>> We added coroutine_id in psycopg2, and found that two coroutine use the >>> same connection before the pthread_mutex_lock released. >>> >>> Maybe something in the connection pool goes wrong? >>> [pid] [coroutine_id] msg >>> >> >> >> This seems unlikely. Openstack uses the SQLAlchemy connection pool with >> eventlet patching in conjunction with the MySQL drivers and nothing like >> that happens. The biggest issue SQLAlchemy had with greenlets is that a >> timed out greenlet would throw GreenletExit and cause a dirty connection to >> go back into the pool, that was fixed in 1.2. >> >> >>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 0x94122f0, >>> cb 0x23ad320 >>> >>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 >>> >>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE 0x94122f0 >>> >>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0, >>> async_cursor 0x881ac00 >>> >>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0, >>> async_cursor 0x881ac00 >>> >>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, autocommit = 0, >>> status = 2 >>> >>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 __lock 1 >>> >>> Below is the logs grep by coroutine id: >>> >>> [49174] [0xa5db730]finish send query, before psyco_wait, conn 0x94122f0 >>> >>> [49174] [0xa5db730]before have_wait_callback conn 0x94122f0 >>> >>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs conn 0x94122f0, >>> cb 0x23ad320 >>> >>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0 >>> >>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE 0x94122f0 >>> >>> [49174] [0xa5db730]conn_poll: poll writing >>> >>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ >>> >>> ----- >>> >>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor for connection >>> at 0x8de2d30 >>> >>> [49174] [0xa5db4b0]cursor_setup: init cursor object at 0xa6c2650 >>> >>> [49174] [0xa5db4b0]cursor_setup: parameters: name = (null), conn = >>> 0x8de2d30 >>> >>> [49174] [0xa5db4b0]cursor_setup: good cursor object at 0xa6c2650, refcnt >>> = 1 >>> >>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at 0xa6c2650: refcnt = 1 >>> >>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0, >>> async_cursor 0x881ac00 >>> >>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn 0x94122f0, >>> async_cursor 0x881ac00 >>> >>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0, autocommit = 0, >>> status = 2 >>> >>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner 49174 __lock 1 >>> >>> 在 2018年4月28日星期六 UTC+8下午4:07:34,jiajun...@gmail.com写道: >>>> >>>> We reproduced this problem and added logs in psycopg2, found a >>>> confusing thing and reported it to psycopg2. >>>> >>>> It seems sqlalchemy & psycopg2 stack flow is: >>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit >>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn) >>>> 3.get exception in sqlalchemy, and do_rollback >>>> 4.from sqlalchemy to psycopg: pq_abort >>>> 5.psycopg get pthread_mutex_lock deadlock >>>> >>>> What we have done to reproduce it is: stop master pg-server and promote >>>> the slave pg-server to master, with moving FIP from old master to slave. >>>> At >>>> the same time, let nova-conductor do quite a lot db query requests. >>>> >>>> [1] https://github.com/psycopg/psycopg2/issues/703 >>>> >>>> 在 2018年4月23日星期一 UTC+8下午9:45:04,Mike Bayer写道: >>>>> >>>>> On Mon, Apr 23, 2018 at 9:03 AM, <jiajun...@gmail.com> wrote: >>>>> > Sorry for reply on this topic. >>>>> > >>>>> > We recently get the same problem in our production environment. >>>>> > >>>>> > I found a patch in other lib [1], and they added conn.close() in >>>>> exception >>>>> > psycopg2.ProgrammingError. >>>>> > >>>>> > Shall we do the same in [2] ? >>>>> >>>>> SQLAlchemy does things much more carefully than that, we parse the >>>>> error message for specific ones that correspond to "connection is no >>>>> longer usable", we call these "is_disconnect" but it can be any >>>>> invalidating condition. >>>>> >>>>> You can make these yourself, and they can also be made to be part of >>>>> oslo.db, using the handle_error event: >>>>> >>>>> >>>>> http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error >>>>> >>>>> >>>>> within oslo.db you would want to propose a change here: >>>>> >>>>> >>>>> https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> > >>>>> > [1] https://github.com/aio-libs/aiopg/pull/415/files?diff=split >>>>> > >>>>> > [2] >>>>> > >>>>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289 >>>>> >>>>> > >>>>> > >>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31,JinRong Cai写道: >>>>> >> >>>>> >> Hi Michael , >>>>> >> >>>>> >> I am using openstack with postgresql which sqlalchemy and oslo_db >>>>> module >>>>> >> were used. >>>>> >> And there are some problems after my pg database switched over. >>>>> >> >>>>> >> Here is my switch over process: >>>>> >> 1. nova-conductor(python application) is running with DB connection >>>>> >> strings point to vip , which is in primary site(A) of pg. >>>>> >> 2. switch VIP from primary(A) to new primary(B) >>>>> >> 3. switch over pg: shutdown primary(A), promopt standby(B) to new >>>>> primary. >>>>> >> 4. nova-conductor is running in the whole process. >>>>> >> >>>>> >> After some seconds, I found some nova-conductor processes are hang >>>>> with >>>>> >> status futex_wait_queue_me, and the status of the query in DB is >>>>> "idle in >>>>> >> transaction", the transaction was not commited or rollbacked! >>>>> >> I think disconnection was handled in the oslo_db, which will send a >>>>> >> ping(select 1) to DB. >>>>> >> >>>>> >> If DB was switchd over, the connection in the pool would be set >>>>> with >>>>> >> status invalid, and reconnect after next check out. >>>>> >> >>>>> >> ###error messages from nova-conductor >>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] >>>>> [tid:122397712] >>>>> >> [exc_filters.py:330 _raise_for_remaining_DBAPIError] >>>>> >> [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] DBAPIError exception >>>>> wrapped from >>>>> >> (psycopg2.ProgrammingError) execute cannot be used while an >>>>> asynchronous >>>>> >> query is underway [SQL: 'SELECT 1'] >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent >>>>> call >>>>> >> last): >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File >>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in >>>>> >> _execute_context >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters context) >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters File >>>>> >> "/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, >>>>> in >>>>> >> do_execute >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters >>>>> cursor.execute(statement, >>>>> >> parameters) >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters ProgrammingError: >>>>> execute >>>>> >> cannot be used while an asynchronous query is underway >>>>> >> 36365 ERROR oslo_db.sqlalchemy.exc_filters >>>>> >> localhost nova-conductor ERROR [pid:36365] [MainThread] >>>>> [tid:122397712] >>>>> >> [log.py:122 error] [req-2bd8a290-e17b-4178-80a6-4b36d5793d85] Error >>>>> closing >>>>> >> cursor >>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool Traceback (most recent call >>>>> last): >>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool File >>>>> >> "/python2.7/site-packages/sqlalchemy/engine/base.py", line 1226, in >>>>> >> _safe_close_cursor >>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool cursor.close() >>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool ProgrammingError: close >>>>> cannot be >>>>> >> used while an asynchronous query is underway >>>>> >> 36365 ERROR sqlalchemy.pool.QueuePool >>>>> >> >>>>> >> ###ps status of nova-conductor >>>>> >> POD6-Mongodb03:/var/log/uvp-getosstat/statistics20171106101500log # >>>>> cat >>>>> >> /proc/33316/stack >>>>> >> [<ffffffff810e4c24>] futex_wait_queue_me+0xc4/0x120 >>>>> >> [<ffffffff810e5799>] futex_wait+0x179/0x280 >>>>> >> [<ffffffff810e782e>] do_futex+0xfe/0x5b0 >>>>> >> [<ffffffff810e7d60>] SyS_futex+0x80/0x180 >>>>> >> [<ffffffff81654e09>] system_call_fastpath+0x16/0x1b >>>>> >> [<ffffffffffffffff>] 0xffffffffffffffff >>>>> >> >>>>> >> ### stack of the nova-conductor process >>>>> >> POD6-Mongodb03:/tmp # pstack 33316 >>>>> >> #0 0x00002b8449e35f4d in __lll_lock_wait () from >>>>> /lib64/libpthread.so.0 >>>>> >> #1 0x00002b8449e31d02 in _L_lock_791 () from >>>>> /lib64/libpthread.so.0 >>>>> >> #2 0x00002b8449e31c08 in pthread_mutex_lock () from >>>>> >> /lib64/libpthread.so.0 >>>>> >> #3 0x00002b84554c44ab in pq_abort () from >>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so >>>>> >> #4 0x00002b84554c955e in psyco_conn_rollback () from >>>>> >> /python2.7/site-packages/psycopg2/_psycopg.so >>>>> >> #5 0x00002b8449b42b50 in PyEval_EvalFrameEx () from >>>>> >> /lib64/libpython2.7.so.1.0 >>>>> >> #6 0x00002b8449b42ad0 in PyEval_EvalFrameEx () from >>>>> >> /lib64/libpython2.7.so.1.0 >>>>> >> >>>>> >> The psycopg2 was trying to close the cursor, and try to get the >>>>> mutex lock >>>>> >> "pthread_mutex_lock", but it seems that the cursor was used by >>>>> other >>>>> >> session. >>>>> >> >>>>> >> >>>>> >> Questions: >>>>> >> >>>>> >> 1. What the error "ProgrammingError: close cannot be used while an >>>>> >> asynchronous query is underway" mean? >>>>> >> AFAIK, these caused by psycopg2, which means a asynchronous query >>>>> was >>>>> >> executed in one connection. >>>>> >> But the I think the sqlalchemy was thread safe since it was patched >>>>> by >>>>> >> eventlet, see details in eventlet/support/psycopg2_patcher.py >>>>> >> And we can see different green thread number in the log, as: >>>>> >> [pid:36365] [MainThread] [tid:122397712] >>>>> >> [pid:36365] [MainThread] [tid:122397815] >>>>> >> So, I guess the connection pool in one process is safe. >>>>> >> >>>>> >> 2. The nova-conductor was a multi-thread python client, which >>>>> forked >>>>> >> several child process. >>>>> >> ps -elf|grep -i nova-conductor >>>>> >> 30878 1 pool_s /usr/bin/nova-conductor >>>>> >> 36364 1 ep_pol /usr/bin/nova-conductor >>>>> >> 36365 1 futex_ /usr/bin/nova-conductor >>>>> >> 36366 1 ep_pol /usr/bin/nova-conductor >>>>> >> 36367 1 ep_pol /usr/bin/nova-conductor >>>>> >> 36368 1 ep_pol /usr/bin/nova-conductor >>>>> >> >>>>> >> If the nova-conductor was started with only one child, the problem >>>>> was not >>>>> >> happen. >>>>> >> Does this mean the connection/engine CAN NOT shared in these child >>>>> >> processes? >>>>> >> >>>>> >> Thanks. >>>>> > >>>>> > -- >>>>> > SQLAlchemy - >>>>> > The Python SQL Toolkit and Object Relational Mapper >>>>> > >>>>> > http://www.sqlalchemy.org/ >>>>> > >>>>> > To post example code, please provide an MCVE: Minimal, Complete, and >>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a >>>>> full >>>>> > description. >>>>> > --- >>>>> > You received this message because you are subscribed to the Google >>>>> Groups >>>>> > "sqlalchemy" group. >>>>> > To unsubscribe from this group and stop receiving emails from it, >>>>> send an >>>>> > email to sqlalchemy+...@googlegroups.com. >>>>> > To post to this group, send email to sqlal...@googlegroups.com. >>>>> > Visit this group at https://groups.google.com/group/sqlalchemy. >>>>> > For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>> SQLAlchemy - >>> The Python SQL Toolkit and Object Relational Mapper >>> >>> http://www.sqlalchemy.org/ >>> >>> To post example code, please provide an MCVE: Minimal, Complete, and >>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >>> description. >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to sqlalchemy+...@googlegroups.com. >>> To post to this group, send email to sqlal...@googlegroups.com. >>> Visit this group at https://groups.google.com/group/sqlalchemy. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.