Re: [sqlalchemy] Re: long transaction after database switched over

Mike Bayer Mon, 14 May 2018 21:41:23 -0700


On Mon, May 14, 2018 at 9:38 PM,  <jiajunsu....@gmail.com> wrote:
> The tid is coroutine_id, we get it by id(greenlet.getcurrent()) in each
> coroutine(green thread).
>
> We run nova-conductor with monkey_patch, the same as [1],
> eventlet.monkey_patch(os=False)
>
> Our nova-conductor is running with multi workers.
>
> It's confusing why the _finalize_fairy is called twice but only log once at
> the entrance, that's why I guess it's called by weakref's callback.

For 130446032 which is the tid you mentioned, _finalize_fairy is only
called once in your log:

$ cat nc_5679.log | grep -e "130446032.*jjs _finalize"
2018-05-14T11:11:39.955+08:00 localhost nova-conductor DEBUG
[pid:5679] [MainThread] [tid:130446032] [log.py:104 debug]
[req-841bb432-7cd2-4fa3-9def-559a34b05a21] jjs _finalize_fairy args
conn None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord
object at 0x70cdb10>. ref None, fairy
<sqlalchemy.pool._ConnectionFairy object at 0x8a87390> debug

Also, the log shows that no _finalize_fairy is ever called due to
"del" - this would show the weakref "ref" as non-None:

$ cat nc_5679.log | grep -e "FINALIZE FAIRY.*REF:.*" | grep -v "REF: None"
$

I would assume by "finalize fairy" called twice, you're referring to
the fact that you see the "exception during reset or simlar" message.
But that message is still AFAICT impossible to be emitted without the
function being entered, and a corresponding "FINALIZE FAIRY" log line
preceding it, unless something is going wrong with the logging, or the
interpreter is somehow not functioning correctly (I wonder if eventlet
had any such issues).   I'm struggling to see any other possibility
but not coming up with any.

One weird thing I'm trying to figure out is that I see a connection
being returned to the pool normally, but then nothing is logged about
it after that:

2018-05-14T11:11:21.183+08:00 localhost nova-conductor DEBUG
[pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
[req-19a57f61-bc64-4575-8253-00b5bfe62a74] Connection <connection
object at 0x7f419c0; closed: 0> being returned to pool,
connection_record.connection <connection object at 0x7f419c0; closed:
0> debug


the next log message from tid:130446352 should be "jjs conn reset".
But that never happens.  The next we see of that tid is:

2018-05-14T11:11:21.549+08:00 localhost nova-conductor DEBUG
[pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
[req-0ef61146-5cb6-4f46-9788-406b6d376843] jjs
_ConnectionFairy._checkout fariy None debug
2018-05-14T11:11:21.549+08:00 localhost nova-conductor DEBUG
[pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
[req-0ef61146-5cb6-4f46-9788-406b6d376843] jjs before pool._do_get
pool queue deque([<sqlalchemy.pool._ConnectionRecord object at
0x7777e10>, <sqlalchemy.pool._ConnectionRecord object at 0x75fb7d0>,
<sqlalchemy.pool._ConnectionRecord object at 0x778d810>,
<sqlalchemy.pool._ConnectionRecord object at 0x7600610>,
<sqlalchemy.pool._ConnectionRecord object at 0x6ea7e10>,
<sqlalchemy.pool._ConnectionRecord object at 0x7798050>,
<sqlalchemy.pool._ConnectionRecord object at 0x796d110>,
<sqlalchemy.pool._ConnectionRecord object at 0x7798d50>,
<sqlalchemy.pool._ConnectionRecord object at 0x75f5710>]) debug

that is, it's checking out a new connection ?  If it were blocking on
trying to reset the connection, we wouldn't see that tid again.  I
wonder if the greenlet was interrupted or something.    I'm finding
various impossibilities like this going on and not able to come up
with any theory other than the interpreter is somehow not functioning
correcltly, or the logging here is somehow incomplete.

I notice you're on a very old version of SQLAlchemy (three years old).
    You really should be on at least the 1.1 series because for
greenlets you need to have:
http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#change-3803.








>
> [1]https://github.com/openstack/nova/blob/master/nova/cmd/__init__.py
>
>
> 在 2018年5月14日星期一 UTC+8下午11:59:43，Mike Bayer写道：
>>
>> I notice in this log file there are multiple "tid", which I assume to
>> be thread identifiers ?   E.g. OS-level threads?    Are you using
>> eventlet or not?    Are you monkeypatching SQLAlchemy and then using
>> multiple OS-level threads at the same time? Because that likely
>> doesn't work.
>>
>> Particularly, the log here illustrates an impossible codepath.
>>
>> We enter finalize_fairy with connection is None:
>>
>> 2018-05-14T11:11:39.961+08:00 localhost nova-conductor DEBUG
>> [pid:5679] [MainThread] [tid:130446352] [log.py:104 debug]
>> [req-74ef6531-2d92-40ea-a524-f54e11edab06] FINALIZE FAIRY: DB CONN:
>> None, FAIRY: <sqlalchemy.pool._ConnectionFairy object at 0x87dd3d0>,
>> REF: None  CONNECTION_REC: <sqlalchemy.pool._ConnectionRecord object
>> at 0x776fe90> CONNECTION_REC REF <weakref at 0x7c5cdb8; to
>> '_ConnectionFairy' at 0x87dd3d0> debug
>>
>> this seems to make it through to the checkin at the bottom.
>>
>> Then, we see this:
>>
>> 2018-05-14T11:11:39.968+08:00 localhost nova-conductor ERROR
>> [pid:5679] [MainThread] [tid:130446032] [log.py:122 error]
>> [req-841bb432-7cd2-4fa3-9def-559a34b05a21] Exception during reset or
>> similar
>>
>> The only place that message is generated is *in* finalize_fairy,
>> *before* the checkin occurs, and *only* if _finalize_fairy was called
>> *with* a connection that is non -None.    There is no other call
>> _finalize_fairy with that request identifier, therefore I see no
>> evidence of where this message "exception during reset" can be
>> generated from - there would at least need to be an additional log
>> line with "FINALIZE FAIRY" and the connection non-None for this to be
>> reached.    I see no way that the pool.py shipped with SQLAlchemy can
>> possibly generate this series of steps.
>>
>> This is an extremely intricate issue and I still have only the vaguest
>> notion of how you are running SQLAlchemy / Nova.
>>
>>
>>
>>
>> On Mon, May 14, 2018 at 11:10 AM, Mike Bayer <mik...@zzzcomputing.com>
>> wrote:
>> > OK I need the exact pool.py you are using as well that includes all
>> > these logging statements and any other adjustments that were made.
>> >
>> >
>> >
>> > On Mon, May 14, 2018 at 8:02 AM,  <jiajun...@gmail.com> wrote:
>> >> I've uploaded the logs here:
>> >> https://raw.githubusercontent.com/jiajunsu/test_logs/master/nc_5679.log
>> >>
>> >> The ConnectionRecord is 0x70cdb10, tid(coroutine id) is 130446032
>> >>
>> >> At  2018-05-14T11:11:39.970, we got the log " jjs rec already in pool
>> >> checkin", but I just deleted `return` in code, so the problem happen
>> >> again.
>> >>
>> >>
>> >>
>> >>
>> >> 在 2018年5月14日星期一 UTC+8上午10:48:49，Mike Bayer写道：
>> >>>
>> >>> as well as the log that shows this:
>> >>>
>> >>> +    if self in pool._pool.queue:
>> >>> +        pool.logger.debug("jjs rec already in pool checkin, skip %r",
>> >>> self)
>> >>> +        return
>> >>>
>> >>>
>> >>> actually happening.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Sun, May 13, 2018 at 10:42 PM, Mike Bayer <mik...@zzzcomputing.com>
>> >>> wrote:
>> >>> > On Sun, May 13, 2018 at 10:35 PM,  <jiajun...@gmail.com> wrote:
>> >>> >> "jjs" is short for my name and "cjr" is my colleague's(JinRong
>> >>> >> Cai). We
>> >>> >> work
>> >>> >> together and add logs in the same environment, add prefix of name
>> >>> >> to
>> >>> >> get a
>> >>> >> keyword for log filter.
>> >>> >>
>> >>> >> The doubled ConnectionRecords(with None connection) are got by two
>> >>> >> different
>> >>> >> coroutines, and the variable record.connection is rewrite by the
>> >>> >> second
>> >>> >> coroutines.
>> >>> >> After the record being put back to QueuePool again, there are two
>> >>> >> same
>> >>> >> record with same connection in the pool, which will cause the next
>> >>> >> two
>> >>> >> coroutines get conflict and deadlock in psycopg2 as above.
>> >>> >
>> >>> > OK, I need the detailed step by step that shows how the two
>> >>> > coroutines
>> >>> > are interacting such that they both are returning the same
>> >>> > connection.
>> >>> >   The nova log above seems to show just one coroutine I assume since
>> >>> > there is just one nova request ID, if I understand correctly.
>> >>> >
>> >>> >>
>> >>> >> The logs of psycopg2 has been pasted at
>> >>> >> https://github.com/psycopg/psycopg2/issues/703
>> >>> >
>> >>> > The above issue says the bug has been found in SQLAlchemy but there
>> >>> > is
>> >>> > no bug report yet and I've not yet been shown how the behavior here
>> >>> > is
>> >>> > possible.      I posted a straightforward example above, can you
>> >>> > please show me where the two coroutines get mixed up ?
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> 在 2018年5月11日星期五 UTC+8上午3:05:46，Mike Bayer写道：
>> >>> >>>
>> >>> >>> I tried to follow your log lines, which is difficult because I
>> >>> >>> don't
>> >>> >>> know where those log statements actually are and I don't know what
>> >>> >>> acronyms like "jjs", "cjr", mean.   But it does look like it's al
>> >>> >>> in
>> >>> >>> one thread (so not async GC) and all in one Nova request (so
>> >>> >>> perhaps
>> >>> >>> just one greenlet involved, unless other greenlets are affecting
>> >>> >>> this
>> >>> >>> without them being displayed in this log snippet).   I do see the
>> >>> >>> connection record doubled at the end.   So here is the
>> >>> >>> illustration of
>> >>> >>> those steps in isolation, the connection record is not doubled.
>> >>> >>> Can
>> >>> >>> you alter this program to show this condition occurring?
>> >>> >>>
>> >>> >>> from sqlalchemy import create_engine
>> >>> >>>
>> >>> >>> e = create_engine("postgresql://scott:tiger@localhost/test",
>> >>> >>> echo_pool='debug')
>> >>> >>> e.connect().close()
>> >>> >>>
>> >>> >>> assert len(e.pool._pool.queue) == 1
>> >>> >>>
>> >>> >>> c1 = e.connect()
>> >>> >>>
>> >>> >>> assert len(e.pool._pool.queue) == 0
>> >>> >>>
>> >>> >>> c1.engine.pool._invalidate(c1._Connection__connection, None)
>> >>> >>> c1.invalidate(e)
>> >>> >>>
>> >>> >>> # connection record is back
>> >>> >>> assert len(e.pool._pool.queue) == 1
>> >>> >>>
>> >>> >>> # do a reconnect
>> >>> >>> c1.connection
>> >>> >>>
>> >>> >>> # uses the record again
>> >>> >>> assert len(e.pool._pool.queue) == 0
>> >>> >>>
>> >>> >>> # close the Connection
>> >>> >>> c1.close()
>> >>> >>>
>> >>> >>> # record is back
>> >>> >>> assert len(e.pool._pool.queue) == 1
>> >>> >>>
>> >>> >>> # etc.
>> >>> >>> c1 = e.connect()
>> >>> >>> assert len(e.pool._pool.queue) == 0
>> >>> >>>
>> >>> >>> c1.close()
>> >>> >>> assert len(e.pool._pool.queue) == 1
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> On Thu, May 10, 2018 at 12:25 PM, Mike Bayer
>> >>> >>> <mik...@zzzcomputing.com>
>> >>> >>> wrote:
>> >>> >>> > Here's a print statement you might want to try:
>> >>> >>> >
>> >>> >>> > diff --git a/lib/sqlalchemy/pool.py b/lib/sqlalchemy/pool.py
>> >>> >>> > index 89a4cea7c..452db4883 100644
>> >>> >>> > --- a/lib/sqlalchemy/pool.py
>> >>> >>> > +++ b/lib/sqlalchemy/pool.py
>> >>> >>> > @@ -689,6 +689,19 @@ def _finalize_fairy(connection,
>> >>> >>> > connection_record,
>> >>> >>> >      been garbage collected.
>> >>> >>> >
>> >>> >>> >      """
>> >>> >>> > +    print(
>> >>> >>> > +        "FINALIZE FAIRY:  GREENLET ID: %s, DB CONN: %s, FAIRY:
>> >>> >>> > %s,
>> >>> >>> > REF:
>> >>> >>> > %s  "
>> >>> >>> > +        "CONNECTION_REC: %s CONNECTION_REC REF %s" % (
>> >>> >>> > +            "put greenlet id here",
>> >>> >>> > +            connection,
>> >>> >>> > +            fairy,
>> >>> >>> > +            ref,
>> >>> >>> > +            connection_record,
>> >>> >>> > +            connection_record.fairy_ref
>> >>> >>> > +            if connection_record is not None else "n/a"
>> >>> >>> > +        )
>> >>> >>> > +    )
>> >>> >>> > +
>> >>> >>> >      _refs.discard(connection_record)
>> >>> >>> >
>> >>> >>> >      if ref is not None:
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > On Thu, May 10, 2018 at 12:21 PM, Mike Bayer
>> >>> >>> > <mik...@zzzcomputing.com>
>> >>> >>> > wrote:
>> >>> >>> >> On Thu, May 10, 2018 at 7:23 AM,  <jiajun...@gmail.com> wrote:
>> >>> >>> >>> Hi Mike,
>> >>> >>> >>>
>> >>> >>> >>> I added more logs and finnaly find the reason and a method to
>> >>> >>> >>> avoid
>> >>> >>> >>> the
>> >>> >>> >>> deadlock between coroutines.
>> >>> >>> >>>
>> >>> >>> >>> In summary, the deadlock happend because two different
>> >>> >>> >>> coroutines
>> >>> >>> >>> use
>> >>> >>> >>> the
>> >>> >>> >>> same ConnectionRecord with same connection.
>> >>> >>> >>> It happends under QueuePool, when the DB connections are reset
>> >>> >>> >>> by
>> >>> >>> >>> the
>> >>> >>> >>> DB
>> >>> >>> >>> server(not related with the backend driver).
>> >>> >>> >>>
>> >>> >>> >>> 1.A coroutine get a ConnectionRecord into ConnectionFairy, and
>> >>> >>> >>> apply a
>> >>> >>> >>> weakref of the ConnectionFairy, with callback function
>> >>> >>> >>> `_finalize_fairy`.
>> >>> >>> >>> 2.When the connection get disconnected, it receives an
>> >>> >>> >>> exception(psycopg2.OperationalError), and goto
>> >>> >>> >>> Connection._handle_dbapi_exception.
>> >>> >>> >>> 3.In `_handle_dbapi_exception.finnaly`, it will call both
>> >>> >>> >>> `ConnectionFairy.invalidate` and `self.invalidate`.
>> >>> >>> >>> 4.In `ConnectionFairy.invalidate`,
>> >>> >>> >>> `_ConnectionRecord._checkin` is
>> >>> >>> >>> called,
>> >>> >>> >>> and put the ConnectionRecord back to the QueuePool.
>> >>> >>> >>> 5.In `self.invalidate`, `del self._root.__connection` is
>> >>> >>> >>> called,
>> >>> >>> >>> and
>> >>> >>> >>> **del**
>> >>> >>> >>> the ConnectionFairy.
>> >>> >>> >>> 6.The **del** operation will make weakref of ConnectionFairy
>> >>> >>> >>> to
>> >>> >>> >>> dead,
>> >>> >>> >>> and
>> >>> >>> >>> the callback `_finalize_fairy` executed.
>> >>> >>> >>> 7.In `_finalize_fairy`, another `_ConnectionRecord._checkin`
>> >>> >>> >>> is
>> >>> >>> >>> called, put
>> >>> >>> >>> the ConnectionRecord to the QueuePool again.
>> >>> >>> >>
>> >>> >>> >> Here's what does not make sense.  You are claiming that a
>> >>> >>> >> simple
>> >>> >>> >> invalidation will result in the same ConnectionRecord being
>> >>> >>> >> checked
>> >>> >>> >> in
>> >>> >>> >> twice.   This is obviously not the case, what is special in
>> >>> >>> >> your
>> >>> >>> >> situation that makes this happen?
>> >>> >>> >>
>> >>> >>> >> If I follow the steps you refer towards, in step 4, when
>> >>> >>> >> ConnectionFairy._checkin is called, that calls finalize_fairy
>> >>> >>> >> directly, which then calls ConnectionRecord.checkin().  Within
>> >>> >>> >> ConnectionRecord.checkin(), ConnectionRecord.fairy_ref is set
>> >>> >>> >> to
>> >>> >>> >> None.
>> >>> >>> >>    Then back in ConnectionFairy._checkin(), it also sets
>> >>> >>> >> self._connection_record = None.
>> >>> >>> >>
>> >>> >>> >> If we come back into finalize_fairy a *second* time then, as a
>> >>> >>> >> result
>> >>> >>> >> of the "del" which occurs in _handle_error(), and the
>> >>> >>> >> connection_record is present, the function will exit
>> >>> >>> >> immediately,
>> >>> >>> >> because:
>> >>> >>> >>
>> >>> >>> >>        if connection_record.fairy_ref is not ref:
>> >>> >>> >>             return
>> >>> >>> >>
>> >>> >>> >> So the situation as given is not possible without concurrent
>> >>> >>> >> access
>> >>> >>> >> to
>> >>> >>> >> the ConnectionFairy being introduced externally to the pool.
>> >>> >>> >>
>> >>> >>> >> There is exactly one thing I can think of that is both unique
>> >>> >>> >> to
>> >>> >>> >> the
>> >>> >>> >> psycopg2-style of asynchronous connection, not the PyMySQL form
>> >>> >>> >> of
>> >>> >>> >> it,
>> >>> >>> >> that might be important here, and it's that I think psycopg2's
>> >>> >>> >> connection.close() method will defer to another greenlet.  That
>> >>> >>> >> means
>> >>> >>> >> when we're inside of ConnectionRecord.invalidate(), and we call
>> >>> >>> >> self.__close(), that could be deferring to another greenlet.
>> >>> >>> >> But
>> >>> >>> >> this all happens before the connection is returned to the pool,
>> >>> >>> >> so
>> >>> >>> >> I
>> >>> >>> >> still don't know what is happening.
>> >>> >>> >>
>> >>> >>> >>>
>> >>> >>> >>> The logs in detail is below. To prove my result is right, I
>> >>> >>> >>> tried
>> >>> >>> >>> to
>> >>> >>> >>> avoid
>> >>> >>> >>> the scienario with this patch, and it worked.
>> >>> >>> >>>
>> >>> >>> >>> def checkin(self):
>> >>> >>> >>>     self.fairy_ref = None
>> >>> >>> >>>     connection = self.connection
>> >>> >>> >>>     pool = self.__pool
>> >>> >>> >>>
>> >>> >>> >>>     while self.finalize_callback:
>> >>> >>> >>>         finalizer = self.finalize_callback.pop()
>> >>> >>> >>>         finalizer(connection)
>> >>> >>> >>> +    if self in pool._pool.queue:
>> >>> >>> >>> +        pool.logger.debug("jjs rec already in pool checkin,
>> >>> >>> >>> skip
>> >>> >>> >>> %r",
>> >>> >>> >>> self)
>> >>> >>> >>> +        return
>> >>> >>> >>
>> >>> >>> >> as mentioned above it's not possible to reach this point
>> >>> >>> >> without
>> >>> >>> >> concurrent access to the pooled connection, which means this
>> >>> >>> >> fix
>> >>> >>> >> could
>> >>> >>> >> fail also (with at least threaded-style concurrency, not
>> >>> >>> >> greenlet
>> >>> >>> >> style).   The original source of the greenlet issue must still
>> >>> >>> >> be
>> >>> >>> >> identified.
>> >>> >>> >>
>> >>> >>> >> I would need a self-contained proof of concept that
>> >>> >>> >> demonstrates
>> >>> >>> >> this
>> >>> >>> >> issue occurring.  The log output is not that helpful because it
>> >>> >>> >> at
>> >>> >>> >> least would need to write out the identifier of the current
>> >>> >>> >> greenlet,
>> >>> >>> >> since it is very likely that greenlets have something to do
>> >>> >>> >> with
>> >>> >>> >> this
>> >>> >>> >> issue.
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >>>
>> >>> >>> >>>     if pool.dispatch.checkin:
>> >>> >>> >>>         pool.dispatch.checkin(connection, self)
>> >>> >>> >>>     pool._return_conn(self)
>> >>> >>> >>>
>> >>> >>> >>> Trace the ConnectionRecord 0x88fc5d0
>> >>> >>> >>>
>> >>> >>> >>> 2018-05-10T14:02:17.927+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Created new
>> >>> >>> >>> connection
>> >>> >>> >>> <connection object at 0x81f69b0; closed: 0>
>> >>> >>> >>> 2018-05-10T14:02:17.928+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs get conn
>> >>> >>> >>> success
>> >>> >>> >>> <connection
>> >>> >>> >>> object at 0x81f69b0; closed: 0>, rec.connection <connection
>> >>> >>> >>> object
>> >>> >>> >>> at
>> >>> >>> >>> 0x81f69b0; closed: 0>, pool queue
>> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord
>> >>> >>> >>> object at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord
>> >>> >>> >>> object at
>> >>> >>> >>> 0x79fdad0>])
>> >>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs checkout fairy
>> >>> >>> >>> <sqlalchemy.pool._ConnectionFairy object at 0x329c0d0> rec
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0> pool
>> >>> >>> >>> queue
>> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x7a04810>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x79fdad0>])
>> >>> >>> >>> 2018-05-10T14:02:17.929+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Connection
>> >>> >>> >>> <connection
>> >>> >>> >>> object at
>> >>> >>> >>> 0x81f69b0; closed: 0> checked out from pool, status Pool size:
>> >>> >>> >>> 10
>> >>> >>> >>> Connections in pool: 2 Current Overflow: 0 Max Overflow: 40
>> >>> >>> >>> Current
>> >>> >>> >>> Checked
>> >>> >>> >>> out connections: 8, queue
>> >>> >>> >>> deque([<sqlalchemy.pool._ConnectionRecord
>> >>> >>> >>> object
>> >>> >>> >>> at 0x7a04810>, <sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x79fdad0>])
>> >>> >>> >>> 2018-05-10T14:02:17.930+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>> >>> >>> >>> _ConnectionFairy._checkout
>> >>> >>> >>> conn <connection object at 0x81f69b0; closed: 0>, rec
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>,
>> >>> >>> >>> rec.conn
>> >>> >>> >>> <connection object at 0x81f69b0; closed: 0>
>> >>> >>> >>> 2018-05-10T14:02:49.716+08:00 localhost nova-conductor ERROR
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr error conn
>> >>> >>> >>> <sqlalchemy.engine.base.Connection object at 0x329c150>, err
>> >>> >>> >>> connection
>> >>> >>> >>> pointer is NULL
>> >>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter newraise
>> >>> >>> >>> 2018-05-10T14:02:49.717+08:00 localhost nova-conductor ERROR
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] cjr enter finally
>> >>> >>> >>>>>Connection._handle_dbapi_exception --> finally -->
>> >>> >>> >>>>> **self.engine.pool._invalidate** & **self.invalidate**
>> >>> >>> >>>>>self.engine.pool._invalidate --> Pool._invalidate -->
>> >>> >>> >>>>> ConnectionFairy.invalidate --> _ConnectionRecord.invalidate
>> >>> >>> >>>>> &
>> >>> >>> >>>>> **self._checkin()**
>> >>> >>> >>>>>self.invalidate() --> **del self._root.__connection**
>> >>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs connectionFairy
>> >>> >>> >>> invalidate
>> >>> >>> >>> 2018-05-10T14:02:49.718+08:00 localhost nova-conductor INFO
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:109 info]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Invalidate
>> >>> >>> >>> connection
>> >>> >>> >>> <connection
>> >>> >>> >>> object at 0x81f69b0; closed: 2> (reason:
>> >>> >>> >>> OperationalError:connection
>> >>> >>> >>> pointer
>> >>> >>> >>> is NULL
>> >>> >>> >>>>>_ConnectionRecord.invalidate(e)  -->
>> >>> >>> >>>>> self.__pool.dispatch.invalidate(self.connection, self, e) &
>> >>> >>> >>>>> self.__close() &
>> >>> >>> >>>>> self.connection = None
>> >>> >>> >>>>>self.__close() --> self.finalize_callback.clear() &
>> >>> >>> >>>>> self.__pool._close_connection(self.connection)
>> >>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Closing connection
>> >>> >>> >>> <connection
>> >>> >>> >>> object at 0x81f69b0; closed: 2>
>> >>> >>> >>>>>Pool._close_connection(connection) -->
>> >>> >>> >>>>> self._dialect.do_close(connection)
>> >>> >>> >>>>> --> _ConnDialect.do_close(dbapi_connection) -->
>> >>> >>> >>>>> dbapi_connection.close() FIN
>> >>> >>> >>> 2018-05-10T14:02:49.719+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs
>> >>> >>> >>> _ConnectionFairy._checkin
>> >>> >>> >>> conn None rec <sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x88fc5d0>
>> >>> >>> >>> rec.conn None
>> >>> >>> >>>>>_ConnectionFairy._checkin --> _finalize_fairy(conn, conn_rec,
>> >>> >>> >>>>> pool,
>> >>> >>> >>>>> None,
>> >>> >>> >>>>> echo, fairy) & self.conn = None & self.conn_rec = None
>> >>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs _finalize_fairy
>> >>> >>> >>> args
>> >>> >>> >>> conn
>> >>> >>> >>> None, rec.conn None, rec, <sqlalchemy.pool._ConnectionRecord
>> >>> >>> >>> object at
>> >>> >>> >>> 0x88fc5d0>. ref None, fairy <sqlalchemy.pool._ConnectionFairy
>> >>> >>> >>> object
>> >>> >>> >>> at
>> >>> >>> >>> 0x329c0d0>
>> >>> >>> >>>>>_finalize_fairy --> connection_record.checkin()
>> >>> >>> >>> 2018-05-10T14:02:49.720+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>> >>> >>> >>> connection_record.checkin
>> >>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn
>> >>> >>> >>> checkin
>> >>> >>> >>> None,
>> >>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x7524590>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>]),
>> >>> >>> >>> finalize_callback
>> >>> >>> >>> deque([])
>> >>> >>> >>> 2018-05-10T14:02:49.721+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn
>> >>> >>> >>> checkin,
>> >>> >>> >>> pool
>> >>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x7524590>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>])
>> >>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor ERROR
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:122 error]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] Exception during
>> >>> >>> >>> reset
>> >>> >>> >>> or
>> >>> >>> >>> similar
>> >>> >>> >>> 2018-05-10T14:02:49.723+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>> >>> >>> >>> connection_record.invalidate
>> >>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs going to
>> >>> >>> >>> connection_record.checkin
>> >>> >>> >>> 2018-05-10T14:02:49.724+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs before conn
>> >>> >>> >>> checkin
>> >>> >>> >>> None,
>> >>> >>> >>> pool queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x7524590>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>]),
>> >>> >>> >>> finalize_callback
>> >>> >>> >>> deque([])
>> >>> >>> >>> 2018-05-10T14:02:49.732+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>> [pid:4528]
>> >>> >>> >>> [MainThread] [tid:122769072] [log.py:104 debug]
>> >>> >>> >>> [req-9e51bb8d-cdfe-434b-859a-9211ac2087f9] jjs after conn
>> >>> >>> >>> checkin,
>> >>> >>> >>> pool
>> >>> >>> >>> queue deque([<sqlalchemy.pool._ConnectionRecord object at
>> >>> >>> >>> 0x7524590>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7a2b990>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0xa856b10>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x7ebee90>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>,
>> >>> >>> >>> <sqlalchemy.pool._ConnectionRecord object at 0x88fc5d0>])
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> 在 2018年5月8日星期二 UTC+8下午11:06:04，jiajun...@gmail.com写道：
>> >>> >>> >>>>
>> >>> >>> >>>> I found why we could get the same connection twice from the
>> >>> >>> >>>> pool...
>> >>> >>> >>>>
>> >>> >>> >>>> Because we put it into pool twice.
>> >>> >>> >>>>
>> >>> >>> >>>> Any suggestion for finding why it put the same connection
>> >>> >>> >>>> into
>> >>> >>> >>>> pool
>> >>> >>> >>>> within
>> >>> >>> >>>> different coroutines?
>> >>> >>> >>>>
>> >>> >>> >>>>
>> >>> >>> >>>> 2018-05-08T22:39:17.448+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>> [pid:15755]
>> >>> >>> >>>> [MainThread] [tid:187954992] [log.py:104 debug]
>> >>> >>> >>>> [req-7c2199c0-d9a6-4131-8358-2aebcf5a4ce1] jjs conn checkin
>> >>> >>> >>>> <connection
>> >>> >>> >>>> object at 0x111d9740; closed: 0>
>> >>> >>> >>>> 2018-05-08T22:39:17.502+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>> [pid:15755]
>> >>> >>> >>>> [MainThread] [tid:178645264] [log.py:104 debug]
>> >>> >>> >>>> [req-7ab83c30-91bd-4278-b694-85a71035992c] jjs conn checkin
>> >>> >>> >>>> <connection
>> >>> >>> >>>> object at 0x111d9740; closed: 0>
>> >>> >>> >>>> ...
>> >>> >>> >>>> 2018-05-08T22:39:17.992+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>> [pid:15755]
>> >>> >>> >>>> [MainThread] [tid:309091856] [log.py:104 debug]
>> >>> >>> >>>> [req-dee5dd2d-866f-408e-999b-b415ea7ce953] Connection
>> >>> >>> >>>> <connection
>> >>> >>> >>>> object at
>> >>> >>> >>>> 0x111d9740; closed: 0> checked out from pool
>> >>> >>> >>>> 2018-05-08T22:39:18.039+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>> [pid:15755]
>> >>> >>> >>>> [MainThread] [tid:178645104] [log.py:104 debug]
>> >>> >>> >>>> [req-ac8017d7-c97e-44ca-92d2-99f180858e14] Connection
>> >>> >>> >>>> <connection
>> >>> >>> >>>> object at
>> >>> >>> >>>> 0x111d9740; closed: 0> checked out from pool
>> >>> >>> >>>>
>> >>> >>> >>>>
>> >>> >>> >>>> 在 2018年5月8日星期二 UTC+8下午10:18:52，jiajun...@gmail.com写道：
>> >>> >>> >>>>>
>> >>> >>> >>>>> I added logs in sqlalchemy/pool.py
>> >>> >>> >>>>>
>> >>> >>> >>>>> QueuePool.checkout
>> >>> >>> >>>>>
>> >>> >>> >>>>> @classmethod
>> >>> >>> >>>>> def checkout(cls, pool):
>> >>> >>> >>>>>     rec = pool._do_get()
>> >>> >>> >>>>>     try:
>> >>> >>> >>>>>         dbapi_connection = rec.get_connection()
>> >>> >>> >>>>>         pool.logger.debug("jjs get conn success %s",
>> >>> >>> >>>>> dbapi_connection)
>> >>> >>> >>>>>     except:
>> >>> >>> >>>>>         pool.logger.debug("jjs get error in get_connection")
>> >>> >>> >>>>>         with util.safe_reraise():
>> >>> >>> >>>>>             rec.checkin()
>> >>> >>> >>>>>     echo = pool._should_log_debug()
>> >>> >>> >>>>>     fairy = _ConnectionFairy(dbapi_connection, rec, echo)
>> >>> >>> >>>>>     rec.fairy_ref = weakref.ref(
>> >>> >>> >>>>>         fairy,
>> >>> >>> >>>>>         lambda ref: _finalize_fairy and
>> >>> >>> >>>>>         _finalize_fairy(
>> >>> >>> >>>>>             dbapi_connection,
>> >>> >>> >>>>>             rec, pool, ref, echo)
>> >>> >>> >>>>>     )
>> >>> >>> >>>>>     _refs.add(rec)
>> >>> >>> >>>>>     if echo:
>> >>> >>> >>>>>         pool.logger.debug("jjs conn %s, pool status %s",
>> >>> >>> >>>>> dbapi_connection, pool.status())
>> >>> >>> >>>>>         pool.logger.debug("Connection %r checked out from
>> >>> >>> >>>>> pool",
>> >>> >>> >>>>>                           dbapi_connection)
>> >>> >>> >>>>>     return fairy
>> >>> >>> >>>>>
>> >>> >>> >>>>> Then I got logs below:
>> >>> >>> >>>>>
>> >>> >>> >>>>> coroutine A:
>> >>> >>> >>>>>
>> >>> >>> >>>>> 2018-05-08T21:39:59.055+08:00 localhost nova-conductor INFO
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:109 info]
>> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs Pool wait is
>> >>> >>> >>>>> False,
>> >>> >>> >>>>> timeout
>> >>> >>> >>>>> 30
>> >>> >>> >>>>> 2018-05-08T21:39:59.078+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Created new
>> >>> >>> >>>>> connection
>> >>> >>> >>>>> <connection object at 0x903f8c0; closed: 0>
>> >>> >>> >>>>> 2018-05-08T21:39:59.079+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs get conn
>> >>> >>> >>>>> success
>> >>> >>> >>>>> <connection
>> >>> >>> >>>>> object at 0x903f8c0; closed: 0>
>> >>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] jjs conn
>> >>> >>> >>>>> <connection
>> >>> >>> >>>>> object at
>> >>> >>> >>>>> 0x903f8c0;  closed: 0>, pool status Pool size: 10
>> >>> >>> >>>>> Connections
>> >>> >>> >>>>> in
>> >>> >>> >>>>> pool: 1
>> >>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out
>> >>> >>> >>>>> connections: 45
>> >>> >>> >>>>> 2018-05-08T21:39:59.080+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:197123760] [log.py:104 debug]
>> >>> >>> >>>>> [req-ba1ab897-c80a-4bc4-922a-39044d669da1] Connection
>> >>> >>> >>>>> <connection
>> >>> >>> >>>>> object at
>> >>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool
>> >>> >>> >>>>>
>> >>> >>> >>>>> coroutine B:
>> >>> >>> >>>>>
>> >>> >>> >>>>> 2018-05-08T21:39:59.212+08:00 localhost nova-conductor INFO
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:109 info]
>> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs Pool wait is
>> >>> >>> >>>>> False,
>> >>> >>> >>>>> timeout
>> >>> >>> >>>>> 30
>> >>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs get conn
>> >>> >>> >>>>> success
>> >>> >>> >>>>> <connection
>> >>> >>> >>>>> object at 0x903f8c0;  closed: 0>
>> >>> >>> >>>>> 2018-05-08T21:39:59.213+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] jjs conn
>> >>> >>> >>>>> <connection
>> >>> >>> >>>>> object at
>> >>> >>> >>>>> 0x903f8c0; closed: 0>, pool status Pool size: 10
>> >>> >>> >>>>> Connections in
>> >>> >>> >>>>> pool: 0
>> >>> >>> >>>>> Current Overflow: 36 Max Overflow: 40 Current Checked out
>> >>> >>> >>>>> connections: 46
>> >>> >>> >>>>> 2018-05-08T21:39:59.214+08:00 localhost nova-conductor DEBUG
>> >>> >>> >>>>> [pid:55528]
>> >>> >>> >>>>> [MainThread] [tid:151519312] [log.py:104 debug]
>> >>> >>> >>>>> [req-a78075b5-c55c-40c4-86d0-65957e89cb56] Connection
>> >>> >>> >>>>> <connection
>> >>> >>> >>>>> object at
>> >>> >>> >>>>> 0x903f8c0; closed: 0> checked out from pool
>> >>> >>> >>>>>
>> >>> >>> >>>>>
>> >>> >>> >>>>> It seems that when we create_connection in
>> >>> >>> >>>>> QueuePool._do_get, we
>> >>> >>> >>>>> may
>> >>> >>> >>>>> not
>> >>> >>> >>>>> get the lock _pool.mutex(~Queue.mutex).
>> >>> >>> >>>>>
>> >>> >>> >>>>>
>> >>> >>> >>>>>
>> >>> >>> >>>>> 在 2018年5月7日星期一 UTC+8下午8:41:14，Mike Bayer写道：
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> On Mon, May 7, 2018, 7:01 AM <jiajun...@gmail.com> wrote:
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> We added coroutine_id in psycopg2, and found that two
>> >>> >>> >>>>>>> coroutine
>> >>> >>> >>>>>>> use the
>> >>> >>> >>>>>>> same connection before the pthread_mutex_lock released.
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> Maybe something in the connection pool goes wrong?
>> >>> >>> >>>>>>> [pid] [coroutine_id] msg
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> This seems unlikely.   Openstack uses the SQLAlchemy
>> >>> >>> >>>>>> connection
>> >>> >>> >>>>>> pool
>> >>> >>> >>>>>> with eventlet patching in conjunction with the MySQL
>> >>> >>> >>>>>> drivers
>> >>> >>> >>>>>> and
>> >>> >>> >>>>>> nothing
>> >>> >>> >>>>>> like that happens.   The biggest issue SQLAlchemy had with
>> >>> >>> >>>>>> greenlets is that
>> >>> >>> >>>>>> a timed out greenlet would throw GreenletExit and cause a
>> >>> >>> >>>>>> dirty
>> >>> >>> >>>>>> connection
>> >>> >>> >>>>>> to go back into the pool, that was fixed in 1.2.
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs
>> >>> >>> >>>>>>> conn
>> >>> >>> >>>>>>> 0x94122f0,
>> >>> >>> >>>>>>> cb 0x23ad320
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE
>> >>> >>> >>>>>>> 0x94122f0
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>> >>>>>>> 0x94122f0,
>> >>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>> >>>>>>> 0x94122f0,
>> >>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0,
>> >>> >>> >>>>>>> autocommit
>> >>> >>> >>>>>>> = 0,
>> >>> >>> >>>>>>> status = 2
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner
>> >>> >>> >>>>>>> 49174
>> >>> >>> >>>>>>> __lock 1
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> Below is the logs grep by coroutine id:
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]finish send query, before psyco_wait,
>> >>> >>> >>>>>>> conn
>> >>> >>> >>>>>>> 0x94122f0
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]before have_wait_callback conn
>> >>> >>> >>>>>>> 0x94122f0
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]before PyObject_CallFunctionObjArgs
>> >>> >>> >>>>>>> conn
>> >>> >>> >>>>>>> 0x94122f0,
>> >>> >>> >>>>>>> cb 0x23ad320
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: status = 2, conn 0x94122f0
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status = ASYNC_WRITE
>> >>> >>> >>>>>>> 0x94122f0
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: poll writing
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db730]conn_poll: async_status -> ASYNC_READ
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> -----
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new unnamed cursor
>> >>> >>> >>>>>>> for
>> >>> >>> >>>>>>> connection
>> >>> >>> >>>>>>> at 0x8de2d30
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: init cursor object at
>> >>> >>> >>>>>>> 0xa6c2650
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: parameters: name =
>> >>> >>> >>>>>>> (null),
>> >>> >>> >>>>>>> conn =
>> >>> >>> >>>>>>> 0x8de2d30
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]cursor_setup: good cursor object at
>> >>> >>> >>>>>>> 0xa6c2650,
>> >>> >>> >>>>>>> refcnt = 1
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]psyco_conn_cursor: new cursor at
>> >>> >>> >>>>>>> 0xa6c2650:
>> >>> >>> >>>>>>> refcnt =
>> >>> >>> >>>>>>> 1
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>> >>>>>>> 0x94122f0,
>> >>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]before EXC_IF_ASYNC_IN_PROGRESS conn
>> >>> >>> >>>>>>> 0x94122f0,
>> >>> >>> >>>>>>> async_cursor 0x881ac00
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]pq_abort: enter pgconn = 0x94122f0,
>> >>> >>> >>>>>>> autocommit
>> >>> >>> >>>>>>> = 0,
>> >>> >>> >>>>>>> status = 2
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> [49174] [0xa5db4b0]before lock pgconn = 0x94122f0, owner
>> >>> >>> >>>>>>> 49174
>> >>> >>> >>>>>>> __lock 1
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>>
>> >>> >>> >>>>>>> 在 2018年4月28日星期六 UTC+8下午4:07:34，jiajun...@gmail.com写道：
>> >>> >>> >>>>>>>>
>> >>> >>> >>>>>>>> We reproduced this problem and added logs in psycopg2,
>> >>> >>> >>>>>>>> found
>> >>> >>> >>>>>>>> a
>> >>> >>> >>>>>>>> confusing thing and reported it to psycopg2.
>> >>> >>> >>>>>>>>
>> >>> >>> >>>>>>>> It seems sqlalchemy & psycopg2 stack flow is:
>> >>> >>> >>>>>>>> 1.from sqlalchemy to psycopg: pq_execute/pq_commit
>> >>> >>> >>>>>>>> 2.from psycopg to sqlalchemy PyWeakref_NewRef(conn)
>> >>> >>> >>>>>>>> 3.get exception in sqlalchemy, and do_rollback
>> >>> >>> >>>>>>>> 4.from sqlalchemy to psycopg: pq_abort
>> >>> >>> >>>>>>>> 5.psycopg get pthread_mutex_lock deadlock
>> >>> >>> >>>>>>>>
>> >>> >>> >>>>>>>> What we have done to reproduce it is: stop master
>> >>> >>> >>>>>>>> pg-server
>> >>> >>> >>>>>>>> and
>> >>> >>> >>>>>>>> promote the slave pg-server to master, with moving FIP
>> >>> >>> >>>>>>>> from
>> >>> >>> >>>>>>>> old
>> >>> >>> >>>>>>>> master to
>> >>> >>> >>>>>>>> slave. At the same time, let nova-conductor do quite a
>> >>> >>> >>>>>>>> lot db
>> >>> >>> >>>>>>>> query
>> >>> >>> >>>>>>>> requests.
>> >>> >>> >>>>>>>>
>> >>> >>> >>>>>>>> [1] https://github.com/psycopg/psycopg2/issues/703
>> >>> >>> >>>>>>>>
>> >>> >>> >>>>>>>> 在 2018年4月23日星期一 UTC+8下午9:45:04，Mike Bayer写道：
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> On Mon, Apr 23, 2018 at 9:03 AM,  <jiajun...@gmail.com>
>> >>> >>> >>>>>>>>> wrote:
>> >>> >>> >>>>>>>>> > Sorry for reply on this topic.
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > We recently get the same problem in our production
>> >>> >>> >>>>>>>>> > environment.
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > I found a patch in other lib [1], and they added
>> >>> >>> >>>>>>>>> > conn.close()
>> >>> >>> >>>>>>>>> > in
>> >>> >>> >>>>>>>>> > exception
>> >>> >>> >>>>>>>>> > psycopg2.ProgrammingError.
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > Shall we do the same in [2] ?
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> SQLAlchemy does things much more carefully than that, we
>> >>> >>> >>>>>>>>> parse
>> >>> >>> >>>>>>>>> the
>> >>> >>> >>>>>>>>> error message for specific ones that correspond to
>> >>> >>> >>>>>>>>> "connection
>> >>> >>> >>>>>>>>> is no
>> >>> >>> >>>>>>>>> longer usable", we call these "is_disconnect" but it can
>> >>> >>> >>>>>>>>> be
>> >>> >>> >>>>>>>>> any
>> >>> >>> >>>>>>>>> invalidating condition.
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> You can make these yourself, and they can also be made
>> >>> >>> >>>>>>>>> to be
>> >>> >>> >>>>>>>>> part of
>> >>> >>> >>>>>>>>> oslo.db, using the handle_error event:
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> http://docs.sqlalchemy.org/en/latest/core/events.html?highlight=handle_error#sqlalchemy.events.ConnectionEvents.handle_error
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> within oslo.db you would want to propose a change here:
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L387
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>>
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > [1]
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > https://github.com/aio-libs/aiopg/pull/415/files?diff=split
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > [2]
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L1289
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> >
>> >>> >>> >>>>>>>>> > 在 2017年11月13日星期一 UTC+8上午10:44:31，JinRong Cai写道：
>> >>> >>> >>>>>>>>> >>
>> >>> >>> >>>>>>>>> >> Hi  Michael ,
>> >>> >>> >>>>>>>>> >>
>> >>> >>> >>>>>>>>> >> I am using openstack with postgresql which sqlalchemy
>> >>> >>> >>>>>>>>> >> and
>> >>> >>> >>>>>>>>> >> oslo_db
>> >>> >>> >>>>>>>>> >> module
>> >>> >>> >>>>>>>>> >> were used.
>> >>> >>> >>>>>>>>> >> And there are some problems after my pg database
>> >>> >>> >>>>>>>>> >> switched
>> >>> >>> >>>>>>>>> >> over.
>> >>> >>> >>>>>>>>> >>
>> >>> >>> >>>>>>>>> >> Here is my switch over process:
>> >>> >>> >>>>>>>>> >> 1. nova-conductor(python application) is running with
>> >>> >>> >>>>>>>>> >> DB
>> >>> >>> >>>>>>>>> >> connection
>> >>> >>> >>>>>>>>> >> strings point to vip , which is in primary site(A) of
>> >>> >>> >>>>>>>>> >> pg.
>> >>> >>> >>>>>>>>> >> 2. switch VIP from primary(A) to new primary(B)
>> >>> >>> >>>>>>>>> >> 3. switch over pg: shutdown primary(A), promopt
>> >>> >>> >>>>>>>>> >> standby(B) to
>> >>> >>> >>>>>>>>> >> new
>> >>> >>> >>>>>>>>> >> primary.
>> >>> >>> >>>>>>>>> >> 4. nova-conductor is running in the whole process.
>> >>> >>> >>>>>>>>> >>
>> >>> >>> >>>>>>>>> >> After some seconds, I found some nova-conductor
>> >>> >>> >>>>>>>>> >> processes
>> >>> >>> >>>>>>>>> >> are
>> >>> >>> >>>>>>>>> >> hang
>> >>> >>> >>>>>>>>> >> with
>> >>> >>> >>>>>>>>> >> status futex_wait_queue_me, and the status of the
>> >>> >>> >>>>>>>>> >> query
>> >>> >>> >>>>>>>>> >> in DB
>> >>> >>> >>>>>>>>> >> is
>> >>> >>> >>>>>>>>> >> "idle in
>> >>> >>> >>>>>>>>> >> transaction", the transaction was not commited or
>> >>> >>> >>>>>>>>> >> rollbacked!
>> >>> >>> >>>>>>>>> >> I think disconnection was handled in the oslo_db,
>> >>> >>> >>>>>>>>> >> which
>> >>> >>> >>>>>>>>> >> will
>> >>> >>> >>>>>>>>> >> send
>> >>> >>> >>>>>>>>> >> a
>> >>> >>> >>>>>>>>> >> ping(select 1) to DB.
>> >>> >>> >>>>>>>>> >>
>> >>> >>> >>>>>>>>> >> If DB was switchd over, the connection in the pool
>> >>> >>> >>>>>>>>> >> would
>> >>> >>> >>>>>>>>> >> be
>> >>> >>> >>>>>>>>> >> set
>> >>> >>> >>>>>>>>> >> with
>> >>> >>> >>>>>>>>> >> status invalid, and reconnect after next check out.
>> >>> >>> >>>>>>>>> >>
>> >>> >>> >>>>>>>>> >> ###error messages from nova-conductor
>> >>> >>> >>>>>>>>> >> localhost nova-conductor ERROR [pid:36365]
>> >>> >>> >>>>>>>>> >> [MainThread]
>> >>> >>> >>>>>>>>> >> [tid:122397712]
>> >
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Re: long transaction after database switched over

Reply via email to