Re: [sqlalchemy] hang on connect in forked process

Uri Okrent Sun, 21 Feb 2016 09:48:07 -0800

I did a bit more digging.  It looks like utils.counter() is used in two 
places that are causing me hangs in multiprocess subprocesses. In 
instances() in order to assign a unique runid to the QueryContext, and in 
session instantiation in order to assign a unique hash_key to the session.


In both places the "counter" seems to simply be supplying a unique 
identifier.  So here is my monkey-patch that seems to be working:

import uuid
from sqlalchemy.orm import loading, session
loading._new_runid = uuid.uuid4
session._new_sessionid = uuid.uuid4

as far as I can see so far uuid appears to be both thread (as of 2.6.1 
http://legacy.python.org/download/releases/2.6.1/NEWS.txt) and multiprocess 
safe (purely anecdotal based on my experience of the last half hour).  I'll 
leave my test running for a few hours, but using the counter() method to 
generate unique IDs I was reliably able to get a hung process in under ten 
minutes, so it's looking good so far.

I'm not sure what the broader implications of such a change are 
(safety/performance/etc), but I'm wondering if you would consider merging 
something like this into sqlalchemy.

On Thursday, February 18, 2016 at 6:24:40 PM UTC-8, Mike Bayer wrote:
>
>
>
> On 02/18/2016 01:39 PM, Uri Okrent wrote: 
> > Looks like forking from a thread causes other issues.  I think I've 
> > resolved the hang in psycopg2 by creating a brand new engine in the 
> > forked subprocess, but now I'm getting occasional hangs here: 
> > 
> > #1 Waiting for a lock (e.g. GIL) 
> > #2 <built-in method acquire of thread.lock object at remote 
> 0x7fc061c17b58> 
> > #4 file 
> > '/usr/lib64/python2.6/site-packages/sqlalchemy/util/langhelpers.py', in 
> > '_next' 
> > #8 file '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/loading.py', 
> > in 'instances' 
> > #16 file '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> > in '__getitem__' 
> > #24 file '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> > in 'first' 
> > 
> > At this point: 
> > | 
> > def counter(): 
> >      """Return a threadsafe counter function.""" 
> > 
> >      lock = compat.threading.Lock() 
> >      counter = itertools.count(1) 
> > 
> >      # avoid the 2to3 "next" transformation... 
> >      def _next(): 
> >          lock.acquire() 
> >          try: 
> >              return next(counter) 
> >          finally: 
> >              lock.release() 
> > 
> >      return _next 
> > | 
> > 
> > 
> > Not sure a process pool will solve the issue since I can't necessarily 
> > tell the volume of concurrent queries beforehand.  I'm wondering if 
> > there's a way to clear the lock manually in the forked process, or 
> > somehow protect this section when forking.  Either one is probably a 
> > monkey patch... 
>
> if you're using python multiprocessing, that thing has a lot of 
> deadlock-ish things in it especially in an older Python like 2.6 - it 
> uses threads in various ways to communicate with process pools and such. 
>    Can't really say why your counter is locking, would have to at least 
> see how you're using it and also I don't quite get how this counter 
> function is interacting with query.first().   But locks are not 
> "interruptable" unless you kill the whole thread in which it runs, so 
> you'd need to put a timeout on it. 
>
>
>
>
>
> > 
> > On Wednesday, February 17, 2016 at 10:41:00 AM UTC-8, Mike Bayer wrote: 
> > 
> > 
> > 
> >     On 02/17/2016 11:33 AM, Uri Okrent wrote: 
> >      > Maybe this is a psycopg question and if so please say so. 
> >      > 
> >      > I have a multi-threaded server which maintains a thread-pool (and 
> a 
> >      > corresponding connection pool) for servicing requests.  In order 
> to 
> >      > mitigate python's high-water-mark memory usage behavior for large 
> >      > queries, I'm attempting to handle queries in particular using a 
> >     forked 
> >      > subprocess from the request thread. 
> >      > 
> >      > I'm using the connection invalidation recipe described here (the 
> >     second 
> >      > one that adds listeners to the Pool): 
> >      > 
> >     
> http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing
>  
> >     <
> http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing>
>  
>
> > 
> >      > 
> >      > It seems to be working correctly -- that is, I can see that the 
> >     child 
> >      > process is indeed creating a new connection.  However, I'm still 
> >      > experiencing intermittent hangs in the child process during 
> >     connection 
> >      > creation.  I've gotten a stack trace using gdb, and I think I 
> >     understand 
> >      > what is going on but I'm not sure how to protect the critical 
> >     section. 
> >      > 
> >      > It looks like threads creating connections in the parent process 
> >     acquire 
> >      > some threading synchronization primitive inside psycopg's 
> _connect 
> >      > function (that's in c so I didn't see the actual source).  This 
> >      > apparently occurs occasionally at the same time as the fork, so 
> >     that the 
> >      > child process never sees the primitive release in the parent 
> >     process and 
> >      > hangs forever.  Interestingly, hangs stop after the server has 
> been 
> >      > running for a while, presumably because the parent process is 
> >     warmed up 
> >      > and has a full connection pool, and is no longer creating 
> >     connections. 
> >      > 
> >      > Here is my stack on a hung process: 
> >      > #17 <built-in function _connect> 
> >      > #19 file 
> >     '/usr/lib64/python2.6/site-packages/psycopg2/__init__.py', in 
> >      > 'connect' 
> >      > #24 file 
> >      > 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py', 
> in 
> >      > 'connect' 
> >      > #29 file 
> >      > 
> >     
> '/usr/lib64/python2.6/site-packages/sqlalchemy/engine/strategies.py', in 
> > 
> >      > 'connect' 
> >      > #33 file '/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py', 
> in 
> >      > '__connect' 
> >      > #36 file '/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py', 
> in 
> >      > 'get_connection' 
> >      > #39 file '/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py', 
> in 
> >      > 'checkout' 
> >      > #43 file '/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py', 
> in 
> >      > '_checkout' 
> >      > #47 file '/usr/lib64/python2.6/site-packages/sqlalchemy/pool.py', 
> in 
> >      > 'connect' 
> >      > #50 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py', 
> >      > in '_wrap_pool_connect' 
> >      > #54 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py', 
> >      > in 'contextual_connect' 
> >      > #58 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py', 
> >      > in '_connection_for_bind' 
> >      > #61 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py', 
> >      > in '_connection_for_bind' 
> >      > #65 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py', 
> >      > in 'connection' 
> >      > #70 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> >      > in '_connection_from_session' 
> >      > #74 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> >      > in '_execute_and_instances' 
> >      > #77 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> >      > in '__iter__' 
> >      > #91 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> >      > in '__getitem__' 
> >      > #99 file 
> >     '/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py', 
> >      > in 'first' 
> >      > 
> >      > I'm using sqlalchemy 1.0.12 and psycopg 2.5.3 
> >      > 
> >      > My quick and dirty fix would be to fill the connection pool in 
> the 
> >      > parent process by force before servicing requests, 
> > 
> >     well I'd not want to transfer a psycopg2 connection from a parent to 
> a 
> >     child fork, because now that same filehandle is in both processes 
> and 
> >     you'll get unsafe concurrent access on it. 
> > 
> >     I've used multiprocessing with psycopg2 for years in a wide variety 
> of 
> >     scenarios and I've never seen it hanging on the actual 
> psycopg2.connect 
> >     call.   But perhaps that's because I've never called fork() from 
> inside 
> >     a thread that is not the main thread - if that is what's triggering 
> it 
> >     here, I'd use a pattern such as a process pool or similar where the 
> >     forking is done from the main thread ahead of time. 
> > 
> > 
> > 
> > 
> >     but that is a hack, 
> >      > and in case of an invalidated connection the server would be 
> >     susceptible 
> >      > to the issue again while recreating the invalid connection in the 
> >     parent 
> >      > process. 
> >      > I apparently need to synchronize my fork in one thread with 
> >     connections 
> >      > being created in others, but I'm not sure how to do that.  Any 
> >     pointers 
> >      > would be great. 
> >      > 
> >      > TIA, 
> >      > Uri 
> >      > 
> >      > -- 
> >      > You received this message because you are subscribed to the 
> Google 
> >      > Groups "sqlalchemy" group. 
> >      > To unsubscribe from this group and stop receiving emails from it, 
> >     send 
> >      > an email to sqlalchemy+...@googlegroups.com <javascript:> 
> >      > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:> 
> <javascript:>>. 
> >      > To post to this group, send email to sqlal...@googlegroups.com 
> >     <javascript:> 
> >      > <mailto:sqlal...@googlegroups.com <javascript:>>. 
> >      > Visit this group at https://groups.google.com/group/sqlalchemy 
> >     <https://groups.google.com/group/sqlalchemy>. 
> >      > For more options, visit https://groups.google.com/d/optout 
> >     <https://groups.google.com/d/optout>. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sqlalchemy+...@googlegroups.com <javascript:> 
> > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:>>. 
> > To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:> 
> > <mailto:sqlal...@googlegroups.com <javascript:>>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] hang on connect in forked process

Reply via email to