Re: [sqlalchemy] Using a connection pool with multiple processes on the same machine

Maximilian Roos Fri, 22 Jan 2016 15:25:36 -0800

>
> A python process has absolutely no way to affect the state of another 
> python process without explicit communication channels (e.g. pipes) set 
> up between those processes, other than whatever resources were mutually 
> inherited from the parent, which can include filehandles (such as socket 
> connections to databases).



That's what I first thought... Then we had the problems we have, and I 
figured there must be some cross-process contamination. It makes sense that 
there needs to communication for there to be contamination.

We'll investigate further. It could be something like unwittingly pickling 
the engine and sending it across processes.

Thanks again

On Friday, January 22, 2016 at 10:03:38 AM UTC-5, Michael Bayer wrote:
>
>
>
> On 01/22/2016 01:23 AM, Maximilian Roos wrote: 
> > Great, thanks for the reply Mike. 
> > 
> > I looked further - Celery *is* using fork / multiprocessing, but the 
> > forking occurs before any import of our libraries / sqlalchemy / 
> > create_engine. Is there a risk of reusing connections in that state? 
> > 
> > Can I confirm that the 'cross-effects' between processes on the same 
> > machine include those the docs details between threads - and so require 
> > a scoped_session (or something that achieves the same goal)? 
>
> A python process has absolutely no way to affect the state of another 
> python process without explicit communication channels (e.g. pipes) set 
> up between those processes, other than whatever resources were mutually 
> inherited from the parent, which can include filehandles (such as socket 
> connections to databases). 
>
> If you are seeing the effects of connection sharing between processes 
> this suggests that database connections are being created before a fork 
> occurs.   The recipe in the docs for asserting this is not the case is 
> very handy for guaranteeing this is not possible. 
>
> > 
> > On Thursday, January 21, 2016 at 11:01:41 PM UTC-5, Michael Bayer wrote: 
> > 
> > 
> > 
> >     On 01/21/2016 08:43 PM, Maximilian Roos wrote: 
> >     > We're using celery, a job distribution package. On a single 
> machine, 
> >     > there are 20+ celery workers running, each with their own Python 
> >     > process. We had some issues with the processes attempting to use 
> the 
> >     > same SQLAlchemy connections (I think because they are referenced 
> >     by file 
> >     > descriptors, and so can cross process boundaries?), and so we're 
> now 
> >     > using a NullPool. 
> >     > 
> >     > But is there a way of doing this more efficiently than a NullPool? 
> > 
> >     yes!   Though if NullPool is doing the job, there might not be a 
> >     need to 
> >     switch. 
> > 
> >     > 
> >     > There's are two suggestions in the docs 
> >     > 
> >     <
> http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing
>  
> >     <
> http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing>>
>  
>
> > 
> >     > : 
> >     > 
> >     >   * Dispose the connection as soon as the other process receives 
> it. 
> >     >     This wouldn't work with celery, since the workers are started 
> on 
> >     >     their own, not forked from another python process. 
> > 
> >     If the worker isn't forked from another Python process, then if you 
> >     call 
> >     create_engine() in that process, that's a brand new engine.  You 
> >     wouldn't be re-using connection across processes in this case.   
> >     Only if 
> >     forking is happening, can the problem you refer to occur.  If 
> >     there's no 
> >     forking, then the process does not inherit any state. 
> > 
> >     However, if Celery is using Python multiprocessing, it's doing 
> forking 
> >     and there is a parent Python process. 
> > 
> > 
> >     >   * Implement a listener that invalidates connections created in 
> >     another 
> >     >     process. I'm unclear how this works and whether that would be 
> >     >     practical in this context. 
> > 
> >     this is a simple recipe and is straight from what we do in 
> Openstack, 
> >     and again, it only makes sense if there's forking going on. If 
> >     Celery is 
> >     starting brand new Python processes that don't inherit any state 
> from a 
> >     parent process, and the worker itself also isn't forking, then 
> there's 
> >     nothing to do. 
> > 
> >      Specifically, given the number of 
> >     >     processes we have running on each machine, is the pool that a 
> >     >     process is selecting & testing from shared between all 
> >     processes on 
> >     >     that machine? 
> > 
> >     it's not, since the pool is an application-level pool.  It exists 
> >     within 
> >     the space of a Python interpreter and therefore is local to a 
> process. 
> >     So when you have a lot of processes, pooling of this style is hard 
> >     because the pooling cannot coordinate between processes. 
> > 
> >     For Postgresql backends, there's already a solution, which is to use 
> >     PGBouncer: https://wiki.postgresql.org/wiki/PgBouncer 
> >     <https://wiki.postgresql.org/wiki/PgBouncer>  For PG, I'd stick 
> >     with NullPool and just use PgBouncer as the pool.   The issue with 
> >     PgBouncer is that it's entirely Postgresql specific and there aren't 
> >     really similar solutions for other backends. 
> > 
> >     For MySQL and others, there's not really a software solution out 
> there 
> >     with the possible exception of SQLRelay, but SQLRelay is kind of an 
> odd 
> >     quasi-commercial product with very complex licensing and an 
> unfortunate 
> >     home on Sourceforge. 
> > 
> >     One of the things I'm doing at Red Hat is attempting to address this 
> >     issue in some way that addresses MySQL and backends in general.   
> Some 
> >     approaches I'm looking into include building a coordination service 
> >     that 
> >     communicates with the otherwise process-local pools, building a 
> system 
> >     that somehow coordinates with an HAProxy service, or the last resort 
> >     which is that I build a brand-new PGBouncer / SQLRelay-like solution 
> >     that isn't hardwired to Postgresql and isn't burdened by weird 
> >     commercial licensing like SQLRelay. 
> > 
> >     If a pool is shared across 20 processes, the chances 
> >     >     that a connection chosen at random would have been created in 
> >     that 
> >     >     process is fairly low... 
> >     > 
> >     > It's also possible we're woefully misunderstanding how this works; 
> >     > corrections appreciated. 
> > 
> >     I haven't worked with Celery so you might want to get a clear answer 
> on 
> >     if the create_engine() you're calling is in fact in a parent process 
> >     that gets inherited (e.g. forked) or if it really is called 
> >     individually 
> >     in totally distinct Python processes.  If the latter, you wouldn't 
> have 
> >     any shared connections between processes. 
> > 
> > 
> >     > 
> >     > Many thanks, 
> >     > Max 
> >     > 
> >     > -- 
> >     > You received this message because you are subscribed to the Google 
> >     > Groups "sqlalchemy" group. 
> >     > To unsubscribe from this group and stop receiving emails from it, 
> >     send 
> >     > an email to sqlalchemy+...@googlegroups.com <javascript:> 
> >     > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:> 
> <javascript:>>. 
> >     > To post to this group, send email to sqlal...@googlegroups.com 
> >     <javascript:> 
> >     > <mailto:sqlal...@googlegroups.com <javascript:>>. 
> >     > Visit this group at https://groups.google.com/group/sqlalchemy 
> >     <https://groups.google.com/group/sqlalchemy>. 
> >     > For more options, visit https://groups.google.com/d/optout 
> >     <https://groups.google.com/d/optout>. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sqlalchemy+...@googlegroups.com <javascript:> 
> > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:>>. 
> > To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:> 
> > <mailto:sqlal...@googlegroups.com <javascript:>>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Using a connection pool with multiple processes on the same machine

Reply via email to