Re: [sqlalchemy] Using a connection pool with multiple processes on the same machine

Maximilian Roos Thu, 21 Jan 2016 21:18:20 -0800

Great, thanks for the reply Mike.

Given that if the processes are started separately, are the possible 
'cross-effects' between processes on the same machine the same as those 
between threads within the same process - and so solved with scoped_session?


On Thursday, January 21, 2016 at 11:01:41 PM UTC-5, Michael Bayer wrote:
>
>
>
> On 01/21/2016 08:43 PM, Maximilian Roos wrote: 
> > We're using celery, a job distribution package. On a single machine, 
> > there are 20+ celery workers running, each with their own Python 
> > process. We had some issues with the processes attempting to use the 
> > same SQLAlchemy connections (I think because they are referenced by file 
> > descriptors, and so can cross process boundaries?), and so we're now 
> > using a NullPool. 
> > 
> > But is there a way of doing this more efficiently than a NullPool? 
>
> yes!   Though if NullPool is doing the job, there might not be a need to 
> switch. 
>
> > 
> > There's are two suggestions in the docs 
> > <
> http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing>
>  
>
> > : 
> > 
> >   * Dispose the connection as soon as the other process receives it. 
> >     This wouldn't work with celery, since the workers are started on 
> >     their own, not forked from another python process. 
>
> If the worker isn't forked from another Python process, then if you call 
> create_engine() in that process, that's a brand new engine.  You 
> wouldn't be re-using connection across processes in this case.   Only if 
> forking is happening, can the problem you refer to occur.  If there's no 
> forking, then the process does not inherit any state. 
>
> However, if Celery is using Python multiprocessing, it's doing forking 
> and there is a parent Python process. 
>
>
> >   * Implement a listener that invalidates connections created in another 
> >     process. I'm unclear how this works and whether that would be 
> >     practical in this context. 
>
> this is a simple recipe and is straight from what we do in Openstack, 
> and again, it only makes sense if there's forking going on. If Celery is 
> starting brand new Python processes that don't inherit any state from a 
> parent process, and the worker itself also isn't forking, then there's 
> nothing to do. 
>
>  Specifically, given the number of 
> >     processes we have running on each machine, is the pool that a 
> >     process is selecting & testing from shared between all processes on 
> >     that machine? 
>
> it's not, since the pool is an application-level pool.  It exists within 
> the space of a Python interpreter and therefore is local to a process. 
> So when you have a lot of processes, pooling of this style is hard 
> because the pooling cannot coordinate between processes. 
>
> For Postgresql backends, there's already a solution, which is to use 
> PGBouncer: https://wiki.postgresql.org/wiki/PgBouncer  For PG, I'd stick 
> with NullPool and just use PgBouncer as the pool.   The issue with 
> PgBouncer is that it's entirely Postgresql specific and there aren't 
> really similar solutions for other backends. 
>
> For MySQL and others, there's not really a software solution out there 
> with the possible exception of SQLRelay, but SQLRelay is kind of an odd 
> quasi-commercial product with very complex licensing and an unfortunate 
> home on Sourceforge. 
>
> One of the things I'm doing at Red Hat is attempting to address this 
> issue in some way that addresses MySQL and backends in general.   Some 
> approaches I'm looking into include building a coordination service that 
> communicates with the otherwise process-local pools, building a system 
> that somehow coordinates with an HAProxy service, or the last resort 
> which is that I build a brand-new PGBouncer / SQLRelay-like solution 
> that isn't hardwired to Postgresql and isn't burdened by weird 
> commercial licensing like SQLRelay. 
>
> If a pool is shared across 20 processes, the chances 
> >     that a connection chosen at random would have been created in that 
> >     process is fairly low... 
> > 
> > It's also possible we're woefully misunderstanding how this works; 
> > corrections appreciated. 
>
> I haven't worked with Celery so you might want to get a clear answer on 
> if the create_engine() you're calling is in fact in a parent process 
> that gets inherited (e.g. forked) or if it really is called individually 
> in totally distinct Python processes.  If the latter, you wouldn't have 
> any shared connections between processes. 
>
>
> > 
> > Many thanks, 
> > Max 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to sqlalchemy+...@googlegroups.com <javascript:> 
> > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:>>. 
> > To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:> 
> > <mailto:sqlal...@googlegroups.com <javascript:>>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Using a connection pool with multiple processes on the same machine

Reply via email to