Great, thanks for the reply Mike. Given that if the processes are started separately, are the possible 'cross-effects' between processes on the same machine the same as those between threads within the same process - and so solved with scoped_session?
On Thursday, January 21, 2016 at 11:01:41 PM UTC-5, Michael Bayer wrote: > > > > On 01/21/2016 08:43 PM, Maximilian Roos wrote: > > We're using celery, a job distribution package. On a single machine, > > there are 20+ celery workers running, each with their own Python > > process. We had some issues with the processes attempting to use the > > same SQLAlchemy connections (I think because they are referenced by file > > descriptors, and so can cross process boundaries?), and so we're now > > using a NullPool. > > > > But is there a way of doing this more efficiently than a NullPool? > > yes! Though if NullPool is doing the job, there might not be a need to > switch. > > > > > There's are two suggestions in the docs > > < > http://docs.sqlalchemy.org/en/latest/core/pooling.html#using-connection-pools-with-multiprocessing> > > > > : > > > > * Dispose the connection as soon as the other process receives it. > > This wouldn't work with celery, since the workers are started on > > their own, not forked from another python process. > > If the worker isn't forked from another Python process, then if you call > create_engine() in that process, that's a brand new engine. You > wouldn't be re-using connection across processes in this case. Only if > forking is happening, can the problem you refer to occur. If there's no > forking, then the process does not inherit any state. > > However, if Celery is using Python multiprocessing, it's doing forking > and there is a parent Python process. > > > > * Implement a listener that invalidates connections created in another > > process. I'm unclear how this works and whether that would be > > practical in this context. > > this is a simple recipe and is straight from what we do in Openstack, > and again, it only makes sense if there's forking going on. If Celery is > starting brand new Python processes that don't inherit any state from a > parent process, and the worker itself also isn't forking, then there's > nothing to do. > > Specifically, given the number of > > processes we have running on each machine, is the pool that a > > process is selecting & testing from shared between all processes on > > that machine? > > it's not, since the pool is an application-level pool. It exists within > the space of a Python interpreter and therefore is local to a process. > So when you have a lot of processes, pooling of this style is hard > because the pooling cannot coordinate between processes. > > For Postgresql backends, there's already a solution, which is to use > PGBouncer: https://wiki.postgresql.org/wiki/PgBouncer For PG, I'd stick > with NullPool and just use PgBouncer as the pool. The issue with > PgBouncer is that it's entirely Postgresql specific and there aren't > really similar solutions for other backends. > > For MySQL and others, there's not really a software solution out there > with the possible exception of SQLRelay, but SQLRelay is kind of an odd > quasi-commercial product with very complex licensing and an unfortunate > home on Sourceforge. > > One of the things I'm doing at Red Hat is attempting to address this > issue in some way that addresses MySQL and backends in general. Some > approaches I'm looking into include building a coordination service that > communicates with the otherwise process-local pools, building a system > that somehow coordinates with an HAProxy service, or the last resort > which is that I build a brand-new PGBouncer / SQLRelay-like solution > that isn't hardwired to Postgresql and isn't burdened by weird > commercial licensing like SQLRelay. > > If a pool is shared across 20 processes, the chances > > that a connection chosen at random would have been created in that > > process is fairly low... > > > > It's also possible we're woefully misunderstanding how this works; > > corrections appreciated. > > I haven't worked with Celery so you might want to get a clear answer on > if the create_engine() you're calling is in fact in a parent process > that gets inherited (e.g. forked) or if it really is called individually > in totally distinct Python processes. If the latter, you wouldn't have > any shared connections between processes. > > > > > > Many thanks, > > Max > > > > -- > > You received this message because you are subscribed to the Google > > Groups "sqlalchemy" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to sqlalchemy+...@googlegroups.com <javascript:> > > <mailto:sqlalchemy+unsubscr...@googlegroups.com <javascript:>>. > > To post to this group, send email to sqlal...@googlegroups.com > <javascript:> > > <mailto:sqlal...@googlegroups.com <javascript:>>. > > Visit this group at https://groups.google.com/group/sqlalchemy. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.