Re: [sqlalchemy] Advice for handling QueuePool limit exceptions (3o7r)

Mike Bayer Wed, 06 May 2020 09:01:21 -0700


On Wed, May 6, 2020, at 11:24 AM, Colton Allen wrote:
> Specifically: https://docs.sqlalchemy.org/en/13/errors.html#error-3o7r
> 
> I think I've got a good handle on what the problem is. I just don't have the 
> experience to know how to solve it effectively and with confidence.
> 
> Just some of my application's stats:
> 
> - Postgres database hosted with Amazon RDS (currently a t3.small).
> - Default QueuePool limits: 5 + 10 overflow.
> - Using flask + celery + gevent + concurrency of 1,000.
> 
> I only get this error when I'm saturating my celery queue. So I'm thinking I 
> can either reduce the concurrency or just add more connections. I'm hesitant 
> to reduce concurrency because the goal is to have even more throughput.


1000 is very common but in my experience it is very high. having a pool of 
worker processes with concurrency of 100-200 per process is more realistic.

also you have to consider that if you are trying to get throughput by having 
1000 jobs all present at once and all waiting for results, then your database 
needs to do that too. so yes, you have to allow that many connections at the 
same time and also tune your database to be ready for that too.


> 
> 
> 
>  I could add more database connections (I think I have ~170 available in 
> total) but I'm unsure of the implications that action has (my app is the only 
> app that uses the database).

if 170 is your total available then I don't see how you can have 1000 greenlets 
at once unless they only occasionally use the database. also, to get the error 
you are seeing, you not only have to request more connections than are actually 
available, but your application also has to **time out** for 30 seconds by 
default, which indicates you have greenlets that are waiting for 30 seconds. 

the "1000 greenlet" number is pretty familiar to me in the Openstack world but 
my experience has been that a value this high tends to be extremely optimistic. 
It's likely better to start with small settings for everything, then do load 
testing on the outside , and increase settings to get the desired throughput 
via load testing. if numbers like number of greenlets are set too high, then 
you have jobs or requests starting up, being placed into the system, and then 
just waiting for unavailable resources. 

I don't know if Celery also supports a multi-process model, if you have 
multiple processes, you are better off with new jobs / requesets being rejected 
by a process that is already handling a few hundred greenlets and instead being 
picked up by another process. I am more familiar with this calculus from the 
web request side of things and not as much with how Celery handles this.


> 
> If in the future I decide to add more concurrency is there some sort of 
> equation I should follow. Like "database connections * n = concurrency limit".

this depends highly on what your appliciation does as well as what tolerance 
you have for jobs / requests to be waiting. if every job uses a single database 
connection from start to finish, then you need to be able to have a database 
connection open for every job, or at least enough so that jobs that are waiting 
for a connection don't wait longer than the configured timeout. or you could 
have a small number of connections and a huge timeout so that jobs just wait a 
long time. I tend to think about this visually in terms of things waiting to 
access some resource that is in scarce supply.


> 
> How does sqlalchemy decide to create a new connection(I'm using 
> *scoped_session* scoped to flask's *_app_ctx_stack.__ident_func__*)? Does it 
> happen every time *scoped_session()* is called?

scoped_session() is a constructor that creates a new Session() object, or if 
one is already associated with the current "scope" (like a greenlet or thread), 
it returns that. Within the Session() no database connection is procured when 
it is first constructed. Once the Session is asked to do a database query, it 
requests a connection from the engine. Assuming the session is configured in 
its default mode of non-autocommit, it holds onto this connection until the 
rollback(), commit(), or close() method is called.


> 
> Let's say I want to scale up to a million concurrent users (as unlikely as 
> that is) surely I can't have 100k active database connections on a single 
> database. 

probably not

> 
> 
> I'd have to divide them among a couple slave databases. So then how does the 
> master database handle the write load? Would I start putting my write 
> operations into a queue and batching them?

you are here on the path to seeking higher tps. This is the whole game really, 
and like anything else there is a whole world of practices and techniques to 
work this out. in MySQL for example we have a cluster called Galera which does 
multimaster clustering. But it still serializes transaction commits across the 
cluster, so in that regard it's maybe more of an HA solution rather than a 
performance boost. I think the primary two hammers we can use to get TPS are 1. 
vertical scaling and 2. partitioning. #1 will get you pretty far and #2 will 
get you the rest of the way. Googing for "tps postgresql" and "tps databases" 
leads to a lot of blogs about CPUs and whatnot. For partitioning, PostgresSQL 
offers native partitioning features that might be worth looking into. I tried 
to find some good intros but they are mostly in the weeds of something very 
specific, so the overall picture on Wikipedia might be useful: 
https://en.wikipedia.org/wiki/Partition_(database)





> 
> Sorry for the brain dump but I'm very lost!
> 

> --
>  SQLAlchemy - 
>  The Python SQL Toolkit and Object Relational Mapper
> 
> http://www.sqlalchemy.org/
> 
>  To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
>  --- 
>  You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
>  To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/2ec7fcb5-1ad2-432b-a11f-ae6b5e89ee74%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/2ec7fcb5-1ad2-432b-a11f-ae6b5e89ee74%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/8b90c6a7-2528-41a5-8494-97a607a16929%40www.fastmail.com.

Re: [sqlalchemy] Advice for handling QueuePool limit exceptions (3o7r)

Reply via email to