Re: [sqlalchemy] SQLAlchemy + async

Fantix King Tue, 05 Sep 2017 01:35:43 -0700

>
> > Hence my question: what would 
> > happen if I try to use blocking Sqlalchemy ORM in an asyncio app? I 
> mean, I 
> > need async because of slow endpoints, for example doing HTTP requests to 
> 3rd 
> > parties. 
>
> Yeah I would offload that work into a queue and separate the CRUD part 
> of the application from the "make lots of slow backend requests" part 
> of it.



Yes! Use Celery for example. The CRUD part is normally the core business 
logic
which you don't want it broken by e.g. newly hired junior developers. It is 
both safe
and fast (fast as in coding and fast as in executing) to keep it in a 
blocking model,
if applicable, comparing to asynchronous programming. On the other hand, 
the I/O
bound code base is relatively less changing normally, and you can use 
Celery or
asyncio or multithreading or multiprocessing or whatever works because it 
is isolated
to be a simple task with a clear target.
 

> Explicit async and ORMs are not compatible with each other 
> from a programming point of view - the philosophy of explicit async is 
> that no IO should ever occur without the programmer typing out a bunch 
> of boilerplate saying that it is OK to do IO, whereas the ORM's main 
> purpose is to transparently fetch and persist data *without* 
> boilerplate. 
>

Exactly! That's also what I wanted to try with GINO - to have an explicit 
yet
objective data interface at the same time, and see what it could bring. So 
far
in the last two months, basic CRUD seemed working well. But when it comes
to relationships, things became harder, and probably need a different way 
out.


> Additionally, if you mix up making lots of slow HTTP requests to 
> backends with your CRUD, you also have database transactions being 
> held open for a long time just waiting - that will require many more 
> database connections which are fairly expensive in Postgresql as well 
> as idle transactions locking rows, which you want to avoid. 
>

Couldn't agree more on this. It is probably the same reason why the
Pyramid user suggested not to use implicit async (gevent) and ORM at the
same time - when you think it is safe to make a long HTTP call after an
explicit DB commit, it may actually not - some apparently innocent attribute
access may magically start a new transaction and lock the row in DB under
the hood, and soon cause a snowball disaster. Explicit async helps marking
the dangers, but only when the driver knew the way.
 
Of course there are scenarios when it is essential to access DB in async
code, subscribing to a PostgerSQL notification channel for example. In our
case we used asyncpg and wrote the transaction boundaries carefully. It is
fine to do this, but it would require a lot more effort to correctly make
**everything** async, and as stated in Mike's post, it is meaningless and 
not
a must in most cases.

> The database and ORM is really fast, at least for the common 
> > queries what you do in a REST API. So what would happen in theory if I'd 
> try 
> > to use SQLAlchemy ORM from something like sanic? 
> > https://github.com/channelcat/sanic 
>

If the risk Mike mentioned about long-opening DB transactions can be well 
handled,
it is theoretically doable, with another trouble I'll mention. We tried 
this. The very basic
and working solution is to handle sessions manually from scratch whenever 
DB access
is needed - borrow a connection and create a new session at the beginning, 
and make
sure the session is closed and connection is returned afterwards. It would 
be safe if no
`await` occur during DB access (so that DB transactions won't be blocked) 
and no
long-running SQL was emitted (so that the main loop won't be blocked for 
long). However
due to the implicit nature of traditional ORMs, it is still possible to get 
messed up when
e.g. accessing data object attributes outside those explicit DB access 
blocks. Then
it would be natural to encapsulate all "dangerous" DB-accessing code, into 
a thread pool
for example, exposing API with primitive-only parameters, in order to get 
rid of the
transaction boundary leaking issue and support long-running SQL safely. 
But, it is actually
similar to the task queue solution Mike suggested - to separate async code 
from sync, and
it may even be more complex to separate sync code from async. So, task 
queue is still the
best option I guess.

@zsolt Please give GINO a try, if async DB access happened to be essential 
in your case.
You don't have to use full GINO functionality, just treat it as a 
SQLAlchemy core table
definer and clause compiler (using only a few hundreds of lines of code in 
declarative.py
and dialect.py), you are then only dealing with vanilla asyncpg for the 
rest.

BR,
Fantix

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] SQLAlchemy + async

Reply via email to