> > > Hence my question: what would > > happen if I try to use blocking Sqlalchemy ORM in an asyncio app? I > mean, I > > need async because of slow endpoints, for example doing HTTP requests to > 3rd > > parties. > > Yeah I would offload that work into a queue and separate the CRUD part > of the application from the "make lots of slow backend requests" part > of it.
Yes! Use Celery for example. The CRUD part is normally the core business logic which you don't want it broken by e.g. newly hired junior developers. It is both safe and fast (fast as in coding and fast as in executing) to keep it in a blocking model, if applicable, comparing to asynchronous programming. On the other hand, the I/O bound code base is relatively less changing normally, and you can use Celery or asyncio or multithreading or multiprocessing or whatever works because it is isolated to be a simple task with a clear target. > Explicit async and ORMs are not compatible with each other > from a programming point of view - the philosophy of explicit async is > that no IO should ever occur without the programmer typing out a bunch > of boilerplate saying that it is OK to do IO, whereas the ORM's main > purpose is to transparently fetch and persist data *without* > boilerplate. > Exactly! That's also what I wanted to try with GINO - to have an explicit yet objective data interface at the same time, and see what it could bring. So far in the last two months, basic CRUD seemed working well. But when it comes to relationships, things became harder, and probably need a different way out. > Additionally, if you mix up making lots of slow HTTP requests to > backends with your CRUD, you also have database transactions being > held open for a long time just waiting - that will require many more > database connections which are fairly expensive in Postgresql as well > as idle transactions locking rows, which you want to avoid. > Couldn't agree more on this. It is probably the same reason why the Pyramid user suggested not to use implicit async (gevent) and ORM at the same time - when you think it is safe to make a long HTTP call after an explicit DB commit, it may actually not - some apparently innocent attribute access may magically start a new transaction and lock the row in DB under the hood, and soon cause a snowball disaster. Explicit async helps marking the dangers, but only when the driver knew the way. Of course there are scenarios when it is essential to access DB in async code, subscribing to a PostgerSQL notification channel for example. In our case we used asyncpg and wrote the transaction boundaries carefully. It is fine to do this, but it would require a lot more effort to correctly make **everything** async, and as stated in Mike's post, it is meaningless and not a must in most cases. > The database and ORM is really fast, at least for the common > > queries what you do in a REST API. So what would happen in theory if I'd > try > > to use SQLAlchemy ORM from something like sanic? > > https://github.com/channelcat/sanic > If the risk Mike mentioned about long-opening DB transactions can be well handled, it is theoretically doable, with another trouble I'll mention. We tried this. The very basic and working solution is to handle sessions manually from scratch whenever DB access is needed - borrow a connection and create a new session at the beginning, and make sure the session is closed and connection is returned afterwards. It would be safe if no `await` occur during DB access (so that DB transactions won't be blocked) and no long-running SQL was emitted (so that the main loop won't be blocked for long). However due to the implicit nature of traditional ORMs, it is still possible to get messed up when e.g. accessing data object attributes outside those explicit DB access blocks. Then it would be natural to encapsulate all "dangerous" DB-accessing code, into a thread pool for example, exposing API with primitive-only parameters, in order to get rid of the transaction boundary leaking issue and support long-running SQL safely. But, it is actually similar to the task queue solution Mike suggested - to separate async code from sync, and it may even be more complex to separate sync code from async. So, task queue is still the best option I guess. @zsolt Please give GINO a try, if async DB access happened to be essential in your case. You don't have to use full GINO functionality, just treat it as a SQLAlchemy core table definer and clause compiler (using only a few hundreds of lines of code in declarative.py and dialect.py), you are then only dealing with vanilla asyncpg for the rest. BR, Fantix -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.