On Sunday, April 19, 2020 at 4:44:56 PM UTC-4, Ben wrote: > > > These seem to be embedded in the related SQL implementations but are > clearly not ANSI standard. I'm not sure if that makes it disqualifying for > a SQLAlchemy feature request, or if anyone else could even use it, but > functionality like this is something that, at least for me, would make my > implementation more DB independent. >
The major database servers and clients do have similar extensions to ANSI and the Python DBAPI here, but they're fairly different from one another in how they work and their input/output formats (and capabilities). They also are not necessarily supported by the underlying database drivers. Pyscopg2 has "copy_" methods on the cursor ( https://www.psycopg.org/docs/usage.html#copy); there are several mysql drivers, i looked at two and neither have explicit support for this. I'm not sure how this would really work though - it's basically designed for being implemented in the C clients with local files. The "bulk loading" functionality in SqlAlchemy is largely in the realm of being a low-level DBAPI operations for insert statements, and SqlAlchemy's performance is about as close to using the driver itself - while still having the syntactic sugar and not needing to write to a database specifically. I don't believe there is any code to invoke a database client to read a source file itself; I am often mistaken though as Mike has put a ton of work into SqlAlchemy over the years. >From experience with large imports, I can tell you the following: * The best performance for large updates involves: 1) take the database offline to clients 2) drop (and copy) all the foreign key and check constraints 3) import your data from a file with the native client that ships with your database * 4) add the foreign key and check constraints back in if you feel the need to script this, you could use `subprocess` to invoke the database's native client on the machine * The second best performance is: use SqlAlchemy's bulk strategies, segment out the ranges of your source data file run multiple processes in parallel, each processing a subset of the source data file * The third best performance is: write a script that uses the SqlAlchemy ORM with the 'windowed query' option to iterate over a range of lines in your source file have the script use a 3rd party tool like Redis to mark that it has "checked out" sections of the source data, and has completed/not-completed that range run multiple processes in parallel, each processing a subset of the source data file In the second and third approach, you can periodically run a query or script to calculate the right number of processes to run. It could be 7, or 20, or another number. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/88842622-82e1-4679-8bd4-bd61a7020604%40googlegroups.com.