Simon, The MySQLdb module performs this operation with very little difference to native (mysql < data.sql) speed, so I'm not concerned on that, the delta was <0.5 seconds
In my original example (I didn't know I could pass a dict to the execute to have it fill in the parameters) I built the complete SQL raw with calling mysqldb's escape string on each value, and formatting that SQL took slightly over 1 second, and I didn't even try to optimize how I was doing that code. My more concern/curiosity is the overhead that sqlalchemy core is adding to these inserts and just figuring out why... One might say I'm giving this too much attention because I have a working, acceptable and optimal solution, but that's bypassing issues and really bypassing sqlalchemy as a whole. In the end, I'm pretty sure it's probably in relation to the size of my dataset; but the time deltas between doing things manually vs letting sqlalchemy do its job is just too great to go unspoken about. On Friday, October 5, 2018 at 4:27:11 AM UTC-5, Simon King wrote: > > MySQLdb builds a new SQL string by escaping each of the parameters and > then substituting them into the original SQL: > > https://github.com/farcepest/MySQLdb1/blob/master/MySQLdb/cursors.py#L182 > > For 89 columns * 10,550 rows, that's nearly 1 million parameters. Each > one involves a call to "db.literal(item)", which in turn calls > "self.escape(item, self.encoders)". It looks like self.escape is > implemented in C. > > This means that you've got at least 2 million function calls (which > are relatively slow in Python) just to build the SQL string to send to > the server. > > Simon > On Fri, Oct 5, 2018 at 12:01 AM <adam....@gmail.com <javascript:>> wrote: > > > > Mike, > > > > Er, yeah, not TSQL, my old MSSQL days are showing :D meant a str() of > sql > > > > I'm saying bindparms as the required input format of the params used to > re-bind into the sql in the optimal way, for example > execute(__table__.insert(), bindparams) would represent a list of dicts. > > > > I did a packet capture of execute(__table__.insert(), list(dict()) and > it was doing the execute many calls (insert ... (a, b) values (c, d), (e, > f). so there was no hangup there. > > I also just re-coded this to call mysqldb and run executemany(str, > [tuple()], it does take about the same amount of time (~8s) as > execute(__table__.insert(), list(dict())) so there must be something going > on in the DBAPI thats unoptimal, as I can generate the proper SQL and > insert it in ~3 seconds with a normal .execute(str). > > > > I tried the test case with PyMySQL with just the execute(str) case and > it took about 2 seconds, so I didn't give it much more attention; i'll try > it again with an executemany though. > > 2018-10-04 17:47:02,373 - INFO - Executing for 10,550 inserts, 89 > columns, approx 6,252,460 bytes of raw data... > > 2018-10-04 17:47:02,373 - INFO - Generating datasets > > 2018-10-04 17:47:02,550 - INFO - Executing with > raw_connection().executemany(str, [tuple(a, b, ...)] > > 2018-10-04 17:47:10,061 - INFO - Committing > > > > log.info('Generating datasets') > > sql_str = 'INSERT IGNORE INTO table ({}) > VALUES({})'.format(','.join(data[0].keys()), ','.join(['%s' for i in > range(len(data[0].keys()))])) > > sql_data = [tuple(value for value in list(part.values())) > for part in data] > > log.info('Executing with raw_connection().executemany(str, > [tuple(a, b, ...)]') > > conn = sql.engine.raw_connection() > > cur = conn.cursor() > > cur.executemany(sql_str, sql_data) > > log.info('Committing') > > conn.commit() > > > > > > I still however don't get the difference between sqlalchemy's > execute(__table__.insert().values(dict())) though and why it would be so > much slower than execute(__table__.insert(), dict()), nor why execute(str, > dict()) is that slow as well. > > > > > > > > On Thursday, October 4, 2018 at 4:47:31 PM UTC-5, Mike Bayer wrote: > >> > >> On Thu, Oct 4, 2018 at 4:04 PM <adam....@gmail.com> wrote: > >> > > >> > Mike, > >> > > >> > No, I'm not calling execute 10k times, but I am doing what they do in > your example with test_core_insert > >> > > >> > My code for that is: > >> > log.info('Executing sqlalchemy insert() with bindparams > as execute param - session.execute(__table__.insert(), bindparams)') > >> > > sql.session.execute(models.target.__table__.insert().prefix_with('IGNORE'), > data) > >> > log.info('Committing') > >> > sql.session.commit() > >> > > >> > where data = [OrderedDict(), OrderedDict(), ...] (tried converting to > a regular [dict(), dict(),...] too with no gains; this method is the > fastest outside of direct or near direct DBAPI calls at about 8 seconds. > >> > > >> > However I read somewhere (can't find the source again to cite, but I > thought it was the sqlalchemy docs, I may be just not remembering > correctly) this query chains together individual INSERT statements instead > of doing a multi-insert > >> > i.e. insert into target (a, b, ...) values (c, d, ...); insert into > target (a, b, ...) values (e, f, ...); ... instead of a insert into target > (a, b, ...) values (c, d, ...), (e, f, ...) > >> > where the latter is the most optimal TSQL > >> > >> I'm not sure what "TSQL" means in this context. > >> > >> SQLAlchemy uses cursor.executemany() to invoke the statement just > >> once. the DBAPI decides how it wants to batch it. Regardless, it > >> should move very fast. On the Postgresql side, there's more of a big > >> speed difference between the way psycopg2 executes a statement > >> normally, and some special modes it provides that allow it to chain > >> together multiple statements in one batch. But that's still behind > >> the executemany() call. > >> > >> > >> > > >> > If I'm wrong there and it does do a multi-insert, what is the > difference between execute(__table__.insert().values(bindparams)) and > execute(__table__.insert(), bindparams), and why would the former be ~26 > seconds slower in this use case? > >> > >> What is "bindparams" here, that is a list of many tuples ? If so > >> then yes, the DBAPI is not batching efficiently in the latter case. > >> Try PyMySQL or mysqlclient. > >> > >> > >> > >> > >> > >> > > >> > > >> > > >> > > >> > On Thursday, October 4, 2018 at 2:05:59 PM UTC-5, Mike Bayer wrote: > >> >> > >> >> On Thu, Oct 4, 2018 at 2:45 PM <adam....@gmail.com> wrote: > >> >> > > >> >> > Hi all, > >> >> > > >> >> > Having some very and major performance issues using sqlalchemy to > insert data, been trying many different ways but none seem to hold even a > close candle to raw MySQLdb connections > >> >> > > >> >> > Versions: > >> >> > sqlalchemy: 1.2.9 > >> >> > MySQLdb: 1.3.13 (bound with mysql+mysqldb connector) > >> >> > MySQL: 14.14 > >> >> > Python: 3.6.5 > >> >> > Ubuntu 18.04 x86_64 > >> >> > > >> >> > Server: > >> >> > 8GB ram > >> >> > 2 x Intel(R) Xeon(R) CPU X5650 @ 2.67GHz threads running in > VMWare ESXi > >> >> > > >> >> > Connected to localhost > >> >> > > >> >> > Base data info: > >> >> > 2018-10-04 13:19:56,519 - INFO - Executing for 10,550 inserts, 89 > columns, approx 6,252,460 bytes of raw data... > >> >> > > >> >> > Pre-formatted session execute ~2 seconds: > >> >> > 2018-10-04 13:19:56,520 - INFO - Building raw TSQL including all > escaped value()s - insert into x (a,b, ...) values (escape_string(c), > escape_string(d), ..) (escape_string(e), escape_string(f), ...) > >> >> > 2018-10-04 13:19:57,640 - INFO - Executing raw TSQL (9,218,872 > bytes) - session.execute(tsql) > >> >> > 2018-10-04 13:19:59,435 - INFO - Committing > >> >> > > >> >> > Raw MySQLdb API ~1 second: > >> >> > 2018-10-04 13:19:59,436 - INFO - Executing same raw TSQL directly > via MySQLdb API - engine.raw_connection(); cursor.execute(tsql) > >> >> > 2018-10-04 13:20:00,326 - INFO - Committing > >> >> > > >> >> > Bindparam formatted TSQL converted to text() executed with > bindparams as a dict ~12 seconds to build text() and ~19 seconds to > execute, ~31 seconds total: > >> >> > 2018-10-04 13:20:00,326 - INFO - Building raw TSQL - insert into x > (a, b, ...) values (:a1, :b1, ..), (:a2, :b2, ...) > >> >> > 2018-10-04 13:20:01,017 - INFO - Rebuilding Input data for TSQL > bindparams - {a1: d, b1: e, a2: f, b2: g} > >> >> > 2018-10-04 13:20:02,234 - INFO - Building text() object - > sql.text(tsql) > >> >> > 2018-10-04 13:20:14,259 - INFO - Executing text() with bindparams > - session.execute(text, bindparams) > >> >> > 2018-10-04 13:20:33,552 - INFO - Committing > >> >> > > >> >> > Bindparam formatted TSQL NOT converted to text, executed with > bindparams as a dict: ~33 seconds > >> >> > 2018-10-04 13:20:33,552 - INFO - Executing raw TSQL with > bindparams - session.execute(tsql, bindparams) > >> >> > 2018-10-04 13:21:06,307 - INFO - Committing > >> >> > > >> >> > Using a sqlalchemy class table insert() with using .values() ~34 > seconds: > >> >> > 2018-10-04 13:21:06,311 - INFO - Executing sqlalchemy insert w/ > bindparams as values - > session.execute(__table__.insert().values(bindparams)) > >> >> > 2018-10-04 13:21:40,808 - INFO - Committing > >> >> > > >> >> > Using a sqlalchemy class table insert() with passing bindparams to > session.execute() ~8 seconds: > >> >> > 2018-10-04 13:21:40,809 - INFO - Executing sqlalchemy insert() > with bindparams as execute param - session.execute(__table__.insert(), > bindparams) > >> >> > 2018-10-04 13:21:48,084 - INFO - Committing > >> >> > > >> >> > Obviously the raw db api will be the fastest as there's little to > no processing or overhead, I would think session.execute(str) would come > closer to matching that, but its not too far off. > >> >> > >> >> A Core insert should be only slightly slower than the raw DBAPI, > >> >> however, if you're inserting lots of rows, you need to be using > >> >> executemany style, that is, a single execute() call with a list of > >> >> parameters. There's no code examples here so I don't know what > >> >> you're actually doing, if you are running session.execute() itself > 10K > >> >> times, that's the wrong way to do it. Check out > >> >> > https://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html > > >> >> for usage examples how to bulk-insert for different APIs including > >> >> Core (test_core_insert specifically). Additionally, I don't know > >> >> if you are using bound parameters with your raw MySQL code or not, > and > >> >> if so, is there some unicode processing going on in the SQLAlchemy > >> >> version that might be slowing things down. > >> >> > >> >> A complete code example like those you see in > >> >> > https://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html > > >> >> will allow me to see what you are doing. > >> >> > >> >> > >> >> > > >> >> > Having sqlalchemy.text(str) take 12 seconds to execute seems > rather excessive, but assuming that's the hang with the other two > execute(__table__.insert().values(bindparams)) statement too, however why > would excute(str, bindparams) do this? Does it shadow text() as well if > arg0 is a str? > >> >> > > >> >> > Now the oddest part of all, doing 10,550 individual inserts in the > last example is roughly 26 seconds faster than a single transaction with > 10,550 sets of values. This just does not make transactional sense. > >> >> > > >> >> > Is there something I can be doing better while utilizing core, or > am I over-expecting how sqlalchemy to perform with datasets like this? > >> >> > > >> >> > > >> >> > Hopefully this is enough detail, but I can provide more upon > request. > >> >> > > >> >> > Thank you! > >> >> > > >> >> > -- > >> >> > SQLAlchemy - > >> >> > The Python SQL Toolkit and Object Relational Mapper > >> >> > > >> >> > http://www.sqlalchemy.org/ > >> >> > > >> >> > To post example code, please provide an MCVE: Minimal, Complete, > and Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > >> >> > --- > >> >> > You received this message because you are subscribed to the Google > Groups "sqlalchemy" group. > >> >> > To unsubscribe from this group and stop receiving emails from it, > send an email to sqlalchemy+...@googlegroups.com. > >> >> > To post to this group, send email to sqlal...@googlegroups.com. > >> >> > Visit this group at https://groups.google.com/group/sqlalchemy. > >> >> > For more options, visit https://groups.google.com/d/optout. > >> > > >> > -- > >> > SQLAlchemy - > >> > The Python SQL Toolkit and Object Relational Mapper > >> > > >> > http://www.sqlalchemy.org/ > >> > > >> > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > >> > --- > >> > You received this message because you are subscribed to the Google > Groups "sqlalchemy" group. > >> > To unsubscribe from this group and stop receiving emails from it, > send an email to sqlalchemy+...@googlegroups.com. > >> > To post to this group, send email to sqlal...@googlegroups.com. > >> > Visit this group at https://groups.google.com/group/sqlalchemy. > >> > For more options, visit https://groups.google.com/d/optout. > > > > -- > > SQLAlchemy - > > The Python SQL Toolkit and Object Relational Mapper > > > > http://www.sqlalchemy.org/ > > > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > > --- > > You received this message because you are subscribed to the Google > Groups "sqlalchemy" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to sqlalchemy+...@googlegroups.com <javascript:>. > > To post to this group, send email to sqlal...@googlegroups.com > <javascript:>. > > Visit this group at https://groups.google.com/group/sqlalchemy. > > For more options, visit https://groups.google.com/d/optout. > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.