Mike,
Er, yeah, not TSQL, my old MSSQL days are showing :D meant a str() of sql
I'm saying bindparms as the required input format of the params used to
re-bind into the sql in the optimal way, for example
execute(__table__.insert(), bindparams) would represent a list of dicts.
I did a packet capture of execute(__table__.insert(), list(dict()) and it
was doing the execute many calls (insert ... (a, b) values (c, d), (e, f).
so there was no hangup there.
I also just re-coded this to call mysqldb and run executemany(str,
[tuple()], it does take about the same amount of time (~8s) as
execute(__table__.insert(), list(dict())) so there must be something going
on in the DBAPI thats unoptimal, as I can generate the proper SQL and
insert it in ~3 seconds with a normal .execute(str).
I tried the test case with PyMySQL with just the execute(str) case and it
took about 2 seconds, so I didn't give it much more attention; i'll try it
again with an executemany though.
2018-10-04 17:47:02,373 - INFO - Executing for 10,550 inserts, 89 columns,
approx 6,252,460 bytes of raw data...
2018-10-04 17:47:02,373 - INFO - Generating datasets
2018-10-04 17:47:02,550 - INFO - Executing with
raw_connection().executemany(str, [tuple(a, b, ...)]
2018-10-04 17:47:10,061 - INFO - Committing
log.info('Generating datasets')
sql_str = 'INSERT IGNORE INTO table ({})
VALUES({})'.format(','.join(data[0].keys()), ','.join(['%s' for i in
range(len(data[0].keys()))]))
sql_data = [tuple(value for value in list(part.values())) for
part in data]
log.info('Executing with raw_connection().executemany(str,
[tuple(a, b, ...)]')
conn = sql.engine.raw_connection()
cur = conn.cursor()
cur.executemany(sql_str, sql_data)
log.info('Committing')
conn.commit()
I still however don't get the difference between sqlalchemy's
execute(__table__.insert().values(dict())) though and why it would be so
much slower than execute(__table__.insert(), dict()), nor why execute(str,
dict()) is that slow as well.
On Thursday, October 4, 2018 at 4:47:31 PM UTC-5, Mike Bayer wrote:
>
> On Thu, Oct 4, 2018 at 4:04 PM <[email protected] <javascript:>> wrote:
> >
> > Mike,
> >
> > No, I'm not calling execute 10k times, but I am doing what they do in
> your example with test_core_insert
> >
> > My code for that is:
> > log.info('Executing sqlalchemy insert() with bindparams as
> execute param - session.execute(__table__.insert(), bindparams)')
> >
> sql.session.execute(models.target.__table__.insert().prefix_with('IGNORE'),
> data)
> > log.info('Committing')
> > sql.session.commit()
> >
> > where data = [OrderedDict(), OrderedDict(), ...] (tried converting to a
> regular [dict(), dict(),...] too with no gains; this method is the fastest
> outside of direct or near direct DBAPI calls at about 8 seconds.
> >
> > However I read somewhere (can't find the source again to cite, but I
> thought it was the sqlalchemy docs, I may be just not remembering
> correctly) this query chains together individual INSERT statements instead
> of doing a multi-insert
> > i.e. insert into target (a, b, ...) values (c, d, ...); insert into
> target (a, b, ...) values (e, f, ...); ... instead of a insert into target
> (a, b, ...) values (c, d, ...), (e, f, ...)
> > where the latter is the most optimal TSQL
>
> I'm not sure what "TSQL" means in this context.
>
> SQLAlchemy uses cursor.executemany() to invoke the statement just
> once. the DBAPI decides how it wants to batch it. Regardless, it
> should move very fast. On the Postgresql side, there's more of a big
> speed difference between the way psycopg2 executes a statement
> normally, and some special modes it provides that allow it to chain
> together multiple statements in one batch. But that's still behind
> the executemany() call.
>
>
> >
> > If I'm wrong there and it does do a multi-insert, what is the difference
> between execute(__table__.insert().values(bindparams)) and
> execute(__table__.insert(), bindparams), and why would the former be ~26
> seconds slower in this use case?
>
> What is "bindparams" here, that is a list of many tuples ? If so
> then yes, the DBAPI is not batching efficiently in the latter case.
> Try PyMySQL or mysqlclient.
>
>
>
>
>
> >
> >
> >
> >
> > On Thursday, October 4, 2018 at 2:05:59 PM UTC-5, Mike Bayer wrote:
> >>
> >> On Thu, Oct 4, 2018 at 2:45 PM <[email protected]> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > Having some very and major performance issues using sqlalchemy to
> insert data, been trying many different ways but none seem to hold even a
> close candle to raw MySQLdb connections
> >> >
> >> > Versions:
> >> > sqlalchemy: 1.2.9
> >> > MySQLdb: 1.3.13 (bound with mysql+mysqldb connector)
> >> > MySQL: 14.14
> >> > Python: 3.6.5
> >> > Ubuntu 18.04 x86_64
> >> >
> >> > Server:
> >> > 8GB ram
> >> > 2 x Intel(R) Xeon(R) CPU X5650 @ 2.67GHz threads running in VMWare
> ESXi
> >> >
> >> > Connected to localhost
> >> >
> >> > Base data info:
> >> > 2018-10-04 13:19:56,519 - INFO - Executing for 10,550 inserts, 89
> columns, approx 6,252,460 bytes of raw data...
> >> >
> >> > Pre-formatted session execute ~2 seconds:
> >> > 2018-10-04 13:19:56,520 - INFO - Building raw TSQL including all
> escaped value()s - insert into x (a,b, ...) values (escape_string(c),
> escape_string(d), ..) (escape_string(e), escape_string(f), ...)
> >> > 2018-10-04 13:19:57,640 - INFO - Executing raw TSQL (9,218,872 bytes)
> - session.execute(tsql)
> >> > 2018-10-04 13:19:59,435 - INFO - Committing
> >> >
> >> > Raw MySQLdb API ~1 second:
> >> > 2018-10-04 13:19:59,436 - INFO - Executing same raw TSQL directly via
> MySQLdb API - engine.raw_connection(); cursor.execute(tsql)
> >> > 2018-10-04 13:20:00,326 - INFO - Committing
> >> >
> >> > Bindparam formatted TSQL converted to text() executed with bindparams
> as a dict ~12 seconds to build text() and ~19 seconds to execute, ~31
> seconds total:
> >> > 2018-10-04 13:20:00,326 - INFO - Building raw TSQL - insert into x
> (a, b, ...) values (:a1, :b1, ..), (:a2, :b2, ...)
> >> > 2018-10-04 13:20:01,017 - INFO - Rebuilding Input data for TSQL
> bindparams - {a1: d, b1: e, a2: f, b2: g}
> >> > 2018-10-04 13:20:02,234 - INFO - Building text() object -
> sql.text(tsql)
> >> > 2018-10-04 13:20:14,259 - INFO - Executing text() with bindparams -
> session.execute(text, bindparams)
> >> > 2018-10-04 13:20:33,552 - INFO - Committing
> >> >
> >> > Bindparam formatted TSQL NOT converted to text, executed with
> bindparams as a dict: ~33 seconds
> >> > 2018-10-04 13:20:33,552 - INFO - Executing raw TSQL with bindparams -
> session.execute(tsql, bindparams)
> >> > 2018-10-04 13:21:06,307 - INFO - Committing
> >> >
> >> > Using a sqlalchemy class table insert() with using .values() ~34
> seconds:
> >> > 2018-10-04 13:21:06,311 - INFO - Executing sqlalchemy insert w/
> bindparams as values -
> session.execute(__table__.insert().values(bindparams))
> >> > 2018-10-04 13:21:40,808 - INFO - Committing
> >> >
> >> > Using a sqlalchemy class table insert() with passing bindparams to
> session.execute() ~8 seconds:
> >> > 2018-10-04 13:21:40,809 - INFO - Executing sqlalchemy insert() with
> bindparams as execute param - session.execute(__table__.insert(),
> bindparams)
> >> > 2018-10-04 13:21:48,084 - INFO - Committing
> >> >
> >> > Obviously the raw db api will be the fastest as there's little to no
> processing or overhead, I would think session.execute(str) would come
> closer to matching that, but its not too far off.
> >>
> >> A Core insert should be only slightly slower than the raw DBAPI,
> >> however, if you're inserting lots of rows, you need to be using
> >> executemany style, that is, a single execute() call with a list of
> >> parameters. There's no code examples here so I don't know what
> >> you're actually doing, if you are running session.execute() itself 10K
> >> times, that's the wrong way to do it. Check out
> >>
> https://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html
>
> >> for usage examples how to bulk-insert for different APIs including
> >> Core (test_core_insert specifically). Additionally, I don't know
> >> if you are using bound parameters with your raw MySQL code or not, and
> >> if so, is there some unicode processing going on in the SQLAlchemy
> >> version that might be slowing things down.
> >>
> >> A complete code example like those you see in
> >>
> https://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html
>
> >> will allow me to see what you are doing.
> >>
> >>
> >> >
> >> > Having sqlalchemy.text(str) take 12 seconds to execute seems rather
> excessive, but assuming that's the hang with the other two
> execute(__table__.insert().values(bindparams)) statement too, however why
> would excute(str, bindparams) do this? Does it shadow text() as well if
> arg0 is a str?
> >> >
> >> > Now the oddest part of all, doing 10,550 individual inserts in the
> last example is roughly 26 seconds faster than a single transaction with
> 10,550 sets of values. This just does not make transactional sense.
> >> >
> >> > Is there something I can be doing better while utilizing core, or am
> I over-expecting how sqlalchemy to perform with datasets like this?
> >> >
> >> >
> >> > Hopefully this is enough detail, but I can provide more upon request.
> >> >
> >> > Thank you!
> >> >
> >> > --
> >> > SQLAlchemy -
> >> > The Python SQL Toolkit and Object Relational Mapper
> >> >
> >> > http://www.sqlalchemy.org/
> >> >
> >> > To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> >> > ---
> >> > You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> >> > To unsubscribe from this group and stop receiving emails from it,
> send an email to [email protected].
> >> > To post to this group, send email to [email protected].
> >> > Visit this group at https://groups.google.com/group/sqlalchemy.
> >> > For more options, visit https://groups.google.com/d/optout.
> >
> > --
> > SQLAlchemy -
> > The Python SQL Toolkit and Object Relational Mapper
> >
> > http://www.sqlalchemy.org/
> >
> > To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> > ---
> > You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected] <javascript:>.
> > To post to this group, send email to [email protected]
> <javascript:>.
> > Visit this group at https://groups.google.com/group/sqlalchemy.
> > For more options, visit https://groups.google.com/d/optout.
>
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.