On Thu, Oct 4, 2018 at 4:04 PM <adam.bot...@gmail.com> wrote:
>
> Mike,
>
> No, I'm not calling execute 10k times, but I am doing what they do in your 
> example with test_core_insert
>
> My code for that is:
>             log.info('Executing sqlalchemy insert() with bindparams as 
> execute param - session.execute(__table__.insert(), bindparams)')
>             
> sql.session.execute(models.target.__table__.insert().prefix_with('IGNORE'), 
> data)
>             log.info('Committing')
>             sql.session.commit()
>
> where data = [OrderedDict(), OrderedDict(), ...] (tried converting to a 
> regular [dict(), dict(),...] too with no gains; this method is the fastest 
> outside of direct or near direct DBAPI calls at about 8 seconds.
>
> However I read somewhere (can't find the source again to cite, but I thought 
> it was the sqlalchemy docs, I may be just not remembering correctly) this 
> query chains together individual INSERT statements instead of doing a 
> multi-insert
> i.e. insert into target (a, b, ...) values (c, d, ...); insert into target 
> (a, b, ...) values (e, f, ...); ... instead of a insert into target (a, b, 
> ...) values (c, d, ...), (e, f, ...)
> where the latter is the most optimal TSQL

I'm not sure what "TSQL" means in this context.

SQLAlchemy uses cursor.executemany() to invoke the statement just
once.  the DBAPI decides how it wants to batch it.     Regardless, it
should move very fast.   On the Postgresql side, there's more of a big
speed difference between the way psycopg2 executes a statement
normally, and some special modes it provides that allow it to chain
together multiple statements in one batch.   But that's still behind
the executemany() call.


>
> If I'm wrong there and it does do a multi-insert, what is the difference 
> between execute(__table__.insert().values(bindparams)) and 
> execute(__table__.insert(), bindparams), and why would the former be ~26 
> seconds slower in this use case?

What is "bindparams" here, that is a list of many tuples ?    If so
then yes, the DBAPI is not batching efficiently in the latter case.
Try PyMySQL or mysqlclient.





>
>
>
>
> On Thursday, October 4, 2018 at 2:05:59 PM UTC-5, Mike Bayer wrote:
>>
>> On Thu, Oct 4, 2018 at 2:45 PM <adam....@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > Having some very and major performance issues using sqlalchemy to insert 
>> > data, been trying many different ways but none seem to hold even a close 
>> > candle to raw MySQLdb connections
>> >
>> > Versions:
>> > sqlalchemy: 1.2.9
>> > MySQLdb: 1.3.13 (bound with mysql+mysqldb connector)
>> > MySQL: 14.14
>> > Python: 3.6.5
>> > Ubuntu 18.04 x86_64
>> >
>> > Server:
>> > 8GB ram
>> > 2 x Intel(R) Xeon(R) CPU X5650  @ 2.67GHz threads running in VMWare ESXi
>> >
>> > Connected to localhost
>> >
>> > Base data info:
>> > 2018-10-04 13:19:56,519 - INFO - Executing for 10,550 inserts, 89 columns, 
>> > approx 6,252,460 bytes of raw data...
>> >
>> > Pre-formatted session execute ~2 seconds:
>> > 2018-10-04 13:19:56,520 - INFO - Building raw TSQL including all escaped 
>> > value()s - insert into x (a,b, ...) values (escape_string(c), 
>> > escape_string(d), ..) (escape_string(e), escape_string(f), ...)
>> > 2018-10-04 13:19:57,640 - INFO - Executing raw TSQL (9,218,872 bytes)  - 
>> > session.execute(tsql)
>> > 2018-10-04 13:19:59,435 - INFO - Committing
>> >
>> > Raw MySQLdb API ~1 second:
>> > 2018-10-04 13:19:59,436 - INFO - Executing same raw TSQL directly via 
>> > MySQLdb API - engine.raw_connection(); cursor.execute(tsql)
>> > 2018-10-04 13:20:00,326 - INFO - Committing
>> >
>> > Bindparam formatted TSQL converted to text() executed with bindparams as a 
>> > dict ~12 seconds to build text() and ~19 seconds to execute, ~31 seconds 
>> > total:
>> > 2018-10-04 13:20:00,326 - INFO - Building raw TSQL - insert into x (a, b, 
>> > ...) values (:a1, :b1, ..), (:a2, :b2, ...)
>> > 2018-10-04 13:20:01,017 - INFO - Rebuilding Input data for TSQL bindparams 
>> > - {a1: d, b1: e, a2: f, b2: g}
>> > 2018-10-04 13:20:02,234 - INFO - Building text() object - sql.text(tsql)
>> > 2018-10-04 13:20:14,259 - INFO - Executing text() with bindparams - 
>> > session.execute(text, bindparams)
>> > 2018-10-04 13:20:33,552 - INFO - Committing
>> >
>> > Bindparam formatted TSQL NOT converted to text, executed with bindparams 
>> > as a dict: ~33 seconds
>> > 2018-10-04 13:20:33,552 - INFO - Executing raw TSQL with bindparams - 
>> > session.execute(tsql, bindparams)
>> > 2018-10-04 13:21:06,307 - INFO - Committing
>> >
>> > Using a sqlalchemy class table insert() with using .values() ~34 seconds:
>> > 2018-10-04 13:21:06,311 - INFO - Executing sqlalchemy insert w/ bindparams 
>> > as values - session.execute(__table__.insert().values(bindparams))
>> > 2018-10-04 13:21:40,808 - INFO - Committing
>> >
>> > Using a sqlalchemy class table insert() with passing bindparams to 
>> > session.execute() ~8 seconds:
>> > 2018-10-04 13:21:40,809 - INFO - Executing sqlalchemy insert() with 
>> > bindparams as execute param - session.execute(__table__.insert(), 
>> > bindparams)
>> > 2018-10-04 13:21:48,084 - INFO - Committing
>> >
>> > Obviously the raw db api will be the fastest as there's little to no 
>> > processing or overhead, I would think session.execute(str) would come 
>> > closer to matching that, but its not too far off.
>>
>> A Core insert should be only  slightly slower than the raw DBAPI,
>> however, if you're inserting lots of rows, you need to be using
>> executemany style, that is, a single execute() call with a list of
>> parameters.   There's no code examples here so I don't know what
>> you're actually doing, if you are running session.execute() itself 10K
>> times, that's the wrong way to do it.  Check out
>> https://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html
>> for usage examples how to bulk-insert for different APIs including
>> Core (test_core_insert specifically).     Additionally, I don't know
>> if you are using bound parameters with your raw MySQL code or not, and
>> if so, is there some unicode processing going on in the SQLAlchemy
>> version that might be slowing things down.
>>
>> A complete code example like those you see in
>> https://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html
>> will allow me to see what you are doing.
>>
>>
>> >
>> > Having sqlalchemy.text(str) take 12 seconds to execute seems rather 
>> > excessive, but assuming that's the hang with the other two 
>> > execute(__table__.insert().values(bindparams)) statement too, however why 
>> > would excute(str, bindparams) do this? Does it shadow text() as well if 
>> > arg0 is a str?
>> >
>> > Now the oddest part of all, doing 10,550 individual inserts in the last 
>> > example is roughly 26 seconds faster than a single transaction with 10,550 
>> > sets of values. This just does not make transactional sense.
>> >
>> > Is there something I can be doing better while utilizing core, or am I 
>> > over-expecting how sqlalchemy to perform with datasets like this?
>> >
>> >
>> > Hopefully this is enough detail, but I can provide more upon request.
>> >
>> > Thank you!
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google Groups 
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to