On Fri, Nov 2, 2018 at 1:08 PM Ruben Di Battista
<rubendibatti...@gmail.com> wrote:
>
> Thanks Mike as always,
>
> I'm diving a bit more in the problem. The solution they decided to apply is 
> to modify the code from above, like this:
>
>         header = 'INSERT INTO `passageData` (`time`, `azimuth`, ' \
>                  '`elevation`, `doppler`, `slant`, `passageID`) VALUES '
>
>        with profiled():
>             for ix in range(n):
>                time_i = times[ix].strftime("%Y-%m-%d %H:%M:%S.%f")[:-3]
>                az_i = round(azimuth[ix], 2)
>                el_i = round(elevation[ix], 2)
>                dopp_i = round(doppler[ix], 10)
>                slant_i = round(slant[ix], 2)
>
>                 header += '(\'{0}\', {1}, {2}, {3}, {4}, {5})'.\
>                    format(time_i, az_i, el_i, dopp_i, slant_i, self.id)
>
>                 if ix != n-1:
>                    header += ','
>
>             header += ';'
>            session.execute(header)
>
> And I'm profiling the previous one with:
>
> with profiled():
>     session.execute(
>                insert_query,
>                [
>                    {
>                        'time': times[i],
>                        'elevation': elevation[i],
>                        'azimuth': azimuth[i],
>                        'doppler': doppler[i],
>                        'slant': slant[i],
>                        'passageID': passage_id
>                    }
>                    for i in six.moves.range(0, n)
>                 ]
>            )
>
> And these are the lines I get from the profiling in the first case (with ugly 
> string interpolation):
>          420358 function calls in 1.591 seconds
>
>    Ordered by: cumulative time
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    1.114    1.114 
> /lib/python2.7/site-packages/sqlalchemy/orm/session.py:1047(execute)
>         1    0.000    0.000    0.971    0.971 
> /lib/python2.7/site-packages/sqlalchemy/engine/base.py:882(execute)
>         1    0.000    0.000    0.971    0.971 
> /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:267(_execute_on_connection)
>         1    0.000    0.000    0.971    0.971 
> /lib/python2.7/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement)
>         1    0.000    0.000    0.834    0.834 
> /lib/python2.7/site-packages/sqlalchemy/engine/base.py:1111(_execute_context)
>         1    0.000    0.000    0.789    0.789 
> /lib/python2.7/site-packages/sqlalchemy/engine/default.py:508(do_execute)
>         1    0.005    0.005    0.789    0.789 
> /lib/python2.7/site-packages/MySQLdb/cursors.py:204(execute)
>         1    0.000    0.000    0.780    0.780 
> /lib/python2.7/site-packages/MySQLdb/cursors.py:411(_query)
>         1    0.000    0.000    0.780    0.780 
> /lib/python2.7/site-packages/MySQLdb/cursors.py:372(_do_query)
>         1    0.780    0.780    0.780    0.780 
> /lib/python2.7/site-packages/MySQLdb/connections.py:267(query)
>         3    0.267    0.089    0.267    0.089 {method 'sub' of 
> '_sre.SRE_Pattern' objects}
>     60000    0.205    0.000    0.205    0.000 {method 'strftime' of 
> 'datetime.date' objects}
>         1    0.011    0.011    0.143    0.143 
> /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:4287(_literal_as_text)
>         1    0.000    0.000    0.137    0.137 <string>:1(<lambda>)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:379(compile)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:444(_compiler)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:393(__init__)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:180(__init__)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:244(process)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/visitors.py:86(_compiler_dispatch)
>         1    0.000    0.000    0.137    0.137 
> /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:759(visit_textclause)
>         1    0.000    0.000    0.132    0.132 
> /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:1259(__init__)
>    240000    0.120    0.000    0.120    0.000 {round}
>     60000    0.104    0.000    0.104    0.000 {method 'format' of 'str' 
> objects}
>     60000    0.047    0.000    0.047    0.000 
> /lib/python2.7/site-packages/sqlalchemy/orm/attributes.py:234(__get__)
>
> While in the second case I get:
>
>          5103248 function calls (5103183 primitive calls) in 3.829 seconds
>
>    Ordered by: cumulative time
>
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    3.829    3.829 
> lib/python2.7/site-packages/sqlalchemy/orm/session.py:1047(execute)
>         1    0.000    0.000    3.829    3.829 
> lib/python2.7/site-packages/sqlalchemy/engine/base.py:882(execute)
>         1    0.000    0.000    3.829    3.829 
> lib/python2.7/site-packages/sqlalchemy/sql/elements.py:267(_execute_on_connection)
>         1    0.000    0.000    3.829    3.829 
> lib/python2.7/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement)
>         1    0.001    0.001    3.829    3.829 
> lib/python2.7/site-packages/sqlalchemy/engine/base.py:1111(_execute_context)
>         1    0.000    0.000    2.829    2.829 
> lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py:104(do_executemany)
>         1    0.000    0.000    2.829    2.829 
> lib/python2.7/site-packages/MySQLdb/cursors.py:256(executemany)
>         1    0.130    0.130    2.829    2.829 
> lib/python2.7/site-packages/MySQLdb/cursors.py:286(_do_execute_many)
>     60000    0.198    0.000    1.970    0.000 
> lib/python2.7/site-packages/MySQLdb/cursors.py:105(_escape_args)
>    420000    0.169    0.000    1.245    0.000 
> lib/python2.7/site-packages/MySQLdb/cursors.py:111(<genexpr>)
>    360000    0.378    0.000    1.076    0.000 
> lib/python2.7/site-packages/MySQLdb/connections.py:302(literal)
>         1    0.368    0.368    0.998    0.998 
> lib/python2.7/site-packages/sqlalchemy/engine/default.py:595(_init_compiled)
>        61    0.001    0.000    0.703    0.012 
> lib/python2.7/site-packages/MySQLdb/cursors.py:204(execute)
>        61    0.000    0.000    0.702    0.012 
> lib/python2.7/site-packages/MySQLdb/cursors.py:411(_query)
>        61    0.001    0.000    0.701    0.011 
> lib/python2.7/site-packages/MySQLdb/cursors.py:372(_do_query)
>        61    0.699    0.011    0.699    0.011 
> lib/python2.7/site-packages/MySQLdb/connections.py:267(query)
>    360000    0.160    0.000    0.543    0.000 {method 'escape' of 
> '_mysql.connection' objects}
>     60000    0.181    0.000    0.521    0.000 {map}
>    360000    0.191    0.000    0.339    0.000 
> lib/python2.7/site-packages/MySQLdb/cursors.py:98(_ensure_bytes)
>   1560325    0.323    0.000    0.323    0.000 {isinstance}
>     60000    0.313    0.000    0.313    0.000 
> lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:526(construct_params)
>    240000    0.076    0.000    0.284    0.000 
> lib/python2.7/site-packages/sqlalchemy/sql/type_api.py:1181(process)
>    240000    0.085    0.000    0.208    0.000 
> /Users/rubendibattista/git/Leaf/pyggdrasill/pyggdrasill/sql/types/small_float.py:10(process_bind_param)
>     60000    0.034    0.000    0.206    0.000 
> lib/python2.7/site-packages/MySQLdb/times.py:125(DateTime2literal)
>    240000    0.155    0.000    0.155    0.000 
> lib/python2.7/site-packages/MySQLdb/converters.py:68(Float2Str)
>     60000    0.028    0.000    0.154    0.000 
> lib/python2.7/site-packages/MySQLdb/times.py:39(format_TIMESTAMP)
>     60000    0.126    0.000    0.126    0.000 {method 'format' of 'str' 
> objects}
>    240000    0.123    0.000    0.123    0.000 {round}
>    420015    0.033    0.000    0.033    0.000 {method 'append' of 'list' 
> objects}
>     60000    0.022    0.000    0.022    0.000 
> lib/python2.7/site-packages/MySQLdb/converters.py:58(Thing2Str)
>     60000    0.018    0.000    0.018    0.000 {_mysql.string_literal}
>    180206    0.014    0.000    0.014    0.000 {len}
>        61    0.001    0.000    0.002    0.000 
> lib/python2.7/site-packages/MySQLdb/cursors.py:182(_do_get_result)
>
>
> So I get a way bigger number of function calls. To me this seems related to 
> the args escaping from MySQLdb cursors.py. Is there any better way to 
> optimize that INSERT query?

So, the callcounts are inflated by lines like these:

         1    0.130    0.130    2.829    2.829
lib/python2.7/site-packages/MySQLdb/cursors.py:286(_do_execute_many)
        61    0.001    0.000    0.703    0.012
lib/python2.7/site-packages/MySQLdb/cursors.py:204(execute)
        61    0.000    0.000    0.702    0.012
lib/python2.7/site-packages/MySQLdb/cursors.py:411(_query)
        61    0.001    0.000    0.701    0.011
lib/python2.7/site-packages/MySQLdb/cursors.py:372(_do_query)
        61    0.699    0.011    0.699    0.011
lib/python2.7/site-packages/MySQLdb/connections.py:267(query)
60000    0.198    0.000    1.970    0.000
lib/python2.7/site-packages/MySQLdb/cursors.py:105(_escape_args)
    420000    0.169    0.000    1.245    0.000
lib/python2.7/site-packages/MySQLdb/cursors.py:111(<genexpr>)
    360000    0.378    0.000    1.076    0.000
lib/python2.7/site-packages/MySQLdb/connections.py:302(literal)
    360000    0.191    0.000    0.339    0.000
lib/python2.7/site-packages/MySQLdb/cursors.py:98(_ensure_bytes)

The first thing we note is that "site-packages/MySQLdb" is the
mysqlclient package, which as the DBAPI is independent from SQLAlchemy
and is documented at https://mysqlclient.readthedocs.io/.      So when
your colleague was incorrect, I didn't specify that he was not
actually observing this.  The source of your 65535 number is right
here in that package:

https://github.com/PyMySQL/mysqlclient-python/blob/master/MySQLdb/cursors.py#L52


This "max_stmt_length" is undocumented outside of the source code, but
it's an attribute on the cursor, so certainly if you wanted to make
that huge, you can set it just like this:

from sqlalchemy import event

@event.listens_for(my_engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, parameters,
context, executemany):
    cursor.max_stmt_length = 300000000    # go nuts

We can see that is the literal answer to your original question.
But if we look at your profile results, I can tell you right now that
changing that number is not going to do much for you as far as
callcounts.    As far as actual time spent, it may or may not help,
because allocating memory in Python is itself often slow, and under
concurrency the memory use can begin to get problematic more
generally.     Above, your profile shows that it invoked a statement
61 times rather than one time.   But the vast majority of callcounts
are spent in processing and escaping of individual parameters in the
statement, e.g. calls like "literal" and "_ensure_bytes", those
numbers aren't going to change whether you have one 10M statement or
60 smaller ones.   SQLAlchemy is always going to use bound parameters
for Core constructs like insert(), and the time spent here seems to be
mostly a product of the expense of bound parameters within the
mysqlclient driver.  So while I encourage you to try the above recipe
and such, to overcome that overhead you may still be stuck formatting
the entire string yourself, but I would encourage you to familiarize
with how MySQLdb is doing this over at
https://github.com/PyMySQL/mysqlclient-python/ , since that's where
you're hitting the limitations you are having.










>
> On Friday, November 2, 2018 at 5:23:51 PM UTC+1, Mike Bayer wrote:
>>
>> On Fri, Nov 2, 2018 at 11:17 AM Ruben Di Battista
>> <rubendi...@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I have a huge insert of the type:
>> >
>> > ```
>> > session.execute(
>> >             insert_query,
>> >             [
>> >                 {
>> >                     'time': times[i],
>> >                     'elevation': elevation[i],
>> >                     'azimuth': azimuth[i],
>> >                     'doppler': doppler[i],
>> >                     'slant': slant[i],
>> >                     'passageID': passage_id
>> >                 }
>> >                 for i in six.moves.range(0, n)
>> >             ]
>> >         )
>> >
>> > ```
>> >
>> > where n is huge.
>> >
>> > I was told by a colleague that SQLALchemy limits query length to 65536 
>> > chars while MySQL can cappet 33MB big queries. Can I tailor this in 
>> > SQLAlchemy?
>>
>> your colleague is mistaken.  SQLAlchemy has no limit on the sizes of
>> query strings or any other string.
>>
>>
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google Groups 
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to