On Fri, Nov 2, 2018 at 1:08 PM Ruben Di Battista <rubendibatti...@gmail.com> wrote: > > Thanks Mike as always, > > I'm diving a bit more in the problem. The solution they decided to apply is > to modify the code from above, like this: > > header = 'INSERT INTO `passageData` (`time`, `azimuth`, ' \ > '`elevation`, `doppler`, `slant`, `passageID`) VALUES ' > > with profiled(): > for ix in range(n): > time_i = times[ix].strftime("%Y-%m-%d %H:%M:%S.%f")[:-3] > az_i = round(azimuth[ix], 2) > el_i = round(elevation[ix], 2) > dopp_i = round(doppler[ix], 10) > slant_i = round(slant[ix], 2) > > header += '(\'{0}\', {1}, {2}, {3}, {4}, {5})'.\ > format(time_i, az_i, el_i, dopp_i, slant_i, self.id) > > if ix != n-1: > header += ',' > > header += ';' > session.execute(header) > > And I'm profiling the previous one with: > > with profiled(): > session.execute( > insert_query, > [ > { > 'time': times[i], > 'elevation': elevation[i], > 'azimuth': azimuth[i], > 'doppler': doppler[i], > 'slant': slant[i], > 'passageID': passage_id > } > for i in six.moves.range(0, n) > ] > ) > > And these are the lines I get from the profiling in the first case (with ugly > string interpolation): > 420358 function calls in 1.591 seconds > > Ordered by: cumulative time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.000 0.000 1.114 1.114 > /lib/python2.7/site-packages/sqlalchemy/orm/session.py:1047(execute) > 1 0.000 0.000 0.971 0.971 > /lib/python2.7/site-packages/sqlalchemy/engine/base.py:882(execute) > 1 0.000 0.000 0.971 0.971 > /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:267(_execute_on_connection) > 1 0.000 0.000 0.971 0.971 > /lib/python2.7/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement) > 1 0.000 0.000 0.834 0.834 > /lib/python2.7/site-packages/sqlalchemy/engine/base.py:1111(_execute_context) > 1 0.000 0.000 0.789 0.789 > /lib/python2.7/site-packages/sqlalchemy/engine/default.py:508(do_execute) > 1 0.005 0.005 0.789 0.789 > /lib/python2.7/site-packages/MySQLdb/cursors.py:204(execute) > 1 0.000 0.000 0.780 0.780 > /lib/python2.7/site-packages/MySQLdb/cursors.py:411(_query) > 1 0.000 0.000 0.780 0.780 > /lib/python2.7/site-packages/MySQLdb/cursors.py:372(_do_query) > 1 0.780 0.780 0.780 0.780 > /lib/python2.7/site-packages/MySQLdb/connections.py:267(query) > 3 0.267 0.089 0.267 0.089 {method 'sub' of > '_sre.SRE_Pattern' objects} > 60000 0.205 0.000 0.205 0.000 {method 'strftime' of > 'datetime.date' objects} > 1 0.011 0.011 0.143 0.143 > /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:4287(_literal_as_text) > 1 0.000 0.000 0.137 0.137 <string>:1(<lambda>) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:379(compile) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:444(_compiler) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:393(__init__) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:180(__init__) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:244(process) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/visitors.py:86(_compiler_dispatch) > 1 0.000 0.000 0.137 0.137 > /lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:759(visit_textclause) > 1 0.000 0.000 0.132 0.132 > /lib/python2.7/site-packages/sqlalchemy/sql/elements.py:1259(__init__) > 240000 0.120 0.000 0.120 0.000 {round} > 60000 0.104 0.000 0.104 0.000 {method 'format' of 'str' > objects} > 60000 0.047 0.000 0.047 0.000 > /lib/python2.7/site-packages/sqlalchemy/orm/attributes.py:234(__get__) > > While in the second case I get: > > 5103248 function calls (5103183 primitive calls) in 3.829 seconds > > Ordered by: cumulative time > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.000 0.000 3.829 3.829 > lib/python2.7/site-packages/sqlalchemy/orm/session.py:1047(execute) > 1 0.000 0.000 3.829 3.829 > lib/python2.7/site-packages/sqlalchemy/engine/base.py:882(execute) > 1 0.000 0.000 3.829 3.829 > lib/python2.7/site-packages/sqlalchemy/sql/elements.py:267(_execute_on_connection) > 1 0.000 0.000 3.829 3.829 > lib/python2.7/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement) > 1 0.001 0.001 3.829 3.829 > lib/python2.7/site-packages/sqlalchemy/engine/base.py:1111(_execute_context) > 1 0.000 0.000 2.829 2.829 > lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py:104(do_executemany) > 1 0.000 0.000 2.829 2.829 > lib/python2.7/site-packages/MySQLdb/cursors.py:256(executemany) > 1 0.130 0.130 2.829 2.829 > lib/python2.7/site-packages/MySQLdb/cursors.py:286(_do_execute_many) > 60000 0.198 0.000 1.970 0.000 > lib/python2.7/site-packages/MySQLdb/cursors.py:105(_escape_args) > 420000 0.169 0.000 1.245 0.000 > lib/python2.7/site-packages/MySQLdb/cursors.py:111(<genexpr>) > 360000 0.378 0.000 1.076 0.000 > lib/python2.7/site-packages/MySQLdb/connections.py:302(literal) > 1 0.368 0.368 0.998 0.998 > lib/python2.7/site-packages/sqlalchemy/engine/default.py:595(_init_compiled) > 61 0.001 0.000 0.703 0.012 > lib/python2.7/site-packages/MySQLdb/cursors.py:204(execute) > 61 0.000 0.000 0.702 0.012 > lib/python2.7/site-packages/MySQLdb/cursors.py:411(_query) > 61 0.001 0.000 0.701 0.011 > lib/python2.7/site-packages/MySQLdb/cursors.py:372(_do_query) > 61 0.699 0.011 0.699 0.011 > lib/python2.7/site-packages/MySQLdb/connections.py:267(query) > 360000 0.160 0.000 0.543 0.000 {method 'escape' of > '_mysql.connection' objects} > 60000 0.181 0.000 0.521 0.000 {map} > 360000 0.191 0.000 0.339 0.000 > lib/python2.7/site-packages/MySQLdb/cursors.py:98(_ensure_bytes) > 1560325 0.323 0.000 0.323 0.000 {isinstance} > 60000 0.313 0.000 0.313 0.000 > lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:526(construct_params) > 240000 0.076 0.000 0.284 0.000 > lib/python2.7/site-packages/sqlalchemy/sql/type_api.py:1181(process) > 240000 0.085 0.000 0.208 0.000 > /Users/rubendibattista/git/Leaf/pyggdrasill/pyggdrasill/sql/types/small_float.py:10(process_bind_param) > 60000 0.034 0.000 0.206 0.000 > lib/python2.7/site-packages/MySQLdb/times.py:125(DateTime2literal) > 240000 0.155 0.000 0.155 0.000 > lib/python2.7/site-packages/MySQLdb/converters.py:68(Float2Str) > 60000 0.028 0.000 0.154 0.000 > lib/python2.7/site-packages/MySQLdb/times.py:39(format_TIMESTAMP) > 60000 0.126 0.000 0.126 0.000 {method 'format' of 'str' > objects} > 240000 0.123 0.000 0.123 0.000 {round} > 420015 0.033 0.000 0.033 0.000 {method 'append' of 'list' > objects} > 60000 0.022 0.000 0.022 0.000 > lib/python2.7/site-packages/MySQLdb/converters.py:58(Thing2Str) > 60000 0.018 0.000 0.018 0.000 {_mysql.string_literal} > 180206 0.014 0.000 0.014 0.000 {len} > 61 0.001 0.000 0.002 0.000 > lib/python2.7/site-packages/MySQLdb/cursors.py:182(_do_get_result) > > > So I get a way bigger number of function calls. To me this seems related to > the args escaping from MySQLdb cursors.py. Is there any better way to > optimize that INSERT query?
So, the callcounts are inflated by lines like these: 1 0.130 0.130 2.829 2.829 lib/python2.7/site-packages/MySQLdb/cursors.py:286(_do_execute_many) 61 0.001 0.000 0.703 0.012 lib/python2.7/site-packages/MySQLdb/cursors.py:204(execute) 61 0.000 0.000 0.702 0.012 lib/python2.7/site-packages/MySQLdb/cursors.py:411(_query) 61 0.001 0.000 0.701 0.011 lib/python2.7/site-packages/MySQLdb/cursors.py:372(_do_query) 61 0.699 0.011 0.699 0.011 lib/python2.7/site-packages/MySQLdb/connections.py:267(query) 60000 0.198 0.000 1.970 0.000 lib/python2.7/site-packages/MySQLdb/cursors.py:105(_escape_args) 420000 0.169 0.000 1.245 0.000 lib/python2.7/site-packages/MySQLdb/cursors.py:111(<genexpr>) 360000 0.378 0.000 1.076 0.000 lib/python2.7/site-packages/MySQLdb/connections.py:302(literal) 360000 0.191 0.000 0.339 0.000 lib/python2.7/site-packages/MySQLdb/cursors.py:98(_ensure_bytes) The first thing we note is that "site-packages/MySQLdb" is the mysqlclient package, which as the DBAPI is independent from SQLAlchemy and is documented at https://mysqlclient.readthedocs.io/. So when your colleague was incorrect, I didn't specify that he was not actually observing this. The source of your 65535 number is right here in that package: https://github.com/PyMySQL/mysqlclient-python/blob/master/MySQLdb/cursors.py#L52 This "max_stmt_length" is undocumented outside of the source code, but it's an attribute on the cursor, so certainly if you wanted to make that huge, you can set it just like this: from sqlalchemy import event @event.listens_for(my_engine, 'before_cursor_execute') def receive_before_cursor_execute(conn, cursor, statement, parameters, context, executemany): cursor.max_stmt_length = 300000000 # go nuts We can see that is the literal answer to your original question. But if we look at your profile results, I can tell you right now that changing that number is not going to do much for you as far as callcounts. As far as actual time spent, it may or may not help, because allocating memory in Python is itself often slow, and under concurrency the memory use can begin to get problematic more generally. Above, your profile shows that it invoked a statement 61 times rather than one time. But the vast majority of callcounts are spent in processing and escaping of individual parameters in the statement, e.g. calls like "literal" and "_ensure_bytes", those numbers aren't going to change whether you have one 10M statement or 60 smaller ones. SQLAlchemy is always going to use bound parameters for Core constructs like insert(), and the time spent here seems to be mostly a product of the expense of bound parameters within the mysqlclient driver. So while I encourage you to try the above recipe and such, to overcome that overhead you may still be stuck formatting the entire string yourself, but I would encourage you to familiarize with how MySQLdb is doing this over at https://github.com/PyMySQL/mysqlclient-python/ , since that's where you're hitting the limitations you are having. > > On Friday, November 2, 2018 at 5:23:51 PM UTC+1, Mike Bayer wrote: >> >> On Fri, Nov 2, 2018 at 11:17 AM Ruben Di Battista >> <rubendi...@gmail.com> wrote: >> > >> > Hello, >> > >> > I have a huge insert of the type: >> > >> > ``` >> > session.execute( >> > insert_query, >> > [ >> > { >> > 'time': times[i], >> > 'elevation': elevation[i], >> > 'azimuth': azimuth[i], >> > 'doppler': doppler[i], >> > 'slant': slant[i], >> > 'passageID': passage_id >> > } >> > for i in six.moves.range(0, n) >> > ] >> > ) >> > >> > ``` >> > >> > where n is huge. >> > >> > I was told by a colleague that SQLALchemy limits query length to 65536 >> > chars while MySQL can cappet 33MB big queries. Can I tailor this in >> > SQLAlchemy? >> >> your colleague is mistaken. SQLAlchemy has no limit on the sizes of >> query strings or any other string. >> >> >> > >> > -- >> > SQLAlchemy - >> > The Python SQL Toolkit and Object Relational Mapper >> > >> > http://www.sqlalchemy.org/ >> > >> > To post example code, please provide an MCVE: Minimal, Complete, and >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> > description. >> > --- >> > You received this message because you are subscribed to the Google Groups >> > "sqlalchemy" group. >> > To unsubscribe from this group and stop receiving emails from it, send an >> > email to sqlalchemy+...@googlegroups.com. >> > To post to this group, send email to sqlal...@googlegroups.com. >> > Visit this group at https://groups.google.com/group/sqlalchemy. >> > For more options, visit https://groups.google.com/d/optout. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.