Re: [sqlalchemy] engine.execute() slow with parameterized queries?

Michael Bayer Mon, 12 May 2014 16:50:24 -0700

On May 12, 2014, at 7:35 PM, Seth P <spadow...@gmail.com> wrote:

> Looks like other people have encountered similar problems with indices being 
> ignored by prepared sql statements: 
> http://www.postgresql.org/message-id/43250afa.7010...@arbash-meinel.com. (If 
> the diagnosis there is correct, then I'm guessing the server would use a 
> unique index where all the columns of the index are specified.) Also, Thierry 
> Florac's post https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E 
> sounds similar.
> (I suspect my earlier hypothesis about int vs varchar is a red herring.)
> 
> I think it would be useful (albeit risky, if not careful) to have an option 
> to plug in parameters client-side. I presume not trivial to add to 
> SQLAlchemy? I don't see such an option for pyodbc.


there's mechanisms for this but they aren't very widely advertised since as you 
know allowing people to do such would be an *enormous* security hole, and I 
don't have the resources to be responsible for parameter escaping.   It would 
be better if you could try pymssql (much more actively maintained than pyodbc 
from what i can tell) and/or file a bug with pyodbc.


> 
> On Monday, May 12, 2014 7:09:08 PM UTC-4, Seth P wrote:
> Yep, it's not a SQLAlchemy issue. The following code demonstrates the problem 
> with direct pyodbc access.
> 
> import pyodbc
> import time
> 
> def print_timing(func):
>     def wrapper(*arg):
>         t1 = time.time()
>         rows = func(*arg)
>         t2 = time.time()
>         print("%30s() len=%d, last=%s, runtime=%0.3fs" % (str(func).split(' 
> at')[0][10:], len(rows), rows[-1], t2 - t1))
>         return t2 - t1
>     return wrapper
> 
> if __name__ == '__main__':
>     cnxn = pyodbc.connect('DRIVER={SQL 
> Server};SERVER=Compustat;DATABASE=Compustat')
>     cursor = cnxn.cursor()
>     sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc WHERE 
> gvkey = ? ORDER BY datadate"
>     key = '001045'
> 
>     @print_timing
>     def execute_explicit_query():
>         sql_select_statement_explicit = 
> sql_select_statement_base.replace("?", "'%s'" % key)
>         rows = cursor.execute(sql_select_statement_explicit).fetchall()
>         return rows
> 
>     @print_timing
>     def execute_parameterized_query():
>         rows = cursor.execute(sql_select_statement_base, key).fetchall()
>         return rows
> 
>     num_iterations = 5
>     explicit_runtime = 0.0
>     parameterized_runtime = 0.0
>     for i in range(num_iterations):
>         explicit_runtime += execute_explicit_query()
>         parameterized_runtime += execute_parameterized_query()
>     print("Total runtime for %d explicit queries = %0.3fs." % 
> (num_iterations, explicit_runtime))
>     print("Total runtime for %d parameterized queries = %0.3fs." % 
> (num_iterations, parameterized_runtime))
> 
> 
> On Monday, May 12, 2014 6:40:48 PM UTC-4, Michael Bayer wrote:
> 
> On May 12, 2014, at 6:33 PM, Seth P <spad...@gmail.com> wrote:
> 
>> Is it possible that the (primary key index (which is a composite index that 
>> begins with gvkey, and is the only index on the table) isn't being used 
>> because the the gvkey parameter is somehow passed as an integer rather than 
>> as a string?
> 
> There's nothing in SQLAlchemy that coerces strings to integers.  If the 
> actual type of the column on the DB is an integer, then there might be some 
> conversion within pyodbc or the ODBC driver.
> 
> if you've got it narrowed down this much the next step is to figure out a raw 
> pyodbc script that illustrates what the problem is. 
> 
> 
>> The first EXEC below is pretty much instantaneous, whereas the second takes 
>> about 8 seconds (and produces the same results).
>> 
>> EXEC sp_executesql
>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>> sec_dprc_prcod
>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate',
>> N'@gvkey VARCHAR(6)', '001045'
>> 
>> EXEC sp_executesql
>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>> sec_dprc_prcod
>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate',
>> N'@gvkey INT', 001045
>> 
>> 
>> 
>> On Monday, May 12, 2014 5:00:27 PM UTC-4, Michael Bayer wrote:
>> 
>> well there's only one parameter being processed here so there is clearly 
>> negligible difference in time spent within Python as far as getting the 
>> statement ready to execute and then executing it.
>> 
>> So the time is either in what SQL Server spends fetching the rows, or the 
>> number of rows being fetched (which seems to be the same).   Which leaves 
>> pretty much that SQL Server is making a different choice about the query 
>> plan for this SELECT statement, this is typically due to an INDEX being used 
>> or not.    You'd need to analyze the plan being used.   With SQL Server, the 
>> option to get a plan within programmatic execution seems to be per this 
>> answer 
>> http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan
>>  to execute "SET SHOWPLAN_TEXT ON" ahead of time.
>> 
>> Besides that, you can confirm where the time is being spent exactly using 
>> Python profiling.   A description on how to achieve that is here: 
>> http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677
>> 
>> 
>> 
>> On May 12, 2014, at 3:48 PM, Seth P <spad...@gmail.com> wrote:
>> 
>>> After tracking down some extreme slowness in loading a one-to-many 
>>> relationship (e.g. myobject.foobars), I seem to have isolated the issue to 
>>> engine.execute() being much slower with parameterized queries than with 
>>> explicit queries. The following is actual code and output for loading 
>>> 10,971 rows from a database table. (The actual table has more columns than 
>>> I'm including here, and is not designed by me.) Note that each explicit 
>>> query (where I explicitly set the WHERE clause parameter and pass the 
>>> resulting SQL statement to engine.execute()) runs in under 0.1 seconds, 
>>> whereas each parameterized query (where I let SQLAlchemy bind the WHERE 
>>> clause parameter) takes over 8 seconds.
>>> 
>>> The difference in runtimes is smaller when the number of rows returned is 
>>> smaller, which seems odd since I would have thought that the binding of the 
>>> WHERE clause parameters is just done once and would be virtually 
>>> instantaneous.
>>> 
>>> Any thoughts?
>>> 
>>> Thanks,
>>> 
>>> Seth
>>> 
>>> 
>>> import sqlalchemy as sa
>>> from sqlalchemy.orm import sessionmaker
>>> from sqlalchemy.ext.declarative import declarative_base
>>> import time
>>> 
>>> engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat')
>>> session = sessionmaker(bind=engine, autoflush=False, 
>>> expire_on_commit=False)()
>>> 
>>> class FooBar(declarative_base()):
>>>     __tablename__ = 'sec_dprc'
>>>     gvkey = sa.Column(sa.String(6), primary_key=True)
>>>     datadate = sa.Column(sa.DateTime, primary_key=True)
>>>     value = sa.Column(sa.Float, name='prcod')
>>> 
>>> def print_timing(func):
>>>     def wrapper(*arg):
>>>         t1 = time.time()
>>>         rows = func(*arg)
>>>         t2 = time.time()
>>>         print("%30s() len=%d, last=%s, runtime=%0.3fs" % (str(func).split(' 
>>> at')[0][10:], len(rows), rows[-1], t2 - t1))
>>>         return t2 - t1
>>>     return wrapper
>>> 
>>> if __name__ == '__main__':
>>> 
>>>     key = '001045'
>>>     query = session.query(FooBar.datadate, 
>>> FooBar.value).filter(sa.and_(FooBar.gvkey == key)).order_by(FooBar.datadate)
>>>     sql_select_statement_base = str(query)
>>>     print(sql_select_statement_base)
>>> 
>>>     @print_timing
>>>     def execute_explicit_query():
>>>         sql_select_statement_explicit = 
>>> sql_select_statement_base.replace(":gvkey_1", "'%s'" % key)
>>>         rows = 
>>> engine.execute(sa.text(sql_select_statement_explicit)).fetchall()
>>>         return rows
>>> 
>>>     @print_timing
>>>     def execute_parameterized_query():
>>>         rows = engine.execute(sa.text(sql_select_statement_base), 
>>> {'gvkey_1':key}).fetchall()
>>>         return rows
>>> 
>>>     num_iterations = 5
>>>     explicit_runtime = 0.0
>>>     parameterized_runtime = 0.0
>>>     for i in range(num_iterations):
>>>         explicit_runtime += execute_explicit_query()
>>>         parameterized_runtime += execute_parameterized_query()
>>>     print("Total runtime for %d explicit queries = %0.3fs." % 
>>> (num_iterations, explicit_runtime))
>>>     print("Total runtime for %d parameterized queries = %0.3fs." % 
>>> (num_iterations, parameterized_runtime))
>>> 
>>> 
>>> SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>>> sec_dprc_prcod 
>>> FROM sec_dprc 
>>> WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate
>>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=0.082s
>>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=8.852s
>>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=0.032s
>>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=8.754s
>>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=0.039s
>>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=9.182s
>>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=0.028s
>>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=9.416s
>>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=0.080s
>>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>>> 5, 9, 0, 0), 37.96), runtime=8.425s
>>> Total runtime for 5 explicit queries = 0.260s.
>>> Total runtime for 5 parameterized queries = 44.629s.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to sqlalchemy+...@googlegroups.com.
>>> To post to this group, send email to sqlal...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to sqlalchemy+...@googlegroups.com.
>> To post to this group, send email to sqlal...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] engine.execute() slow with parameterized queries?

Reply via email to