Re: [sqlalchemy] engine.execute() slow with parameterized queries?

Seth P Tue, 13 May 2014 17:45:06 -0700

Yep, literal_column() fixed my performance problem.

Thanks again for all your help


On Tuesday, May 13, 2014 6:15:39 AM UTC-4, Michael Bayer wrote:
>
> OK well of course also, as we have the exact same thing being asked in 
> regards to Oracle right now in another thread, you can of course always 
> bypass a “bound” value in the most direct way, using text() or 
> literal_column():
>
> q = s.query(Something).filter(Something.foo = literal_column(“‘my value’”))
>
>
>
> On May 12, 2014, at 8:38 PM, Michael Bayer 
> <mik...@zzzcomputing.com<javascript:>> 
> wrote:
>
> well or a FreeTDS issue, more likely, if that’s what you’re using.
>
> the SQL compiler has a parameter called “literal_binds” that will make it 
> render a bound parameter as an inline string, but it only supports a few 
> very basic types.    As far as getting this parameter set for a general 
> class of queries, it depends on when you’d want it to happen and how.   It 
> likely would require some subclassing and possibly monkey patching.
>
>
> On May 12, 2014, at 8:23 PM, Seth P <spad...@gmail.com <javascript:>> 
> wrote:
>
> pymssql produces the same results as pyodbc. So it looks like a SQL Server 
> issue.
>
> On Monday, May 12, 2014 8:06:08 PM UTC-4, Seth P wrote:
>>
>> Fair enough. I'll take a look at pymssql, though I suspect it may be a 
>> SQL Server rather than a driver issue.
>>
>>
>> On Monday, May 12, 2014 7:50:03 PM UTC-4, Michael Bayer wrote:
>>>
>>>
>>> On May 12, 2014, at 7:35 PM, Seth P <spad...@gmail.com> wrote:
>>>
>>> Looks like other people have encountered similar problems with indices 
>>> being ignored by prepared sql statements: 
>>> http://www.postgresql.org/message-id/43250afa.7010...@arbash-meinel.com. 
>>> (If the diagnosis there is correct, then I'm guessing the server would use 
>>> a unique index where all the columns of the index are specified.) Also, 
>>> Thierry Florac's post 
>>> https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds 
>>> similar.
>>> (I suspect my earlier hypothesis about int vs varchar is a red herring.)
>>>
>>> I think it would be useful (albeit risky, if not careful) to have an 
>>> option to plug in parameters client-side. I presume not trivial to add to 
>>> SQLAlchemy? I don't see such an option for pyodbc.
>>>
>>>
>>> there’s mechanisms for this but they aren’t very widely advertised since 
>>> as you know allowing people to do such would be an *enormous* security 
>>> hole, and I don’t have the resources to be responsible for parameter 
>>> escaping.   It would be better if you could try pymssql (much more actively 
>>> maintained than pyodbc from what i can tell) and/or file a bug with pyodbc.
>>>
>>>
>>>
>>> On Monday, May 12, 2014 7:09:08 PM UTC-4, Seth P wrote:
>>>>
>>>> Yep, it's not a SQLAlchemy issue. The following code demonstrates the 
>>>> problem with direct pyodbc access.
>>>>
>>>> import pyodbc
>>>> import time
>>>>
>>>> def print_timing(func):
>>>>     def wrapper(*arg):
>>>>         t1 = time.time()
>>>>         rows = func(*arg)
>>>>         t2 = time.time()
>>>>         print("%30s() len=%d, last=%s, runtime=%0.3fs" % 
>>>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1))
>>>>         return t2 - t1
>>>>     return wrapper
>>>>
>>>> if __name__ == '__main__':
>>>>     cnxn = pyodbc.connect('DRIVER={SQL 
>>>> Server};SERVER=Compustat;DATABASE=Compustat')
>>>>     cursor = cnxn.cursor()
>>>>     sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc 
>>>> WHERE gvkey = ? ORDER BY datadate"
>>>>     key = '001045'
>>>>
>>>>     @print_timing
>>>>     def execute_explicit_query():
>>>>         sql_select_statement_explicit = 
>>>> sql_select_statement_base.replace("?", "'%s'" % key)
>>>>         rows = cursor.execute(sql_select_statement_explicit).fetchall()
>>>>         return rows
>>>>
>>>>     @print_timing
>>>>     def execute_parameterized_query():
>>>>         rows = cursor.execute(sql_select_statement_base, key).fetchall()
>>>>         return rows
>>>>
>>>>     num_iterations = 5
>>>>     explicit_runtime = 0.0
>>>>     parameterized_runtime = 0.0
>>>>     for i in range(num_iterations):
>>>>         explicit_runtime += execute_explicit_query()
>>>>         parameterized_runtime += execute_parameterized_query()
>>>>     print("Total runtime for %d explicit queries = %0.3fs." % 
>>>> (num_iterations, explicit_runtime))
>>>>     print("Total runtime for %d parameterized queries = %0.3fs." % 
>>>> (num_iterations, parameterized_runtime))
>>>>
>>>>
>>>> On Monday, May 12, 2014 6:40:48 PM UTC-4, Michael Bayer wrote:
>>>>>
>>>>>
>>>>> On May 12, 2014, at 6:33 PM, Seth P <spad...@gmail.com> wrote:
>>>>>
>>>>> Is it possible that the (primary key index (which is a composite index 
>>>>> that begins with gvkey, and is the only index on the table) isn't being 
>>>>> used because the the gvkey parameter is somehow passed as an integer 
>>>>> rather 
>>>>> than as a string?
>>>>>
>>>>>
>>>>> There’s nothing in SQLAlchemy that coerces strings to integers.  If 
>>>>> the actual type of the column on the DB is an integer, then there might 
>>>>> be 
>>>>> some conversion within pyodbc or the ODBC driver.
>>>>>
>>>>> if you’ve got it narrowed down this much the next step is to figure 
>>>>> out a raw pyodbc script that illustrates what the problem is. 
>>>>>
>>>>>
>>>>> The first EXEC below is pretty much instantaneous, whereas the second 
>>>>> takes about 8 seconds (and produces the same results).
>>>>>
>>>>> EXEC sp_executesql
>>>>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>>>>> sec_dprc_prcod
>>>>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY 
>>>>> sec_dprc.datadate',
>>>>> N'@gvkey VARCHAR(6)', '001045'
>>>>>
>>>>> EXEC sp_executesql
>>>>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>>>>> sec_dprc_prcod
>>>>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY 
>>>>> sec_dprc.datadate',
>>>>> N'@gvkey INT', 001045
>>>>>
>>>>>
>>>>>
>>>>> On Monday, May 12, 2014 5:00:27 PM UTC-4, Michael Bayer wrote:
>>>>>>
>>>>>>
>>>>>> well there’s only one parameter being processed here so there is 
>>>>>> clearly negligible difference in time spent within Python as far as 
>>>>>> getting 
>>>>>> the statement ready to execute and then executing it.
>>>>>>
>>>>>> So the time is either in what SQL Server spends fetching the rows, or 
>>>>>> the number of rows being fetched (which seems to be the same).   Which 
>>>>>> leaves pretty much that SQL Server is making a different choice about 
>>>>>> the 
>>>>>> query plan for this SELECT statement, this is typically due to an INDEX 
>>>>>> being used or not.    You’d need to analyze the plan being used.   With 
>>>>>> SQL 
>>>>>> Server, the option to get a plan within programmatic execution seems to 
>>>>>> be 
>>>>>> per this answer 
>>>>>> http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-planto
>>>>>>  execute “SET SHOWPLAN_TEXT ON” ahead of time.
>>>>>>
>>>>>> Besides that, you can confirm where the time is being spent exactly 
>>>>>> using Python profiling.   A description on how to achieve that is here: 
>>>>>> http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677
>>>>>>
>>>>>>
>>>>>>
>>>>>> On May 12, 2014, at 3:48 PM, Seth P <spad...@gmail.com> wrote:
>>>>>>
>>>>>> After tracking down some extreme slowness in loading a one-to-many 
>>>>>> relationship (e.g. myobject.foobars), I seem to have isolated the issue 
>>>>>> to 
>>>>>> engine.execute() being much slower with parameterized queries than with 
>>>>>> explicit queries. The following is actual code and output for loading 
>>>>>> 10,971 rows from a database table. (The actual table has more columns 
>>>>>> than 
>>>>>> I'm including here, and is not designed by me.) Note that each explicit 
>>>>>> query (where I explicitly set the WHERE clause parameter and pass the 
>>>>>> resulting SQL statement to engine.execute()) runs in under 0.1 seconds, 
>>>>>> whereas each parameterized query (where I let SQLAlchemy bind the WHERE 
>>>>>> clause parameter) takes over 8 seconds.
>>>>>>
>>>>>> The difference in runtimes is smaller when the number of rows 
>>>>>> returned is smaller, which seems odd since I would have thought that the 
>>>>>> binding of the WHERE clause parameters is just done once and would be 
>>>>>> virtually instantaneous.
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Seth
>>>>>>
>>>>>>
>>>>>> import sqlalchemy as sa
>>>>>> from sqlalchemy.orm import sessionmaker
>>>>>> from sqlalchemy.ext.declarative import declarative_base
>>>>>> import time
>>>>>>
>>>>>> engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat')
>>>>>> session = sessionmaker(bind=engine, autoflush=False, 
>>>>>> expire_on_commit=False)()
>>>>>>
>>>>>> class FooBar(declarative_base()):
>>>>>>     __tablename__ = 'sec_dprc'
>>>>>>     gvkey = sa.Column(sa.String(6), primary_key=True)
>>>>>>     datadate = sa.Column(sa.DateTime, primary_key=True)
>>>>>>     value = sa.Column(sa.Float, name='prcod')
>>>>>>
>>>>>> def print_timing(func):
>>>>>>     def wrapper(*arg):
>>>>>>         t1 = time.time()
>>>>>>         rows = func(*arg)
>>>>>>         t2 = time.time()
>>>>>>         print("%30s() len=%d, last=%s, runtime=%0.3fs" % 
>>>>>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1))
>>>>>>         return t2 - t1
>>>>>>     return wrapper
>>>>>>
>>>>>> if __name__ == '__main__':
>>>>>>
>>>>>>     key = '001045'
>>>>>>     query = session.query(FooBar.datadate, 
>>>>>> FooBar.value).filter(sa.and_(FooBar.gvkey == 
>>>>>> key)).order_by(FooBar.datadate)
>>>>>>     sql_select_statement_base = str(query)
>>>>>>     print(sql_select_statement_base)
>>>>>>
>>>>>>     @print_timing
>>>>>>     def execute_explicit_query():
>>>>>>         sql_select_statement_explicit = 
>>>>>> sql_select_statement_base.replace(":gvkey_1", "'%s'" % key)
>>>>>>         rows = 
>>>>>> engine.execute(sa.text(sql_select_statement_explicit)).fetchall()
>>>>>>         return rows
>>>>>>
>>>>>>     @print_timing
>>>>>>     def execute_parameterized_query():
>>>>>>         rows = engine.execute(sa.text(sql_select_statement_base), 
>>>>>> {'gvkey_1':key}).fetchall()
>>>>>>         return rows
>>>>>>
>>>>>>     num_iterations = 5
>>>>>>     explicit_runtime = 0.0
>>>>>>     parameterized_runtime = 0.0
>>>>>>     for i in range(num_iterations):
>>>>>>         explicit_runtime += execute_explicit_query()
>>>>>>         parameterized_runtime += execute_parameterized_query()
>>>>>>     print("Total runtime for %d explicit queries = %0.3fs." % 
>>>>>> (num_iterations, explicit_runtime))
>>>>>>     print("Total runtime for %d parameterized queries = %0.3fs." % 
>>>>>> (num_iterations, parameterized_runtime))
>>>>>>
>>>>>>
>>>>>> SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>>>>>> sec_dprc_prcod 
>>>>>> FROM sec_dprc 
>>>>>> WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate
>>>>>>         execute_explicit_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.082s
>>>>>>    execute_parameterized_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.852s
>>>>>>         execute_explicit_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.032s
>>>>>>    execute_parameterized_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.754s
>>>>>>         execute_explicit_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.039s
>>>>>>    execute_parameterized_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.182s
>>>>>>         execute_explicit_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.028s
>>>>>>    execute_parameterized_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.416s
>>>>>>         execute_explicit_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.080s
>>>>>>    execute_parameterized_query() len=10971, 
>>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.425s
>>>>>> Total runtime for 5 explicit queries = 0.260s.
>>>>>> Total runtime for 5 parameterized queries = 44.629s.
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "sqlalchemy" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to sqlalchemy+...@googlegroups.com.
>>>>>> To post to this group, send email to sqlal...@googlegroups.com.
>>>>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>>
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "sqlalchemy" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to sqlalchemy+...@googlegroups.com.
>>>>> To post to this group, send email to sqlal...@googlegroups.com.
>>>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to sqlalchemy+...@googlegroups.com.
>>> To post to this group, send email to sqlal...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+...@googlegroups.com <javascript:>.
> To post to this group, send email to sqlal...@googlegroups.com<javascript:>
> .
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+...@googlegroups.com <javascript:>.
> To post to this group, send email to sqlal...@googlegroups.com<javascript:>
> .
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] engine.execute() slow with parameterized queries?

Reply via email to