Fair enough. I'll take a look at pymssql, though I suspect it may be a SQL Server rather than a driver issue.
On Monday, May 12, 2014 7:50:03 PM UTC-4, Michael Bayer wrote: > > > On May 12, 2014, at 7:35 PM, Seth P <spad...@gmail.com <javascript:>> > wrote: > > Looks like other people have encountered similar problems with indices > being ignored by prepared sql statements: > http://www.postgresql.org/message-id/43250afa.7010...@arbash-meinel.com. > (If the diagnosis there is correct, then I'm guessing the server would use > a unique index where all the columns of the index are specified.) Also, > Thierry Florac's post > https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds > similar. > (I suspect my earlier hypothesis about int vs varchar is a red herring.) > > I think it would be useful (albeit risky, if not careful) to have an > option to plug in parameters client-side. I presume not trivial to add to > SQLAlchemy? I don't see such an option for pyodbc. > > > there’s mechanisms for this but they aren’t very widely advertised since > as you know allowing people to do such would be an *enormous* security > hole, and I don’t have the resources to be responsible for parameter > escaping. It would be better if you could try pymssql (much more actively > maintained than pyodbc from what i can tell) and/or file a bug with pyodbc. > > > > On Monday, May 12, 2014 7:09:08 PM UTC-4, Seth P wrote: >> >> Yep, it's not a SQLAlchemy issue. The following code demonstrates the >> problem with direct pyodbc access. >> >> import pyodbc >> import time >> >> def print_timing(func): >> def wrapper(*arg): >> t1 = time.time() >> rows = func(*arg) >> t2 = time.time() >> print("%30s() len=%d, last=%s, runtime=%0.3fs" % >> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1)) >> return t2 - t1 >> return wrapper >> >> if __name__ == '__main__': >> cnxn = pyodbc.connect('DRIVER={SQL >> Server};SERVER=Compustat;DATABASE=Compustat') >> cursor = cnxn.cursor() >> sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc >> WHERE gvkey = ? ORDER BY datadate" >> key = '001045' >> >> @print_timing >> def execute_explicit_query(): >> sql_select_statement_explicit = >> sql_select_statement_base.replace("?", "'%s'" % key) >> rows = cursor.execute(sql_select_statement_explicit).fetchall() >> return rows >> >> @print_timing >> def execute_parameterized_query(): >> rows = cursor.execute(sql_select_statement_base, key).fetchall() >> return rows >> >> num_iterations = 5 >> explicit_runtime = 0.0 >> parameterized_runtime = 0.0 >> for i in range(num_iterations): >> explicit_runtime += execute_explicit_query() >> parameterized_runtime += execute_parameterized_query() >> print("Total runtime for %d explicit queries = %0.3fs." % >> (num_iterations, explicit_runtime)) >> print("Total runtime for %d parameterized queries = %0.3fs." % >> (num_iterations, parameterized_runtime)) >> >> >> On Monday, May 12, 2014 6:40:48 PM UTC-4, Michael Bayer wrote: >>> >>> >>> On May 12, 2014, at 6:33 PM, Seth P <spad...@gmail.com> wrote: >>> >>> Is it possible that the (primary key index (which is a composite index >>> that begins with gvkey, and is the only index on the table) isn't being >>> used because the the gvkey parameter is somehow passed as an integer rather >>> than as a string? >>> >>> >>> There’s nothing in SQLAlchemy that coerces strings to integers. If the >>> actual type of the column on the DB is an integer, then there might be some >>> conversion within pyodbc or the ODBC driver. >>> >>> if you’ve got it narrowed down this much the next step is to figure out >>> a raw pyodbc script that illustrates what the problem is. >>> >>> >>> The first EXEC below is pretty much instantaneous, whereas the second >>> takes about 8 seconds (and produces the same results). >>> >>> EXEC sp_executesql >>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>> sec_dprc_prcod >>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate', >>> N'@gvkey VARCHAR(6)', '001045' >>> >>> EXEC sp_executesql >>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>> sec_dprc_prcod >>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate', >>> N'@gvkey INT', 001045 >>> >>> >>> >>> On Monday, May 12, 2014 5:00:27 PM UTC-4, Michael Bayer wrote: >>>> >>>> >>>> well there’s only one parameter being processed here so there is >>>> clearly negligible difference in time spent within Python as far as >>>> getting >>>> the statement ready to execute and then executing it. >>>> >>>> So the time is either in what SQL Server spends fetching the rows, or >>>> the number of rows being fetched (which seems to be the same). Which >>>> leaves pretty much that SQL Server is making a different choice about the >>>> query plan for this SELECT statement, this is typically due to an INDEX >>>> being used or not. You’d need to analyze the plan being used. With >>>> SQL >>>> Server, the option to get a plan within programmatic execution seems to be >>>> per this answer >>>> http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-planto >>>> execute “SET SHOWPLAN_TEXT ON” ahead of time. >>>> >>>> Besides that, you can confirm where the time is being spent exactly >>>> using Python profiling. A description on how to achieve that is here: >>>> http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677 >>>> >>>> >>>> >>>> On May 12, 2014, at 3:48 PM, Seth P <spad...@gmail.com> wrote: >>>> >>>> After tracking down some extreme slowness in loading a one-to-many >>>> relationship (e.g. myobject.foobars), I seem to have isolated the issue to >>>> engine.execute() being much slower with parameterized queries than with >>>> explicit queries. The following is actual code and output for loading >>>> 10,971 rows from a database table. (The actual table has more columns than >>>> I'm including here, and is not designed by me.) Note that each explicit >>>> query (where I explicitly set the WHERE clause parameter and pass the >>>> resulting SQL statement to engine.execute()) runs in under 0.1 seconds, >>>> whereas each parameterized query (where I let SQLAlchemy bind the WHERE >>>> clause parameter) takes over 8 seconds. >>>> >>>> The difference in runtimes is smaller when the number of rows returned >>>> is smaller, which seems odd since I would have thought that the binding of >>>> the WHERE clause parameters is just done once and would be virtually >>>> instantaneous. >>>> >>>> Any thoughts? >>>> >>>> Thanks, >>>> >>>> Seth >>>> >>>> >>>> import sqlalchemy as sa >>>> from sqlalchemy.orm import sessionmaker >>>> from sqlalchemy.ext.declarative import declarative_base >>>> import time >>>> >>>> engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat') >>>> session = sessionmaker(bind=engine, autoflush=False, >>>> expire_on_commit=False)() >>>> >>>> class FooBar(declarative_base()): >>>> __tablename__ = 'sec_dprc' >>>> gvkey = sa.Column(sa.String(6), primary_key=True) >>>> datadate = sa.Column(sa.DateTime, primary_key=True) >>>> value = sa.Column(sa.Float, name='prcod') >>>> >>>> def print_timing(func): >>>> def wrapper(*arg): >>>> t1 = time.time() >>>> rows = func(*arg) >>>> t2 = time.time() >>>> print("%30s() len=%d, last=%s, runtime=%0.3fs" % >>>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1)) >>>> return t2 - t1 >>>> return wrapper >>>> >>>> if __name__ == '__main__': >>>> >>>> key = '001045' >>>> query = session.query(FooBar.datadate, >>>> FooBar.value).filter(sa.and_(FooBar.gvkey == >>>> key)).order_by(FooBar.datadate) >>>> sql_select_statement_base = str(query) >>>> print(sql_select_statement_base) >>>> >>>> @print_timing >>>> def execute_explicit_query(): >>>> sql_select_statement_explicit = >>>> sql_select_statement_base.replace(":gvkey_1", "'%s'" % key) >>>> rows = >>>> engine.execute(sa.text(sql_select_statement_explicit)).fetchall() >>>> return rows >>>> >>>> @print_timing >>>> def execute_parameterized_query(): >>>> rows = engine.execute(sa.text(sql_select_statement_base), >>>> {'gvkey_1':key}).fetchall() >>>> return rows >>>> >>>> num_iterations = 5 >>>> explicit_runtime = 0.0 >>>> parameterized_runtime = 0.0 >>>> for i in range(num_iterations): >>>> explicit_runtime += execute_explicit_query() >>>> parameterized_runtime += execute_parameterized_query() >>>> print("Total runtime for %d explicit queries = %0.3fs." % >>>> (num_iterations, explicit_runtime)) >>>> print("Total runtime for %d parameterized queries = %0.3fs." % >>>> (num_iterations, parameterized_runtime)) >>>> >>>> >>>> SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>> sec_dprc_prcod >>>> FROM sec_dprc >>>> WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate >>>> execute_explicit_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.082s >>>> execute_parameterized_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.852s >>>> execute_explicit_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.032s >>>> execute_parameterized_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.754s >>>> execute_explicit_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.039s >>>> execute_parameterized_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.182s >>>> execute_explicit_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.028s >>>> execute_parameterized_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.416s >>>> execute_explicit_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.080s >>>> execute_parameterized_query() len=10971, >>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.425s >>>> Total runtime for 5 explicit queries = 0.260s. >>>> Total runtime for 5 parameterized queries = 44.629s. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "sqlalchemy" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to sqlalchemy+...@googlegroups.com. >>>> To post to this group, send email to sqlal...@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/sqlalchemy. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to sqlalchemy+...@googlegroups.com. >>> To post to this group, send email to sqlal...@googlegroups.com. >>> Visit this group at http://groups.google.com/group/sqlalchemy. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+...@googlegroups.com <javascript:>. > To post to this group, send email to sqlal...@googlegroups.com<javascript:> > . > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.