pymssql produces the same results as pyodbc. So it looks like a SQL Server issue.
On Monday, May 12, 2014 8:06:08 PM UTC-4, Seth P wrote: > > Fair enough. I'll take a look at pymssql, though I suspect it may be a SQL > Server rather than a driver issue. > > > On Monday, May 12, 2014 7:50:03 PM UTC-4, Michael Bayer wrote: >> >> >> On May 12, 2014, at 7:35 PM, Seth P <spad...@gmail.com> wrote: >> >> Looks like other people have encountered similar problems with indices >> being ignored by prepared sql statements: >> http://www.postgresql.org/message-id/43250afa.7010...@arbash-meinel.com. >> (If the diagnosis there is correct, then I'm guessing the server would use >> a unique index where all the columns of the index are specified.) Also, >> Thierry Florac's post >> https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds >> similar. >> (I suspect my earlier hypothesis about int vs varchar is a red herring.) >> >> I think it would be useful (albeit risky, if not careful) to have an >> option to plug in parameters client-side. I presume not trivial to add to >> SQLAlchemy? I don't see such an option for pyodbc. >> >> >> there’s mechanisms for this but they aren’t very widely advertised since >> as you know allowing people to do such would be an *enormous* security >> hole, and I don’t have the resources to be responsible for parameter >> escaping. It would be better if you could try pymssql (much more actively >> maintained than pyodbc from what i can tell) and/or file a bug with pyodbc. >> >> >> >> On Monday, May 12, 2014 7:09:08 PM UTC-4, Seth P wrote: >>> >>> Yep, it's not a SQLAlchemy issue. The following code demonstrates the >>> problem with direct pyodbc access. >>> >>> import pyodbc >>> import time >>> >>> def print_timing(func): >>> def wrapper(*arg): >>> t1 = time.time() >>> rows = func(*arg) >>> t2 = time.time() >>> print("%30s() len=%d, last=%s, runtime=%0.3fs" % >>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1)) >>> return t2 - t1 >>> return wrapper >>> >>> if __name__ == '__main__': >>> cnxn = pyodbc.connect('DRIVER={SQL >>> Server};SERVER=Compustat;DATABASE=Compustat') >>> cursor = cnxn.cursor() >>> sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc >>> WHERE gvkey = ? ORDER BY datadate" >>> key = '001045' >>> >>> @print_timing >>> def execute_explicit_query(): >>> sql_select_statement_explicit = >>> sql_select_statement_base.replace("?", "'%s'" % key) >>> rows = cursor.execute(sql_select_statement_explicit).fetchall() >>> return rows >>> >>> @print_timing >>> def execute_parameterized_query(): >>> rows = cursor.execute(sql_select_statement_base, key).fetchall() >>> return rows >>> >>> num_iterations = 5 >>> explicit_runtime = 0.0 >>> parameterized_runtime = 0.0 >>> for i in range(num_iterations): >>> explicit_runtime += execute_explicit_query() >>> parameterized_runtime += execute_parameterized_query() >>> print("Total runtime for %d explicit queries = %0.3fs." % >>> (num_iterations, explicit_runtime)) >>> print("Total runtime for %d parameterized queries = %0.3fs." % >>> (num_iterations, parameterized_runtime)) >>> >>> >>> On Monday, May 12, 2014 6:40:48 PM UTC-4, Michael Bayer wrote: >>>> >>>> >>>> On May 12, 2014, at 6:33 PM, Seth P <spad...@gmail.com> wrote: >>>> >>>> Is it possible that the (primary key index (which is a composite index >>>> that begins with gvkey, and is the only index on the table) isn't being >>>> used because the the gvkey parameter is somehow passed as an integer >>>> rather >>>> than as a string? >>>> >>>> >>>> There’s nothing in SQLAlchemy that coerces strings to integers. If the >>>> actual type of the column on the DB is an integer, then there might be >>>> some >>>> conversion within pyodbc or the ODBC driver. >>>> >>>> if you’ve got it narrowed down this much the next step is to figure out >>>> a raw pyodbc script that illustrates what the problem is. >>>> >>>> >>>> The first EXEC below is pretty much instantaneous, whereas the second >>>> takes about 8 seconds (and produces the same results). >>>> >>>> EXEC sp_executesql >>>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>> sec_dprc_prcod >>>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate', >>>> N'@gvkey VARCHAR(6)', '001045' >>>> >>>> EXEC sp_executesql >>>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>> sec_dprc_prcod >>>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate', >>>> N'@gvkey INT', 001045 >>>> >>>> >>>> >>>> On Monday, May 12, 2014 5:00:27 PM UTC-4, Michael Bayer wrote: >>>>> >>>>> >>>>> well there’s only one parameter being processed here so there is >>>>> clearly negligible difference in time spent within Python as far as >>>>> getting >>>>> the statement ready to execute and then executing it. >>>>> >>>>> So the time is either in what SQL Server spends fetching the rows, or >>>>> the number of rows being fetched (which seems to be the same). Which >>>>> leaves pretty much that SQL Server is making a different choice about the >>>>> query plan for this SELECT statement, this is typically due to an INDEX >>>>> being used or not. You’d need to analyze the plan being used. With >>>>> SQL >>>>> Server, the option to get a plan within programmatic execution seems to >>>>> be >>>>> per this answer >>>>> http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-planto >>>>> execute “SET SHOWPLAN_TEXT ON” ahead of time. >>>>> >>>>> Besides that, you can confirm where the time is being spent exactly >>>>> using Python profiling. A description on how to achieve that is here: >>>>> http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677 >>>>> >>>>> >>>>> >>>>> On May 12, 2014, at 3:48 PM, Seth P <spad...@gmail.com> wrote: >>>>> >>>>> After tracking down some extreme slowness in loading a one-to-many >>>>> relationship (e.g. myobject.foobars), I seem to have isolated the issue >>>>> to >>>>> engine.execute() being much slower with parameterized queries than with >>>>> explicit queries. The following is actual code and output for loading >>>>> 10,971 rows from a database table. (The actual table has more columns >>>>> than >>>>> I'm including here, and is not designed by me.) Note that each explicit >>>>> query (where I explicitly set the WHERE clause parameter and pass the >>>>> resulting SQL statement to engine.execute()) runs in under 0.1 seconds, >>>>> whereas each parameterized query (where I let SQLAlchemy bind the WHERE >>>>> clause parameter) takes over 8 seconds. >>>>> >>>>> The difference in runtimes is smaller when the number of rows returned >>>>> is smaller, which seems odd since I would have thought that the binding >>>>> of >>>>> the WHERE clause parameters is just done once and would be virtually >>>>> instantaneous. >>>>> >>>>> Any thoughts? >>>>> >>>>> Thanks, >>>>> >>>>> Seth >>>>> >>>>> >>>>> import sqlalchemy as sa >>>>> from sqlalchemy.orm import sessionmaker >>>>> from sqlalchemy.ext.declarative import declarative_base >>>>> import time >>>>> >>>>> engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat') >>>>> session = sessionmaker(bind=engine, autoflush=False, >>>>> expire_on_commit=False)() >>>>> >>>>> class FooBar(declarative_base()): >>>>> __tablename__ = 'sec_dprc' >>>>> gvkey = sa.Column(sa.String(6), primary_key=True) >>>>> datadate = sa.Column(sa.DateTime, primary_key=True) >>>>> value = sa.Column(sa.Float, name='prcod') >>>>> >>>>> def print_timing(func): >>>>> def wrapper(*arg): >>>>> t1 = time.time() >>>>> rows = func(*arg) >>>>> t2 = time.time() >>>>> print("%30s() len=%d, last=%s, runtime=%0.3fs" % >>>>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1)) >>>>> return t2 - t1 >>>>> return wrapper >>>>> >>>>> if __name__ == '__main__': >>>>> >>>>> key = '001045' >>>>> query = session.query(FooBar.datadate, >>>>> FooBar.value).filter(sa.and_(FooBar.gvkey == >>>>> key)).order_by(FooBar.datadate) >>>>> sql_select_statement_base = str(query) >>>>> print(sql_select_statement_base) >>>>> >>>>> @print_timing >>>>> def execute_explicit_query(): >>>>> sql_select_statement_explicit = >>>>> sql_select_statement_base.replace(":gvkey_1", "'%s'" % key) >>>>> rows = >>>>> engine.execute(sa.text(sql_select_statement_explicit)).fetchall() >>>>> return rows >>>>> >>>>> @print_timing >>>>> def execute_parameterized_query(): >>>>> rows = engine.execute(sa.text(sql_select_statement_base), >>>>> {'gvkey_1':key}).fetchall() >>>>> return rows >>>>> >>>>> num_iterations = 5 >>>>> explicit_runtime = 0.0 >>>>> parameterized_runtime = 0.0 >>>>> for i in range(num_iterations): >>>>> explicit_runtime += execute_explicit_query() >>>>> parameterized_runtime += execute_parameterized_query() >>>>> print("Total runtime for %d explicit queries = %0.3fs." % >>>>> (num_iterations, explicit_runtime)) >>>>> print("Total runtime for %d parameterized queries = %0.3fs." % >>>>> (num_iterations, parameterized_runtime)) >>>>> >>>>> >>>>> SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>>> sec_dprc_prcod >>>>> FROM sec_dprc >>>>> WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate >>>>> execute_explicit_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.082s >>>>> execute_parameterized_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.852s >>>>> execute_explicit_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.032s >>>>> execute_parameterized_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.754s >>>>> execute_explicit_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.039s >>>>> execute_parameterized_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.182s >>>>> execute_explicit_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.028s >>>>> execute_parameterized_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.416s >>>>> execute_explicit_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.080s >>>>> execute_parameterized_query() len=10971, >>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.425s >>>>> Total runtime for 5 explicit queries = 0.260s. >>>>> Total runtime for 5 parameterized queries = 44.629s. >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "sqlalchemy" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to sqlalchemy+...@googlegroups.com. >>>>> To post to this group, send email to sqlal...@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/sqlalchemy. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "sqlalchemy" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to sqlalchemy+...@googlegroups.com. >>>> To post to this group, send email to sqlal...@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/sqlalchemy. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> >> -- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to sqlalchemy+...@googlegroups.com. >> To post to this group, send email to sqlal...@googlegroups.com. >> Visit this group at http://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. >> >> >> -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.