Yep, literal_column() fixed my performance problem. Thanks again for all your help
On Tuesday, May 13, 2014 6:15:39 AM UTC-4, Michael Bayer wrote: > > OK well of course also, as we have the exact same thing being asked in > regards to Oracle right now in another thread, you can of course always > bypass a “bound” value in the most direct way, using text() or > literal_column(): > > q = s.query(Something).filter(Something.foo = literal_column(“‘my value’”)) > > > > On May 12, 2014, at 8:38 PM, Michael Bayer > <mik...@zzzcomputing.com<javascript:>> > wrote: > > well or a FreeTDS issue, more likely, if that’s what you’re using. > > the SQL compiler has a parameter called “literal_binds” that will make it > render a bound parameter as an inline string, but it only supports a few > very basic types. As far as getting this parameter set for a general > class of queries, it depends on when you’d want it to happen and how. It > likely would require some subclassing and possibly monkey patching. > > > On May 12, 2014, at 8:23 PM, Seth P <spad...@gmail.com <javascript:>> > wrote: > > pymssql produces the same results as pyodbc. So it looks like a SQL Server > issue. > > On Monday, May 12, 2014 8:06:08 PM UTC-4, Seth P wrote: >> >> Fair enough. I'll take a look at pymssql, though I suspect it may be a >> SQL Server rather than a driver issue. >> >> >> On Monday, May 12, 2014 7:50:03 PM UTC-4, Michael Bayer wrote: >>> >>> >>> On May 12, 2014, at 7:35 PM, Seth P <spad...@gmail.com> wrote: >>> >>> Looks like other people have encountered similar problems with indices >>> being ignored by prepared sql statements: >>> http://www.postgresql.org/message-id/43250afa.7010...@arbash-meinel.com. >>> (If the diagnosis there is correct, then I'm guessing the server would use >>> a unique index where all the columns of the index are specified.) Also, >>> Thierry Florac's post >>> https://groups.google.com/forum/#!topic/sqlalchemy/k_9ZGI-e85E sounds >>> similar. >>> (I suspect my earlier hypothesis about int vs varchar is a red herring.) >>> >>> I think it would be useful (albeit risky, if not careful) to have an >>> option to plug in parameters client-side. I presume not trivial to add to >>> SQLAlchemy? I don't see such an option for pyodbc. >>> >>> >>> there’s mechanisms for this but they aren’t very widely advertised since >>> as you know allowing people to do such would be an *enormous* security >>> hole, and I don’t have the resources to be responsible for parameter >>> escaping. It would be better if you could try pymssql (much more actively >>> maintained than pyodbc from what i can tell) and/or file a bug with pyodbc. >>> >>> >>> >>> On Monday, May 12, 2014 7:09:08 PM UTC-4, Seth P wrote: >>>> >>>> Yep, it's not a SQLAlchemy issue. The following code demonstrates the >>>> problem with direct pyodbc access. >>>> >>>> import pyodbc >>>> import time >>>> >>>> def print_timing(func): >>>> def wrapper(*arg): >>>> t1 = time.time() >>>> rows = func(*arg) >>>> t2 = time.time() >>>> print("%30s() len=%d, last=%s, runtime=%0.3fs" % >>>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1)) >>>> return t2 - t1 >>>> return wrapper >>>> >>>> if __name__ == '__main__': >>>> cnxn = pyodbc.connect('DRIVER={SQL >>>> Server};SERVER=Compustat;DATABASE=Compustat') >>>> cursor = cnxn.cursor() >>>> sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc >>>> WHERE gvkey = ? ORDER BY datadate" >>>> key = '001045' >>>> >>>> @print_timing >>>> def execute_explicit_query(): >>>> sql_select_statement_explicit = >>>> sql_select_statement_base.replace("?", "'%s'" % key) >>>> rows = cursor.execute(sql_select_statement_explicit).fetchall() >>>> return rows >>>> >>>> @print_timing >>>> def execute_parameterized_query(): >>>> rows = cursor.execute(sql_select_statement_base, key).fetchall() >>>> return rows >>>> >>>> num_iterations = 5 >>>> explicit_runtime = 0.0 >>>> parameterized_runtime = 0.0 >>>> for i in range(num_iterations): >>>> explicit_runtime += execute_explicit_query() >>>> parameterized_runtime += execute_parameterized_query() >>>> print("Total runtime for %d explicit queries = %0.3fs." % >>>> (num_iterations, explicit_runtime)) >>>> print("Total runtime for %d parameterized queries = %0.3fs." % >>>> (num_iterations, parameterized_runtime)) >>>> >>>> >>>> On Monday, May 12, 2014 6:40:48 PM UTC-4, Michael Bayer wrote: >>>>> >>>>> >>>>> On May 12, 2014, at 6:33 PM, Seth P <spad...@gmail.com> wrote: >>>>> >>>>> Is it possible that the (primary key index (which is a composite index >>>>> that begins with gvkey, and is the only index on the table) isn't being >>>>> used because the the gvkey parameter is somehow passed as an integer >>>>> rather >>>>> than as a string? >>>>> >>>>> >>>>> There’s nothing in SQLAlchemy that coerces strings to integers. If >>>>> the actual type of the column on the DB is an integer, then there might >>>>> be >>>>> some conversion within pyodbc or the ODBC driver. >>>>> >>>>> if you’ve got it narrowed down this much the next step is to figure >>>>> out a raw pyodbc script that illustrates what the problem is. >>>>> >>>>> >>>>> The first EXEC below is pretty much instantaneous, whereas the second >>>>> takes about 8 seconds (and produces the same results). >>>>> >>>>> EXEC sp_executesql >>>>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>>> sec_dprc_prcod >>>>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY >>>>> sec_dprc.datadate', >>>>> N'@gvkey VARCHAR(6)', '001045' >>>>> >>>>> EXEC sp_executesql >>>>> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>>> sec_dprc_prcod >>>>> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY >>>>> sec_dprc.datadate', >>>>> N'@gvkey INT', 001045 >>>>> >>>>> >>>>> >>>>> On Monday, May 12, 2014 5:00:27 PM UTC-4, Michael Bayer wrote: >>>>>> >>>>>> >>>>>> well there’s only one parameter being processed here so there is >>>>>> clearly negligible difference in time spent within Python as far as >>>>>> getting >>>>>> the statement ready to execute and then executing it. >>>>>> >>>>>> So the time is either in what SQL Server spends fetching the rows, or >>>>>> the number of rows being fetched (which seems to be the same). Which >>>>>> leaves pretty much that SQL Server is making a different choice about >>>>>> the >>>>>> query plan for this SELECT statement, this is typically due to an INDEX >>>>>> being used or not. You’d need to analyze the plan being used. With >>>>>> SQL >>>>>> Server, the option to get a plan within programmatic execution seems to >>>>>> be >>>>>> per this answer >>>>>> http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-planto >>>>>> execute “SET SHOWPLAN_TEXT ON” ahead of time. >>>>>> >>>>>> Besides that, you can confirm where the time is being spent exactly >>>>>> using Python profiling. A description on how to achieve that is here: >>>>>> http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677 >>>>>> >>>>>> >>>>>> >>>>>> On May 12, 2014, at 3:48 PM, Seth P <spad...@gmail.com> wrote: >>>>>> >>>>>> After tracking down some extreme slowness in loading a one-to-many >>>>>> relationship (e.g. myobject.foobars), I seem to have isolated the issue >>>>>> to >>>>>> engine.execute() being much slower with parameterized queries than with >>>>>> explicit queries. The following is actual code and output for loading >>>>>> 10,971 rows from a database table. (The actual table has more columns >>>>>> than >>>>>> I'm including here, and is not designed by me.) Note that each explicit >>>>>> query (where I explicitly set the WHERE clause parameter and pass the >>>>>> resulting SQL statement to engine.execute()) runs in under 0.1 seconds, >>>>>> whereas each parameterized query (where I let SQLAlchemy bind the WHERE >>>>>> clause parameter) takes over 8 seconds. >>>>>> >>>>>> The difference in runtimes is smaller when the number of rows >>>>>> returned is smaller, which seems odd since I would have thought that the >>>>>> binding of the WHERE clause parameters is just done once and would be >>>>>> virtually instantaneous. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Seth >>>>>> >>>>>> >>>>>> import sqlalchemy as sa >>>>>> from sqlalchemy.orm import sessionmaker >>>>>> from sqlalchemy.ext.declarative import declarative_base >>>>>> import time >>>>>> >>>>>> engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat') >>>>>> session = sessionmaker(bind=engine, autoflush=False, >>>>>> expire_on_commit=False)() >>>>>> >>>>>> class FooBar(declarative_base()): >>>>>> __tablename__ = 'sec_dprc' >>>>>> gvkey = sa.Column(sa.String(6), primary_key=True) >>>>>> datadate = sa.Column(sa.DateTime, primary_key=True) >>>>>> value = sa.Column(sa.Float, name='prcod') >>>>>> >>>>>> def print_timing(func): >>>>>> def wrapper(*arg): >>>>>> t1 = time.time() >>>>>> rows = func(*arg) >>>>>> t2 = time.time() >>>>>> print("%30s() len=%d, last=%s, runtime=%0.3fs" % >>>>>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1)) >>>>>> return t2 - t1 >>>>>> return wrapper >>>>>> >>>>>> if __name__ == '__main__': >>>>>> >>>>>> key = '001045' >>>>>> query = session.query(FooBar.datadate, >>>>>> FooBar.value).filter(sa.and_(FooBar.gvkey == >>>>>> key)).order_by(FooBar.datadate) >>>>>> sql_select_statement_base = str(query) >>>>>> print(sql_select_statement_base) >>>>>> >>>>>> @print_timing >>>>>> def execute_explicit_query(): >>>>>> sql_select_statement_explicit = >>>>>> sql_select_statement_base.replace(":gvkey_1", "'%s'" % key) >>>>>> rows = >>>>>> engine.execute(sa.text(sql_select_statement_explicit)).fetchall() >>>>>> return rows >>>>>> >>>>>> @print_timing >>>>>> def execute_parameterized_query(): >>>>>> rows = engine.execute(sa.text(sql_select_statement_base), >>>>>> {'gvkey_1':key}).fetchall() >>>>>> return rows >>>>>> >>>>>> num_iterations = 5 >>>>>> explicit_runtime = 0.0 >>>>>> parameterized_runtime = 0.0 >>>>>> for i in range(num_iterations): >>>>>> explicit_runtime += execute_explicit_query() >>>>>> parameterized_runtime += execute_parameterized_query() >>>>>> print("Total runtime for %d explicit queries = %0.3fs." % >>>>>> (num_iterations, explicit_runtime)) >>>>>> print("Total runtime for %d parameterized queries = %0.3fs." % >>>>>> (num_iterations, parameterized_runtime)) >>>>>> >>>>>> >>>>>> SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS >>>>>> sec_dprc_prcod >>>>>> FROM sec_dprc >>>>>> WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate >>>>>> execute_explicit_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.082s >>>>>> execute_parameterized_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.852s >>>>>> execute_explicit_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.032s >>>>>> execute_parameterized_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.754s >>>>>> execute_explicit_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.039s >>>>>> execute_parameterized_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.182s >>>>>> execute_explicit_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.028s >>>>>> execute_parameterized_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=9.416s >>>>>> execute_explicit_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=0.080s >>>>>> execute_parameterized_query() len=10971, >>>>>> last=(datetime.datetime(2014, 5, 9, 0, 0), 37.96), runtime=8.425s >>>>>> Total runtime for 5 explicit queries = 0.260s. >>>>>> Total runtime for 5 parameterized queries = 44.629s. >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "sqlalchemy" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to sqlalchemy+...@googlegroups.com. >>>>>> To post to this group, send email to sqlal...@googlegroups.com. >>>>>> Visit this group at http://groups.google.com/group/sqlalchemy. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "sqlalchemy" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to sqlalchemy+...@googlegroups.com. >>>>> To post to this group, send email to sqlal...@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/sqlalchemy. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "sqlalchemy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to sqlalchemy+...@googlegroups.com. >>> To post to this group, send email to sqlal...@googlegroups.com. >>> Visit this group at http://groups.google.com/group/sqlalchemy. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+...@googlegroups.com <javascript:>. > To post to this group, send email to sqlal...@googlegroups.com<javascript:> > . > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > > > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+...@googlegroups.com <javascript:>. > To post to this group, send email to sqlal...@googlegroups.com<javascript:> > . > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.