Re: [sqlalchemy] any way to "pre cook" a monster query?

Michael Bayer Wed, 27 Jan 2010 09:31:16 -0800

Chris Withers wrote:
>
> This is a bit puzzling. Surely all the taking of python Query, select,
> and_, join, etc objects, running them through dialects, etc and ending
> up with a string of sql only needs to be done once; then it should just
> be a case of formatting the values and plugging them into the text
> string at the bind points, which obviously needs to be done for each
> call...
>
> what am I missing?


I would consider that to be a minor-to-moderate optimization that would
increase complexity and would not usually even be used.

Piecing together a select() construct and compiling to a string is a
relatively inexpensive operation.    The vast majority of time in Query is
spent about 1/3rd-half on the database side and the rest in fetching rows
and instantiating/populating objects.   I would recommend running some
profiling to see this in effect (though maybe I wouldn't, since I don't
think you're going to like it much).  reddit.com uses SQLA expression
constructs, and they create and compile them fresh for each usage.  They
even built on SQLA 0.3 originally where the compilation steps were way
more expensive.    SQL generation of a SELECT construct nowadays takes
from 200-500 function calls.   The overhead within Query is higher as it
has to traverse mapped classes to generate.   But its a fraction of what
it takes to fetch rows, and in that area we'd ultimately like to
reimplement that part in C.

the mechanics of caching the compiled statement would be like this:

q = query(...).filter(...).order_by(...)

no SQL construct is generated.

then:

q.all() or iter(q)

utimately this calls q.__iter__(), which is the point at which a SQL
execution occurs.

We have two choices.  We can generate the select() construct at this point
and store that, but not do the string part.  At this level, we have the
issue that if any query objects were cached before subsequent attributes
were attached to mappers, it will break.  Granted, a very edge case.

Or, we can generate the compiled() object, which contains the SQL string
as well as a lot of important metadata about the statement used when
fetching results.   But this is not possible without access to a dialect
and changes for every dialect - so we can key the string off of the
current dialect in a dictionary.   But another super edge case, some
inexperienced users create new engines on every request of their
application - the dictionary would grow and they'd say we have a memory
leak (this has actually happened).

We can key the string off the dialect's class instead, but then each
dialect object has options configured, some user configured and some
derived from the server information about the database - in a multi-engine
scenario with varied backend characteristics, this could again break.

These are all slim edge cases, but they would still be vulnerabilities in
such an approach.

Other points:

- its a lot of work to implement and test.   We have a QueryContext that
carries along information about the query and onto the result generation -
those are per-run so we have to re-generate those for an individual
compiled() when pulling from cache, and make sure whatever bits it needs
are still there.   Testing now means we have a lot of tests that run Query
objects twice to ensure results are created in exactly the same way on a
second run as the first.

- we currently implement LIMIT and OFFSET values as part of the SQL
string, i.e. not as bind parameters.   There is a ticket for this, but its
not clear yet that every backend supports them as binds (i.e. I'm not
sure, I may have heard MS-SQL doesn't like it perhaps). So caching
wouldn't get us anything for a query that's used in pagination until that
is done.

Suppose we get LIMIT/OFFSET to work as binds.  If we cache at the point of
__iter__, but then any subsequent generation blows away the cached
statement for the new Query, calling limit() or offset() (or first(),
one()) can't use the cached construct anyway (as the limit/offset implies
a different SQL statement), so we must add more logic to account for this.
  limit() and offset() would have to be smart enough to re-use a cached a
SQL string (cached on the previous generation, probably) that included the
appropriate LIMIT/OFFSET syntax.

Suppose LIMIT/OFFSET work as binds but only on some dialects.  Now the
caching gets even more complicated, as it has to deal with whether or not
the dialect supports reusing the same statement with different
LIMIT/OFFSET.

Nothing there with limit/offest is impossible, just a PITA, and creates
more complicated/harder to understand/more fragile code which will
ultimately generate more support requests.

- Query doesn't have a "with_session()" method right now - little to no
users are actually doing what you want here, i.e. making a module level
Query object and keeping it lying around.   Its not a pattern most people
care to use.   So despite added complexity, its questionable how often it
would even be worth it.

All of that said we are always welcome to patches and new core developer
candidates since a really well designed implementation of such is
certainly something we can look for - particularly if it could be a
"pluggable" option that people don't have to use, which would allow it to
become mature on its own as its used by people without impacting the
current userbase.



>
>> second way, you can use bindparam() for the binds as others have
>> mentioned, and then params() to set the values as needed, but the
>> missing
>> link is that you want the Query against your own particular session at
>> the
>> moment.   I haven't yet gotten the chance to add a "with_session()"
>> method
>> to Query but you can do this easily enough yourself:
>>
>> from sqlalchemy.orm.query import Query, _generative
>>
>> class MyQuery(Query):
>>    @_generative
>
> what does @_generative do?
>
>>    def with_session(self, session):
>>         self.session = session
>>
>> Session = sessionmaker(query_cls=MyQuery)
>>
>> so have your query lying around:
>>
>> q = Session().query(...).filter(...)
>
> Could I instead just do:
>
> q = MyQuery(...).filter(...)
>
> and then use it with:
>
> q.with_session(my_session).params(foo='bar').all()
>
> ?
>
> cheers,
>
> Chris
>
> --
> Simplistix - Content Management, Batch Processing & Python Consulting
>              - http://www.simplistix.co.uk
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] any way to "pre cook" a monster query?

Reply via email to