[sqlalchemy] PROPOSAL: whack query.select(), selectfirst(), selectone(), select_by(), selectfirst_by(), selectone_by(), get_by(), "auto-join" feature

Michael Bayer Sun, 03 Jun 2007 08:34:59 -0700


This is being discussed in a different thread and it was suggested I  
put this up in its own brightly lit thread so that people can notice it.


now that the Query object in 0.3 has merged an enhanced set of  
capabilities from the SelectResults extension, namely that it behaves  
in a "generative" fashion whereby criteria and other modifiers are  
applied via filter(), filter_by(), join(), order_by(), etc., its time  
to create "one and preferably only one obvious way to do it" for  
Query.   the current documentation at "Basic Data Mapping" spends a  
lot of effort explaining the dichotomy between "generative" and "non- 
generative" methods, being able to say query.select() which returns  
results immediately, or query.filter().list() which allows  
generation.   I spent a lot of time getting that doc to make some  
semblance of sense but it was pretty clear that the API had become  
muddied, with a better idea moving in and the older idea seeming kind  
of redundant.

the details of the proposal would be:

- the methods select(), selectfirst(), selectone(), select_by(),  
selectfirst_by(), selectone_by() and get_by() would be deprecated.   
this means they will remain present on the Query object but the  
documentation would be reorganized to talk only about filter(),  
filter_by(), list(), scalar(), and a new method called one() which is  
like scalar() but ensures that only one row was returned.  By  
deprecating the methods but retaining their exact behavior, the  
majority of upgrade paths for those still using these methods is  
painless.  the methods would remain present until version 0.5.

- the ability to place ClauseElements at the start of filter_by()  
would be removed.

- filter_by() would gain a new positional argument, which is  
optional, which is either a string or list of strings indicating  
property names, indicating a join path.  when this argument is  
present, a self-contained join among those mapped properties will be  
generated, and the keyword criterion of the filter_by() would be  
expressed against the endpoint of that join.  by "self-contained", i  
mean that the tables used in the join will be *aliased* anonymously,  
so that different filter_by()s which specify join paths that overlap  
can be used together.  the existing behavior of being able to say  
query.join(['x', 'y', 'z']).filter_by(**kwargs) remains as well (but  
is different in that it doesnt create aliases).

- for execution of a completely literal text statement, as well as a  
fully constructed select() object, a method query.from_statement()  
will be added, and the older select_text() method (that i dont think  
anyone knows about) will be deprecated like the others.  executions  
would look like query.from_statement("select * from table").scalar()  
as well as query.from_statement(sometable.select()).list()

- the behavior of join() changes slightly, such that each call to join 
() will reset the "joinpoint" of the resulting query back to the  
beginning before creating the join.  this means that join() will  
always build its joins from the original queried class. join(None)  
resets the joinpoint back to the beginning for subsequent filter()/ 
filter_by() calls.

- assignmapper would also keep all of its current methods with  
regards to selecting/filtering.  it seems like tools like Elixir are  
going to move away from assignmapper anyway which is a good thing.

- the behavior of select_by(), get_by(), join(), filter_by(), and  
others right now is such that when you name an attribute which is not  
immediately present on the queried class, a search is performed  
through all the mapped properties of the class, into its subclasses,  
until it finds a property of that name, upon which it constructs a  
join from the initial table to the target table, adding in a  
comparison criterion for the key selected.  this feature would remain  
in the methods select_by() and the other deprecated _by() methods,  
but would be removed from filter_by() as well as join().

- for those who have built MapperExtensions into select() and  
select_by(), we'll put some MapperExtension hooks into list() which  
is the execution point for queries.

- SelectResults remains deprecated until 0.5 when its also removed.

and now for the reasons:

- the select() methods are now redundant in light of the new  
generative methods.  all of the various flags which you can use with  
select() are present generatively, i.e. order_by(), limit(),  
with_lockmode(), filter(), select_from(), etc.  By favoring the  
generative methods we get a more flexible interface and only one way  
to do it.

- by removing the word "select" from Query's API entirely, the  
timeless confusion of "sql.select()? or query.select()?" goes away.   
the identifier "select" at last loses its ambiguity.  this is a big win.

- the "multi-dispatch" behavior of select() is also little known and  
confusing.  Did you know that select() can take not just a "where"  
criterion, but also a fully constructed select() object ?  neither  
did anyone else.   now we'll have from_statement() to express "ive  
constructed a full statement and dont want any statement compilation  
to occur".

- join() will reset the join point before joining because then you  
can build multiple joins starting from the root of the query chain.   
so query.join('a').join('b') builds two joins against the root table,  
query.join(['a', 'b']) builds a join from root to 'a' to 'b'.   the  
meaning of "joinpoint" is that any subsequent filter_by() call will  
construct its criterion against the endpoint of the join most  
recently constructed.  the new approach doesnt remove any  
capabilities but adds the capability for multiple joins off the  
root.  a join of join(None) brings the joinpoint back to the beginning.

- the ability to place ClauseElements inside of select_by() stems  
from the need to query based on keyword criterion *as well as* table  
based criterion at the same time.  but now that we have generative  
behavior, you can just use individual filter() and filter_by()  
methods to combine those.  The "multiple-use" nature of various  
methods, which is also very confusing (and hard to document), is not  
necessary with the generative approach.  So it will be removed from  
filter_by(), where Im hoping people havent been putting  
ClauseElements inside of too much.  (the deprecated methods of course  
keep it for backwards compatibility).

- the new "positional" argument of filter_by() is based on Gaetan's  
idea, as well as some light that was shed on the IRC channel.  it  
looks like this:

   session.query(User).filter_by(['orders', 'items', 'keywords'],  
keyword_name='foo').filter_by(['orders', 'items'],  
item_price=49.50).list()

above, we have created two joins from the "users" table, one of which  
joins from 'orders' to 'items' to 'keywords', and the other from  
'orders' to 'items'.  the two sets of joins are constructed using  
anonymous aliases, so that you get all users who have purchased an  
item that has the 'foo' keyword, and have also purchased an item that  
costs 49.50.  the two paths across 'orders' and 'items' are isolated  
from each other.

if aliases were not used above, the redundant 'orders' and 'items'  
tables would converge into one FROM clause each in the final  
statement since thats the behavior of select(), but then the  
semantics of the query would then be "all users who purchased items  
that cost 49.50 and also have the 'foo' keyword".

the above capability is currently available by constructing your own  
Alias object and building join criterion manually.  But it was  
pointed out on IRC that the Django ORM constructs filtering queries  
using aliases as above.  for a lot of cases the approach saves a lot  
of lines of code spent creating manual aliases.

- "auto-join": In the docs, the example of select_by() searching  
through all mapped properties looking for a match is given in the  
many-to-many page with something like session.query(Article).select_by 
(keyword_name='foo'), where a join is constructed from "articles" to  
"article_keywords", to "keywords" and then adding criterion  
"keywords.keyword_name='foo'".   It was known since day one, and  
eventually pointed out by a few users in case it wasnt known (which  
it was),  that this feature does not have "deterministic" behavior.   
if multiple attribute names of "keyword_name" exist, its going to  
pick the first one that it finds, and all subsequent occurences are  
forever ignored.   if mapped properties happen to be organized  
differently, then you get a different result - or, if you are  
querying from a different point in the "join" chain, the sudden  
presence of the same attribute name elsewhere suddenly changes the  
result.  the reason this feature was created, and why its stuck  
around, is that before we added generative behavior, there was really  
no way to join among properties without spelling out the join  
explicitly using tables.   the anemic methods join_via() and join_to 
() were added to assist in constructing these joins but I knew from  
the beginning i didnt like those.  but with generative behavior, we  
got join(), and now we even get the enhanced filter_by() idea.   so  
the "automatic join" idea isnt needed as there are now explicit ways  
to construct a join as desired.  again, hoping people arent relying  
on this with filter_by() or join() already (the deprecated methods of  
course keep it for backwards compatibility).

Cons:

upgrade path:  the only pain in this upgrade path is if you are using  
filter_by() with either ClauseElements or with "auto-join", using join 
() with "auto-join", or using multiple join()'s strung together to  
build a single join from the root.  filter_by() and join() are fairly  
new methods so i dont think their adoption is so deep as of yet, and  
also the features to be removed aren't so intuitive with them anyway  
so im betting they arent used much.  any current use of select(),  
select_by(), etc. would all continue to work identically.

more typing:  it is not as succinct to say query.filter_by(id=10).one 
(), instead of query.get_by(id=10).  but removing seven redundant  
methods which are not as flexible, making the purpose of each method  
much more focused, and taking out five or six paragraphs out of the  
docs thereby leaving room for deeper exposition of whats there, seems  
to be worth it.


so......whattaya say ?



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] PROPOSAL: whack query.select(), selectfirst(), selectone(), select_by(), selectfirst_by(), selectone_by(), get_by(), "auto-join" feature

Reply via email to