On Apr 28, 2011, at 2:24 PM, Jennifer Rodriguez-Mueller wrote:

> 
> Here is the core SQL (lots of distracting content trimmed):
> 
> SELECT s.*, a.*
>   FROM aliquot_table a
>     JOIN ( -- Biologically (p)lausible aliquot/specimen type logic
>       SELECT  3 AS a_type, 12 AS s_type UNION  -- Serum/Serum
>       SELECT 15 AS a_type,  7 AS s_type UNION  -- Gut Biopsy/Gut Biopsy
>       SELECT  7 AS a_type,  5 AS s_type UNION  -- Saliva/Saliva
>       SELECT 23 AS a_type, 13 AS s_type        -- NO_CODE/Swab
>       ) p ON a.a_type = p.a_type
>     JOIN specimen_table s ON (s.random_id = a.random_id AND s.s_type = 
> p.s_type)
>   WHERE a.test_id = 12345
> 
> Here is the (assumed) equivalent python:
> 
> def getSpecimenAliquots(fromDB):
>     from sqlalchemy.ext.sqlsoup import SqlSoup
>     plausibleTypes = set([( 3,12) #Serum/Serum
>                          ,(15, 7) #Gut Biopsy/Gut Biopsy
>                          ,( 7, 5) #Saliva/Saliva
>                          ,(23,13) #NO_CODE/Swab
>                          ])
>     soupDB = SqlSoup(fromDB)
>     aliq = soupDB.aliquot_table
>     spec = soupDB.specimen_table
>     join1 = soupDB.join(aliq, spec,
>         aliq.random_id == spec.random_id and (aliq.a_type, spec.s_type) in 
> plausibleTypes,
>         isouter=False)
>     join1 = join1.filter_by(aliq.test_id=12345)
>     return join1.all()
> 
> The code works in SQL, but when I try to translate it into sqlalchemy's 
> SqlSoup the number of rows is different (specifically the plausibility logic 
> is being silently ignored for some reason because there are 3 MappedJoin 
> objects instead of 1 row from the SQL and the two of the MappedObjects have 
> incompatible specimen/aliquot types).  
> I'm wondering if I expected too much integration between python and SQL when 
> I used a python set object to drive the join?  When I imagine what sqlalchemy 
> "should ideally do" with something like this, it would be to understand how 
> python types and comparisons over them correspond to database table/index 
> idioms and (perhaps?) create a temporary table in the DB with optimized 
> indexes based on the content of the set, write logically equivalent SQL to 
> use that temp table, get the results, and then clean up after itself.   This 
> would be much better than my SQL, which is logically sound but not 
> efficiently implemented, because (after all) it is only exploratory SQL, not 
> a real programming language.  

Well there's a leap here, which is that the original SQL uses UNION.   The 
"assumed" Python doesn't make any mention of unions.    There are also several 
elements here of Python that go above what the language can offer us in terms 
of operator overloading (that is, SQLAlchemy overloads things like ==, !=, but 
can only do this so much) - the Python "and" operator can't be intercepted (use 
and_(), or alternatively &&), the "(x, y) in set()" can't be intercepted 
('tuple' and 'set' are both Python builtins that aren't modified).

As a rule, when building things up you can always test individual expressions 
by calling str() on them.    Such as, if you were to say str(aliq.a_type == 5) 
you'd get a SQL expression.   If you call it on "aliq.a_type == 5 and 
spec.s_type==7" you'll get a string like "true" or "false" since the "and" 
operator isn't overridden, and the boundary of SQLAlchemy expression generation 
is crossed into plain Python.    Turning on SQL logging is also a good habit 
when building up queries to see what is being emitted as well as what's being 
returned (the docs describe how to do this).

The "(x, y) in set()" is not something that has a universally supported SQL 
representation.   Postgresql and MySQL for example could state an expression 
like that as "(x, y, z) IN ((1, 2, 3), (4, 5, 6), ...)", but SQLite and others 
cannot.   The approach you have uses UNIONs which is more verbose but probably 
better supported.   It doesn't seem like it would present a major inefficiency, 
if you wanted to note the difference between using temp tables and the inline 
expression you'd work with the databases EXPLAIN services, i.e. features that 
illustrate the query plan.

The best approach when translating from literal SQL is to build up as much of 
that exact statement as possible (which is typically the whole thing).    Your 
table of unions can be achieved as:

u = union(*[
    select([
        literal_column(str(tup[0])).label("a_type"),
        literal_column(str(tup[1])).label("s_type")
    ])
        for tup in plausibleTypes
]).alias()

'u' is then a selectable that represents the thing you have as the second 
element of your join - its columns are available as "u.c.<name>".      The join 
can then be constructed exactly as the SQL version is, substituting prefix 
conjunctions instead of infix (i.e. "and_(x, y)" instead of "x and y"):

soupDB.join(aliq, u, u.c.a_type==aliq.a_type).join(spec, 
and_(spec.random_id==aliq.random_id, spec.s_type==u.c.s_type))

> 
> How am I confused given that I'm surprised by the different number of 
> rows/objects? 

well you should do the SQL logging to see the whole conversation as it happens

> Why was no error thrown? 

you didn't actually make an "error", you essentially used Python to comprise an 
expression "aliq.random_id == True" - no errors there other than possible type 
conflicts.

> Also, assuming I've done something amazingly wrong, what is the *right* way 
> to specify join logic of this sort?  Perhaps I could translate the specimen 
> type logic into a giant python expression with lots of ((foo=3 and bar=12) or 
> (foo=15 and bar=7) or...)?  But that seems unlikely to be idiomatic with 
> sqlalchemy...

well there is not an idiomatic system for (x, y) in set() right now, so the 
union() approach above is one way, or if you wanted to use SQL tuples there is 
a tuple_() construct - you'd need to wrap two levels of tuple_() on the right 
hand side, the "IN" is accomplished by using the "in_()" operator:

from sqlalchemy import tuple_, literal_column
from sqlalchemy.sql import column

u = tuple_(column('x'), column('y')).in_(
    tuple_(*[tuple_(*[literal_column(str(v)) for v in tup]) 
    for tup in plausibleTypes])
)
print str(u)

returns:

(x, y) IN ((7, 5), (3, 12), (15, 7), (23, 13))



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to