On Oct 17, 2011, at 3:28 PM, Phil Vandry wrote:

> 
> - Whenever I fetch FlightEvents, I want to automatically fetch the
> runways that may be referenced by the event, hence lazy="joined" in
> the runway = relationship. This relationship is an outer join because
> an event may or may not reference a runway.
> 
> - Whenever I fetch Runways, I want to automatically fetch the Airport
> where those runways are located, so another lazy="joined". This one is
> an inner join because a runway is guaranteed to have an associated
> airport.
> 
> The generated SQL contains this (formatting added):
> 
> FROM flight_events
>  LEFT OUTER JOIN world.runways AS runways_1
>    ON runways_1.runway_id = flight_events.runway_id
>  INNER JOIN world.airports AS airports_1
>    ON airports_1.airport_id = runways_1.airport_id
> 
> Unfortunately this doesn't work. Due to the INNER JOIN to airports, only
> events which reference runways (which in turn reference airports) are
> returned.

This is because you're using an ancient version of SQLAlchemy that was one of 
the first to ever feature the "innerjoin" flag.    Was fixed in 0.6.6.    I'd 
strongly advise, not just for SQLAlchemy but for Python overall, to use "pip" 
to install Python libraries instead of relying upon distribution installs, 
Python libraries are developed a lot more quickly than linux distros and you'll 
have a lot less headache.  Especially if starting a new project with 
SQLAlchemy, your app will run a lot faster with 0.7, be easier to develop, and 
be easier to upgrade as new releases come out.

Post 0.6.5 the full chain of eager loaders detects the outer join up the chain 
and converts them all to outer join.   Here's an example:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import *
from sqlalchemy.orm import *

Base = declarative_base()

class Airport(Base):
    __tablename__ = 'airport'
    id = Column(Integer, primary_key=True)

class Runway(Base):
    __tablename__ = 'runway'
    id = Column(Integer, primary_key=True)
    airport_id = Column(Integer, ForeignKey('airport.id'))
    airport = relationship(Airport, 
                        backref=backref("runways", collection_class=set), 
                        lazy="joined", innerjoin=True)

class FlightEvent(Base):
    __tablename__ = 'flightevent'
    id = Column(Integer, primary_key=True)
    runway_id = Column(Integer, ForeignKey('runway.id'))
    runway = relationship(Runway, lazy="joined")

s = Session()
print s.query(FlightEvent)

output:

SELECT flightevent.id AS flightevent_id, flightevent.runway_id AS 
flightevent_runway_id, airport_1.id AS airport_1_id, runway_1.id AS 
runway_1_id, runway_1.airport_id AS runway_1_airport_id 
FROM flightevent LEFT OUTER JOIN runway AS runway_1 ON runway_1.id = 
flightevent.runway_id LEFT OUTER JOIN airport AS airport_1 ON airport_1.id = 
runway_1.airport_id

its only if you have no "outer" joins in the preceding chain do you get the 
INNER join:

SELECT runway.id AS runway_id, runway.airport_id AS runway_airport_id, 
airport_1.id AS airport_1_id 
FROM runway JOIN airport AS airport_1 ON airport_1.id = runway.airport_id

so the correct results are returned.  However, OUTER JOIN does not perform as 
well as INNER JOIN, which is the real problem here.

> 
> The only way I know to make this work is for the INNER JOIN to be
> contained inside the RHS of the OUTER JOIN (where it logically belongs).
> If the SQL fragment above is replace with the following, it works:
> 
> FROM flight_events
>  LEFT OUTER JOIN (
>    world.runways AS runways_1
>      INNER JOIN world.airports AS airports_1
>        ON airports_1.airport_id = runways_1.airport_id
>  )
>    ON runways_1.runway_id = flight_events.runway_id
> 
> Is there a way to convince SQLAlchemy to generate this kind of SQL?

absolutely, but not using joinedload() or lazy="joined", read on...

> If so, I was not able to find how. It seems that the AST given as
> input to the SQL compiler has the joins defined flat, not
> hierarchically (i.e. the INNER JOIN of runways and airports is not an
> AST node under the RHS of the toplevel OUTER JOIN).


So assuming you want the performance increase potentially afforded by x outer 
join (y join z), as opposed to just the right results, you need to specify the 
joins yourself.   SQLAlchemy can't invoke "x JOIN (y JOIN z)" across the board 
because it is not supported by all backends, specifically sqlite:

sqlite> create table foo(id integer);
sqlite> create table bar(id integer);
sqlite> create table bat(id integer);
sqlite> select * from foo left outer join (bar join bat on bar.id=bat.id) on 
foo.id=bat.id;
Error: no such column: bat.id

It also would probably fail against an Oracle 8 backend, as we need to jump 
through some hoops to convert out all the JOINs to "=" and "x=y(+)" operators.  
 The ORM's construction of the chain of joins does so above the level of SQL 
rendering, so for the moment it unwraps it outwards the way it does for full 
cross-compatibility.    Having it nest the joins is an improvement we've 
considered which would be straightforward, but I'd be concerned about not just 
sqlite but also potential performance regressions on other backends that might 
not optimize as well with nested joins (with MySQL the lead culprit for this 
kind of thing).   Perhaps as a configurational option.   

for now, if you want that exact SQL, it's pretty easy to do (note I tend to not 
use "joined" loading at the relationship() level as it makes it more difficult 
to control how many joins a complex query might end up having):

from sqlalchemy.orm import join

s.query(FlightEvent).\
        outerjoin(join(Runway, Airport, Runway.airport)).\
        options(contains_eager(FlightEvent.runway, Runway.airport))   # 0.7 
style here

Where contains_eager() routes the columns from those joins into the related 
attribute.   On 0.6, things are slightly more tedious due to the workings of 
contains_eager():

s.query(FlightEvent).\
        outerjoin(join(Runway, Airport, Runway.airport)).\
        options(
            contains_eager(FlightEvent.runway),
            contains_eager(FlightEvent.runway, Runway.airport)
        )

where contains_eager() didn't establish for each element in the preceding path, 
so a separate option must be set up for each path (FlightEvent->Runway, 
FlightEvent->Runway->Airport).

> 
> Using SQLAlchemy 0.6.3 (from Debian stable... yeah, I know Debian is
> behind the current version) with MySQL 5.1.49.

to give some perspective how behind 0.6.3 is, here's a list of all the CHANGES 
since 0.6.3, a grep comes up with about 427 individual changes:

http://pastebin.com/bFLikGfF



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to