Re: [sqlalchemy] Is there any way to get around the "redundancy" of coupled .join and contains_eager chains?

seaders Sun, 30 Sep 2018 16:40:55 -0700

Any further thought on the above at all?

Reason I ask is because I still regularly get caught / frustrated with 
situations similar to this, and am going to rock my own solution, with my 
own branch of sqlalchemy.


If I get it to a "happy" and good enough state, I'd also be happy to PR it 
back into main for you, if you'd take it.

My preferences would be what I stated, one of,

Odds.query
> .options( 

    joinedload(Odds.market_participant)
>     .joinedload(MarketParticipant.event_market)
>     .joinedload(EventMarket.event, extra_join_condition=Event.id > 123456)


(which is adding more to the already established ON condition of the join)
 
Or,

Odds.query
> .options( 

    joinedload(Odds.market_participant)
>     .joinedload(MarketParticipant.event_market)
>     .joinedload(EventMarket.event, with_filter=Event.id > 123456)


Which would add the filter for the general query, and operate much like 
your example, with target,

    Odds.query 
    .options( 
        joinedload(Odds.market_participant) 
        .joinedload(MarketParticipant.event_market) 
        .joinedload(EventMarket.event, target=Event)) 
    .filter(Event.id == 123456) 

Main reason I'd choose my ways over yours is because, I would still 
consider having to specify the Event table is going back into redundancy 
again.

As in your suggestion is basically just like how it already works (I'm 
changing the query to just EventMarket to Event, because they're a direct 
join),

  EventMarket.query
  .join(EventMarket.event)
  .options(
    joinedload(EventMarket.event)
  )
  .filter(Event.date == '2018-10-01')

  EventMarket.query
  .options(
    joinedload(EventMarket.event, target=Event)
  )
  .filter(Event.date == '2018-10-01')

It definitely reads better, but I would still consider having the filter 
condition accompanying the actual join line is more direct / clear.  I 
think it would probably be better other relationships which end up in a 
sub-statement, because they're indirect via secondary tables, and secondary 
joins.

On Sunday, September 9, 2018 at 6:21:00 PM UTC+1, seaders wrote:
>
> I think the main "way" I'd want to use it, closest to its current is the 
> joinedload way,
>
> Odds.query
>> .options( 
>
>     joinedload(Odds.market_participant)
>>     .joinedload(MarketParticipant.event_market)
>>     .joinedload(EventMarket.event))
>> .filter(Event.id == 123456)
>
>
> In which case either your method with target,
>
>     Odds.query 
>>     .options( 
>>         joinedload(Odds.market_
>> participant) 
>>         .joinedload(MarketParticipant.event_market) 
>>         .joinedload(EventMarket.event, target=Event)) 
>>     .filter(Event.id == 123456) 
>>
>
> Or "simply" (in quotes cos I've no idea how simple this would be), 
> enabling extra conditions for the ON load, 
>
> Odds.query
>> .options( 
>
>     joinedload(Odds.market_participant)
>>     .joinedload(MarketParticipant.event_market)
>>     .joinedload(EventMarket.event, extra_join_condition=Event.id > 123456)
>
>
> (changing event_id = 123456 to > 123456 so the example makes a bit more 
> sense)
>
> So the query would turn into,
>
> SELECT ...
>>  FROM odds
>>  LEFT OUTER JOIN market_participant AS market_participant_1 ON 
>> market_participant_1.id = odds.market_participant_id 
>>  LEFT OUTER JOIN event_market AS event_market_1 ON event_market_1.id = 
>> market_participant_1.event_market_id 
>>  LEFT OUTER JOIN event AS event_1 ON (event_1.id = 
>> event_market_1.event_id AND event.id > 123456)
>>
>  
> Or some way to specify the filter is "part of" / "paired with" that 
> relationship, so that the mapper doesn't just add in another "FROM event",
>
> Odds.query
>> .options( 
>
>     joinedload(Odds.market_participant)
>>     .joinedload(MarketParticipant.event_market)
>>     .joinedload(EventMarket.event, with_filter=Event.id > 123456)
>
>
> (changing event_id = 123456 to > 123456 so the example makes a bit more 
> sense)
>
> So the query would turn into,
>
> SELECT ...
>>  FROM odds
>>  LEFT OUTER JOIN market_participant AS market_participant_1 ON 
>> market_participant_1.id = odds.market_participant_id 
>>  LEFT OUTER JOIN event_market AS event_market_1 ON event_market_1.id = 
>> market_participant_1.event_market_id 
>>  LEFT OUTER JOIN event AS event_1 ON event_1.id = 
>> event_market_1.event_id 
>>
> WHERE event.id > 123456)
>>
>
> I accept your point about verbosity vs. redundancy, but when you've a 
> structure like this (and I've few more again!), you often want to do a 
> "full" load of your objects, with filters.
>
> I mean, for today, in my Db, there are about 20,000 "odds" objects, for 
> about 80 events.  I'll *never* want to load all odds for all events, but I 
> definitely do sometimes want to load all the data for 1 specific market of 
> an event.
>
> And that's something I've consistently struggled with, I think.  The 
> proper, stable way, after you've correctly set up all your relationships 
> between your tables, to do filtered multi-load of multiple tables.
>
> This question is probably more about the lack of options you have for 
> filtering joinedload, and the (IMO) redundancy of only being able to filter 
> it if you have redundancy in the code,
>
> Odds.query
>> .join(Odds.market_participant)
>> .options( 
>
>     joinedload(Odds.market_participant))
>> .filter(MarketParticipant.id == 1234567890)
>
>
> Seems to absolutely "fit" redundancy.  You need to do a "pre" join to 
> ensure your later joinedload and filter actually work together, as opposed 
> to
>
> Odds.query
>> .options(joinedload(Odds.market_participant))
>> .filter(MarketParticipant.id == 1234567890)
>
>
> With that, I *really* can't imagine anyone actually wants to do what it 
> does,
>
> SELECT ...
>>  FROM market_participant, odds
>>  LEFT OUTER JOIN market_participant AS market_participant_1 ON 
>> market_participant_1.id = odds.market_participant_id 
>> WHERE market_participant_1.id = 1234567890
>>
>
> But I accept that if everyones used to that behaviour, then changing it 
> would cause too much trouble.
>
> So then it just becomes how to custom filter your joins, during the join 
> in ON, or after in WHERE.
>
> I'm easy about either one, I just want to be able to do it.
>
> On Sunday, September 9, 2018 at 5:48:00 PM UTC+1, Mike Bayer wrote:
>>
>> On Sun, Sep 9, 2018 at 10:01 AM, seaders <sead...@gmail.com> wrote: 
>> > I'm using sqlalchemy's ORM, and run queries like, I want to get all 
>> Odds for 
>> > a specific Event. 
>> > 
>> > An Event has many EventMarkets, an EventMarket has many 
>> MarketParticipants, 
>> > and each MarketParticipant has many Odds. 
>> > 
>> > A query like, 
>> > 
>> > 
>> >> Odds.query 
>> >> .join(Odds.market_participant) 
>> >> .join(MarketParticipant.event_market) 
>> >> .join(EventMarket.event) 
>> >> .options( 
>> >> 
>> >>     contains_eager(Odds.market_participant) 
>> >>     .contains_eager(MarketParticipant.event_market) 
>> >>     .contains_eager(EventMarket.event)) 
>> >> .filter(Event.id == 123456) 
>> > 
>> > 
>> > This produces a query like, 
>> > 
>> >> SELECT ... 
>> >>  FROM odds 
>> >>  LEFT OUTER JOIN market_participant AS market_participant_1 ON 
>> >> market_participant_1.id = odds.market_participant_id 
>> >>  LEFT OUTER JOIN event_market AS event_market_1 ON event_market_1.id 
>> = 
>> >> market_participant_1.event_market_id 
>> >>  LEFT OUTER JOIN event AS event_1 ON event_1.id = 
>> event_market_1.event_id 
>> >> 
>> >> WHERE event.id = 123456 
>> > 
>> > 
>> > Which works perfectly, but I consider having to have both join and 
>> > contains_eager redundant, I'd much prefer to do the more 
>> straightforward, 
>> > 
>> >> Odds.query 
>> >> .options( 
>> >> 
>> >>     joinedload(Odds.market_participant) 
>> >>     .joinedload(MarketParticipant.event_market) 
>> >>     .joinedload(EventMarket.event)) 
>> >> .filter(Event.id == 123456) 
>> > 
>> > 
>> > But this produces a query like 
>> > 
>> >> SELECT ... 
>> >>  FROM event, odds 
>> >>  LEFT OUTER JOIN market_participant AS market_participant_1 ON 
>> >> market_participant_1.id = odds.market_participant_id 
>> >>  LEFT OUTER JOIN event_market AS event_market_1 ON event_market_1.id 
>> = 
>> >> market_participant_1.event_market_id 
>> >>  LEFT OUTER JOIN event AS event_1 ON event_1.id = 
>> event_market_1.event_id 
>> >> WHERE event.id = 123456 
>> > 
>> > 
>> > As it queries from both Event, and then Odds with the joins to Event 
>> again, 
>> > this time as "event_1", it joins the entire table, incorrectly. 
>> > 
>> > ---- 
>> > 
>> > I'm also aware I could do all the filters, 
>> > 
>> >> Odds.query 
>> >> .filter( 
>> >>     Odds.market_participant, 
>> >>     MarketParticipant.event_market, 
>> >>     EventMarket.event, 
>> >>     Event.id == 123456) 
>> > 
>> > 
>> > But that doesn't ensure the "bits" are loaded in (market_participant 
>> for 
>> > odds, event_market for market_participant, etc.), you still need to 
>> > redundantly add the contains_eager chain, 
>> > 
>> >> Odds.query 
>> >> .options(contains_eager(Odds.market_participant) 
>> >>          .contains_eager(MarketParticipant.event_market) 
>> >>          .contains_eager(EventMarket.event)) 
>> >> .filter( 
>> >>     Odds.market_participant, 
>> >>     MarketParticipant.event_market, 
>> >>     EventMarket.event, 
>> >>     Event.id == 123456) 
>> > 
>> > 
>> > This produces a perfectly acceptable query, with filtering via the 
>> tables, 
>> > instead of joins, 
>> > 
>> >> SELECT ... 
>> >> 
>> >> FROM event, event_market, market_participant, odds 
>> >>  WHERE market_participant.id = odds.market_participant_id 
>> >> 
>> >>  AND event_market.id = market_participant.event_market_id 
>> >>  AND event.id = event_market.event_id 
>> >> 
>> >>  AND event.id = 123456 
>> > 
>> > 
>> > ---- 
>> > 
>> > Again, I know coupling join with contains_eager (as is done in the 
>> first 
>> > example) works, but I absolutely consider it redundant. 
>> > 
>> > This redundancy is shown more clearly if you swap contains_eager with 
>> > joinedload, 
>> > 
>> >> Odds.query 
>> >> .join(Odds.market_participant) 
>> >> .join(MarketParticipant.event_market) 
>> >> .join(EventMarket.event) 
>> >> .options( 
>> >> 
>> >>      joinedload(Odds.market_participant) 
>> >>     .joinedload(MarketParticipant.event_market) 
>> >>     .joinedload(EventMarket.event)) 
>> >> .filter(Event.id == 123456) 
>> > 
>> > 
>> > That produces the exact same as the first query, coupling join with 
>> > contains_eager. 
>> > 
>> > ---- 
>> > 
>> > Is there any way to be able to specify what you want in joinedload in 
>> the 
>> > options without having to specify the join as well, *AND* be able to 
>> filter 
>> > on those tables, like the example. 
>> > 
>> > The main reason I'm asking this is because every now and again, with a 
>> > simple autcompleted typo, or forgetfulness, I forget something, or get 
>> > something wrong in the join chain, or the contains_eager chain, and 
>> instead 
>> > of a query coming back within seconds, it stalls for a few minutes, 
>> trying 
>> > to return a million rows, instead of fifty.  99% of the time with these 
>> > mistakes one side will be correct (the join chain), with a mistake in 
>> the 
>> > other (the contains_eager chain).  I understand those bugs / errors are 
>> "on 
>> > me", but even every time I do it, to do it right it frustrates me a 
>> bit. 
>> > 
>> > I've put a decent amount of effort into design the tables with minimal 
>> > redundancy, both on the Db side, and within my code, but then, when I'm 
>> > running a few queries, I do get frustrated every time when I have to 
>> write 
>> > double the "join" logic, in both .join / .filter, and .contains_eager. 
>>
>>
>> TLDR: I would accept that you find it "verbose", but it is not 
>> "redundant". 
>>
>> When you claim it's "redundant", that sounds like you're saying that 
>> nobody would ever want to do a join() *without* contains_eager(), and 
>> hence, there should be no joinedload() either.  There would just be 
>> join(), and it would always look for scalar relationships or 
>> collections to populate with those joined results.    But join() works 
>> much more generically than this and is also agnostic of the fact that 
>> the rows being joined constitute a scalar row, as seems to be the case 
>> here, or a collection, where it is not a given that a user would want 
>> a filtered set of rows from their join() to also form the population 
>> of their collections, e.g. they'd have objects with 
>> not-fully-populated collections.   People do want to use 
>> contains_eager() in this way, in fact, even though I find it to be 
>> misleading.     Also, join() can be expressed multiple times across 
>> the same relationship() path, or expressed from entities that don't 
>> actually include a given path, or from individual columns where 
>> relationship loading doesn't apply, and in all these cases it would be 
>> ambiguous for contains_eager() to invoke automatically, besides that 
>> it is usually (but not always) undesirable in the collection use case. 
>>
>> So with join() and contains_eager(), there are two separate intents 
>> being expressed and IMO there is no redundancy.    This may however be 
>> considered to be an undesirable verbosity. 
>>
>> As far as a feature that would attempt to run these two intents at 
>> once, I think as a built in feature it would be confusing and 
>> misleading for the reasons given above, not to mention it would fail 
>> to function unpredictably when invoked against the same relationship 
>> path multiple times. 
>>
>> The usual way to reduce query verbosity in this kind of case is that 
>> you can make your own option that does the two things you want at the 
>> same time.   The most expedient form is to make use of a 
>> "transformation" function: 
>>
>> def eager_join(criteria): 
>>     def transform(q): 
>>         return q.join(criteria).options(contains_eager(criteria)) 
>>     return transform 
>>
>> The above function can be used as: 
>>
>> query = query.with_transformation(eager_join(MyModel.some_attr)) 
>>
>>
>>
>> http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=query#sqlalchemy.orm.query.Query.with_transformation
>>  
>>
>>
>> Alternatively this function can be a standalone function that you pass 
>> a Query towards, or a method on Query subclass, which you then provide 
>> to your Session / sessionmaker using the query_cls parameter.   If you 
>> are using an existing system like Flask that already provides for a 
>> Query class, you can still subclass that. 
>>
>> If you'd still like to pursue that SQLAlchemy can have such a feature, 
>> feel free to specify the API you think it should have, however I'm at 
>> the moment skeptical such an API wouldn't introduce many new ambiguous 
>> scenarios that would need to be detected, when all the while a 
>> completely unambiguous system of producing this effect already exists. 
>> The simplest thing I can think of would be if joinedload() simply had 
>> an option to either not run its automatic aliasing feature, or if you 
>> could pass a desired target entity towards it so that you could filter 
>> on it, in your example it would be: 
>>
>>     Odds.query 
>>     .options( 
>>
>>         joinedload(Odds.market_participant) 
>>         .joinedload(MarketParticipant.event_market) 
>>         .joinedload(EventMarket.event, target=Event)) 
>>     .filter(Event.id == 123456) 
>>
>>
>> Above, we tell the last joinedload() to create its JOIN against the 
>> plain "event" table without anonymously aliasing it, so that the 
>> filter() catches it. 
>>
>> This is again explciitly expressing the two separate intents, but just 
>> in a less verbose way. 
>>
>>
>> > 
>> > -- 
>> > SQLAlchemy - 
>> > The Python SQL Toolkit and Object Relational Mapper 
>> > 
>> > http://www.sqlalchemy.org/ 
>> > 
>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> > description. 
>> > --- 
>> > You received this message because you are subscribed to the Google 
>> Groups 
>> > "sqlalchemy" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an 
>> > email to sqlalchemy+...@googlegroups.com. 
>> > To post to this group, send email to sqlal...@googlegroups.com. 
>> > Visit this group at https://groups.google.com/group/sqlalchemy. 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Is there any way to get around the "redundancy" of coupled .join and contains_eager chains?

Reply via email to