Re: [sqlalchemy] Large number of polymorphic subclasses

damon Mon, 01 May 2017 15:12:20 -0700

Hey Mike,

I just saw the new selectinload feature, and am super excited about it!


As far as I can tell, this is only for relationships, so I wanted to check 
what the plans are for an inheritance loader using selectinload as well, 
given that you initially mentioned them together and I can't find an issue 
in the tracker for it.
 

> I've created an issue for both a new relationship loader and an 
> inheritance loader at the same time, since they will use very similar 
> paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944 
> <https://www.google.com/url?q=https%3A%2F%2Fbitbucket.org%2Fzzzeek%2Fsqlalchemy%2Fissues%2F3944&sa=D&sntz=1&usg=AFQjCNHP34Q9-nl0O_qmczm7uZFo8daCAA>
> .


Thanks!
Damon

On Tuesday, April 4, 2017 at 8:24:07 PM UTC-7, Mike Bayer wrote:
>
> Reusing the original query is hard, and the case you have is due to 
> mismatched entities that would have to be handled, probably by wrapping the 
> original query in a subquery just like subquery eager loading does.  The 
> subquery eager loading feature took a few years to work out an enormous 
> number of issues with this approach.   
>
> The new feature which I'm now working on, several hours a day, in response 
> to this thread solves the whole problem in a much better way, using a 
> simple IN expression against the primary keys of all states loaded in the 
> query.   Betas of 1.2 will hopefully be available in a month or two.    The 
> feature can be replicated in older versions by just grabbing the list of 
> states coming out of __iter__ and using their primary keys in an IN clause. 
>   Parallel development of this other approach as an interim throwaway is 
> probably not worth it if it continues to become more complicated.
>
> On Apr 4, 2017 7:02 PM, <da...@benchling.com <javascript:>> wrote:
>
> Hey Mike,
>
> Thanks for those -- seems to have helped those cases, though we're running 
> into some pretty weird behavior with joins. Here's a simplified case that 
> shows one of the issues we're running into (use _loader_from_cls from 
> above):
>
>
> class W(Base):
>     __tablename__ = 'w'
>     id = Column(Integer, primary_key=True)
>     type = Column(String)
>
>     x_id = Column(Integer, ForeignKey('x.id'))
>     x = relationship('X')
>
>     __mapper_args__ = {'polymorphic_on': type}
>
>
> class W2(W):
>     __tablename__ = 'w2'
>     id = Column(Integer, ForeignKey('w.id'), primary_key=True)
>     w2 = Column(String)
>     __mapper_args__ = {'polymorphic_identity': 'w2'}
>
>
> class X(Base):
>     __tablename__ = 'x'
>     id = Column(Integer, primary_key=True)
>
>
> @event.listens_for(W, "load", propagate=True)
> def load_extra(target, context):
>      key = ('loader_by_cls', type(target))
>
>      if key not in context.attributes:
>          context.attributes[key] = _loader_for_cls(target, context)
>
>
> e = create_engine("sqlite://", echo=True)
> Base.metadata.create_all(e)
> s = Session(e)
>
> s.add(W2(x=X()))
> s.commit()
>
> s.query(W).join(W.x).first()
>
>
> One of the statements emitted by this is
>
> SELECT w2.id AS w2_id, w.id AS w_id, w.type AS w_type, w.x_id AS w_x_id, 
> w2.w2 AS w2_w2
> FROM w2, w JOIN x ON x.id = w.x_id
>  LIMIT ? OFFSET ?
>
> Specifically, the "FROM w2, w JOIN x" is not what we want here -- we'd 
> just want "FROM w2"
>
> Replacing 
>
>     q = orig_query.with_entities(target_cls)
>
> with
>
>     q = context.session.query(target_cls).join(orig_query.subquery())
>
> fixes the issue, though this feels less than ideal. Subclassing Query also 
> seems less than ideal. Do you have any ideas here?
>
>
> Thanks,
> Damon
>
> On Tuesday, April 4, 2017 at 6:42:32 AM UTC-7, Mike Bayer wrote:
>
>> This repaste of the example contains two refinements to address each of 
>> these issues distinctly.  We want to avoid re-entrant invocation of 
>> _loader_for_cls, so putting a flag into attributes handles that.   Then 
>> there's the issue of same class coming in that we are already querying, 
>> we can look at the type being loaded in column_descriptions, and this 
>> also solves the re-entrant problem though I think the flag is more 
>> direct.  The column_descriptions check likely needs more refinements, 
>> but the general idea is that if the entity being loaded is already a B2, 
>> then you wouldn't do any extra B2 queries (unless you have multiple 
>> levels of inheritance / polymorphic loading in which case that probably 
>> needs adjustment). 
>>
>> The "strong reference" comment, you can try commenting that out in the 
>> test and see what happens.  Because this loader trick loads all the 
>> subclasses up front upon seeing the first member of a particular 
>> subtype, it is loading for subsequent instances that aren't seen yet as 
>> well.  If we don't make a strong reference to them, they get lost and 
>> the extra attribute loading fails. 
>>
>> The feature I have in development works a lot better because all the 
>> additional loads are done *after* all the entities are loaded, and it 
>> does it using an IN query that only includes those entities that 
>> definitely need the load.  I'm not sure if there are other negative side 
>> effects from our loading of the "subclass" entity in some cases ahead of 
>> where the primary query gets at the entity.    The recipe here could be 
>> made to do all the extra loads after the primary query but theres no 
>> "after query" hook, you'd have to subclass Query and override __iter__ 
>> to add this step. 
>>
>> from sqlalchemy import * 
>> from sqlalchemy.orm import * 
>> from sqlalchemy.ext.declarative import declarative_base 
>> from sqlalchemy import event 
>>
>> Base = declarative_base() 
>>
>>
>> class A(Base): 
>>      __tablename__ = 'a' 
>>      id = Column(Integer, primary_key=True) 
>>      a1 = Column(String) 
>>      type = Column(String) 
>>
>>      __mapper_args__ = {'polymorphic_on': type} 
>>
>>
>> class B1(A): 
>>      __tablename__ = 'b1' 
>>      id = Column(Integer, ForeignKey('a.id'), primary_key=True) 
>>      b1 = Column(String) 
>>      b_data = Column(String) 
>>      __mapper_args__ = {'polymorphic_identity': 'b1'} 
>>
>>
>> class B2(A): 
>>      __tablename__ = 'b2' 
>>      id = Column(Integer, ForeignKey('a.id'), primary_key=True) 
>>      b2 = Column(String) 
>>      b_data = Column(String) 
>>      cs = relationship("C", lazy='subquery') 
>>
>>      __mapper_args__ = {'polymorphic_identity': 'b2'} 
>>
>>
>> class C(Base): 
>>      __tablename__ = 'c' 
>>      id = Column(Integer, primary_key=True) 
>>      b2_id = Column(ForeignKey('b2.id')) 
>>
>>
>> class B3(A): 
>>      __tablename__ = 'b3' 
>>      id = Column(Integer, ForeignKey('a.id'), primary_key=True) 
>>      b3 = Column(String) 
>>      b_data = Column(String) 
>>      __mapper_args__ = {'polymorphic_identity': 'b3'} 
>>
>>
>> def _loader_for_cls(target, context): 
>>      orig_query = context.query 
>>
>>      target_cls = type(target) 
>>
>>      if orig_query.column_descriptions[0]['type'] is target_cls: 
>>          return None 
>>
>>      # take the original query and chance the entity to the subclass 
>>      q = orig_query.with_entities(target_cls) 
>>
>>      # defer everything that's not PK / polymorphic_on from A.  this 
>> whole 
>>      # bit is just to avoid all those extra columns 
>>      to_defer = [] 
>>      mapper = inspect(target).mapper 
>>      inherited = mapper.inherits 
>>      while inherited is not None: 
>>          for attr in inherited.column_attrs: 
>>              if not attr.expression.primary_key and \ 
>>                      attr.expression is not inherited.polymorphic_on: 
>>                  to_defer.append(attr.key) 
>>          for attr in inherited.relationships: 
>>              to_defer.append(attr.key) 
>>          inherited = inherited.inherits 
>>      q = q.options(*[defer(k) for k in to_defer]) 
>>
>>      q._attributes["load_extra_called"] = True 
>>
>>      # store this strong reference so recs don't get lost while 
>>      # iterating 
>>      return q.all() 
>>
>>
>> @event.listens_for(A, "load", propagate=True) 
>> def load_extra(target, context): 
>>      if "load_extra_called" in context.attributes: 
>>          return 
>>
>>      key = ('loader_by_cls', type(target)) 
>>
>>      if key not in context.attributes: 
>>          context.attributes[key] = _loader_for_cls(target, context) 
>>
>>
>> e = create_engine("sqlite://", echo=True) 
>> Base.metadata.create_all(e) 
>>
>>
>> s = Session(e) 
>> s.add_all([ 
>>      B1(b_data='b11', b1='b1', a1='b11'), 
>>      B2(b_data='b21', b2='b2', a1='b21', cs=[C(), C(), C()]), 
>>      B3(b_data='b31', b3='b3', a1='b31'), 
>>      B1(b_data='b12', b1='b1', a1='b12'), 
>>      B1(b_data='b13', b1='b1', a1='b13'), 
>>      B2(b_data='b22', b2='b2', a1='b22', cs=[C(), C()]), 
>>      B3(b_data='b32', b3='b3', a1='b12'), 
>>      B3(b_data='b33', b3='b3', a1='b33') 
>> ]) 
>> s.commit() 
>>
>>
>> q = s.query(A).filter(A.a1.like('%2%')) 
>>
>> result = q.all() 
>>
>> print "----- no more SQL ----" 
>> for b in result: 
>>      if isinstance(b, B1): 
>>          print b.b1 
>>      elif isinstance(b, B2): 
>>          print b.cs 
>>          print b.b2 
>>      elif isinstance(b, B3): 
>>          print b.b3 
>>
>>
>>
>>
>> On 04/04/2017 12:16 AM, da...@benchling.com wrote: 
>> > Hey Mike, 
>> > 
>> > Looks like I spoke too soon -- a few more questions: 
>> > 
>> > Using the example code you posted, we're actually seeing 4 additional 
>> > queries (one per result model), rather than the expected 3 (one per 
>> > result model type). If you print context.query inside load_extra, I 
>> > think it's clear why: 
>> > 
>> > - the loader sequentially processes loaded models 
>> > (
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L75)
>>  
>>
>> > - on processing the first B2, it executes a query to load all B2's data 
>> > - when processing the results of this new query, it now processes the 
>> > second B2 record (under a new context) 
>> > - load_extra() gets called for the new B2 with a different context than 
>> > the first B2 
>> > - this now executes _loader_for_cls 
>> > 
>> > Note that if you don't call _loader_for_cls in load_extra, load_extra 
>> is 
>> > called for each record using the correct context. 
>> > 
>> > A related issue is that if you query directly for B2, it'll redundantly 
>> > execute _loader_for_cls. 
>> > 
>> > I think we could solve both of these issues by doing this instead: 
>> > 
>> >     q = orig_query.with_entities(target_cls) 
>> >     if q == orig_query: 
>> >         return 
>> > 
>> > Unfortunately, the == operator doesn't work for comparing queries. Do 
>> > you have a way to compare query equality, or alternatively have a 
>> > solution to both of these issues? 
>> > 
>> > Could you also elaborate on what you meant by this comment? 
>> > 
>> >     # store this strong reference so recs don't get lost while 
>> >     # iterating 
>> > 
>> > 
>> > Thanks again for all your help, 
>> > Damon 
>> > 
>> > On Monday, April 3, 2017 at 6:42:17 PM UTC-7, da...@benchling.com 
>> wrote: 
>> > 
>> >     Thanks a ton for your help, Mike! 
>> > 
>> >     We played around with it and are pretty happy with your solution 
>> >     using the load() event, so we'll be using that moving forward. 
>> > 
>> > 
>> >     Damon 
>> > 
>> >     On Wednesday, March 29, 2017 at 2:40:39 PM UTC-7, Mike Bayer wrote: 
>> > 
>> >         I have a working version of both loading relationships via IN 
>> as 
>> >         well as 
>> >         loading joined inheritance subclasses via IN, including your 
>> >         great idea 
>> >         that extra eager loaders should continue to work for the 
>> subclass 
>> >         loaders. 
>> > 
>> >         I've only tested it with one scenario so far and both patches 
>> >         have a 
>> >         long way to go re: tests and documentation. 
>> > 
>> >         the scratch test case is currently in the commit message for 
>> the 
>> >         second 
>> >         patch, which illustrates a base class + subclass load where 
>> both 
>> >         classes 
>> >         have an additional relationship.     This is at 
>> >         https://gerrit.sqlalchemy.org/#/c/359/ 
>> >         <https://gerrit.sqlalchemy.org/#/c/359/>, note this builds 
>> upon the 
>> >         previous gerrit at https://gerrit.sqlalchemy.org/#/c/352/ 
>> >         <https://gerrit.sqlalchemy.org/#/c/352/>.    The work 
>> >         at the moment will likely fall down with any kind of surprises 
>> >         but if 
>> >         you wanted to start running it and finding those cases, that is 
>> >         always 
>> >         helpful. 
>> > 
>> >         So far this looks promising as something that can be in 1.2 
>> with 
>> >         perhaps 
>> >         some "EXPERIMENTAL" warnings surrounding it, but overall 1.2 
>> was 
>> >         lacking 
>> >         a "killer ORM feature" so these two would be it. 
>> > 
>> > 
>> > 
>> >         On 03/23/2017 06:02 PM, mike bayer wrote: 
>> >         > 
>> >         > 
>> >         > On 03/23/2017 02:40 PM, mike bayer wrote: 
>> >         >> 
>> >         >> 
>> >         >> On 03/23/2017 12:53 PM, da...@benchling.com wrote: 
>> >         >>> Hey Mike, 
>> >         >>> 
>> >         >>> Thanks for the quick response! 
>> >         >>> 
>> >         >>> For developers that are pretty familiar with the SQLAlchemy 
>> >         API, but not 
>> >         >>> so much the internals, would implementing the subqueryloads 
>> to 
>> >         >>> contribute to SA be a reasonable endeavor? Could you 
>> >         ballpark how much 
>> >         >>> time how long it might take for us to do it? 
>> >         >> 
>> >         >> I haven't looked at what this would take, but it would be 
>> >         intricate and 
>> >         >> also need a lot of tests that are not easy to write.   
>>  Like, 
>> >         if you 
>> >         >> worked on it, you could probably get something working, but 
>> >         then that 
>> >         >> probably wouldn't be how it really needs to be because all 
>> >         kinds of 
>> >         >> things that are simple for simple cases don't work with the 
>> >         vast amount 
>> >         >> of edge cases which we have. 
>> >         > 
>> >         > I've created an issue for both a new relationship loader and 
>> an 
>> >         > inheritance loader at the same time, since they will use very 
>> >         similar 
>> >         > paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944 
>> >         <https://bitbucket.org/zzzeek/sqlalchemy/issues/3944>.    A 
>> POC 
>> >         > for the relationship loader so far looks to be very simple 
>> >         (but then 
>> >         > again all the loaders start out very simple...) and is at 
>> >         > https://gerrit.sqlalchemy.org/352 
>> >         <https://gerrit.sqlalchemy.org/352>.   The same infrastructure 
>> and 
>> >         > approach would also be used for the mapper inheritance 
>> loader, 
>> >         which 
>> >         > would be enabled via a mapper()-level flag, as well as 
>> >         per-query using a 
>> >         > new query option. 
>> >         > 
>> >         > I can't yet guarantee this will be a 1.2 thing, I'd have to 
>> >         get some 
>> >         > more momentum going w/ test cases and all that.  In a 1.2 
>> >         release it 
>> >         > would also be a little bit experimental as new loading styles 
>> >         usually 
>> >         > have small issues coming up for months or years, as people 
>> try 
>> >         more use 
>> >         > cases. 
>> >         > 
>> >         > 
>> >         > 
>> >         > 
>> >         > 
>> >         > 
>> >         >> 
>> >         >> The routine that's loading the additional columns just for 
>> >         one object at 
>> >         >> a time is here: 
>> >         >> 
>> >         >> 
>> >         
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635
>>  
>> >         <
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635>
>>  
>>
>> > 
>> >         >> 
>> >         >> and then here for the bulk of it: 
>> >         >> 
>> >         
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588
>>  
>> >         <
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588>
>>  
>>
>> > 
>> >         >> 
>> >         >> 
>> >         >> 
>> >         >> But the much harder part would be how to work this step into 
>> >         the loading 
>> >         >> infrastructure, which would be somewhere in 
>> >         >> 
>> >         
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273
>>  
>> >         <
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273>,
>>  
>>
>> > 
>> >         >> 
>> >         >> which is a very intricate function with over a decade of 
>> >         constant 
>> >         >> refactorings behind it, and I'd have to think pretty deeply 
>> >         for awhile 
>> >         >> how best to do this. 
>> >         >> 
>> >         >> Not to mention that there's more than one way to do this 
>> >         query, there's 
>> >         >> either re-using the criteria from the original query, or 
>> there's 
>> >         >> injecting the primary key ids of the whole list of objects 
>> >         into an IN 
>> >         >> clause after the fact.  The latter approach is probably more 
>> >         efficient 
>> >         >> but wouldn't work for composite primary keys outside of 
>> >         Postgresql.   As 
>> >         >> a built in feature I'd want "IN" loading to be an option at 
>> >         least. 
>> >         >> 
>> >         >> 
>> >         >> 
>> >         >>> 
>> >         >>> Regarding STI and relationships, is there any way to do 
>> that 
>> >         but still 
>> >         >>> get the benefits of JTI? e.g. is there an easy way to 
>> resolve 
>> >         >>> my_base_class_inst.subclass_prop as a proxy to the 
>> subclass? 
>> >         We could 
>> >         >>> roll our own using __getitem__ but it seems a little hairy. 
>> >         >> 
>> >         >> So proxying to a related item wouldn't be terrible, sure you 
>> >         could use 
>> >         >> __getitem__ or you could also add descriptors to the primary 
>> >         class, 
>> >         >> adding the descriptors to match the "info" could be 
>> automated 
>> >         as well 
>> >         >> (or even do it in the other direction, add special 
>> descriptor 
>> >         to main 
>> >         >> class == a column gets added to related class). 
>> >         >> 
>> >         >> 
>> >         >> Doing the thing where you query() for all the related 
>> classes 
>> >         after the 
>> >         >> fact yourself might not be that terrible.   you can use the 
>> >         load() event 
>> >         >> which receives the query context that has the query you need 
>> >         already.  I 
>> >         >> guess it's time for proof of concept.     Here's that.  You 
>> >         can see at 
>> >         >> the end we load all the "bs" without any per-object load. 
>> >         >> 
>> >         >> from sqlalchemy import * 
>> >         >> from sqlalchemy.orm import * 
>> >         >> from sqlalchemy.ext.declarative import declarative_base 
>> >         >> from sqlalchemy import event 
>> >         >> 
>> >         >> Base = declarative_base() 
>> >         >> 
>> >         >> 
>> >         >> class A(Base): 
>> >         >>     __tablename__ = 'a' 
>> >         >>     id = Column(Integer, primary_key=True) 
>> >         >>     a1 = Column(String) 
>> >         >>     type = Column(String) 
>> >         >> 
>> >         >>     __mapper_args__ = {'polymorphic_on': type} 
>> >         >> 
>> >         >> 
>> >         >> class B1(A): 
>> >         >>     __tablename__ = 'b1' 
>> >         >>     id = Column(Integer, ForeignKey('a.id <http://a.id>'), 
>> >         primary_key=True) 
>> >         >>     b1 = Column(String) 
>> >         >>     b_data = Column(String) 
>> >         >>     __mapper_args__ = {'polymorphic_identity': 'b1'} 
>> >         >> 
>> >         >> 
>> >         >> class B2(A): 
>> >         >>     __tablename__ = 'b2' 
>> >         >>     id = Column(Integer, ForeignKey('a.id <http://a.id>'), 
>> >         primary_key=True) 
>> >         >>     b2 = Column(String) 
>> >         >>     b_data = Column(String) 
>> >         >>     cs = relationship("C", lazy='subquery') 
>> >         >> 
>> >         >>     __mapper_args__ = {'polymorphic_identity': 'b2'} 
>> >         >> 
>> >         >> 
>> >         >> class C(Base): 
>> >         >>     __tablename__ = 'c' 
>> >         >>     id = Column(Integer, primary_key=True) 
>> >         >>     b2_id = Column(ForeignKey('b2.id <http://b2.id>')) 
>> >         >> 
>> >         >> 
>> >         >> class B3(A): 
>> >         >>     __tablename__ = 'b3' 
>> >         >>     id = Column(Integer, ForeignKey('a.id <http://a.id>'), 
>> >         primary_key=True) 
>> >         >>     b3 = Column(String) 
>> >         >>     b_data = Column(String) 
>> >         >>     __mapper_args__ = {'polymorphic_identity': 'b3'} 
>> >         >> 
>> >         >> 
>> >         >> def _loader_for_cls(target, context): 
>> >         >>     orig_query = context.query 
>> >         >> 
>> >         >>     target_cls = type(target) 
>> >         >> 
>> >         >>     # take the original query and chance the entity to the 
>> >         subclass 
>> >         >>     q = orig_query.with_entities(target_cls) 
>> >         >> 
>> >         >>     # defer everything that's not PK / polymorphic_on from 
>> A. 
>> >          this whole 
>> >         >>     # bit is just to avoid all those extra columns 
>> >         >>     to_defer = [] 
>> >         >>     mapper = inspect(target).mapper 
>> >         >>     inherited = mapper.inherits 
>> >         >>     while inherited is not None: 
>> >         >>         for attr in inherited.column_attrs: 
>> >         >>             if not attr.expression.primary_key and \ 
>> >         >>                     attr.expression is not 
>> >         inherited.polymorphic_on: 
>> >         >>                 to_defer.append(attr.key) 
>> >         >>         for attr in inherited.relationships: 
>> >         >>             to_defer.append(attr.key) 
>> >         >>         inherited = inherited.inherits 
>> >         >>     q = q.options(*[defer(k) for k in to_defer]) 
>> >         >> 
>> >         >>     # store this strong reference so recs don't get lost 
>> while 
>> >         >>     # iterating 
>> >         >>     return q.all() 
>> >         >> 
>> >         >> 
>> >         >> @event.listens_for(A, "load", propagate=True) 
>> >         >> def load_extra(target, context): 
>> >         >>     key = ('loader_by_cls', type(target)) 
>> >         >> 
>> >         >>     if key not in context.attributes: 
>> >         >>         context.attributes[key] = _loader_for_cls(target, 
>> >         context) 
>> >         >> 
>> >         >> 
>> >         >> e = create_engine("sqlite://", echo=True) 
>> >         >> Base.metadata.create_all(e) 
>> >         >> 
>> >         >> 
>> >         >> s = Session(e) 
>> >         >> s.add_all([ 
>> >         >>     B1(b_data='b11', b1='b1', a1='b11'), 
>> >         >>     B2(b_data='b21', b2='b2', a1='b21', cs=[C(), C(), C()]), 
>> >         >>     B3(b_data='b31', b3='b3', a1='b31'), 
>> >         >>     B1(b_data='b12', b1='b1', a1='b12'), 
>> >         >>     B1(b_data='b13', b1='b1', a1='b13'), 
>> >         >>     B2(b_data='b22', b2='b2', a1='b22', cs=[C(), C()]), 
>> >         >>     B3(b_data='b32', b3='b3', a1='b12'), 
>> >         >>     B3(b_data='b33', b3='b3', a1='b33') 
>> >         >> ]) 
>> >         >> s.commit() 
>> >         >> 
>> >         >> 
>> >         >> q = s.query(A).filter(A.a1.like('%2%')) 
>> >         >> 
>> >         >> result = q.all() 
>> >         >> 
>> >         >> print "----- no more SQL ----" 
>> >         >> for b in result: 
>> >         >>     if isinstance(b, B1): 
>> >         >>         print b.b1 
>> >         >>     elif isinstance(b, B2): 
>> >         >>         print b.cs 
>> >         >>         print b.b2 
>> >         >>     elif isinstance(b, B3): 
>> >         >>         print b.b3 
>> >         >> 
>> >         >> 
>> >         >> 
>> >         >> 
>> >         >> 
>> >         >>> 
>> >         >>> 
>> >         >>> Thanks again, 
>> >         >>> Damon 
>> >         >>> 
>> >         >>> On Wednesday, March 22, 2017 at 3:59:45 PM UTC-7, Mike 
>> Bayer 
>> >         wrote: 
>> >         >>> 
>> >         >>> 
>> >         >>> 
>> >         >>>     On 03/22/2017 02:17 PM, da...@benchling.com 
>> >         <javascript:> wrote: 
>> >         >>>     > Hey all, 
>> >         >>>     > 
>> >         >>>     > We were wondering if you had any advice on having a 
>> >         large (~10) 
>> >         >>>     number 
>> >         >>>     > of polymorphic subclasses for a single base class. 
>> Using 
>> >         >>>     > with_polymorphic: '*' causes SQLAlchemy to joinedload 
>> all 
>> >         >>> subclasses 
>> >         >>>     > like this: 
>> >         >>>     > 
>> >         >>>     > SELECT ... 
>> >         >>>     > FROM base_table 
>> >         >>>     > LEFT OUTER JOIN sub_table_1 ON base_table.id 
>> >         <http://base_table.id> 
>> >         >>>     <http://base_table.id> = sub_table_1.id 
>> >         <http://sub_table_1.id> <http://sub_table_1.id> 
>> >         >>>     > LEFT OUTER JOIN sub_table_2 ON base_table.id 
>> >         <http://base_table.id> 
>> >         >>>     <http://base_table.id> = sub_table_2.id 
>> >         <http://sub_table_2.id> <http://sub_table_2.id> 
>> >         >>>     > ... 
>> >         >>>     > 
>> >         >>>     > Postgres buckles under too many joins, and these 
>> >         queries start 
>> >         >>>     taking a 
>> >         >>>     > really long time. 
>> >         >>>     > 
>> >         >>>     > One other note is that for most of our queries, only 
>> a 
>> >         few of 
>> >         >>> these 
>> >         >>>     > sub-tables are actually needed, so most of the joins 
>> >         are wasted. 
>> >         >>>     > Unfortunately, ahead of time, we don't know which 
>> >         tables will be 
>> >         >>>     needed 
>> >         >>>     > -- we're relying on the discriminator. 
>> >         >>>     > 
>> >         >>>     > Ideally, we'd be able to specify that the ORM should 
>> >         >>> subqueryload the 
>> >         >>>     > subclasses (and only execute subqueries on the types 
>> >         that are 
>> >         >>>     present). 
>> >         >>>     > This would have to happen both when querying the base 
>> >         table, but 
>> >         >>> also 
>> >         >>>     > when accessing relationships. We'd want it to execute 
>> >         a query on 
>> >         >>> the 
>> >         >>>     > base table, then execute one query for each present 
>> >         subclass. 
>> >         >>>     > 
>> >         >>>     > Another solution might be to use some kind of hook 
>> that 
>> >         >>>     > 
>> >         >>>     > - is executed after a query returns with results (or 
>> >         after a 
>> >         >>> list of 
>> >         >>>     > models are added to the session?) 
>> >         >>>     > - groups the models by type and runs its own 
>> >         subqueries to load 
>> >         >>>     the data 
>> >         >>>     > 
>> >         >>>     > Any help here is greatly appreciated! 
>> >         >>> 
>> >         >>> 
>> >         >>>     The purpose of with_polymorphic is more about being 
>> able to 
>> >         >>> filter on 
>> >         >>>     multiple classes at the same time, which is why it uses 
>> >         joins, but 
>> >         >>>     these 
>> >         >>>     don't scale to many subclasses.    Adding a subquery 
>> >         load for the 
>> >         >>>     related tables would be something that the ORM can 
>> >         someday have as a 
>> >         >>>     feature, but it would need a lot of tests to ensure 
>> it's 
>> >         working as 
>> >         >>>     advertised. 
>> >         >>> 
>> >         >>>     There's a lot of ways to get those other tables loaded 
>> >         but none of 
>> >         >>> them 
>> >         >>>     look that great.   Turning off with_polymorphic(), one 
>> >         approach 
>> >         >>> is to 
>> >         >>>     collect all the distinct types and identifiers from 
>> your 
>> >         query 
>> >         >>> result; 
>> >         >>>     then do a separate query for each subtype: 
>> >         >>> 
>> >         >>>             result = 
>> session.query(BaseClass).filter(...).all() 
>> >         >>> 
>> >         >>>              types = sorted([(type(obj), obj.id 
>> >         <http://obj.id> <http://obj.id>) for obj 
>> >         >>>     in result], 
>> >         >>>     key=lambda t: t[0]) 
>> >         >>> 
>> >         >>>              for type, ids in itertools.groupby(types, 
>> >         key=lambda t: 
>> >         >>> t[0]): 
>> >         >>> 
>> >          session.query(type).filter(type.id.in_(ids)).all() 
>> >         >>> 
>> >         >>>     That will emit a query with an INNER JOIN for each 
>> class 
>> >         and will 
>> >         >>>     populate the remaining records in the identity map.  
>> The 
>> >         columns 
>> >         >>> that 
>> >         >>>     are already loaded are not re-accessed, though the 
>> DBAPI 
>> >         will still 
>> >         >>>     send 
>> >         >>>     them over the network to the cursor.   You can try 
>> >         limiting the 
>> >         >>> columns 
>> >         >>>     you query for in each statement as well by using the 
>> >         defer() option. 
>> >         >>> 
>> >         >>>     Another way is to use with_polymorphic() but to provide 
>> >         a different 
>> >         >>>     kind 
>> >         >>>     of SQL statement, like a polymorphic_union().   This 
>> >         would be a 
>> >         >>>     UNION of 
>> >         >>>     statements that each have an inner join.   the 
>> resulting 
>> >         SQL is a 
>> >         >>> beast 
>> >         >>>     but it at least isn't using those left outer joins.   I 
>> >         think you 
>> >         >>> can 
>> >         >>>     probably use sqlalchemy.orm.util.polymorphic_union() 
>> >         directly to get 
>> >         >>>     this UNION statement built up automatically. 
>> >         >>> 
>> >         >>>     Still another way is to reorganize the mappings to use 
>> >         single-table 
>> >         >>>     inheritance and relationship() to link out to the 
>> >         related table, 
>> >         >>> then 
>> >         >>>     the normal "subqueryload" feature to load them as 
>> >         relationships. 
>> >         >>> Even 
>> >         >>>     though this way is ugly, I might use this (short of 
>> >         implementing the 
>> >         >>>     related table subqueryload feature) just to make things 
>> >         simple. 
>> >         >>> 
>> >         >>> 
>> >         >>>     Definitely a feature that should be added but that's 
>> not an 
>> >         >>> immediate 
>> >         >>>     solution. 
>> >         >>> 
>> >         >>>     > 
>> >         >>>     > 
>> >         >>>     > Thanks, 
>> >         >>>     > Damon 
>> >         >>>     > 
>> >         >>>     > -- 
>> >         >>>     > SQLAlchemy - 
>> >         >>>     > The Python SQL Toolkit and Object Relational Mapper 
>> >         >>>     > 
>> >         >>>     > http://www.sqlalchemy.org/ 
>> >         >>>     > 
>> >         >>>     > To post example code, please provide an MCVE: 
>> Minimal, 
>> >         Complete, 
>> >         >>> and 
>> >         >>>     > Verifiable Example. See 
>> >         http://stackoverflow.com/help/mcve 
>> >         <http://stackoverflow.com/help/mcve> 
>> >         >>>     <http://stackoverflow.com/help/mcve 
>> >         <http://stackoverflow.com/help/mcve>> for a full 
>> >         >>>     > description. 
>> >         >>>     > --- 
>> >         >>>     > You received this message because you are subscribed 
>> >         to the Google 
>> >         >>>     > Groups "sqlalchemy" group. 
>> >         >>>     > To unsubscribe from this group and stop receiving 
>> >         emails from it, 
>> >         >>>     send 
>> >         >>>     > an email to sqlalchemy+...@googlegroups.com 
>> <javascript:> 
>> >         >>>     > <mailto:sqlalchemy+unsubscr...@googlegroups.com 
>> >         <javascript:>>. 
>> >         >>>     > To post to this group, send email to 
>> >         sqlal...@googlegroups.com 
>> >         >>>     <javascript:> 
>> >         >>>     > <mailto:sqlal...@googlegroups.com <javascript:>>. 
>> >         >>>     > Visit this group at 
>> >         https://groups.google.com/group/sqlalchemy 
>> >         <https://groups.google.com/group/sqlalchemy> 
>> >         >>>     <https://groups.google.com/group/sqlalchemy 
>> >         <https://groups.google.com/group/sqlalchemy>>. 
>> >         >>>     > For more options, visit 
>> >         https://groups.google.com/d/optout 
>> >         <https://groups.google.com/d/optout> 
>> >         >>>     <https://groups.google.com/d/optout 
>> >         <https://groups.google.com/d/optout>>. 
>> >         >>> 
>> >         >>> -- 
>> >         >>> SQLAlchemy - 
>> >         >>> The Python SQL Toolkit and Object Relational Mapper 
>> >         >>> 
>> >         >>> http://www.sqlalchemy.org/ 
>> >         >>> 
>> >         >>> To post example code, please provide an MCVE: Minimal, 
>> >         Complete, and 
>> >         >>> Verifiable Example. See http://stackoverflow.com/help/mcve 
>> >         <http://stackoverflow.com/help/mcve> for a full 
>> >         >>> description. 
>> >         >>> --- 
>> >         >>> You received this message because you are subscribed to the 
>> >         Google 
>> >         >>> Groups "sqlalchemy" group. 
>> >         >>> To unsubscribe from this group and stop receiving emails 
>> >         from it, send 
>> >         >>> an email to sqlalchemy+...@googlegroups.com 
>> >         >>> <mailto:sqlalchemy+unsubscr...@googlegroups.com>. 
>> >         >>> To post to this group, send email to 
>> sqlal...@googlegroups.com 
>> >         >>> <mailto:sqlal...@googlegroups.com>. 
>> >         >>> Visit this group at 
>> >         https://groups.google.com/group/sqlalchemy 
>> >         <https://groups.google.com/group/sqlalchemy>. 
>> >         >>> For more options, visit https://groups.google.com/d/optout 
>> >         <https://groups.google.com/d/optout>. 
>> > 
>> > -- 
>> > SQLAlchemy - 
>> > The Python SQL Toolkit and Object Relational Mapper 
>> > 
>> > http://www.sqlalchemy.org/ 
>> > 
>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> > description. 
>> > --- 
>> > You received this message because you are subscribed to the Google 
>> > Groups "sqlalchemy" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> > an email to sqlalchemy+...@googlegroups.com 
>> > <mailto:sqlalchemy+unsubscr...@googlegroups.com>. 
>> > To post to this group, send email to sqlal...@googlegroups.com 
>> > <mailto:sqlal...@googlegroups.com>. 
>> > Visit this group at https://groups.google.com/group/sqlalchemy. 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
> -- 
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+...@googlegroups.com <javascript:>.
> To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:>.
>
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Large number of polymorphic subclasses

Reply via email to