This repaste of the example contains two refinements to address each of
these issues distinctly. We want to avoid re-entrant invocation of
_loader_for_cls, so putting a flag into attributes handles that. Then
there's the issue of same class coming in that we are already querying,
we can look at the type being loaded in column_descriptions, and this
also solves the re-entrant problem though I think the flag is more
direct. The column_descriptions check likely needs more refinements,
but the general idea is that if the entity being loaded is already a B2,
then you wouldn't do any extra B2 queries (unless you have multiple
levels of inheritance / polymorphic loading in which case that probably
needs adjustment).
The "strong reference" comment, you can try commenting that out in the
test and see what happens. Because this loader trick loads all the
subclasses up front upon seeing the first member of a particular
subtype, it is loading for subsequent instances that aren't seen yet as
well. If we don't make a strong reference to them, they get lost and
the extra attribute loading fails.
The feature I have in development works a lot better because all the
additional loads are done *after* all the entities are loaded, and it
does it using an IN query that only includes those entities that
definitely need the load. I'm not sure if there are other negative side
effects from our loading of the "subclass" entity in some cases ahead of
where the primary query gets at the entity. The recipe here could be
made to do all the extra loads after the primary query but theres no
"after query" hook, you'd have to subclass Query and override __iter__
to add this step.
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a1 = Column(String)
type = Column(String)
__mapper_args__ = {'polymorphic_on': type}
class B1(A):
__tablename__ = 'b1'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b1 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b1'}
class B2(A):
__tablename__ = 'b2'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b2 = Column(String)
b_data = Column(String)
cs = relationship("C", lazy='subquery')
__mapper_args__ = {'polymorphic_identity': 'b2'}
class C(Base):
__tablename__ = 'c'
id = Column(Integer, primary_key=True)
b2_id = Column(ForeignKey('b2.id'))
class B3(A):
__tablename__ = 'b3'
id = Column(Integer, ForeignKey('a.id'), primary_key=True)
b3 = Column(String)
b_data = Column(String)
__mapper_args__ = {'polymorphic_identity': 'b3'}
def _loader_for_cls(target, context):
orig_query = context.query
target_cls = type(target)
if orig_query.column_descriptions[0]['type'] is target_cls:
return None
# take the original query and chance the entity to the subclass
q = orig_query.with_entities(target_cls)
# defer everything that's not PK / polymorphic_on from A. this whole
# bit is just to avoid all those extra columns
to_defer = []
mapper = inspect(target).mapper
inherited = mapper.inherits
while inherited is not None:
for attr in inherited.column_attrs:
if not attr.expression.primary_key and \
attr.expression is not inherited.polymorphic_on:
to_defer.append(attr.key)
for attr in inherited.relationships:
to_defer.append(attr.key)
inherited = inherited.inherits
q = q.options(*[defer(k) for k in to_defer])
q._attributes["load_extra_called"] = True
# store this strong reference so recs don't get lost while
# iterating
return q.all()
@event.listens_for(A, "load", propagate=True)
def load_extra(target, context):
if "load_extra_called" in context.attributes:
return
key = ('loader_by_cls', type(target))
if key not in context.attributes:
context.attributes[key] = _loader_for_cls(target, context)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)
s.add_all([
B1(b_data='b11', b1='b1', a1='b11'),
B2(b_data='b21', b2='b2', a1='b21', cs=[C(), C(), C()]),
B3(b_data='b31', b3='b3', a1='b31'),
B1(b_data='b12', b1='b1', a1='b12'),
B1(b_data='b13', b1='b1', a1='b13'),
B2(b_data='b22', b2='b2', a1='b22', cs=[C(), C()]),
B3(b_data='b32', b3='b3', a1='b12'),
B3(b_data='b33', b3='b3', a1='b33')
])
s.commit()
q = s.query(A).filter(A.a1.like('%2%'))
result = q.all()
print "----- no more SQL ----"
for b in result:
if isinstance(b, B1):
print b.b1
elif isinstance(b, B2):
print b.cs
print b.b2
elif isinstance(b, B3):
print b.b3
On 04/04/2017 12:16 AM, da...@benchling.com wrote:
Hey Mike,
Looks like I spoke too soon -- a few more questions:
Using the example code you posted, we're actually seeing 4 additional
queries (one per result model), rather than the expected 3 (one per
result model type). If you print context.query inside load_extra, I
think it's clear why:
- the loader sequentially processes loaded models
(https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L75)
- on processing the first B2, it executes a query to load all B2's data
- when processing the results of this new query, it now processes the
second B2 record (under a new context)
- load_extra() gets called for the new B2 with a different context than
the first B2
- this now executes _loader_for_cls
Note that if you don't call _loader_for_cls in load_extra, load_extra is
called for each record using the correct context.
A related issue is that if you query directly for B2, it'll redundantly
execute _loader_for_cls.
I think we could solve both of these issues by doing this instead:
q = orig_query.with_entities(target_cls)
if q == orig_query:
return
Unfortunately, the == operator doesn't work for comparing queries. Do
you have a way to compare query equality, or alternatively have a
solution to both of these issues?
Could you also elaborate on what you meant by this comment?
# store this strong reference so recs don't get lost while
# iterating
Thanks again for all your help,
Damon
On Monday, April 3, 2017 at 6:42:17 PM UTC-7, da...@benchling.com wrote:
Thanks a ton for your help, Mike!
We played around with it and are pretty happy with your solution
using the load() event, so we'll be using that moving forward.
Damon
On Wednesday, March 29, 2017 at 2:40:39 PM UTC-7, Mike Bayer wrote:
I have a working version of both loading relationships via IN as
well as
loading joined inheritance subclasses via IN, including your
great idea
that extra eager loaders should continue to work for the subclass
loaders.
I've only tested it with one scenario so far and both patches
have a
long way to go re: tests and documentation.
the scratch test case is currently in the commit message for the
second
patch, which illustrates a base class + subclass load where both
classes
have an additional relationship. This is at
https://gerrit.sqlalchemy.org/#/c/359/
<https://gerrit.sqlalchemy.org/#/c/359/>, note this builds upon the
previous gerrit at https://gerrit.sqlalchemy.org/#/c/352/
<https://gerrit.sqlalchemy.org/#/c/352/>. The work
at the moment will likely fall down with any kind of surprises
but if
you wanted to start running it and finding those cases, that is
always
helpful.
So far this looks promising as something that can be in 1.2 with
perhaps
some "EXPERIMENTAL" warnings surrounding it, but overall 1.2 was
lacking
a "killer ORM feature" so these two would be it.
On 03/23/2017 06:02 PM, mike bayer wrote:
>
>
> On 03/23/2017 02:40 PM, mike bayer wrote:
>>
>>
>> On 03/23/2017 12:53 PM, da...@benchling.com wrote:
>>> Hey Mike,
>>>
>>> Thanks for the quick response!
>>>
>>> For developers that are pretty familiar with the SQLAlchemy
API, but not
>>> so much the internals, would implementing the subqueryloads to
>>> contribute to SA be a reasonable endeavor? Could you
ballpark how much
>>> time how long it might take for us to do it?
>>
>> I haven't looked at what this would take, but it would be
intricate and
>> also need a lot of tests that are not easy to write. Like,
if you
>> worked on it, you could probably get something working, but
then that
>> probably wouldn't be how it really needs to be because all
kinds of
>> things that are simple for simple cases don't work with the
vast amount
>> of edge cases which we have.
>
> I've created an issue for both a new relationship loader and an
> inheritance loader at the same time, since they will use very
similar
> paths, at https://bitbucket.org/zzzeek/sqlalchemy/issues/3944
<https://bitbucket.org/zzzeek/sqlalchemy/issues/3944>. A POC
> for the relationship loader so far looks to be very simple
(but then
> again all the loaders start out very simple...) and is at
> https://gerrit.sqlalchemy.org/352
<https://gerrit.sqlalchemy.org/352>. The same infrastructure and
> approach would also be used for the mapper inheritance loader,
which
> would be enabled via a mapper()-level flag, as well as
per-query using a
> new query option.
>
> I can't yet guarantee this will be a 1.2 thing, I'd have to
get some
> more momentum going w/ test cases and all that. In a 1.2
release it
> would also be a little bit experimental as new loading styles
usually
> have small issues coming up for months or years, as people try
more use
> cases.
>
>
>
>
>
>
>>
>> The routine that's loading the additional columns just for
one object at
>> a time is here:
>>
>>
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635
<https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L635>
>>
>> and then here for the bulk of it:
>>
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588
<https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L2588>
>>
>>
>>
>> But the much harder part would be how to work this step into
the loading
>> infrastructure, which would be somewhere in
>>
https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273
<https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/loading.py#L273>,
>>
>> which is a very intricate function with over a decade of
constant
>> refactorings behind it, and I'd have to think pretty deeply
for awhile
>> how best to do this.
>>
>> Not to mention that there's more than one way to do this
query, there's
>> either re-using the criteria from the original query, or there's
>> injecting the primary key ids of the whole list of objects
into an IN
>> clause after the fact. The latter approach is probably more
efficient
>> but wouldn't work for composite primary keys outside of
Postgresql. As
>> a built in feature I'd want "IN" loading to be an option at
least.
>>
>>
>>
>>>
>>> Regarding STI and relationships, is there any way to do that
but still
>>> get the benefits of JTI? e.g. is there an easy way to resolve
>>> my_base_class_inst.subclass_prop as a proxy to the subclass?
We could
>>> roll our own using __getitem__ but it seems a little hairy.
>>
>> So proxying to a related item wouldn't be terrible, sure you
could use
>> __getitem__ or you could also add descriptors to the primary
class,
>> adding the descriptors to match the "info" could be automated
as well
>> (or even do it in the other direction, add special descriptor
to main
>> class == a column gets added to related class).
>>
>>
>> Doing the thing where you query() for all the related classes
after the
>> fact yourself might not be that terrible. you can use the
load() event
>> which receives the query context that has the query you need
already. I
>> guess it's time for proof of concept. Here's that. You
can see at
>> the end we load all the "bs" without any per-object load.
>>
>> from sqlalchemy import *
>> from sqlalchemy.orm import *
>> from sqlalchemy.ext.declarative import declarative_base
>> from sqlalchemy import event
>>
>> Base = declarative_base()
>>
>>
>> class A(Base):
>> __tablename__ = 'a'
>> id = Column(Integer, primary_key=True)
>> a1 = Column(String)
>> type = Column(String)
>>
>> __mapper_args__ = {'polymorphic_on': type}
>>
>>
>> class B1(A):
>> __tablename__ = 'b1'
>> id = Column(Integer, ForeignKey('a.id <http://a.id>'),
primary_key=True)
>> b1 = Column(String)
>> b_data = Column(String)
>> __mapper_args__ = {'polymorphic_identity': 'b1'}
>>
>>
>> class B2(A):
>> __tablename__ = 'b2'
>> id = Column(Integer, ForeignKey('a.id <http://a.id>'),
primary_key=True)
>> b2 = Column(String)
>> b_data = Column(String)
>> cs = relationship("C", lazy='subquery')
>>
>> __mapper_args__ = {'polymorphic_identity': 'b2'}
>>
>>
>> class C(Base):
>> __tablename__ = 'c'
>> id = Column(Integer, primary_key=True)
>> b2_id = Column(ForeignKey('b2.id <http://b2.id>'))
>>
>>
>> class B3(A):
>> __tablename__ = 'b3'
>> id = Column(Integer, ForeignKey('a.id <http://a.id>'),
primary_key=True)
>> b3 = Column(String)
>> b_data = Column(String)
>> __mapper_args__ = {'polymorphic_identity': 'b3'}
>>
>>
>> def _loader_for_cls(target, context):
>> orig_query = context.query
>>
>> target_cls = type(target)
>>
>> # take the original query and chance the entity to the
subclass
>> q = orig_query.with_entities(target_cls)
>>
>> # defer everything that's not PK / polymorphic_on from A.
this whole
>> # bit is just to avoid all those extra columns
>> to_defer = []
>> mapper = inspect(target).mapper
>> inherited = mapper.inherits
>> while inherited is not None:
>> for attr in inherited.column_attrs:
>> if not attr.expression.primary_key and \
>> attr.expression is not
inherited.polymorphic_on:
>> to_defer.append(attr.key)
>> for attr in inherited.relationships:
>> to_defer.append(attr.key)
>> inherited = inherited.inherits
>> q = q.options(*[defer(k) for k in to_defer])
>>
>> # store this strong reference so recs don't get lost while
>> # iterating
>> return q.all()
>>
>>
>> @event.listens_for(A, "load", propagate=True)
>> def load_extra(target, context):
>> key = ('loader_by_cls', type(target))
>>
>> if key not in context.attributes:
>> context.attributes[key] = _loader_for_cls(target,
context)
>>
>>
>> e = create_engine("sqlite://", echo=True)
>> Base.metadata.create_all(e)
>>
>>
>> s = Session(e)
>> s.add_all([
>> B1(b_data='b11', b1='b1', a1='b11'),
>> B2(b_data='b21', b2='b2', a1='b21', cs=[C(), C(), C()]),
>> B3(b_data='b31', b3='b3', a1='b31'),
>> B1(b_data='b12', b1='b1', a1='b12'),
>> B1(b_data='b13', b1='b1', a1='b13'),
>> B2(b_data='b22', b2='b2', a1='b22', cs=[C(), C()]),
>> B3(b_data='b32', b3='b3', a1='b12'),
>> B3(b_data='b33', b3='b3', a1='b33')
>> ])
>> s.commit()
>>
>>
>> q = s.query(A).filter(A.a1.like('%2%'))
>>
>> result = q.all()
>>
>> print "----- no more SQL ----"
>> for b in result:
>> if isinstance(b, B1):
>> print b.b1
>> elif isinstance(b, B2):
>> print b.cs
>> print b.b2
>> elif isinstance(b, B3):
>> print b.b3
>>
>>
>>
>>
>>
>>>
>>>
>>> Thanks again,
>>> Damon
>>>
>>> On Wednesday, March 22, 2017 at 3:59:45 PM UTC-7, Mike Bayer
wrote:
>>>
>>>
>>>
>>> On 03/22/2017 02:17 PM, da...@benchling.com
<javascript:> wrote:
>>> > Hey all,
>>> >
>>> > We were wondering if you had any advice on having a
large (~10)
>>> number
>>> > of polymorphic subclasses for a single base class. Using
>>> > with_polymorphic: '*' causes SQLAlchemy to joinedload all
>>> subclasses
>>> > like this:
>>> >
>>> > SELECT ...
>>> > FROM base_table
>>> > LEFT OUTER JOIN sub_table_1 ON base_table.id
<http://base_table.id>
>>> <http://base_table.id> = sub_table_1.id
<http://sub_table_1.id> <http://sub_table_1.id>
>>> > LEFT OUTER JOIN sub_table_2 ON base_table.id
<http://base_table.id>
>>> <http://base_table.id> = sub_table_2.id
<http://sub_table_2.id> <http://sub_table_2.id>
>>> > ...
>>> >
>>> > Postgres buckles under too many joins, and these
queries start
>>> taking a
>>> > really long time.
>>> >
>>> > One other note is that for most of our queries, only a
few of
>>> these
>>> > sub-tables are actually needed, so most of the joins
are wasted.
>>> > Unfortunately, ahead of time, we don't know which
tables will be
>>> needed
>>> > -- we're relying on the discriminator.
>>> >
>>> > Ideally, we'd be able to specify that the ORM should
>>> subqueryload the
>>> > subclasses (and only execute subqueries on the types
that are
>>> present).
>>> > This would have to happen both when querying the base
table, but
>>> also
>>> > when accessing relationships. We'd want it to execute
a query on
>>> the
>>> > base table, then execute one query for each present
subclass.
>>> >
>>> > Another solution might be to use some kind of hook that
>>> >
>>> > - is executed after a query returns with results (or
after a
>>> list of
>>> > models are added to the session?)
>>> > - groups the models by type and runs its own
subqueries to load
>>> the data
>>> >
>>> > Any help here is greatly appreciated!
>>>
>>>
>>> The purpose of with_polymorphic is more about being able to
>>> filter on
>>> multiple classes at the same time, which is why it uses
joins, but
>>> these
>>> don't scale to many subclasses. Adding a subquery
load for the
>>> related tables would be something that the ORM can
someday have as a
>>> feature, but it would need a lot of tests to ensure it's
working as
>>> advertised.
>>>
>>> There's a lot of ways to get those other tables loaded
but none of
>>> them
>>> look that great. Turning off with_polymorphic(), one
approach
>>> is to
>>> collect all the distinct types and identifiers from your
query
>>> result;
>>> then do a separate query for each subtype:
>>>
>>> result = session.query(BaseClass).filter(...).all()
>>>
>>> types = sorted([(type(obj), obj.id
<http://obj.id> <http://obj.id>) for obj
>>> in result],
>>> key=lambda t: t[0])
>>>
>>> for type, ids in itertools.groupby(types,
key=lambda t:
>>> t[0]):
>>>
session.query(type).filter(type.id.in_(ids)).all()
>>>
>>> That will emit a query with an INNER JOIN for each class
and will
>>> populate the remaining records in the identity map. The
columns
>>> that
>>> are already loaded are not re-accessed, though the DBAPI
will still
>>> send
>>> them over the network to the cursor. You can try
limiting the
>>> columns
>>> you query for in each statement as well by using the
defer() option.
>>>
>>> Another way is to use with_polymorphic() but to provide
a different
>>> kind
>>> of SQL statement, like a polymorphic_union(). This
would be a
>>> UNION of
>>> statements that each have an inner join. the resulting
SQL is a
>>> beast
>>> but it at least isn't using those left outer joins. I
think you
>>> can
>>> probably use sqlalchemy.orm.util.polymorphic_union()
directly to get
>>> this UNION statement built up automatically.
>>>
>>> Still another way is to reorganize the mappings to use
single-table
>>> inheritance and relationship() to link out to the
related table,
>>> then
>>> the normal "subqueryload" feature to load them as
relationships.
>>> Even
>>> though this way is ugly, I might use this (short of
implementing the
>>> related table subqueryload feature) just to make things
simple.
>>>
>>>
>>> Definitely a feature that should be added but that's not an
>>> immediate
>>> solution.
>>>
>>> >
>>> >
>>> > Thanks,
>>> > Damon
>>> >
>>> > --
>>> > SQLAlchemy -
>>> > The Python SQL Toolkit and Object Relational Mapper
>>> >
>>> > http://www.sqlalchemy.org/
>>> >
>>> > To post example code, please provide an MCVE: Minimal,
Complete,
>>> and
>>> > Verifiable Example. See
http://stackoverflow.com/help/mcve
<http://stackoverflow.com/help/mcve>
>>> <http://stackoverflow.com/help/mcve
<http://stackoverflow.com/help/mcve>> for a full
>>> > description.
>>> > ---
>>> > You received this message because you are subscribed
to the Google
>>> > Groups "sqlalchemy" group.
>>> > To unsubscribe from this group and stop receiving
emails from it,
>>> send
>>> > an email to sqlalchemy+...@googlegroups.com <javascript:>
>>> > <mailto:sqlalchemy+unsubscr...@googlegroups.com
<javascript:>>.
>>> > To post to this group, send email to
sqlal...@googlegroups.com
>>> <javascript:>
>>> > <mailto:sqlal...@googlegroups.com <javascript:>>.
>>> > Visit this group at
https://groups.google.com/group/sqlalchemy
<https://groups.google.com/group/sqlalchemy>
>>> <https://groups.google.com/group/sqlalchemy
<https://groups.google.com/group/sqlalchemy>>.
>>> > For more options, visit
https://groups.google.com/d/optout
<https://groups.google.com/d/optout>
>>> <https://groups.google.com/d/optout
<https://groups.google.com/d/optout>>.
>>>
>>> --
>>> SQLAlchemy -
>>> The Python SQL Toolkit and Object Relational Mapper
>>>
>>> http://www.sqlalchemy.org/
>>>
>>> To post example code, please provide an MCVE: Minimal,
Complete, and
>>> Verifiable Example. See http://stackoverflow.com/help/mcve
<http://stackoverflow.com/help/mcve> for a full
>>> description.
>>> ---
>>> You received this message because you are subscribed to the
Google
>>> Groups "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails
from it, send
>>> an email to sqlalchemy+...@googlegroups.com
>>> <mailto:sqlalchemy+unsubscr...@googlegroups.com>.
>>> To post to this group, send email to sqlal...@googlegroups.com
>>> <mailto:sqlal...@googlegroups.com>.
>>> Visit this group at
https://groups.google.com/group/sqlalchemy
<https://groups.google.com/group/sqlalchemy>.
>>> For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and
Verifiable Example. See http://stackoverflow.com/help/mcve for a full
description.
---
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to sqlalchemy+unsubscr...@googlegroups.com
<mailto:sqlalchemy+unsubscr...@googlegroups.com>.
To post to this group, send email to sqlalchemy@googlegroups.com
<mailto:sqlalchemy@googlegroups.com>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.