Re: [sqlalchemy] group_by and lazy=False relationship problem

Robert Tasarz Sat, 01 Feb 2014 07:32:46 -0800

On 01/31/2014 10:56 PM, Michael Bayer wrote:
> On Jan 31, 2014, at 2:49 PM, Robert Tasarz <robert.tas...@gmail.com> wrote:
>> Hi,
>>
>> I'm almost sure this is a bug, but maybe I'm missing something obvious. 
>> I've tested it with Python 3.3, SQLAlchemy 0.9.1,  PostgreSQL 9.3 and 
>> reduced the issue to the following code:
>>
>> Here's most relevant part of the exception:
>> Traceback (most recent call last):
>>   File 
>> "/srv/websites/sika/local/lib/python3.3/dist-packages/sqlalchemy/engine/base.py",
>>  line 867, in _execute_context
>>     context)
>>   File 
>> "/srv/websites/sika/local/lib/python3.3/dist-packages/sqlalchemy/engine/default.py",
>>  line 388, in do_execute
>>     cursor.execute(statement, parameters)
>> psycopg2.ProgrammingError: column "a_1.id" must appear in the GROUP BY 
>> clause or be used in an aggregate function
>> LINE 1: SELECT p.id AS p_id, sum(i.value) AS sum1, a_1.id AS a_1_id,…
> When joined eager loading is used, if LIMIT is also applied as you have in 
> that [:10], SQLAlchemy wraps the query to be limited inside of a subquery, so 
> that the LEFT OUTER JOIN for the eager loading can safely load all related 
> rows without being subject to the LIMIT.   This wrapping doesn’t occur when 
> GROUP BY is used - GROUP BY is not usually used in conjunction with loads of 
> a full entity, as this is typically inefficient - it is usually used with a 
> query that is only loading individual columns, and then if the query overall 
> is to return entities, a JOIN against the GROUP BY as a subquery is used (see 
> http://weblogs.sqlteam.com/jeffs/archive/2005/12/14/8546.aspx for a 
> description of this).


Yeah, I understand the process behind, as despite using SQLAlchemy
for quite some time already, I'm still mostly "thinking in SQL"
first. This time I've been building query in interactive shell and
in such cases I'm usually using something like [:4] to roughly see
if a query that SA produces does what I'm expecting. So it surprised
me that when I've finally got what I needed calling .all() on it
thrown an exception. That's why I've started digging through the
internet, but found nothing relevant, so posted here.
Regarding article you've linked. That's all nice and true, but when
I'm using ORM I'm expecting some performance penalty sometimes in
exchange for short and clean syntax abstracting away inner details.
This time for my particular case it is around 30%-40% which gives
~60ms longer execution time, so I can live with it. Now let's
compare code snippets based on my example classes from the first
post (assuming no eager load and no exception thrown).

Original:
sumq = func.sum(I.value).label('sum1')
result = sess.query(P,
sumq).outerjoin(P.items).group_by(P).order_by(sumq).all()

Optimized:
subq = db.query(I._elem,
func.sum(I.value).label('sum1')).group_by(I._elem).subquery()
result = sess.query(P,
subq.c.sum1).outerjoin(subq).order_by(subq.c.sum1).all()

As you can see the second case is quite noticeably longer and more
complicated, moreover it is "leaking abstraction" as it needs to use
I._elem which is an implementation detail how relations are built in
SQL (can it be rewritten without it?). And it is after all really
simple query. In comparison, the two queries from linked article are
similar in size and the second (preferred) one is IMO more clear
what developer wants to achieve - of course mostly because plain SQL
is much more verbose in the first place. And I'm writing it to show
that IMO, unlike in SQL, in SA first use case has some advantages in
not performance critical scenarios.

> So in that sense, you’re getting that subquery behavior out of the box due to 
> the LIMIT, but with plain all(), this isn’t applied.  you can call 
> from_self() to produce the self-wrapping effect:
>
> sess.query(P,sumq).outerjoin(P.items).group_by(P).order_by(sumq).from_self().all()

Thanks, thats the method I've been looking for :). I even thought
about something like
sess.query(sess.query(P,....)[...].subquery()).all() but thought
it's too ugly and certainly something cleaner should be possible.
Don't know how I've missed .from_self() when scanning through all
methods of the query object for the dozen time.

> perhaps the presence of GROUP BY should be added to the list of things that 
> cause the automatic wrapping with joined eager loading to occur, though the 
> current behavior has been this way for nearly 8 years and nobody’s asked for 
> it before.

I guess it should, after all a bug that unveils itself once every
eight years is still a bug ;). Should I file a bug report?

regards,
  Robert Tasarz

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] group_by and lazy=False relationship problem

Reply via email to