Thanks for the detailed explanation. I think I should explain why I
run into this problem, as it's in line with emitting a warning on this
use case.

1. I wrote the query without any kind of alias in SQLAlchemy.
2. Received an error from PostgreSQL "subquery in FROM must have an
alias", so I added it in both places "to be safe". One for the
subquery, one for the join.
3. After this the query worked - so I confusingly thought I did the
right thing - but became unusably slow.

Then I tried to debug step by step and arrived that the adding of
alias to join() made it into a very different query which happened to
be slow.

I think I'm not the only user who adds aliases "everywhere" once
receiving that alias is needed error. I thought that the j = ... is a
subquery as it's in a select_from() line.

So yes, in my case adding those warnings would be very helpful,
directing me in the right direction.


Zsolt

On Fri, 29 Nov 2019 at 04:50, Mike Bayer <mike...@zzzcomputing.com> wrote:
>
>
>
> On Thu, Nov 28, 2019, at 4:31 PM, Zsolt Ero wrote:
>
> I have the following query:
>
> min_values = (
>         sa.select(
>             [
>                 table.c.region_id,
>                 table.c.operator,
>                 table.c.day_str,
>                 
> sa.func.min(table.c.processed_time).label('min_processed_time'),
>             ]
>         )
>         .group_by(table.c.region_id, table.c.operator, table.c.day_str)
>         .order_by(table.c.region_id, table.c.operator, table.c.day_str)
>     ).alias('mv')
>
>     j = min_values.join(
>         config_mds,
>         sa.and_(
>             min_values.c.region_id == config_mds.c.region_id,
>             min_values.c.operator == config_mds.c.operator,
>         ),
>     ).alias('j')
>
>     stmt = (
>         sa.select([min_values.c.region_id, min_values.c.operator, 
> min_values.c.day_str])
>         .select_from(j)
>         .where(min_values.c.min_processed_time < 
> config_mds.c.last_reprocess_time)
>     )
>
>
> Which works well, as long as I do not add the .alias('j') part. If I add it, 
> it generates a wildly different SQL code, which doesn't finish in reasonable 
> time (I terminated it after a few minutes whereas the first one finishes in a 
> few seconds.
>
>
> sure, the "j" is shorthand for calling a SELECT of the join, then aliasing 
> it.   you can no longer refer to 'min_values" in the subsequent query because 
> that means you want to add it to the FROM clause separately from the "j" and 
> it will go poorly.
>
>
>
> Here is the generated SQL without alias('j'):
>
> SELECT
>   mv.region_id,
>   mv.operator,
>   mv.day_str
> FROM
>   (
>     SELECT
>       status_changes.region_id AS region_id,
>       status_changes.operator AS OPERATOR,
>       status_changes.day_str AS day_str,
>       min(status_changes.processed_time) AS min_processed_time
>     FROM
>       status_changes
>     GROUP BY
>       status_changes.region_id,
>       status_changes.operator,
>       status_changes.day_str
>     ORDER BY
>       status_changes.region_id,
>       status_changes.operator,
>       status_changes.day_str
>   ) AS mv
>   JOIN config_mds ON mv.region_id = config_mds.region_id
>   AND mv.operator = config_mds.operator
> WHERE
>   config_mds.last_reprocess_time > mv.min_processed_time
>
>
> And here is with the added alias('j'), which doesn't terminates:
>
> SELECT
>   mv.region_id,
>   mv.operator,
>   mv.day_str
> FROM
>   (
>     SELECT
>       status_changes.region_id AS region_id,
>       status_changes.operator AS OPERATOR,
>       status_changes.day_str AS day_str,
>       min(status_changes.processed_time) AS min_processed_time
>     FROM
>       status_changes
>     GROUP BY
>       status_changes.region_id,
>       status_changes.operator,
>       status_changes.day_str
>     ORDER BY
>       status_changes.region_id,
>       status_changes.operator,
>       status_changes.day_str
>   ) AS mv,
>   config_mds,
>   (
>     SELECT
>       mv.region_id AS mv_region_id,
>       mv.operator AS mv_operator,
>       mv.day_str AS mv_day_str,
>       mv.min_processed_time AS mv_min_processed_time,
>       config_mds.region_id AS config_mds_region_id,
>       config_mds.operator AS config_mds_operator,
>       config_mds.disable_download AS config_mds_disable_download,
>       config_mds.trips_url AS config_mds_trips_url,
>       config_mds.status_changes_url AS config_mds_status_changes_url,
>       config_mds.url_headers AS config_mds_url_headers,
>       config_mds.login_data AS config_mds_login_data,
>       config_mds.use_different_region AS config_mds_use_different_region,
>       config_mds.comment AS config_mds_comment,
>       config_mds.last_reprocess_time AS config_mds_last_reprocess_time,
>       config_mds.token_cache AS config_mds_token_cache
>     FROM
>       (
>         SELECT
>           status_changes.region_id AS region_id,
>           status_changes.operator AS OPERATOR,
>           status_changes.day_str AS day_str,
>           min(status_changes.processed_time) AS min_processed_time
>         FROM
>           status_changes
>         GROUP BY
>           status_changes.region_id,
>           status_changes.operator,
>           status_changes.day_str
>         ORDER BY
>           status_changes.region_id,
>           status_changes.operator,
>           status_changes.day_str
>       ) AS mv
>       JOIN config_mds ON mv.region_id = config_mds.region_id
>       AND mv.operator = config_mds.operator
>   ) AS j
> WHERE
>   config_mds.last_reprocess_time > mv.min_processed_time;
>
>
>
> This behaviour confuses me a bit. Why would adding an alias make a difference 
> to the generated query?
>
>
>
> you have:
>
> min_values = "(SELECT .... FROM ... ) AS mv"
>
> select([min_values.c.x]) =  "SELECT mv.x FROM (SELECT ... FROM ..) AS mv"
>
>
> then you say, "j = min_values.join(config_mds, ...).alias("j")"
>
> this is the same as saying:
>
> j = select([min_values.join(config_mds, ...)]).alias("j")
>
> an "alias" is only against a SELECT or a table;  there is no "SELECT * FROM 
> (x JOIN y) AS j" syntax.     Did a quick test and apparently Postgresql 
> allows that syntax but it fails on MySQL.   SQLite seems to support it but 
> that's a newer thing as SQLite didn't support nested JOINs at all some years 
> ago.   Adding the "AS <name>" to the plain parenthesies like that It's not 
> something I've seen in SQL before and SQLAlchemy has no concept of that right 
> now.
>
> so given that join().alias() is shorthand for select([join()]).alias(), this 
> means:
>
> select([j.c.x]) =   "SELECT j.x FROM (SELECT mv.*, config_mds.* FROM (SELECT 
> ... FROM ..) AS mv JOIN config_mds ON ...) AS j"
>
> In SQLAlchemy 1.4 there's been a big scale-back of how things like alias() 
> and join() can be used in order to decrease this kind of confusion, and I was 
> sort of hoping this particular use was also deprecated but apparently it 
> isn't.   so https://github.com/sqlalchemy/sqlalchemy/issues/5010 is added and 
> the join().alias() usage will emit a deprecation warning in 1.4 because it 
> does a thing that by today's standards is surprising.
>
> anyway, if you mix the join() and the join().alias() together, like:
>
> select([min_values.c.x]).select_from(j)...
>
> this is like:
>
> "SELECT mv.x FROM (SELECT ... FROM ..) AS mv, (SELECT mv.*, config_mds.* FROM 
> (SELECT ... FROM ..) AS mv JOIN config_mds ON ...) AS j"
>
> that is, saying select_from(j) doesn't mean that's the only thing you are 
> selecting from, it also has to select from min_values directly.
>
> TLDR join().alias() is doing a large implicit thing based on older 
> assumptions, when parenthesized "(a JOIN b)" clauses were not accepted by 
> SQLite and possibly others and SQLAlchemy always used SELECT... in order to 
> group joins.     you can group joins in parentheses now but you still can't 
> give them an alias() without a SELECT implied.
>
>
>
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/bbf11796-b9ed-4b6f-9188-c35585dfc01d%40googlegroups.com.
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to a topic in the Google 
> Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/sqlalchemy/nPKcBGWWY3s/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/f26cf2af-cb16-4488-b75b-baca3c457e00%40www.fastmail.com.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CAKw-smA-O2CPJwTys6r2KE0fFV%2BdnNL4jEgnB66VcGbJKO%2BObw%40mail.gmail.com.

Reply via email to