Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

Mike Bayer Thu, 07 Feb 2019 09:54:20 -0800

just as a note, don't access .c._all_columns, just iterate over selectable.c


On Thu, Feb 7, 2019 at 10:45 AM Ian Miller <irmille...@gmail.com> wrote:
>
> Hello Mike,
>
> Thank you for your response! I have currently constructed the ORM 
> implementation that you suggested in your response. Here's the code:
>
> def _rebuild_non_interval_query_for_group_by(self, session, query):
>     from sqlalchemy import table, column, select
>     from sqlalchemy.orm import aliased
>     from collections import defaultdict, OrderedDict
>
>     post_metadata = table(
>         "post_metadata", column("post_id"), column("metadata_value")
>     )
>     campaign_metadata = table(
>         "campaign_metadata", column("campaign_id"), column("metadata_value")
>     )
>     asset_metadata = table(
>         "asset_metadata", column("asset_id"), column("metadata_value")
>     )
>
>     vw_asset = table("vw_asset", column("id"))
>     vw_campaign = table("vw_campaign", column("id"))
>     vw_post = table("vw_post", column("id"))
>
>     METADATA_PRIMARY_TABLE_MAP = {
>         asset_metadata.name: vw_asset,
>         campaign_metadata.name: vw_campaign,
>         post_metadata.name: vw_post,
>     }
>
>     METADATA_NAME_TABLE_MAP = {
>         asset_metadata.name: asset_metadata,
>         campaign_metadata.name: campaign_metadata,
>         post_metadata.name: post_metadata,
>     }
>
>     primary_tables = set()
>     metadata_columns_count = defaultdict(int)
>     metadata_alias = OrderedDict()
>     columns = []
>     for c in query.c._all_columns:
>         if c.name == "metadata_value":
>             parent_column = list(c.base_columns)[0]
>             table = parent_column.table
>             primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name])
>             metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] += 1
>             alias_number = 
> metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]]
>             alias_name = "{}_{}".format(table.name, alias_number)
>             alias = aliased(parent_column.table, alias_name)
>             metadata_alias[alias_name] = alias
>             column = alias.c.metadata_value.label(
>                 "{}_{}_{}".format(table.name, alias_number, "metadata_value")
>             )
>             columns.append(column)
>         else:
>             columns.append(c)
>
>     # start constructing query
>     non_interval_query = session.query(*columns).select_from(*primary_tables)
>
>     for alias_name, alias in metadata_alias.items():
>         object_type = 
> self._get_object_type_from_metadata_name(alias.original.name)
>         non_interval_query = (
>             non_interval_query
>             .join(
>                 alias,
>                 getattr(alias.c, "{}_id".format(object_type)) == 
> METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id
>             )
>         )
>
>     non_interval_query = non_interval_query.subquery("non_interval_query")
>
>     return non_interval_query
>
>
>
> The "metadata_alias" values are [('post_metadata_1", alias), 
> ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 and 
> post_metadata_2 alias in your example. However, when I reference these in the 
> join, the aliased table names are not "post_metadata_1" or "post_metadata_2" 
> - they're "post_metadata_3" and "post_metadata_4". I'm unable to figure out 
> why there's a new join seemingly created instead of referencing the aliased 
> tables that were passed in.
>
> Here's the query that the above generates:
>
> SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
>        post_metadata_2.metadata_value AS post_metadata_2_metadata_value,
>        non_interval_query.created_at,
>        non_interval_query.coalesce_1 \nFROM
>   (SELECT post_metadata_3.metadata_value AS metadata_value, 
> post_metadata_4.metadata_value AS metadata_value, vw_post.created_at AS 
> created_at, coalesce(count(DISTINCT vw_post.id), :coalesce_2) AS coalesce_1 
> \nFROM vw_post
>    JOIN post_metadata AS post_metadata_3 ON post_metadata_3.post_id = 
> vw_post.id
>    JOIN post_metadata AS post_metadata_4 ON post_metadata_4.post_id = 
> vw_post.id \nWHERE post_metadata_3.metadata_value IN (:metadata_value_1, 
> :metadata_value_2)
>    AND post_metadata_4.metadata_value IN (:metadata_value_3, 
> :metadata_value_4)
>    AND vw_post.created_at >= :created_at_1
>    AND vw_post.created_at <= :created_at_2
>    AND post_metadata_3.schema_uid = :schema_uid_1
>    AND post_metadata_3.metadata_name = :metadata_name_1
>    AND post_metadata_4.schema_uid = :schema_uid_2
>    AND post_metadata_4.metadata_name = :metadata_name_2
>    AND vw_post.license_id IN (:license_id_1, :license_id_2)
>    GROUP BY vw_post.created_at, post_metadata_3.metadata_value, 
> post_metadata_4.metadata_value, vw_post.created_at) AS non_interval_query,
>                                       vw_post
> JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = vw_post.id
> JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = vw_post.id;
>
>
>
>
> On Tuesday, February 5, 2019 at 11:51:25 AM UTC-5, Ian Miller wrote:
> Hello all -
>
> I am relatively new to using SQLAlchemy for more complex use cases. I am in 
> the process of creating a time series query, but I am unable to reference a 
> column by its alias at the top level of the query.
>
> This is the query that I am trying to address that SQLAlchemy is currently 
> generating:
>
>
>
> SELECT non_interval_query.metadata_value AS non_interval_query_metadata_value,
>        coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1,
>        timestamp
> FROM
>   (SELECT generate_series(date_trunc('day', 
> date('2019-01-06T00:00:00+00:00')), date_trunc('day', 
> date('2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series
> LEFT OUTER JOIN
>   (SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
>           post_metadata_2.metadata_value AS post_metadata_2_metadata_value,
>           vw_post.created_at AS vw_post_created_at,
>           coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1
>    FROM vw_post
>    JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
> vw_post.id
>    JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
> vw_post.id
>    WHERE post_metadata_1.metadata_value IN ('<metadata_values>')
>      AND post_metadata_2.metadata_value IN ('<metadata_value>')
>      AND vw_post.created_at >= '2019-01-06T00:00:00+00:00'
>      AND vw_post.created_at <= '2019-01-12T00:00:00+00:00'
>      AND post_metadata_1.schema_uid = '<schema_uid>'
>      AND post_metadata_1.metadata_name = '<metadata_name>'
>      AND post_metadata_2.schema_uid = '<schema_uid>'
>      AND post_metadata_2.metadata_name = '<metadata_name>'
>      AND vw_post.license_id IN (<license_ids>)
>    GROUP BY vw_post.created_at,
>             post_metadata_1.metadata_value,
>             post_metadata_2.metadata_value,
>             vw_post.created_at) AS non_interval_query ON date_trunc('day', 
> created_at) = timestamp;
> You'll notice that "non_interval_query.metadata_value AS 
> non_interval_query_metadata_value" specified at the beginning of the query is 
> ambiguous due to the 2 "metadata_value" selects in the "non_interval_query" 
> subquery. What I'm trying to do is have 2 selects at the top level - one for 
> "non_interval_query.post_metadata_1_metadata_value" and one for 
> "non_interval_query.post_metadata_2_metadata_value".
>
>
> For reference, here is the code used to generate the above query:
>
>
>
>
> def apply_date_group_by(self, session, query, range_gb_params):
>     field_name = self.db.get("column")
>     model = self._object.get("model")
>
>     if not field_name or not model:
>         raise ValueError("Invalid date group by")
>
>     gb_column = self._build_column()
>     interval = range_gb_params.get("interval")
>     interval_type = range_gb_params.get("interval_type")
>
>     time_series = func.generate_series(
>         func.date_trunc(interval_type, func.date(range_gb_params["start"])),
>         func.date_trunc(interval_type, func.date(range_gb_params["end"])),
>         interval,
>     ).label("timestamp")
>
>     ts_column = column("timestamp")
>
>     time_series_query = session.query(time_series).subquery("time_series")
>     non_interval_query = query.subquery("non_interval_query")
>     # have to replace the original gb_column with the 'timestamp' column
>     # in order to properly merge the dataset into the time series dataset
>     non_gb_columns, gbs = self._prepare_non_gb_columns(
>         ts_column, gb_column, non_interval_query.columns
>     )
>
>     # construct query with correct position passed in from `range_gb_params`
>     query_position = range_gb_params.get("query_index_position", 0)
>     non_gb_columns.insert(query_position, ts_column)
>
>     date_gb_query = session.query(*non_gb_columns).select_from(
>         time_series_query.outerjoin(
>             non_interval_query,
>             func.date_trunc(interval_type, column(field_name)) == ts_column,
>         )
>     )
>
>     if gbs:
>         date_gb_query = date_gb_query.group_by(*gbs)
>
>     return date_gb_query.order_by(ts_column)
>
>
> Any help on this would be greatly appreciated!
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

Reply via email to