Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

Mike Bayer Thu, 07 Feb 2019 14:14:41 -0800

On Thu, Feb 7, 2019 at 2:49 PM Ian Miller <irmille...@gmail.com> wrote:
>
> The reason why I iterated over .c._all_columns was because it shows the 
> references to the 2 "metadata_value" columns, whereas selectable.c only 
> references 1.



so that's a sign that you need to apply labels to these columns
because they are conflicting.   with select(), call
select().apply_labels() so that these two columns are prefixed with
their table name in a label.


>
> For example, selectable.c shows the following:
>
> (Pdb++) list(query.c)
> [Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), 
> Column('created_at', TIMESTAMP(), table=<non_interval_query>), 
> <sqlalchemy.sql.elements.ColumnClause at 0x7f15e1f6fef0; %(139731962070520 
> coalesce)s>]Enter code here...
>
> selectable.c._all_columns shows the following:
>
> (Pdb++) query.c._all_columns
> [Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), 
> Column('metadata_value', VARCHAR(length=255), table=<non_interval_query>), 
> Column('created_at', TIMESTAMP(), table=<non_interval_query>), 
> <sqlalchemy.sql.elements.ColumnClause at 0x7f15e1f6fef0; %(139731962070520 
> coalesce)s>]Enter code here...
>
>
>
>
> On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote:
>>
>> just as a note, don't access .c._all_columns, just iterate over selectable.c
>>
>> On Thu, Feb 7, 2019 at 10:45 AM Ian Miller <irmil...@gmail.com> wrote:
>> >
>> > Hello Mike,
>> >
>> > Thank you for your response! I have currently constructed the ORM 
>> > implementation that you suggested in your response. Here's the code:
>> >
>> > def _rebuild_non_interval_query_for_group_by(self, session, query):
>> >     from sqlalchemy import table, column, select
>> >     from sqlalchemy.orm import aliased
>> >     from collections import defaultdict, OrderedDict
>> >
>> >     post_metadata = table(
>> >         "post_metadata", column("post_id"), column("metadata_value")
>> >     )
>> >     campaign_metadata = table(
>> >         "campaign_metadata", column("campaign_id"), 
>> > column("metadata_value")
>> >     )
>> >     asset_metadata = table(
>> >         "asset_metadata", column("asset_id"), column("metadata_value")
>> >     )
>> >
>> >     vw_asset = table("vw_asset", column("id"))
>> >     vw_campaign = table("vw_campaign", column("id"))
>> >     vw_post = table("vw_post", column("id"))
>> >
>> >     METADATA_PRIMARY_TABLE_MAP = {
>> >         asset_metadata.name: vw_asset,
>> >         campaign_metadata.name: vw_campaign,
>> >         post_metadata.name: vw_post,
>> >     }
>> >
>> >     METADATA_NAME_TABLE_MAP = {
>> >         asset_metadata.name: asset_metadata,
>> >         campaign_metadata.name: campaign_metadata,
>> >         post_metadata.name: post_metadata,
>> >     }
>> >
>> >     primary_tables = set()
>> >     metadata_columns_count = defaultdict(int)
>> >     metadata_alias = OrderedDict()
>> >     columns = []
>> >     for c in query.c._all_columns:
>> >         if c.name == "metadata_value":
>> >             parent_column = list(c.base_columns)[0]
>> >             table = parent_column.table
>> >             primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name])
>> >             metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] += 
>> > 1
>> >             alias_number = 
>> > metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]]
>> >             alias_name = "{}_{}".format(table.name, alias_number)
>> >             alias = aliased(parent_column.table, alias_name)
>> >             metadata_alias[alias_name] = alias
>> >             column = alias.c.metadata_value.label(
>> >                 "{}_{}_{}".format(table.name, alias_number, 
>> > "metadata_value")
>> >             )
>> >             columns.append(column)
>> >         else:
>> >             columns.append(c)
>> >
>> >     # start constructing query
>> >     non_interval_query = 
>> > session.query(*columns).select_from(*primary_tables)
>> >
>> >     for alias_name, alias in metadata_alias.items():
>> >         object_type = 
>> > self._get_object_type_from_metadata_name(alias.original.name)
>> >         non_interval_query = (
>> >             non_interval_query
>> >             .join(
>> >                 alias,
>> >                 getattr(alias.c, "{}_id".format(object_type)) == 
>> > METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id
>> >             )
>> >         )
>> >
>> >     non_interval_query = non_interval_query.subquery("non_interval_query")
>> >
>> >     return non_interval_query
>> >
>> >
>> >
>> > The "metadata_alias" values are [('post_metadata_1", alias), 
>> > ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 
>> > and post_metadata_2 alias in your example. However, when I reference these 
>> > in the join, the aliased table names are not "post_metadata_1" or 
>> > "post_metadata_2" - they're "post_metadata_3" and "post_metadata_4". I'm 
>> > unable to figure out why there's a new join seemingly created instead of 
>> > referencing the aliased tables that were passed in.
>> >
>> > Here's the query that the above generates:
>> >
>> > SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
>> >        post_metadata_2.metadata_value AS post_metadata_2_metadata_value,
>> >        non_interval_query.created_at,
>> >        non_interval_query.coalesce_1 \nFROM
>> >   (SELECT post_metadata_3.metadata_value AS metadata_value, 
>> > post_metadata_4.metadata_value AS metadata_value, vw_post.created_at AS 
>> > created_at, coalesce(count(DISTINCT vw_post.id), :coalesce_2) AS 
>> > coalesce_1 \nFROM vw_post
>> >    JOIN post_metadata AS post_metadata_3 ON post_metadata_3.post_id = 
>> > vw_post.id
>> >    JOIN post_metadata AS post_metadata_4 ON post_metadata_4.post_id = 
>> > vw_post.id \nWHERE post_metadata_3.metadata_value IN (:metadata_value_1, 
>> > :metadata_value_2)
>> >    AND post_metadata_4.metadata_value IN (:metadata_value_3, 
>> > :metadata_value_4)
>> >    AND vw_post.created_at >= :created_at_1
>> >    AND vw_post.created_at <= :created_at_2
>> >    AND post_metadata_3.schema_uid = :schema_uid_1
>> >    AND post_metadata_3.metadata_name = :metadata_name_1
>> >    AND post_metadata_4.schema_uid = :schema_uid_2
>> >    AND post_metadata_4.metadata_name = :metadata_name_2
>> >    AND vw_post.license_id IN (:license_id_1, :license_id_2)
>> >    GROUP BY vw_post.created_at, post_metadata_3.metadata_value, 
>> > post_metadata_4.metadata_value, vw_post.created_at) AS non_interval_query,
>> >                                       vw_post
>> > JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
>> > vw_post.id
>> > JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
>> > vw_post.id;
>> >
>> >
>> >
>> >
>> > On Tuesday, February 5, 2019 at 11:51:25 AM UTC-5, Ian Miller wrote:
>> > Hello all -
>> >
>> > I am relatively new to using SQLAlchemy for more complex use cases. I am 
>> > in the process of creating a time series query, but I am unable to 
>> > reference a column by its alias at the top level of the query.
>> >
>> > This is the query that I am trying to address that SQLAlchemy is currently 
>> > generating:
>> >
>> >
>> >
>> > SELECT non_interval_query.metadata_value AS 
>> > non_interval_query_metadata_value,
>> >        coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1,
>> >        timestamp
>> > FROM
>> >   (SELECT generate_series(date_trunc('day', 
>> > date('2019-01-06T00:00:00+00:00')), date_trunc('day', 
>> > date('2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series
>> > LEFT OUTER JOIN
>> >   (SELECT post_metadata_1.metadata_value AS post_metadata_1_metadata_value,
>> >           post_metadata_2.metadata_value AS post_metadata_2_metadata_value,
>> >           vw_post.created_at AS vw_post_created_at,
>> >           coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1
>> >    FROM vw_post
>> >    JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
>> > vw_post.id
>> >    JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
>> > vw_post.id
>> >    WHERE post_metadata_1.metadata_value IN ('<metadata_values>')
>> >      AND post_metadata_2.metadata_value IN ('<metadata_value>')
>> >      AND vw_post.created_at >= '2019-01-06T00:00:00+00:00'
>> >      AND vw_post.created_at <= '2019-01-12T00:00:00+00:00'
>> >      AND post_metadata_1.schema_uid = '<schema_uid>'
>> >      AND post_metadata_1.metadata_name = '<metadata_name>'
>> >      AND post_metadata_2.schema_uid = '<schema_uid>'
>> >      AND post_metadata_2.metadata_name = '<metadata_name>'
>> >      AND vw_post.license_id IN (<license_ids>)
>> >    GROUP BY vw_post.created_at,
>> >             post_metadata_1.metadata_value,
>> >             post_metadata_2.metadata_value,
>> >             vw_post.created_at) AS non_interval_query ON date_trunc('day', 
>> > created_at) = timestamp;
>> > You'll notice that "non_interval_query.metadata_value AS 
>> > non_interval_query_metadata_value" specified at the beginning of the query 
>> > is ambiguous due to the 2 "metadata_value" selects in the 
>> > "non_interval_query" subquery. What I'm trying to do is have 2 selects at 
>> > the top level - one for 
>> > "non_interval_query.post_metadata_1_metadata_value" and one for 
>> > "non_interval_query.post_metadata_2_metadata_value".
>> >
>> >
>> > For reference, here is the code used to generate the above query:
>> >
>> >
>> >
>> >
>> > def apply_date_group_by(self, session, query, range_gb_params):
>> >     field_name = self.db.get("column")
>> >     model = self._object.get("model")
>> >
>> >     if not field_name or not model:
>> >         raise ValueError("Invalid date group by")
>> >
>> >     gb_column = self._build_column()
>> >     interval = range_gb_params.get("interval")
>> >     interval_type = range_gb_params.get("interval_type")
>> >
>> >     time_series = func.generate_series(
>> >         func.date_trunc(interval_type, 
>> > func.date(range_gb_params["start"])),
>> >         func.date_trunc(interval_type, func.date(range_gb_params["end"])),
>> >         interval,
>> >     ).label("timestamp")
>> >
>> >     ts_column = column("timestamp")
>> >
>> >     time_series_query = session.query(time_series).subquery("time_series")
>> >     non_interval_query = query.subquery("non_interval_query")
>> >     # have to replace the original gb_column with the 'timestamp' column
>> >     # in order to properly merge the dataset into the time series dataset
>> >     non_gb_columns, gbs = self._prepare_non_gb_columns(
>> >         ts_column, gb_column, non_interval_query.columns
>> >     )
>> >
>> >     # construct query with correct position passed in from 
>> > `range_gb_params`
>> >     query_position = range_gb_params.get("query_index_position", 0)
>> >     non_gb_columns.insert(query_position, ts_column)
>> >
>> >     date_gb_query = session.query(*non_gb_columns).select_from(
>> >         time_series_query.outerjoin(
>> >             non_interval_query,
>> >             func.date_trunc(interval_type, column(field_name)) == 
>> > ts_column,
>> >         )
>> >     )
>> >
>> >     if gbs:
>> >         date_gb_query = date_gb_query.group_by(*gbs)
>> >
>> >     return date_gb_query.order_by(ts_column)
>> >
>> >
>> > Any help on this would be greatly appreciated!
>> >
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and 
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google Groups 
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an 
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> ---
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

Reply via email to