Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

Ian Miller Thu, 07 Feb 2019 16:04:53 -0800

:facepalm: i should have seen that much sooner. Thank you so much for your 
help. No longer need to do any of that crazy stuff I attempted!


On Thursday, February 7, 2019 at 5:13:41 PM UTC-5, Mike Bayer wrote:
>
> On Thu, Feb 7, 2019 at 2:49 PM Ian Miller <irmil...@gmail.com 
> <javascript:>> wrote: 
> > 
> > The reason why I iterated over .c._all_columns was because it shows the 
> references to the 2 "metadata_value" columns, whereas selectable.c only 
> references 1. 
>
>
> so that's a sign that you need to apply labels to these columns 
> because they are conflicting.   with select(), call 
> select().apply_labels() so that these two columns are prefixed with 
> their table name in a label. 
>
>
> > 
> > For example, selectable.c shows the following: 
> > 
> > (Pdb++) list(query.c) 
> > [Column('metadata_value', VARCHAR(length=255), 
> table=<non_interval_query>), Column('created_at', TIMESTAMP(), 
> table=<non_interval_query>), <sqlalchemy.sql.elements.ColumnClause at 
> 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here... 
> > 
> > selectable.c._all_columns shows the following: 
> > 
> > (Pdb++) query.c._all_columns 
> > [Column('metadata_value', VARCHAR(length=255), 
> table=<non_interval_query>), Column('metadata_value', VARCHAR(length=255), 
> table=<non_interval_query>), Column('created_at', TIMESTAMP(), 
> table=<non_interval_query>), <sqlalchemy.sql.elements.ColumnClause at 
> 0x7f15e1f6fef0; %(139731962070520 coalesce)s>]Enter code here... 
> > 
> > 
> > 
> > 
> > On Thursday, February 7, 2019 at 12:53:51 PM UTC-5, Mike Bayer wrote: 
> >> 
> >> just as a note, don't access .c._all_columns, just iterate over 
> selectable.c 
> >> 
> >> On Thu, Feb 7, 2019 at 10:45 AM Ian Miller <irmil...@gmail.com> wrote: 
> >> > 
> >> > Hello Mike, 
> >> > 
> >> > Thank you for your response! I have currently constructed the ORM 
> implementation that you suggested in your response. Here's the code: 
> >> > 
> >> > def _rebuild_non_interval_query_for_group_by(self, session, query): 
> >> >     from sqlalchemy import table, column, select 
> >> >     from sqlalchemy.orm import aliased 
> >> >     from collections import defaultdict, OrderedDict 
> >> > 
> >> >     post_metadata = table( 
> >> >         "post_metadata", column("post_id"), column("metadata_value") 
> >> >     ) 
> >> >     campaign_metadata = table( 
> >> >         "campaign_metadata", column("campaign_id"), 
> column("metadata_value") 
> >> >     ) 
> >> >     asset_metadata = table( 
> >> >         "asset_metadata", column("asset_id"), 
> column("metadata_value") 
> >> >     ) 
> >> > 
> >> >     vw_asset = table("vw_asset", column("id")) 
> >> >     vw_campaign = table("vw_campaign", column("id")) 
> >> >     vw_post = table("vw_post", column("id")) 
> >> > 
> >> >     METADATA_PRIMARY_TABLE_MAP = { 
> >> >         asset_metadata.name: vw_asset, 
> >> >         campaign_metadata.name: vw_campaign, 
> >> >         post_metadata.name: vw_post, 
> >> >     } 
> >> > 
> >> >     METADATA_NAME_TABLE_MAP = { 
> >> >         asset_metadata.name: asset_metadata, 
> >> >         campaign_metadata.name: campaign_metadata, 
> >> >         post_metadata.name: post_metadata, 
> >> >     } 
> >> > 
> >> >     primary_tables = set() 
> >> >     metadata_columns_count = defaultdict(int) 
> >> >     metadata_alias = OrderedDict() 
> >> >     columns = [] 
> >> >     for c in query.c._all_columns: 
> >> >         if c.name == "metadata_value": 
> >> >             parent_column = list(c.base_columns)[0] 
> >> >             table = parent_column.table 
> >> >             primary_tables.add(METADATA_PRIMARY_TABLE_MAP[table.name]) 
>
> >> >             metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> += 1 
> >> >             alias_number = 
> metadata_columns_count[METADATA_NAME_TABLE_MAP[table.name]] 
> >> >             alias_name = "{}_{}".format(table.name, alias_number) 
> >> >             alias = aliased(parent_column.table, alias_name) 
> >> >             metadata_alias[alias_name] = alias 
> >> >             column = alias.c.metadata_value.label( 
> >> >                 "{}_{}_{}".format(table.name, alias_number, 
> "metadata_value") 
> >> >             ) 
> >> >             columns.append(column) 
> >> >         else: 
> >> >             columns.append(c) 
> >> > 
> >> >     # start constructing query 
> >> >     non_interval_query = 
> session.query(*columns).select_from(*primary_tables) 
> >> > 
> >> >     for alias_name, alias in metadata_alias.items(): 
> >> >         object_type = self._get_object_type_from_metadata_name(
> alias.original.name) 
> >> >         non_interval_query = ( 
> >> >             non_interval_query 
> >> >             .join( 
> >> >                 alias, 
> >> >                 getattr(alias.c, "{}_id".format(object_type)) == 
> METADATA_PRIMARY_TABLE_MAP[alias.original.name].c.id 
> >> >             ) 
> >> >         ) 
> >> > 
> >> >     non_interval_query = 
> non_interval_query.subquery("non_interval_query") 
> >> > 
> >> >     return non_interval_query 
> >> > 
> >> > 
> >> > 
> >> > The "metadata_alias" values are [('post_metadata_1", alias), 
> ('post_metadata_2', alias)] - the alias correspond to the post_metadata_1 
> and post_metadata_2 alias in your example. However, when I reference these 
> in the join, the aliased table names are not "post_metadata_1" or 
> "post_metadata_2" - they're "post_metadata_3" and "post_metadata_4". I'm 
> unable to figure out why there's a new join seemingly created instead of 
> referencing the aliased tables that were passed in. 
> >> > 
> >> > Here's the query that the above generates: 
> >> > 
> >> > SELECT post_metadata_1.metadata_value AS 
> post_metadata_1_metadata_value, 
> >> >        post_metadata_2.metadata_value AS 
> post_metadata_2_metadata_value, 
> >> >        non_interval_query.created_at, 
> >> >        non_interval_query.coalesce_1 \nFROM 
> >> >   (SELECT post_metadata_3.metadata_value AS metadata_value, 
> post_metadata_4.metadata_value AS metadata_value, vw_post.created_at AS 
> created_at, coalesce(count(DISTINCT vw_post.id), :coalesce_2) AS 
> coalesce_1 \nFROM vw_post 
> >> >    JOIN post_metadata AS post_metadata_3 ON post_metadata_3.post_id = 
> vw_post.id 
> >> >    JOIN post_metadata AS post_metadata_4 ON post_metadata_4.post_id = 
> vw_post.id \nWHERE post_metadata_3.metadata_value IN (:metadata_value_1, 
> :metadata_value_2) 
> >> >    AND post_metadata_4.metadata_value IN (:metadata_value_3, 
> :metadata_value_4) 
> >> >    AND vw_post.created_at >= :created_at_1 
> >> >    AND vw_post.created_at <= :created_at_2 
> >> >    AND post_metadata_3.schema_uid = :schema_uid_1 
> >> >    AND post_metadata_3.metadata_name = :metadata_name_1 
> >> >    AND post_metadata_4.schema_uid = :schema_uid_2 
> >> >    AND post_metadata_4.metadata_name = :metadata_name_2 
> >> >    AND vw_post.license_id IN (:license_id_1, :license_id_2) 
> >> >    GROUP BY vw_post.created_at, post_metadata_3.metadata_value, 
> post_metadata_4.metadata_value, vw_post.created_at) AS non_interval_query, 
> >> >                                       vw_post 
> >> > JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
> vw_post.id 
> >> > JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
> vw_post.id; 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > On Tuesday, February 5, 2019 at 11:51:25 AM UTC-5, Ian Miller wrote: 
> >> > Hello all - 
> >> > 
> >> > I am relatively new to using SQLAlchemy for more complex use cases. I 
> am in the process of creating a time series query, but I am unable to 
> reference a column by its alias at the top level of the query. 
> >> > 
> >> > This is the query that I am trying to address that SQLAlchemy is 
> currently generating: 
> >> > 
> >> > 
> >> > 
> >> > SELECT non_interval_query.metadata_value AS 
> non_interval_query_metadata_value, 
> >> >        coalesce(sum(non_interval_query.coalesce_2), 0) AS coalesce_1, 
> >> >        timestamp 
> >> > FROM 
> >> >   (SELECT generate_series(date_trunc('day', 
> date('2019-01-06T00:00:00+00:00')), date_trunc('day', 
> date('2019-01-12T00:00:00+00:00')), '1 day') AS timestamp) AS time_series 
> >> > LEFT OUTER JOIN 
> >> >   (SELECT post_metadata_1.metadata_value AS 
> post_metadata_1_metadata_value, 
> >> >           post_metadata_2.metadata_value AS 
> post_metadata_2_metadata_value, 
> >> >           vw_post.created_at AS vw_post_created_at, 
> >> >           coalesce(count(DISTINCT vw_post.id), 0) AS coalesce_1 
> >> >    FROM vw_post 
> >> >    JOIN post_metadata AS post_metadata_1 ON post_metadata_1.post_id = 
> vw_post.id 
> >> >    JOIN post_metadata AS post_metadata_2 ON post_metadata_2.post_id = 
> vw_post.id 
> >> >    WHERE post_metadata_1.metadata_value IN ('<metadata_values>') 
> >> >      AND post_metadata_2.metadata_value IN ('<metadata_value>') 
> >> >      AND vw_post.created_at >= '2019-01-06T00:00:00+00:00' 
> >> >      AND vw_post.created_at <= '2019-01-12T00:00:00+00:00' 
> >> >      AND post_metadata_1.schema_uid = '<schema_uid>' 
> >> >      AND post_metadata_1.metadata_name = '<metadata_name>' 
> >> >      AND post_metadata_2.schema_uid = '<schema_uid>' 
> >> >      AND post_metadata_2.metadata_name = '<metadata_name>' 
> >> >      AND vw_post.license_id IN (<license_ids>) 
> >> >    GROUP BY vw_post.created_at, 
> >> >             post_metadata_1.metadata_value, 
> >> >             post_metadata_2.metadata_value, 
> >> >             vw_post.created_at) AS non_interval_query ON 
> date_trunc('day', created_at) = timestamp; 
> >> > You'll notice that "non_interval_query.metadata_value AS 
> non_interval_query_metadata_value" specified at the beginning of the query 
> is ambiguous due to the 2 "metadata_value" selects in the 
> "non_interval_query" subquery. What I'm trying to do is have 2 selects at 
> the top level - one for "non_interval_query.post_metadata_1_metadata_value" 
> and one for "non_interval_query.post_metadata_2_metadata_value". 
> >> > 
> >> > 
> >> > For reference, here is the code used to generate the above query: 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > def apply_date_group_by(self, session, query, range_gb_params): 
> >> >     field_name = self.db.get("column") 
> >> >     model = self._object.get("model") 
> >> > 
> >> >     if not field_name or not model: 
> >> >         raise ValueError("Invalid date group by") 
> >> > 
> >> >     gb_column = self._build_column() 
> >> >     interval = range_gb_params.get("interval") 
> >> >     interval_type = range_gb_params.get("interval_type") 
> >> > 
> >> >     time_series = func.generate_series( 
> >> >         func.date_trunc(interval_type, 
> func.date(range_gb_params["start"])), 
> >> >         func.date_trunc(interval_type, 
> func.date(range_gb_params["end"])), 
> >> >         interval, 
> >> >     ).label("timestamp") 
> >> > 
> >> >     ts_column = column("timestamp") 
> >> > 
> >> >     time_series_query = 
> session.query(time_series).subquery("time_series") 
> >> >     non_interval_query = query.subquery("non_interval_query") 
> >> >     # have to replace the original gb_column with the 'timestamp' 
> column 
> >> >     # in order to properly merge the dataset into the time series 
> dataset 
> >> >     non_gb_columns, gbs = self._prepare_non_gb_columns( 
> >> >         ts_column, gb_column, non_interval_query.columns 
> >> >     ) 
> >> > 
> >> >     # construct query with correct position passed in from 
> `range_gb_params` 
> >> >     query_position = range_gb_params.get("query_index_position", 0) 
> >> >     non_gb_columns.insert(query_position, ts_column) 
> >> > 
> >> >     date_gb_query = session.query(*non_gb_columns).select_from( 
> >> >         time_series_query.outerjoin( 
> >> >             non_interval_query, 
> >> >             func.date_trunc(interval_type, column(field_name)) == 
> ts_column, 
> >> >         ) 
> >> >     ) 
> >> > 
> >> >     if gbs: 
> >> >         date_gb_query = date_gb_query.group_by(*gbs) 
> >> > 
> >> >     return date_gb_query.order_by(ts_column) 
> >> > 
> >> > 
> >> > Any help on this would be greatly appreciated! 
> >> > 
> >> > 
> >> > -- 
> >> > SQLAlchemy - 
> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> > 
> >> > http://www.sqlalchemy.org/ 
> >> > 
> >> > To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description. 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> Groups "sqlalchemy" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to sqlalchemy+...@googlegroups.com. 
> >> > To post to this group, send email to sqlal...@googlegroups.com. 
> >> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to sqlalchemy+...@googlegroups.com <javascript:>. 
> > To post to this group, send email to sqlal...@googlegroups.com 
> <javascript:>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Re: Unable to reference label in subquery at top level of query

Reply via email to