Unfortunatelly, Beam SQL doesn’t support COUNT(DISTINCT) aggregation. More details about “why" is on this discussion [1] and the related open issue for that here [2].
— Alexey [1] https://lists.apache.org/thread/hvmy6d5dls3m8xcnf74hfmy1xxfgj2xh [2] https://github.com/apache/beam/issues/19398 > On 2 Nov 2023, at 20:52, Goutham Miryala <goutham.miry...@chartboost.com> > wrote: > > Hey Team, > > We're trying to implement an aggregation which involves several trillions of > rows using apache beam sql. > However I'm getting an exception > Exception in thread "main" java.lang.UnsupportedOperationException: Does not > support COUNT DISTINCT > > Here's the code for doing the aggregation: > > PCollection<Row> aggregate = joinedCollection.apply("Aggregation", > SqlTransform.query("SELECT" + > " exchange_name as adexchange," + > " strategy," + > " platform," + > " segment," + > " auction_type," + > " placement_type," + > " country," + > " COALESCE(loss, 0) AS loss_code," + > " COUNT(DISTINCT identifier) AS uniques," + > " no_bid_reason," + > " SUM(1) AS auctions," + > " SUM(CASE WHEN cpm_bid > 0 THEN 1 ELSE 0 END) AS > bids," + > " SUM(cpm_bid) AS total_bid_price," + > " SUM(CASE WHEN loss = 0 THEN 1 END) AS wins," + > " app_bundle AS app_bundle," + > " model_id AS model_id," + > " identifier_type AS identifier_type," + > " promotion_id AS promotion_id," + > " sub_floor_bid_min_price_cohort AS > sub_floor_bid_min_price_cohort," + > " bf_match_experiment AS bf_match_experiment," + > " bep_matched_floor AS bep_matched_floor," + > " SUM(p_ctr) AS p_ctr_total," + > " SUM(p_ir) AS p_ir_total," + > " SUM(p_cpa) AS p_cpa_total," + > " SUM(arppu) AS arppu_total," + > " SUM(spend) AS spend_total," + > " SUM(cpm_price) AS cpm_price_total" + > " FROM" + > " PCOLLECTION" + > " GROUP BY > exchange_name,strategy,platform,segment,auction_type" + > ",placement_type,country,loss,no_bid_reason,app_bundle" + > > ",model_id,identifier_type,promotion_id,sub_floor_bid_min_price_cohort" + > ",bf_match_experiment,bep_matched_floor") > ); > > Can you please guide us? > > Let me know in case you need any more information. > > Goutham Miryala > Senior Data Engineer > > <http://chartboost.com/>