Unfortunatelly, Beam SQL doesn’t support COUNT(DISTINCT) aggregation. 

More details about “why" is on this discussion [1] and the related open issue 
for that here [2].

—
Alexey

[1] https://lists.apache.org/thread/hvmy6d5dls3m8xcnf74hfmy1xxfgj2xh
[2] https://github.com/apache/beam/issues/19398


> On 2 Nov 2023, at 20:52, Goutham Miryala <goutham.miry...@chartboost.com> 
> wrote:
> 
> Hey Team,
> 
> We're trying to implement an aggregation which involves several trillions of 
> rows using apache beam sql.
> However I'm getting an exception 
> Exception in thread "main" java.lang.UnsupportedOperationException: Does not 
> support COUNT DISTINCT
> 
> Here's the code for doing the aggregation:
> 
> PCollection<Row> aggregate = joinedCollection.apply("Aggregation",
>         SqlTransform.query("SELECT" +
>                 "        exchange_name as adexchange," +
>                 "        strategy," +
>                 "        platform," +
>                 "        segment," +
>                 "        auction_type," +
>                 "        placement_type," +
>                 "        country," +
>                 "        COALESCE(loss, 0) AS loss_code," +
>                 "        COUNT(DISTINCT identifier) AS uniques," +
>                 "        no_bid_reason," +
>                 "        SUM(1) AS auctions," +
>                 "        SUM(CASE WHEN cpm_bid > 0 THEN 1 ELSE 0 END) AS 
> bids," +
>                 "        SUM(cpm_bid) AS total_bid_price," +
>                 "        SUM(CASE WHEN loss = 0 THEN 1 END) AS wins," +
>                 "        app_bundle AS app_bundle," +
>                 "        model_id AS model_id," +
>                 "        identifier_type AS identifier_type," +
>                 "        promotion_id AS promotion_id," +
>                 "        sub_floor_bid_min_price_cohort AS 
> sub_floor_bid_min_price_cohort," +
>                 "        bf_match_experiment AS bf_match_experiment," +
>                 "        bep_matched_floor AS bep_matched_floor," +
>                 "        SUM(p_ctr) AS p_ctr_total," +
>                 "        SUM(p_ir) AS p_ir_total," +
>                 "        SUM(p_cpa) AS p_cpa_total," +
>                 "        SUM(arppu) AS arppu_total," +
>                 "        SUM(spend) AS spend_total," +
>                 "        SUM(cpm_price) AS cpm_price_total" +
>                 "    FROM" +
>                 "        PCOLLECTION" +
>                 "    GROUP BY 
> exchange_name,strategy,platform,segment,auction_type" +
>                 ",placement_type,country,loss,no_bid_reason,app_bundle" +
>                 
> ",model_id,identifier_type,promotion_id,sub_floor_bid_min_price_cohort" +
>                 ",bf_match_experiment,bep_matched_floor")
> );
> 
> Can you please guide us?
> 
> Let me know in case you need any more information.
> 
> Goutham Miryala
> Senior Data Engineer
> 
>  <http://chartboost.com/>

Reply via email to