Hey Team,
We're trying to implement an aggregation which involves *several trillions
of rows *using apache beam sql.
However I'm getting an exception
Exception in thread "main" java.lang.UnsupportedOperationException: Does
not support COUNT DISTINCT
Here's the code for doing the aggregation:
PCollection<Row> aggregate = joinedCollection.apply("Aggregation",
SqlTransform.query("SELECT" +
" exchange_name as adexchange," +
" strategy," +
" platform," +
" segment," +
" auction_type," +
" placement_type," +
" country," +
" COALESCE(loss, 0) AS loss_code," +
" COUNT(DISTINCT identifier) AS uniques," +
" no_bid_reason," +
" SUM(1) AS auctions," +
" SUM(CASE WHEN cpm_bid > 0 THEN 1 ELSE 0 END)
AS bids," +
" SUM(cpm_bid) AS total_bid_price," +
" SUM(CASE WHEN loss = 0 THEN 1 END) AS wins," +
" app_bundle AS app_bundle," +
" model_id AS model_id," +
" identifier_type AS identifier_type," +
" promotion_id AS promotion_id," +
" sub_floor_bid_min_price_cohort AS
sub_floor_bid_min_price_cohort," +
" bf_match_experiment AS bf_match_experiment," +
" bep_matched_floor AS bep_matched_floor," +
" SUM(p_ctr) AS p_ctr_total," +
" SUM(p_ir) AS p_ir_total," +
" SUM(p_cpa) AS p_cpa_total," +
" SUM(arppu) AS arppu_total," +
" SUM(spend) AS spend_total," +
" SUM(cpm_price) AS cpm_price_total" +
" FROM" +
" PCOLLECTION" +
" GROUP BY
exchange_name,strategy,platform,segment,auction_type" +
",placement_type,country,loss,no_bid_reason,app_bundle" +
",model_id,identifier_type,promotion_id,sub_floor_bid_min_price_cohort"
+
",bf_match_experiment,bep_matched_floor")
);
Can you please guide us?
Let me know in case you need any more information.
Goutham Miryala
Senior Data Engineer
<http://chartboost.com/>