Propagating event time field from nested query

Piyush Narang Mon, 11 Nov 2019 13:44:32 -0800

Hi folks,

We have a Flink streaming Table / SQL job that we were looking to migrate from 
an older Flink release (1.6.x) to 1.9. As part of doing so, we have been seeing 
a few errors which I was trying to figure out how to work around. Would 
appreciate any help / pointers.
Job essentially involves a nested query:
SELECT `timestamp`, cost, partnerid, impression_id, …
FROM my_kafka_stream


The kafka stream has a ‘timestamp’ field that tracks event time. We register 
this nested query as “base_query”.

We now use this in a couple of outer aggregation queries (different outer 
aggregation queries differ in terms of the time window we aggregate over – 1M, 
1H, 6H etc):
SELECT
  SUM(cost) AS FLOAT AS CostPerPartner,
  COUNT(impression_id) AS ImpsPerPartner,
  …
FROM
  base_query
GROUP BY
  partnerid,
  HOP(`timestamp`, INTERVAL '30' SECOND, INTERVAL '1' MINUTE)

While the outer query would get translated and scheduled as a Flink streaming 
job just fine on 1.6, we are running into this error when we try to bump our 
build to 1.9:
“Exception in thread "main" org.apache.flink.table.api.ValidationException: 
Window can only be defined over a time attribute column.”

Any suggestions on how we could work around this? I saw a thread suggesting 
using HOP_ROWTIME but if I understand correctly, that would mean we would need 
to do the hop window generation / group by in the nested query which we’d like 
to avoid (as we have a couple of time window combinations to generate).

Thanks,
-- Piyush

Propagating event time field from nested query

Reply via email to