Hi folks,

I’m trying to write a Flink job that computes a bunch of counters which 
requires custom triggers and I was trying to figure out the best way to express 
that.

The query looks something like this:
SELECT userId, customUDAF(...) AS counter1, customUDAF(...) AS counter2, ...
FROM (
SELECT * FROM my_kafka_stream
)
GROUP BY userId, HOP(`timestamp`, INTERVAL '6' HOUR, INTERVAL '7' DAY)

We sink this to a KV store (memcache / couchbase) for persistence.

Some of these counters end up spanning a pretty wide time window (longest is 7 
days) and if we want to keep the state tractable we have to have a pretty large 
slide interval (6 hours or greater). A requirement that some of our users have 
is for counters to be updated fairly frequently (e.g. every min) so we were 
looking at how to achieve that with the Table / SQL api. I see that this is 
possible using the custom 
triggers<https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/windows.html#triggers>
 support if we were to use the Datastream api but I’m wondering if this is 
possible using the Table / SQL apis.

I did see another thread where Fabian brought up this design 
doc<https://docs.google.com/document/d/1wrla8mF_mmq-NW9sdJHYVgMyZsgCmHumJJ5f5WUzTiM/edit?ts=59816a8b#heading=h.1zg4jlmqwzlr>
 which has listed what support for emit triggers would look like (in various 
streaming platforms). Is this something that is being actively worked on? If 
not, any suggestions on how we could get the ball rolling on this? (google doc 
design + jira?)

Thanks,

-- Piyush

Reply via email to