Guozhang,

Let me explain what I'm trying to do. The message volume is large (TB per
Day) and that is coming to a topic. Now I want to do per minute
aggregation(Windowed) and send the output to the downstream (a topic)

(Topic1 - Large Volume) -> [Stream App] -> (Topic2 - Large Volume)

I assume the internal topic would have the same amount of data as topic1
and the same goes to the local store, I know we can tweak retention period
but network traffic would be same (or even more)

The key point here is that most of incoming stream will end up a single
data point per a minute (aggregation window), but the variance of the key
is huge (high cardinality), then buffering wouldn't really help reduce the
data/traffic size.

I'm going to do something like per partition watermarking with offset
metadata. It wouldn't increase the traffic that much
thanks
-Kohki

Reply via email to