Guozhang, Let me explain what I'm trying to do. The message volume is large (TB per Day) and that is coming to a topic. Now I want to do per minute aggregation(Windowed) and send the output to the downstream (a topic)
(Topic1 - Large Volume) -> [Stream App] -> (Topic2 - Large Volume) I assume the internal topic would have the same amount of data as topic1 and the same goes to the local store, I know we can tweak retention period but network traffic would be same (or even more) The key point here is that most of incoming stream will end up a single data point per a minute (aggregation window), but the variance of the key is huge (high cardinality), then buffering wouldn't really help reduce the data/traffic size. I'm going to do something like per partition watermarking with offset metadata. It wouldn't increase the traffic that much thanks -Kohki