Hello Kohki,

Given your data traffic and the state volume I cannot think of a better
solution but suggest using large number of partitioned local states.

I'm wondering how would "per partition watermark" can help with your
traffic?

Guozhang

On Sun, Feb 26, 2017 at 10:45 AM, Kohki Nishio <tarop...@gmail.com> wrote:

> Guozhang,
>
> Let me explain what I'm trying to do. The message volume is large (TB per
> Day) and that is coming to a topic. Now I want to do per minute
> aggregation(Windowed) and send the output to the downstream (a topic)
>
> (Topic1 - Large Volume) -> [Stream App] -> (Topic2 - Large Volume)
>
> I assume the internal topic would have the same amount of data as topic1
> and the same goes to the local store, I know we can tweak retention period
> but network traffic would be same (or even more)
>
> The key point here is that most of incoming stream will end up a single
> data point per a minute (aggregation window), but the variance of the key
> is huge (high cardinality), then buffering wouldn't really help reduce the
> data/traffic size.
>
> I'm going to do something like per partition watermarking with offset
> metadata. It wouldn't increase the traffic that much
> thanks
> -Kohki
>



-- 
-- Guozhang

Reply via email to