Re: Kafka streams in Kubernetes

2019-06-10 Thread Matthias J. Sax
What I try to say is, that compaction is not perfect. Assuming you have 10 unique keys, and a message size of 1KB, this implies that your data set if it's perfectly compacted would be roughly 100MB. The default segment size is 1GB and the active segment is not compacted. Hence, if the active

Re: Kafka streams in Kubernetes

2019-06-10 Thread Scott Reynolds
We have been giving this a bunch of thought lately. We attempted to replace PARTITION_ASSIGNMENT_STRATEGY_CONFIG with our implementation that hooks into our deployment service. The idea is simple, the new deployment gets *Standby tasks assigned to them until they are caught up*. Once they are caugh

Re: Kafka streams in Kubernetes

2019-06-10 Thread Parthasarathy, Mohan
Matt, I read your email again and this one that you point out: > What you also need to take into account is, how often topics are > compacted, and how large the segment size is, because the active segment > is not subject to compaction. Are you saying that compaction aff

Re: Kafka streams in Kubernetes

2019-06-10 Thread Parthasarathy, Mohan
Thanks. That helps me understand why recreating state might take time. -mohan On 6/9/19, 11:50 PM, "Matthias J. Sax" wrote: By default, Kafka Streams does not "close" windows. To handle out-of-order data, windows are maintained until their retention time passed, and are upda

Re: Kafka streams in Kubernetes

2019-06-09 Thread Matthias J. Sax
By default, Kafka Streams does not "close" windows. To handle out-of-order data, windows are maintained until their retention time passed, and are updated each time an out-of-order record arrives (even if window-end time passed). Cf https://stackoverflow.com/questions/38935904/how-to-send-final-k

Re: Kafka streams in Kubernetes

2019-06-09 Thread Parthasarathy, Mohan
Pavel, Thanks for the pointer. I will take a look. -mohan On 6/8/19, 4:29 PM, "Pavel Sapozhnikov" wrote: I suggest take a look at Strimzi project https://strimzi.io/ Kafka operator deployed in Kubernetes environment. On Sat, Jun 8, 2019, 6:09 PM Parthasarathy, Mohan wr

Re: Kafka streams in Kubernetes

2019-06-09 Thread Parthasarathy, Mohan
Matt, Thanks for your response. I agree with you that there is no easy way to answer this. I was trying to see what others experience is which could simply be "Don't bother, in practice stateful set is better". Could you explain as to why there has to be more state than the window size ? In a

Re: Kafka streams in Kubernetes

2019-06-08 Thread Matthias J. Sax
If depends how much state you need to restore and how much restore-time you can accept in your application. The amount of data that needs to be restored, does not depend on the window-size, but the store retention time (default 1 day, configurable via `Materialized#withRetention()`). The window si

Re: Kafka streams in Kubernetes

2019-06-08 Thread Pavel Sapozhnikov
I suggest take a look at Strimzi project https://strimzi.io/ Kafka operator deployed in Kubernetes environment. On Sat, Jun 8, 2019, 6:09 PM Parthasarathy, Mohan wrote: > Hi, > > I have read several articles about this topic. We are soon going to deploy > our streaming apps inside k8s. My under

Kafka streams in Kubernetes

2019-06-08 Thread Parthasarathy, Mohan
Hi, I have read several articles about this topic. We are soon going to deploy our streaming apps inside k8s. My understanding from reading these articles is that stateful set in k8s is not mandatory as the application can rebuild its state if the state store is not present. Can people share th