Re: Kafka streams in Kubernetes

Parthasarathy, Mohan Mon, 10 Jun 2019 11:55:27 -0700

Matt,

I read your email again and this one that you point out:


    >     What you also need to take into account is, how often topics are
    >     compacted, and how large the segment size is, because the active 
segment
    >     is not subject to compaction.

Are you saying that compaction affects the rebuilding time ? Sorry, I am not 
sure I understand what you meant by this in the current context.

-mohan


On 6/10/19, 10:22 AM, "Parthasarathy, Mohan" <mpart...@hpe.com> wrote:

    Thanks. That helps me understand why recreating state might take time. 
    
    -mohan
    
    
    On 6/9/19, 11:50 PM, "Matthias J. Sax" <matth...@confluent.io> wrote:
    
        By default, Kafka Streams does not "close" windows.
        
        To handle out-of-order data, windows are maintained until their
        retention time passed, and are updated each time an out-of-order record
        arrives (even if window-end time passed).
        
        Cf
        
https://stackoverflow.com/questions/38935904/how-to-send-final-kafka-streams-aggregation-result-of-a-time-windowed-ktable
 
        for details; note, that using `suppress()` and setting a grace-period
        change the default behavior.
        
        Hence, you usually want to keep windows around until you are sure that
        no more out-of-order records for a window may arrive (what is usually
        some time after the window end time). If 1 day is too large, you can of
        course reduce the retention time accordingly.
        
        If you use Interactive Queries, you would need to set retention time
        large enough to allow your application to query the state. For this
        case, retention time might be higher to keep state around for serving
        queries, even if you know it won't be updated any longer.
        
        For Kafka Streams, using 1 day as default provides a good out-of-the-box
        experience. For production deployments, change the retention time base
        on the need of the application make sense of course.
        
        There is also one more config you might want to consider:
        `windowstore.changelog.additional.retention.ms` -- it's 1 day by
        default, too, and you might want to reduce it to reduce the amount of
        data that is restored.
        
        
        In general, there in nothing wrong with using stateful sets though, and
        for large state, it's recommended to avoid long recovery times.
        
        
        
        -Matthias
        
        
        On 6/9/19 2:59 PM, Parthasarathy, Mohan wrote:
        > Matt,
        > 
        > Thanks for your response. I agree with you that there is no easy way 
to answer this. I was trying to see what others experience is which could 
simply be "Don't bother, in practice stateful set is better".
        > 
        > Could you explain as to why there has to be more state than the 
window size ? In a running application, as the data is being processed from a 
topic, there is state being created depending on the stateful primitives. As 
the window is closed, this state is not needed and I can see why grace period 
has to be taken into account ?  So, when would you  need it for the whole store 
retention time ? Could you clarify ?
        > 
        > Thanks
        > Mohan
        > 
        > On 6/8/19, 11:18 PM, "Matthias J. Sax" <matth...@confluent.io> wrote:
        > 
        >     If depends how much state you need to restore and how much 
restore-time
        >     you can accept in your application.
        >     
        >     The amount of data that needs to be restored, does not depend on 
the
        >     window-size, but the store retention time (default 1 day, 
configurable
        >     via `Materialized#withRetention()`). The window size (and grace 
period,
        >     if case you use one) is a lower bound for the configurable 
retention
        >     time though, ie, retention time >= grace-period >= window-size.
        >     
        >     What you also need to take into account is, how often topics are
        >     compacted, and how large the segment size is, because the active 
segment
        >     is not subject to compaction.
        >     
        >     It's always hard to answer a question like this. I would 
recommend to do
        >     some testing and benchmark fail over, by manually killing some 
instances
        >     to simulate a crash. This should give the best insight -- tuning 
the
        >     above parameters, you can see, what works for your application.
        >     
        >     
        >     
        >     -Matthias
        >     
        >     On 6/8/19 4:28 PM, Pavel Sapozhnikov wrote:
        >     > I suggest take a look at Strimzi project https://strimzi.io/ 
        >     > 
        >     > Kafka operator deployed in Kubernetes environment.
        >     > 
        >     > On Sat, Jun 8, 2019, 6:09 PM Parthasarathy, Mohan 
<mpart...@hpe.com> wrote:
        >     > 
        >     >> Hi,
        >     >>
        >     >> I have read several articles about this topic. We are soon 
going to deploy
        >     >> our streaming apps inside k8s. My understanding from reading 
these articles
        >     >> is that stateful set in k8s is not mandatory as the 
application can rebuild
        >     >> its state if the state store is not present. Can people share 
their
        >     >> experience or recommendation when it comes to deploying the 
streaming apps
        >     >> on k8s ?
        >     >>
        >     >> Also, let us say the application is using a tumbling window of 
5 mts. When
        >     >> an application restarts, is it correct to say that it has to 
re-build the
        >     >> state only for that 5 minute window for the partitions that it 
was handling
        >     >> before. I had an instance of such a restart where it was 
running a long
        >     >> time in REBALANCE which makes me think that my understanding 
is incorrect.
        >     >> In this case, the state store was available during the 
restart. Can someone
        >     >> clarify ?
        >     >>
        >     >> Thanks
        >     >> Mohan
        >     >>
        >     >>
        >     > 
        >     
        >     
        >

Re: Kafka streams in Kubernetes

Reply via email to