Re: Understanding the topology of high level kafka stream

Eno Thereska Wed, 09 Nov 2016 06:16:58 -0800

Hi Sachin,

Kafka Streams is built on top of standard Kafka consumers. For for every topic 
it consumes from (whether changelog topic or source topic, it doesn't matter), 
the consumer stores the offset it last consumed from. Upon restart, by default 
it start consuming from where it left off from each of the topics. So you can 
think of it this way: a restart should be no different than if you had left the 
application running (i.e., no restart).


Thanks
Eno


> On 9 Nov 2016, at 13:59, Sachin Mittal <sjmit...@gmail.com> wrote:
> 
> Hi,
> I had some basic questions on sequence of tasks for streaming application
> restart in case of failure or otherwise.
> 
> Say my stream is structured this way
> 
> source-topic
>   branched into 2 kstreams
>    source-topic-1
>    source-topic-2
>   each mapped to 2 new kstreams (new key,value pairs) backed by 2 kafka
> topics
>       source-topic-1-new
>       source-topic-2-new
>       each aggregated to new ktable backed by internal changelog topics
>       source-topic-1-new-table (scource-topic-1-new-changelog)
>       source-topic-2-new-table (scource-topic-2-new-changelog)
>       table1 left join table2 -> to final stream
> Results of final stream are then persisted into another data storage
> 
> So if you see I have following physical topics or state stores
> source-topic
> source-topic-1-new
> source-topic-2-new
> scource-topic-1-new-changelog
> scource-topic-2-new-changelog
> 
> Now at a give point if the streaming application is stopped there is some
> data in all these topics.
> Barring the source-topic all other topic has data inserted by the streaming
> application.
> 
> Also I suppose streaming application stores the offset for each of the
> topic as where it was last.
> 
> So when I restart the application how does the processing starts again?
> Will it pick the data from last left changelog topics and process them
> first and then process the source topic data from the offset last left?
> 
> Or it will start from source topic. I really don't want it to maintain
> offset to changelog tables because any old key's value can be modified as
> part of aggregation again.
> 
> Bit confused here, any light would help a lot.
> 
> Thanks
> Sachin

Re: Understanding the topology of high level kafka stream

Reply via email to