Hi Sachin,

On 3 February 2017 at 09:07, Sachin Mittal <sjmit...@gmail.com> wrote:

>
> 1. Now what I wanted to know is that for separate machines running same
> instance of the streams application, my application.id would be same
> right.
>

That is correct.


> If yes then how does kafka cluser know which partition to assign to which
> machine and which thread.
> Because what I see that on same machine each thread has its unique name, so
> it will get message from a given partition(s) only, but how does kafka
> cluster know that each machines thread are different from some other
> machines.
> Like how does it distinguish thread-1 from machine A vs machine B. Do I
> need to configure something here.
>
>
This is all taken care of by the Kafka Consumer. All application instances
and threads with the same application.id are part of the same consumer
group. So the kafka consumer will ensure the partitions are balanced across
the available threads.


> 2. Also my stream processing creates an internal changelog topic which is
> backed by rocksDB state store.
> - So should I have to partition that topic too in same number of partitions
> as my source topic. (Here I am creating that change log topic manually)
>
>
Any reason why you don't just let streams create the changelog topic? Yes
you should partition it the same as the source topic.



> 3. If I don't create that change log topic manually and let kafka stream
> create that automatically, then does what number of partitions it uses.
>
>
The same number as the source topic.


> 4. Suppose my change log topic has single partition (if that is allowed)
> and now we will have multiple threads accessing that. Is there any deadlock
> situation I need to worry about.
>

Multiple threads will never access a single partition in a kafka streams
app. A partition is only ever consumed by a single thread.


>
> 5. Also now multiple threads will access the same state store and attempt
> to recreate from change log topic if there is a need. How does this work.
>
>
It is done on a per partition basis. You have a state store for each
partition.


> 6. Lastly say one instance fails then other instances will try to balance
> off the load, now when i bring that instance up, how does partition get re
> assigned to it? Like at what point does some old thread stops processing
> that partition and new thread of new instance takes over. Is there any
> configuration needed gere?
>

When an instance fails or is shutdown it will be removed from the consumer
group. When it is removed a rebalance will be triggered. The partitions
will be re-assigned to the remaining threads.

There are some settings you can adjust that will effect the length of time
it takes for a dead consumer to be detected.

i.e.,
max.poll.interval.ms
heartbeat.interval.ms
session.timeout.ms

I suggest you take a look at the consumer config docs here:
https://kafka.apache.org/documentation/#newconsumerconfigs

Thanks,
Damian

Reply via email to