Hi!

Kinesis shards should be ideally evenly assigned to the source instances.
So, with your example of source parallelism of 10 and 20 shards, each
source instance will have 2 shards and will have 2 threads consuming them
(therefore, not in round robin).

For the Kafka consumer, in the source instances there will be one consuming
thread per broker, instead of partition. So, if a source instance is
assigned partitions that happen to be on the same broker, the source
instance will only create 1 thread to consume all of them.

You are correct that currently the Kafka consumer does not handle
repartitioning transparently like the Kinesis connector, but we’re working
on this :)

Regards,
Gordon

On August 23, 2016 at 6:50:31 PM, Sameer W (sam...@axiomine.com) wrote:

Hi,

The documentation says that there will be one thread per shard. If I my
streaming job runs with a parallelism of 10 and there are 20 shards, are
more threads going to be launched within  a task slot running a source
function to consume the additional shards or will one source function
instance consume 2 shards in round robin.

Is it any different for Kafka? Based on the documentation my understanding
is that if there are 10 source function instances and 20 partitions, each
one will read 2 partitions.

Also if partitions are added to Kafka are they handled by the existing
streaming job or does it need to be restarted? It appears as though Kinesis
handles it via the consumer constantly checking for more shards.

Thanks,
Sameer

Reply via email to