What purpose serves the repartition topic?

João Peixoto Tue, 16 May 2017 16:45:07 -0700

Certain operations require a repartition topic, such as "selectKey" or
"map". What purpose serves this repartition topic?


Sample record: {"key": "a", ...}

Stream: source.selectKey((k, v) -> KeyValue.pair(k.toUpperCase(),
v)).groupByKey() //...

>From my understanding, the repartition topic will guarantee that if we are
reading from partition N, the new key will be written to the same partition
N on the repartition topic, which allows the stream task to always handle
the same partition number all the way.

This seems relevant if the topology above is followed by:
/*...*/.toStream().leftJoin(kTable) //...
We are still processing the same partition number. If the source stream and
the kTable are co-partitioned, so will be the repartition topic.

However in cases where there are no other operations in the topology like
"joins", that repartition topic seems useless.

There's a thread on this subject
<http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3CCAJikTEUHR=r0ika6vlf_y+qajxg8f_q19og_-s+q-gozpqb...@mail.gmail.com%3E>,
specific to topics with one partition only. The argument there is that
repartition does not make sense on a topic with 1 partition only. However,
even if you have multiple partitions but never join with anything else, it
may not make sense for the reasons above.

What purpose serves the repartition topic?

Reply via email to