Certain operations require a repartition topic, such as "selectKey" or
"map". What purpose serves this repartition topic?

Sample record: {"key": "a", ...}

Stream: source.selectKey((k, v) -> KeyValue.pair(k.toUpperCase(),
v)).groupByKey() //...

>From my understanding, the repartition topic will guarantee that if we are
reading from partition N, the new key will be written to the same partition
N on the repartition topic, which allows the stream task to always handle
the same partition number all the way.

This seems relevant if the topology above is followed by:
/*...*/.toStream().leftJoin(kTable) //...
We are still processing the same partition number. If the source stream and
the kTable are co-partitioned, so will be the repartition topic.

However in cases where there are no other operations in the topology like
"joins", that repartition topic seems useless.

There's a thread on this subject
<http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3CCAJikTEUHR=r0ika6vlf_y+qajxg8f_q19og_-s+q-gozpqb...@mail.gmail.com%3E>,
specific to topics with one partition only. The argument there is that
repartition does not make sense on a topic with 1 partition only. However,
even if you have multiple partitions but never join with anything else, it
may not make sense for the reasons above.

Reply via email to