Thanks,
As I understand for Kafka case the way to do it is to define my
kafka.Partitioner that is used when data is produced to Kafka and just
reuse this partitioner as spark.Partitioner in mapWithState spec.
I think I'll stick with that.
Thanks,
Andrii
2016-10-31 16:55 GMT+02:00 Cody Koeninger
You may know that those streams share the same keys, but Spark doesn't
unless you tell it.
mapWithState takes a StateSpec, which should allow you to specify a partitioner.
On Mon, Oct 31, 2016 at 9:40 AM, Andrii Biletskyi wrote:
> Thanks for response,
>
> So as I understand there is no way to "t
If you call a transformation on an rdd using the same partitioner as that
rdd, no shuffle will occur. KafkaRDD doesn't have a partitioner, there's
no consistent partitioning scheme that works for all kafka uses. You can
wrap each kafkardd with an rdd that has a custom partitioner that you write
to