Re: MapWithState partitioning

2016-10-31 Thread Andrii Biletskyi
Thanks, As I understand for Kafka case the way to do it is to define my kafka.Partitioner that is used when data is produced to Kafka and just reuse this partitioner as spark.Partitioner in mapWithState spec. I think I'll stick with that. Thanks, Andrii 2016-10-31 16:55 GMT+02:00 Cody Koeninger

Re: MapWithState partitioning

2016-10-31 Thread Cody Koeninger
You may know that those streams share the same keys, but Spark doesn't unless you tell it. mapWithState takes a StateSpec, which should allow you to specify a partitioner. On Mon, Oct 31, 2016 at 9:40 AM, Andrii Biletskyi wrote: > Thanks for response, > > So as I understand there is no way to "t

Re: MapWithState partitioning

2016-10-31 Thread Cody Koeninger
If you call a transformation on an rdd using the same partitioner as that rdd, no shuffle will occur. KafkaRDD doesn't have a partitioner, there's no consistent partitioning scheme that works for all kafka uses. You can wrap each kafkardd with an rdd that has a custom partitioner that you write to