Re: How does MapWithStateRDD distribute the data

Ben Teeuwen Wed, 03 Aug 2016 09:33:12 -0700

Did you check the executors logs to check whether the kafka offsets pulled in 
evenly over the 4 executors?


I recall a similar situation with such uneven balancing from a kafka stream, 
and ended up raising the amount of resources (RAM and cores). Then it nicely 
balanced out. I don’t understand the mechanism behind it though.

> On Aug 3, 2016, at 4:42 PM, Soumitra Johri <soumitra.siddha...@gmail.com> 
> wrote:
> 
> Hi,
> 
> I am running a steaming job with 4 executors and 16 cores so that each 
> executor has two cores to work with. The input Kafka topic has 4 partitions.
> With this given configuration I was expecting MapWithStateRDD to be evenly 
> distributed across all executors, how ever I see that it uses only two 
> executors on which MapWithStateRDD data is distributed. Sometimes the data 
> goes only to one executor.
> 
> How can this be explained and pretty sure there would be some math to 
> understand this behavior.
> 
> I am using the standard standalone 1.6.2 cluster.
> 
> Thanks
> Soumitra


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How does MapWithStateRDD distribute the data

Reply via email to