Did you check the executors logs to check whether the kafka offsets pulled in evenly over the 4 executors?
I recall a similar situation with such uneven balancing from a kafka stream, and ended up raising the amount of resources (RAM and cores). Then it nicely balanced out. I don’t understand the mechanism behind it though. > On Aug 3, 2016, at 4:42 PM, Soumitra Johri <soumitra.siddha...@gmail.com> > wrote: > > Hi, > > I am running a steaming job with 4 executors and 16 cores so that each > executor has two cores to work with. The input Kafka topic has 4 partitions. > With this given configuration I was expecting MapWithStateRDD to be evenly > distributed across all executors, how ever I see that it uses only two > executors on which MapWithStateRDD data is distributed. Sometimes the data > goes only to one executor. > > How can this be explained and pretty sure there would be some math to > understand this behavior. > > I am using the standard standalone 1.6.2 cluster. > > Thanks > Soumitra --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org