Hi,

I have some iterative program written in Spark and have been tested under a
snapshot version of spark 0.8 before. After I ported it to the 0.8 release
version, I see  performance drops in large datasets. I am wondering if
there is any clue?

I monitored the number of partitions on each machine (by looking at
DAGScheduler.getCacheLocs). I observed that some machine may have 30
partitions in the previous iteration while only have < 10 partitions in the
next iterations. This is something I didn't observed in the older version.
Thus I am wondering if the release version would do task stealing
more aggressively (for a better dynamic load balance?)

Thank you!

Best Regards,
Wenlei

Reply via email to