I've run into a situation where it would appear that foreachPartition is only 
running on one of my executors.

I have a small cluster (2 executors with 8 cores each).

When I run a job with a small file (with 16 partitions) I can see that the 16 
partitions are initialized but they all appear to be initialized on only one 
executor.  All of the work then runs on this  one executor (even though the 
number of partitions is 16). This seems odd, but at least it works.  Not sure 
why the other executor was not used.

However, when I run a larger file (once again with 16 partitions) I can see 
that the 16 partitions are initialized once again (but all on the same 
executor).  But, this time subsequent work is now spread across the 2 
executors.  This of course results in problems because the other executor was 
not initialized as all of the partitions were only initialized on the other 
executor.

Does anyone have any suggestions for where I might want to investigate?  Has 
anyone else seen something like this before?  Any thoughts/insights would be 
appreciated.  I'm using the Stand Alone Cluster manager, cluster started with 
the spark ec2 scripts  and submitting my job using spark-submit.

Thanks.

Darin.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to