We got a Spark program that iterates through a while loop on the same input DataFrame and produces different results per iteration. I see through Spark UI that the workload is concentrated on a single core of the same worker.  Is there anyway to distribute the workload to different cores/workers, e.g. per iteration, since each iteration is not dependent from each other?

Certainly this type of problem could be easily implemented using threads, e.g. spawn a child thread for each iteration, and wait at the end of the loop.  But threads apparently don't go beyond the worker boundary.  We also thought about using MapReduce, but it won't be straightforward since mapping only deals with rows, not at the dataframe level.  Any thoughts/suggestions are highly appreciated..

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to