Non-Partition based Workload Distribution

Artemis User Thu, 24 Feb 2022 12:25:49 -0800

We got a Spark program that iterates through a while loop on the sameinput DataFrame and produces different results per iteration. I seethrough Spark UI that the workload is concentrated on a single core ofthe same worker. Is there anyway to distribute the workload todifferent cores/workers, e.g. per iteration, since each iteration is notdependent from each other?

Certainly this type of problem could be easily implemented usingthreads, e.g. spawn a child thread for each iteration, and wait at theend of the loop. But threads apparently don't go beyond the workerboundary. We also thought about using MapReduce, but it won't bestraightforward since mapping only deals with rows, not at the dataframelevel. Any thoughts/suggestions are highly appreciated..


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Non-Partition based Workload Distribution

Reply via email to