Re: Stage level scheduling - lower the number of executors when using GPUs

Artemis User Wed, 02 Nov 2022 16:18:16 -0700

Are you using Rapids for GPU support in Spark? Couple of options youmay want to try:


1. In addition to dynamic allocation turned on, you may also need to
   turn on external shuffling service.
2. Sounds like you are using Kubernetes.  In that case, you may also
   need to turn on shuffle tracking.
3. The "stages" are controlled by the APIs.  The APIs for dynamic
   resource request (change of stage) do exist, but only for RDDs (e.g.
   TaskResourceRequest and ExecutorResourceRequest).



On 11/2/22 11:30 AM, Shay Elbaz wrote:

Hi,
Our typical applications need less *executors* for a GPU stage thanfor a CPU stage. We are using dynamic allocation with stage levelscheduling, and Spark tries to maximize the number of executors alsoduring the GPU stage, causing a bit of resources chaos in the cluster.This forces us to use a lower value for 'maxExecutors' in the firstplace, at the cost of the CPU stages performance. Or try to solve thisin the Kubernets scheduler level, which is not straightforward anddoesn't feel like the right way to go.
Is there a way to effectively use less executors in Stage LevelScheduling? The API does not seem to include such an option, but maybethere is some more advanced workaround?
Thanks,
Shay

Re: Stage level scheduling - lower the number of executors when using GPUs

Reply via email to