Dear Spark Community,
Hello!
I am a graduate student at Yunnan Normal University. Currently, in my research, 
I am endeavoring to implement heterogeneous hybrid parallel computing by 
integrating PySpark with GPU acceleration. At this juncture, I have 
successfully achieved single-GPU acceleration per node utilizing PySpark, 
CTypes, and dynamic libraries.
However, I aspire to further scale the number of GPUs utilized per node. My 
intention is to bind each executor to a specific GPU; however, I have yet to 
discover any available API for this purpose. As I necessitate more flexible GPU 
control, I cannot directly employ Spark's built-in parameters, such as 
spark.task.resource.gpu.amount.
Within the tasks, I attempted to retrieve the executor ID using:


TaskContext.get().getLocalProperty("spark.executor.id")


However, the returned value is invariably 'Driver' or 'None', potentially 
attributable to PySpark's underlying execution mechanism.
My inquiries are as follows:
Is there an effective method provided by PySpark to retrieve the executor ID?
Are there alternative solutions to achieve more flexible GPU binding?
Eagerly awaiting your valuable insights!
Warm regards,
Wu Chaowei

Reply via email to