Dear Spark Community, Hello! I am a graduate student at Yunnan Normal University. Currently, in my research, I am endeavoring to implement heterogeneous hybrid parallel computing by integrating PySpark with GPU acceleration. At this juncture, I have successfully achieved single-GPU acceleration per node utilizing PySpark, CTypes, and dynamic libraries. However, I aspire to further scale the number of GPUs utilized per node. My intention is to bind each executor to a specific GPU; however, I have yet to discover any available API for this purpose. As I necessitate more flexible GPU control, I cannot directly employ Spark's built-in parameters, such as spark.task.resource.gpu.amount. Within the tasks, I attempted to retrieve the executor ID using:
TaskContext.get().getLocalProperty("spark.executor.id") However, the returned value is invariably 'Driver' or 'None', potentially attributable to PySpark's underlying execution mechanism. My inquiries are as follows: Is there an effective method provided by PySpark to retrieve the executor ID? Are there alternative solutions to achieve more flexible GPU binding? Eagerly awaiting your valuable insights! Warm regards, Wu Chaowei