Xi Chen created SPARK-49479:
-------------------------------

             Summary: Non-daemon Timer prevents Spark driver JVM from stopping
                 Key: SPARK-49479
                 URL: https://issues.apache.org/jira/browse/SPARK-49479
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.5.2
            Reporter: Xi Chen


It is observed that when using [Spark Torch 
Distributor|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.torch.distributor.TorchDistributor.html],
 Spark driver pod could hang around after calling {_}spark.stop(){_}. Although 
the Spark Context was shutdown, the JVM was still running.

The reason was that there is a non-daemon Timer thread named 
{_}BarrierCoordinator barrier epoch increment timer{_}, which prevented the 
driver JVM from stopping.

This issue iss fixed in master branch by 
[https://github.com/apache/spark/pull/44718/files#diff-c2ca635ca0080bea12bcb5e25272a830019b3b150fc6c1cee0d268e0c12b82ceR82]
 as a side effect. We should backport SPARK-46895 and SPARK-46698 to branch-3.5 
for fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to