Ankur Dave created SPARK-45616:
----------------------------------

             Summary: Usages of ParVector are unsafe because it does not 
propagate ThreadLocals or SparkSession
                 Key: SPARK-45616
                 URL: https://issues.apache.org/jira/browse/SPARK-45616
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL, Tests
    Affects Versions: 3.5.0
            Reporter: Ankur Dave
            Assignee: Ankur Dave


CastSuiteBase and ExpressionInfoSuite use ParVector.foreach() to run Spark SQL 
queries in parallel. They incorrectly assume that each parallel operation will 
inherit the main thread’s active SparkSession. This is only true when these 
parallel operations run in freshly-created threads. However, when other code 
has already run some parallel operations before Spark was started, then there 
may be existing threads that do not have an active SparkSession. In that case, 
these tests fail with NullPointerExceptions when creating SparkPlans or running 
SQL queries.

The fix is to use the existing method ThreadUtils.parmap(). This method creates 
fresh threads that inherit the current active SparkSession, and it propagates 
the Spark ThreadLocals.

We should also add a scalastyle warning against use of ParVector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to