[ 
https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-48871:
-----------------------------------

    Assignee: Carmen Kwan

> Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis 
> ----------------------------------------------------------------------
>
>                 Key: SPARK-48871
>                 URL: https://issues.apache.org/jira/browse/SPARK-48871
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.0, 3.5.2, 3.4.4
>            Reporter: Carmen Kwan
>            Assignee: Carmen Kwan
>            Priority: Major
>              Labels: pull-request-available
>
> I encountered the following exception when attempting to use a 
> non-deterministic udf in my query.
> {code:java}
> [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: 
> [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic 
> expression, but the actual expression is "[some expression]".; line 2 pos 1
> [info] [some logical plan]
> [info] at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182)
> [info] at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211)
> [info] at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
> [info] at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
> [info] at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
> [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
> [info] at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66){code}
> The non-deterministic expression can be safely allowed for my custom 
> LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis 
> rule is too strict so that reasonable use cases of non-deterministic 
> expressions are also disabled.
> To fix this, we could add a trait that logical plans can extend to implement 
> a method to decide whether there can be non-deterministic expressions for the 
> operator, and check this function in checkAnalysis. This allows delegation of 
> this validation to frameworks that extend Spark so we can allow list more 
> than just the few explicitly named logical plans (e.g. `Project`, `Filter`). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to