[ https://issues.apache.org/jira/browse/SPARK-48871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-48871: ----------------------------------- Assignee: Carmen Kwan > Fix INVALID_NON_DETERMINISTIC_EXPRESSIONS validation in CheckAnalysis > ---------------------------------------------------------------------- > > Key: SPARK-48871 > URL: https://issues.apache.org/jira/browse/SPARK-48871 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0, 3.5.2, 3.4.4 > Reporter: Carmen Kwan > Assignee: Carmen Kwan > Priority: Major > Labels: pull-request-available > > I encountered the following exception when attempting to use a > non-deterministic udf in my query. > {code:java} > [info] org.apache.spark.sql.catalyst.ExtendedAnalysisException: > [INVALID_NON_DETERMINISTIC_EXPRESSIONS] The operator expects a deterministic > expression, but the actual expression is "[some expression]".; line 2 pos 1 > [info] [some logical plan] > [info] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:761) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182) > [info] at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164) > [info] at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150) > [info] at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188) > [info] at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211) > [info] at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) > [info] at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208) > [info] at > org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77) > [info] at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138) > [info] at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219) > [info] at > org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546) > [info] at > org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219) > [info] at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900) > [info] at > org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218) > [info] at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77) > [info] at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74) > [info] at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66){code} > The non-deterministic expression can be safely allowed for my custom > LogicalPlan, but it is disabled in the checkAnalysis phase. The CheckAnalysis > rule is too strict so that reasonable use cases of non-deterministic > expressions are also disabled. > To fix this, we could add a trait that logical plans can extend to implement > a method to decide whether there can be non-deterministic expressions for the > operator, and check this function in checkAnalysis. This allows delegation of > this validation to frameworks that extend Spark so we can allow list more > than just the few explicitly named logical plans (e.g. `Project`, `Filter`). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org