Bruce Robbins created SPARK-38666:
-------------------------------------

             Summary: Missing aggregate filter checks
                 Key: SPARK-38666
                 URL: https://issues.apache.org/jira/browse/SPARK-38666
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Bruce Robbins


h2. Window function in filter
{noformat}
select sum(a) filter (where nth_value(a, 2) over (order by b) > 1)
from (select 1 a, '2' b);
{noformat}
This query should produce an analysis error, but instead produces a stack 
overflow:
{noformat}
java.lang.StackOverflowError: null
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1(TreeNode.scala:305)
 ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1$adapted(TreeNode.scala:305)
 ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:264) 
~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:265)
 ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:265)
 ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
        at scala.collection.Iterator.foreach(Iterator.scala:943) 
~[scala-library.jar:?]
...
{noformat}
h2. Non-boolean filter expression
{noformat}
select sum(a) filter (where a) from (select 1 a, '2' b);
{noformat}
This query should produce an analysis error, but instead causes a projection 
compilation error or whole-stage codegen error (depending on the datatype of 
the expression):
{noformat}
22/03/26 17:19:03 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, 
Column 6: Not a boolean expression
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, 
Column 6: Not a boolean expression
        at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021) 
~[janino-3.0.16.jar:?]
        at 
org.codehaus.janino.UnitCompiler.compileBoolean2(UnitCompiler.java:4049) 
~[janino-3.0.16.jar:?]
        at org.codehaus.janino.UnitCompiler.access$6300(UnitCompiler.java:226) 
~[janino-3.0.16.jar:?]
        at 
org.codehaus.janino.UnitCompiler$14.visitIntegerLiteral(UnitCompiler.java:4016) 
~[janino-3.0.16.jar:?]
...
22/03/26 17:19:05 WARN MutableProjection: Expr codegen error and falling back 
to interpreter mode
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, 
Column 15: failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 40, Column 15: Not a boolean expression
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
 ~[guava-14.0.1.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
 ~[guava-14.0.1.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[guava-14.0.1.jar:?]
        at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
 ~[guava-14.0.1.jar:?]
        at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
 ~[guava-14.0.1.jar:?]
        at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) 
~[guava-14.0.1.jar:?]
        at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
 ~[guava-14.0.1.jar:?]
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) 
~[guava-14.0.1.jar:?]
        at com.google.common.cache.LocalCache.get(LocalCache.java:4000) 
~[guava-14.0.1.jar:?]
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) 
~[guava-14.0.1.jar:?]
...
NULL
Time taken: 5.397 seconds, Fetched 1 row(s)
{noformat}
Interestingly, it also returns a result (NULL).
h2. Aggregate expression in filter expression
{noformat}
select max(b) filter (where max(a) > 1) from (select 1 a, '2' b);
{noformat}
This query should produce an analysis error, but instead causes a projection 
compilation error or whole-stage codegen error (depending on the datatype of 
the expression being aggregated):
{noformat}
22/03/26 17:26:38 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; 
aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 
2) (10.0.0.106 executor driver): 
org.apache.spark.SparkUnsupportedOperationException: Cannot evaluate 
expression: max(1)
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:79)
        at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:344)
        at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:343)
        at 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.eval(interfaces.scala:99)
        at 
org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:593)
        at 
org.apache.spark.sql.catalyst.expressions.If.eval(conditionalExpressions.scala:68)
...
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to