Bruce Robbins created SPARK-41035: ------------------------------------- Summary: Incorrect results or NPE when a literal is reused across distinct aggregations Key: SPARK-41035 URL: https://issues.apache.org/jira/browse/SPARK-41035 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.1, 3.2.2, 3.4.0 Reporter: Bruce Robbins
This query produces incorrect results: {noformat} select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2 from values (1, 2), (4, 5) as data(a, b) group by a; +---+----+----+ |a |cnt1|cnt2| +---+----+----+ |1 |1 |0 | |4 |1 |0 | +---+----+----+ {noformat} The values for {{cnt2}} should be 1 and 1 (not 0 and 0). If you change the literal used in the first aggregate function, the second aggregate function now works correctly: {noformat} select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2 from values (1, 2), (4, 5) as data(a, b) group by a; +---+----+----+ |a |cnt1|cnt2| +---+----+----+ |1 |1 |1 | |4 |1 |1 | +---+----+----+ {noformat} The same bug causes the following query to get a NullPointerException: {noformat} select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1) from values (1, 2), (4, 5) as data(a, b) group by a; {noformat} If your change the literal used in the first aggregation, then the query succeeds: {noformat} select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1) from values (1, 2), (4, 5) as data(a, b) group by a; +---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |a |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1) | +---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |1 |1 |[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]| |4 |1 |[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]| +---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org