[PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
pan3793 opened a new pull request, #44352: URL: https://github.com/apache/spark/pull/44352 ### What changes were proposed in this pull request? This PR enhanced the analyzer to handle the following pattern properly. ``` Sort - Filter - Aggregate ```

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
pan3793 commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1855834233 cc @cloud-fan @wangyum @yaooqinn @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
ulysses-you commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427477125 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -323,12 +323,14 @@ trait ColumnResolutionHelper extends Lo

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427525939 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -323,12 +323,14 @@ trait ColumnResolutionHelper extends Loggin

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
cloud-fan commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427552666 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT udf(

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427566432 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT udf(b)

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427566432 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT udf(b)

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
cloud-fan commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427591478 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT udf(

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-14 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427596518 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT udf(b)

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-15 Thread via GitHub
ulysses-you commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427717512 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT ud

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-15 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1427800437 ## sql/core/src/test/resources/sql-tests/analyzer-results/udf/postgreSQL/udf-select_having.sql.out: ## @@ -102,12 +102,11 @@ Project [udf(b)#x, udf(c)#x] SELECT udf(b)

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-15 Thread via GitHub
cloud-fan commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1858490137 does this query work in other databases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-15 Thread via GitHub
cloud-fan commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1428490301 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1415,6 +1415,20 @@ class AnalysisSuite extends AnalysisTest with Mat

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-15 Thread via GitHub
cloud-fan commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1428490859 ## sql/core/src/test/resources/sql-tests/inputs/having.sql: ## @@ -33,3 +33,6 @@ SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t. SE

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-16 Thread via GitHub
beliefer commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1428769800 ## sql/core/src/test/resources/sql-tests/inputs/having.sql: ## @@ -33,3 +33,6 @@ SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t. SEL

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-16 Thread via GitHub
pan3793 commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1858818617 > does this query work in other databases? Actually, it was reported by my colleague that the same SQL works on Impala but not Spark. I will investigate other popular RDBMS. -- T

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-18 Thread via GitHub
cloud-fan commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1860415356 In the SQL standard, ORDER BY can only reference columns in the SELECT list, but many databases extend it to support other cases. I think in Spark the extension is we can push down grou

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-18 Thread via GitHub
cloud-fan commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1860426324 I think we should dig into https://github.com/apache/spark/pull/44352#discussion_r1427552666 more. It seems we have an optimization that if the ORDER BY expression directly matches som

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-18 Thread via GitHub
pan3793 commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1860450330 > I think in Spark the extension is we can push down grouping expressions and aggregate functions from ORDER BY to SELECT. @cloud-fan I believe Spark already supports it when HVAING

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-18 Thread via GitHub
pan3793 commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1860806255 > It seems we have an optimization that if the ORDER BY expression directly matches something from the SELECT list, we replace it with AttributReference. Can you find out where the optimi

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-19 Thread via GitHub
pan3793 commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1862325392 @cloud-fan @beliefer I have updated the test cases as requested, please take another look when you have time -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-19 Thread via GitHub
pan3793 commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1862624266 CI failure seems irrelevant https://github.com/pan3793/spark/actions/runs/7259161309/job/19780922398 ``` Notice: A new release of pip is available: 23.3.1 -> 23.3.2 Notice: To

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-19 Thread via GitHub
cloud-fan commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1431351538 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1415,6 +1415,20 @@ class AnalysisSuite extends AnalysisTest with Mat

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-19 Thread via GitHub
cloud-fan commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1431354583 ## sql/core/src/test/resources/sql-tests/inputs/having.sql: ## @@ -33,3 +33,27 @@ SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t. S

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-19 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1431429678 ## sql/core/src/test/resources/sql-tests/inputs/having.sql: ## @@ -33,3 +33,27 @@ SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t. SEL

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-19 Thread via GitHub
pan3793 commented on code in PR #44352: URL: https://github.com/apache/spark/pull/44352#discussion_r1432220550 ## sql/core/src/test/resources/sql-tests/inputs/having.sql: ## @@ -33,3 +33,27 @@ SELECT c1 FROM VALUES (1, 2) as t(c1, c2) GROUP BY GROUPING SETS(t.c1) HAVING t. SEL

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-20 Thread via GitHub
yaooqinn closed pull request #44352: [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING URL: https://github.com/apache/spark/pull/44352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-28386][SQL] Cannot resolve ORDER BY columns with GROUP BY and HAVING [spark]

2023-12-20 Thread via GitHub
yaooqinn commented on PR #44352: URL: https://github.com/apache/spark/pull/44352#issuecomment-1864082194 Thanks, merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co