Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove merged PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove commented on PR #1253:
URL: 
https://github.com/apache/datafusion-comet/pull/1253#issuecomment-2581468945

   Thanks for the review @kazuyukitanimura 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


codecov-commenter commented on PR #1253:
URL: 
https://github.com/apache/datafusion-comet/pull/1253#issuecomment-2581266227

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1253?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   Attention: Patch coverage is `42.85714%` with `4 lines` in your changes 
missing coverage. Please review.
   > Project coverage is 34.69%. Comparing base 
[(`4cf840f`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/4cf840f571b916b399e6ca6708bf0de4c4c1ca95?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`5bddb87`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/5bddb87ef6d9f2221f3b00a6851d7857f3b49365?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 7 commits behind head on main.
   
   | [Files with missing 
lines](https://app.codecov.io/gh/apache/datafusion-comet/pull/1253?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Patch % | Lines |
   |---|---|---|
   | 
[.../scala/org/apache/comet/serde/QueryPlanSerde.scala](https://app.codecov.io/gh/apache/datafusion-comet/pull/1253?src=pr&el=tree&filepath=spark%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fcomet%2Fserde%2FQueryPlanSerde.scala&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3Bhcmsvc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9jb21ldC9zZXJkZS9RdWVyeVBsYW5TZXJkZS5zY2FsYQ==)
 | 42.85% | [3 Missing and 1 partial :warning: 
](https://app.codecov.io/gh/apache/datafusion-comet/pull/1253?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 |
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1253  +/-   ##
   
   - Coverage 34.88%   34.69%   -0.19% 
   - Complexity  990  991   +1 
   
 Files   116  116  
 Lines 4375544891+1136 
 Branches   9569 9864 +295 
   
   + Hits  1526315574 +311 
   - Misses2551326168 +655 
   - Partials   2979 3149 +170 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1253?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove commented on code in PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253#discussion_r1909275211


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {
+  withSQLConf(CometConf.COMET_SHUFFLE_MODE.key -> shuffleMode) {
+val df =
+  Seq((1, "a", 5), (2, "a", 6), (3, "b", 7), (4, "b", 8), (5, "c", 9), 
(6, "c", 10)).toDF(
+"month",
+"area",
+"product")
+df.createOrReplaceTempView("windowData")
+checkSparkAnswer(sql("""

Review Comment:
   I updated the test. I added assertions to check for the expected shuffle 
exchanges and added some notes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove commented on code in PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253#discussion_r1909259510


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {
+  withSQLConf(CometConf.COMET_SHUFFLE_MODE.key -> shuffleMode) {
+val df =
+  Seq((1, "a", 5), (2, "a", 6), (3, "b", 7), (4, "b", 8), (5, "c", 9), 
(6, "c", 10)).toDF(
+"month",
+"area",
+"product")
+df.createOrReplaceTempView("windowData")
+checkSparkAnswer(sql("""

Review Comment:
   With `auto` and `native` we fall back to Spark currently



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove commented on code in PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253#discussion_r1909260086


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {
+  withSQLConf(CometConf.COMET_SHUFFLE_MODE.key -> shuffleMode) {
+val df =
+  Seq((1, "a", 5), (2, "a", 6), (3, "b", 7), (4, "b", 8), (5, "c", 9), 
(6, "c", 10)).toDF(
+"month",
+"area",
+"product")
+df.createOrReplaceTempView("windowData")
+checkSparkAnswer(sql("""

Review Comment:
   Let me see if I can conditionally use `checkSparkAnswerAndOperator` for the 
`jvm` shuffle case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove commented on code in PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253#discussion_r1909258219


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {

Review Comment:
   Not directly, but due to an existing issue in Comet, we fall back to Spark 
for this query if shuffle mode is auto or native, so we only see the issue when 
specifying jvm shuffle.
   
   See https://github.com/apache/datafusion-comet/issues/1248 for more info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


andygrove commented on code in PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253#discussion_r1909258219


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {

Review Comment:
   Not directly, but due to an existing issue in Comet, we fall back to Spark 
for if shuffle mode is auto or native, so we only see the issue when specifying 
jvm shuffle.
   
   See https://github.com/apache/datafusion-comet/issues/1248 for more info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for unsupported partition or sort expressions in window aggregates [datafusion-comet]

2025-01-09 Thread via GitHub


kazuyukitanimura commented on code in PR #1253:
URL: https://github.com/apache/datafusion-comet/pull/1253#discussion_r1909251792


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {
+  withSQLConf(CometConf.COMET_SHUFFLE_MODE.key -> shuffleMode) {
+val df =
+  Seq((1, "a", 5), (2, "a", 6), (3, "b", 7), (4, "b", 8), (5, "c", 9), 
(6, "c", 10)).toDF(
+"month",
+"area",
+"product")
+df.createOrReplaceTempView("windowData")
+checkSparkAnswer(sql("""

Review Comment:
   What about using `checkSparkAnswerAndOperator` instead?



##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -89,6 +89,24 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  // based on Spark's SQLWindowFunctionSuite test of the same name
+  test("window function: partition and order expressions") {
+for (shuffleMode <- Seq("auto", "native", "jvm")) {

Review Comment:
   Does shuffle mode affect the behavior of window function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org