Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-13 Thread via GitHub


andygrove merged PR #1262:
URL: https://github.com/apache/datafusion-comet/pull/1262


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-13 Thread via GitHub


andygrove commented on PR #1262:
URL: 
https://github.com/apache/datafusion-comet/pull/1262#issuecomment-2587330496

   Thanks for the review @huaxingao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub


andygrove commented on code in PR #1262:
URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911777608


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("distinct") {
+  // TODO enable once https://github.com/apache/datafusion-comet/issues/1267 
is implemented

Review Comment:
   I understand what is happening now. Spark optimizes some simple cases like 
`SELECT a, COUNT(DISTINCT b) ...` to first perform an aggregate to just get the 
distinct values, so by the time we perform the `COUNT` there are only distinct 
values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub


andygrove commented on code in PR #1262:
URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911770443


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("distinct") {
+  // TODO enable once https://github.com/apache/datafusion-comet/issues/1267 
is implemented

Review Comment:
   I'm looking at this test again on main and do not understand how this one is 
passing now :confused: 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub


codecov-commenter commented on PR #1262:
URL: 
https://github.com/apache/datafusion-comet/pull/1262#issuecomment-2584940287

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1262?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 34.72%. Comparing base 
[(`be48839`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/be488390dcd89b2a5bff6f24b9ae7724383cffde?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`43c8f55`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/43c8f559fc948a94cf7e3a5dcb70149ef2e29acb?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 2 commits behind head on main.
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1262  +/-   ##
   
   + Coverage 34.69%   34.72%   +0.03% 
   - Complexity  991  994   +3 
   
 Files   116  116  
 Lines 4489144907  +16 
 Branches   9864 9869   +5 
   
   + Hits  1557415595  +21 
 Misses2616826168  
   + Partials   3149 3144   -5 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1262?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub


andygrove commented on code in PR #1262:
URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911767513


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("distinct") {
+  // TODO enable once https://github.com/apache/datafusion-comet/issues/1267 
is implemented

Review Comment:
   yes, many of the tests were passing because the aggregate was running in 
Spark rather than Comet or because the input values were already distinct



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub


comphead commented on code in PR #1262:
URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911745757


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("distinct") {
+  // TODO enable once https://github.com/apache/datafusion-comet/issues/1267 
is implemented

Review Comment:
   I'm thinking should we add a test with unique values, that fails now by 
correctness issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] fix: Fall back to Spark for distinct aggregates [datafusion-comet]

2025-01-10 Thread via GitHub


comphead commented on code in PR #1262:
URL: https://github.com/apache/datafusion-comet/pull/1262#discussion_r1911726694


##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("distinct") {
+  // TODO enable once https://github.com/apache/datafusion-comet/issues/1267 
is implemented

Review Comment:
   so now this tests passed? 



##
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##
@@ -974,7 +974,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
-  test("distinct") {
+  // TODO enable once https://github.com/apache/datafusion-comet/issues/1267 
is implemented

Review Comment:
   so now this test passed? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org