Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]

2025-04-13 Thread via GitHub


Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800082794

   @adriangb FYI CI is passing, it's ready for review.
   I had to make some changes to the filter that is applied to respect 
lexicographic ordering (which made Q7 lose the speedup), but it looks like it 
is still a big improvement while I can see benchmarks. I filed 
https://github.com/apache/datafusion/issues/15698 to support multiple columns + 
use `BinaryExpr` in that case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]

2025-04-13 Thread via GitHub


adriangb commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800033871

   @Dandandan will be happy to review once CI is passing 😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]

2025-04-13 Thread via GitHub


Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799975208

   > Nice! We can even wire it up with the filter pushdown so that if an 
operator under us "absorbs" the filter (eg it got pushed down to the scan) we 
skip doing this internally.
   
   Yeah, would be useful to avoid filtering twice and the way to go👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]

2025-04-13 Thread via GitHub


adriangb commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799971429

   Nice! We can even wire it up with the filter pushdown so that if an operator 
under us "absorbs" the filter (eg it got pushed down to the scan) we skip doing 
this internally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]

2025-04-13 Thread via GitHub


Dandandan commented on PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799968795

   > If I understand correctly, the ideas to basically do the same thing we're 
going to do for the dynamic filters but essentially do the filtering inside of 
top K to avoid some extra work. Is that correct? If so, it sounds like a great 
idea and we're going to be able to reuse a lot of the code
   
   Yeah that's totally correct! The gains won't be as impressive as with 
dynamic filter, but still avoid work in TopK by not having to convert the 
sorting keys to row format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]