Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]
Dandandan commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800082794 @adriangb FYI CI is passing, it's ready for review. I had to make some changes to the filter that is applied to respect lexicographic ordering (which made Q7 lose the speedup), but it looks like it is still a big improvement while I can see benchmarks. I filed https://github.com/apache/datafusion/issues/15698 to support multiple columns + use `BinaryExpr` in that case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]
adriangb commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2800033871 @Dandandan will be happy to review once CI is passing 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]
Dandandan commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799975208 > Nice! We can even wire it up with the filter pushdown so that if an operator under us "absorbs" the filter (eg it got pushed down to the scan) we skip doing this internally. Yeah, would be useful to avoid filtering twice and the way to go👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]
adriangb commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799971429 Nice! We can even wire it up with the filter pushdown so that if an operator under us "absorbs" the filter (eg it got pushed down to the scan) we skip doing this internally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Optimize TopK with filter ~1.4x faster [datafusion]
Dandandan commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2799968795 > If I understand correctly, the ideas to basically do the same thing we're going to do for the dynamic filters but essentially do the filtering inside of top K to avoid some extra work. Is that correct? If so, it sounds like a great idea and we're going to be able to reuse a lot of the code Yeah that's totally correct! The gains won't be as impressive as with dynamic filter, but still avoid work in TopK by not having to convert the sorting keys to row format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
