Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]
jonathanc-n commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2973341692 @Dandandan Yes, it was mentioned in the original pull request. I concluded that if it is better for the nested loop join to take a equijoin condition (where one table's rows are under a certain threshold) and the `null_equals_null = true`, then we will pass in the join condition under the join filter using the `<=>`; if `null_equals_null` = false, then we will just pass it in as `=`. I'll just move the todo statement to the physical planner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]
Dandandan commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2972940062 > After some thinking I realized that it will still not need `null_equals_null` support in the nestedloopjoin this will happen at planner time, and it is more of a question on whether equijoins should be supported. I think this todo should be removed to deter anybody from trying to implement it in the nestedloopjoin I think it makes sense to support it for optimizing joins when one of the sides is only a small number of rows (which is faster than execution a hash join). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]
alamb commented on code in PR #16390:
URL: https://github.com/apache/datafusion/pull/16390#discussion_r2145093532
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -718,8 +718,6 @@ struct NestedLoopJoinStream {
inner_table: OnceFut,
/// Information of index and left / right placement of columns
column_indices: Vec,
-// TODO: support null aware equal
Review Comment:
Rather than just deleting this, Maybe we can add a comment explaining the
rationale for why it doesn't need null_equals_null.
```suggestion
// No need to support null aware equal because ...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]
jonathanc-n commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2969175497 After some thinking I realized that it will still not need `null_equals_null` support in the nestedloopjoin this will happen at planner time, and it is more of a question on whether equijoins should be supported. I think this todo should be removed to deter anybody from trying to implement it in the nestedloopjoin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]
jonathanc-n closed pull request #16390: fix: Remove `null_equals_null` todo in `NestedLoopJoin` URL: https://github.com/apache/datafusion/pull/16390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]
UBarney commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2969058654 > however @UBarney was able to point out that a on clause already included in the join filter 🤦. I mean that non-equal condition(eg `<=>`) in `on` will be included in the join filter. (sorry for misundering) https://github.com/apache/datafusion/pull/16210#discussion_r2144133973 I think we can support `null_equals_null` in nlj by merging `on` condition into `filter` and reusing existing filter logical. For instance, if `null_equals_null` is enabled, a condition like `filter: t1.a < t2.a, on: t1.c = t2.c` would be transformed into a single filter: filter: `t1.a < t2.a and t1.c <=> t2.c`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
