Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-14 Thread via GitHub


jonathanc-n commented on PR #16390:
URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2973341692

   @Dandandan Yes, it was mentioned in the original pull request. I concluded 
that if it is better for the nested loop join to take a equijoin condition 
(where one table's rows are under a certain threshold) and the 
`null_equals_null = true`, then we will pass in the join condition under the 
join filter using the `<=>`; if `null_equals_null` = false, then we will just 
pass it in as `=`.
   
   I'll just move the todo statement to the physical planner. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-14 Thread via GitHub


Dandandan commented on PR #16390:
URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2972940062

   > After some thinking I realized that it will still not need 
`null_equals_null` support in the nestedloopjoin this will happen at planner 
time, and it is more of a question on whether equijoins should be supported. I 
think this todo should be removed to deter anybody from trying to implement it 
in the nestedloopjoin
   
   I think it makes sense to support it for optimizing joins when one of the 
sides is only a small number of rows (which is faster than execution a hash 
join). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-13 Thread via GitHub


alamb commented on code in PR #16390:
URL: https://github.com/apache/datafusion/pull/16390#discussion_r2145093532


##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -718,8 +718,6 @@ struct NestedLoopJoinStream {
 inner_table: OnceFut,
 /// Information of index and left / right placement of columns
 column_indices: Vec,
-// TODO: support null aware equal

Review Comment:
   Rather than just deleting this, Maybe we can add a comment explaining the 
rationale for why it doesn't need null_equals_null.
   
   ```suggestion
   // No need to support null aware equal because ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-13 Thread via GitHub


jonathanc-n commented on PR #16390:
URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2969175497

   After some thinking I realized that it will still not need 
`null_equals_null` support in the nestedloopjoin this will happen at planner 
time, and it is more of a question on whether equijoins should be supported. I 
think this todo should be removed to deter anybody from trying to implement it 
in the nestedloopjoin


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub


jonathanc-n closed pull request #16390: fix: Remove `null_equals_null` todo in 
`NestedLoopJoin`
URL: https://github.com/apache/datafusion/pull/16390


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub


UBarney commented on PR #16390:
URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2969058654

   > however @UBarney was able to point out that a on clause already included 
in the join filter 🤦.
   
   I mean that non-equal condition(eg `<=>`) in `on` will be included  in the 
join filter. (sorry for misundering)  
https://github.com/apache/datafusion/pull/16210#discussion_r2144133973
   
   
   I think we can support `null_equals_null` in nlj by merging `on` condition 
into `filter` and reusing existing filter logical. For instance, if 
`null_equals_null` is enabled, a condition like `filter: t1.a < t2.a, on: t1.c 
= t2.c` would be transformed into a single filter: filter: `t1.a < t2.a and 
t1.c <=> t2.c`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]