jonathanc-n opened a new issue, #16425:
URL: https://github.com/apache/datafusion/issues/16425
### Is your feature request related to a problem or challenge?
This is part of #13181 for looking into different joins.
## What is a Single Join
Single joins are similar to a regular left outer join however the build side
can only have zero or one match with the probe side. If there are two or
matches for one build side row, it will throw an error during runtime. The
reason we need a separate join type for this is because other join types do not
keep track of these number of matches.
## What is it used for
Single joins are used to optimize scalar subqueries. In Neumann's paper for
unnesting subqueries
([here](https://btw-2015.informatik.uni-hamburg.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf)),
he writes on dependent joins, which is just a join where every iteration of
the left side will incur a search on the right side (right side dependent on
left).
Here is an example of a scalar subquery:
```
SELECT
e.name,
(
SELECT d.dept_name
FROM departments d
WHERE d.dept_id = e.dept_id
) AS department
FROM employees e;
```
Scalar subqueries act as a type of dependent join due to the subquery
needing to be calculated for every row on the left. We can eliminate this by
creating a hash table out of the subquery side and probing with the build side.
This cuts the `O(n^2)` runtime to `O(n)`. If the build side matches on more
than two rows we call an error as it would be an invalid scalar subquery.
### Describe the solution you'd like
- Optimize scalar subqueries into single joins
- Using the existing hash join execution plan we can create a path for
single join (duckdb does something similar to this)
- Do checks on each probe to see if it matched multiple rows; call execution
error if it did.
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]