Hi all,
I found this jira for an issue I ran into recently:
https://issues.apache.org/jira/browse/SPARK-28771
My initial idea for a fix is to change SortMergeJoinExec's (and
ShuffledHashJoinExec) requiredChildDistribution.
At least if all below conditions are met, we could only require a subset
Hi All,
For the use case where the expensive UDF has constant inputs (literals) we
have proposed the following JIRA and PR which calculates the UDF only once
in the driver:
https://issues.apache.org/jira/browse/SPARK-27692
https://github.com/apache/spark/pull/24593
If considering revisiting the o