[ https://issues.apache.org/jira/browse/SPARK-46946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-46946. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44988 [https://github.com/apache/spark/pull/44988] > Supporting broadcast of multiple filtering keys in DynamicPruning > ----------------------------------------------------------------- > > Key: SPARK-46946 > URL: https://issues.apache.org/jira/browse/SPARK-46946 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 4.0.0, 3.5.1 > Reporter: Thang Long Vu > Assignee: Thang Long Vu > Priority: Major > Labels: pull-request-available, releasenotes > Fix For: 4.0.0 > > > This PR extends `DynamicPruningSubquery` to support broadcasting of multiple > filtering keys (instead of one as before). The majority of the PR is to > simply generalise singularity to plurality. > Note: We actually do not use the multiple filtering keys > `DynamicPruningSubquery` in this PR, we are doing this to make supporting DPP > Null Safe Equality or multiple Equality predicates easier in the future. > In Null Safe Equality JOIN, the JOIN condition `a <=> b` is transformed to > `Coalesce(key1, Literal(key1.dataType)) = Coalesce(key2, > Literal(key2.dataType)) AND IsNull(key1) = IsNull(key2)`. In order to have > the highest pruning efficiency, we broadcast the 2 keys `Coalesce(key, > Literal(key.dataType))` and `IsNull(key)` and use them to prune the other > side at the same time. > Before, the `DynamicPruningSubquery` only has one broadcasting key and we > only supports DPP for one `EqualTo` JOIN predicate, now we are extending the > subquery to multiple broadcasting keys. Please note that DPP has not been > supported for multiple JOIN predicates. > Put it in another way, at the moment, we don't insert a DPP Filter for > multiple JOIN predicates at the same time, only potentially insert a DPP > Filter for a given Equality JOIN predicate. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org