Johannes Mayer created SPARK-24859: -------------------------------------- Summary: Predicates pushdown on outer joins Key: SPARK-24859 URL: https://issues.apache.org/jira/browse/SPARK-24859 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.2.0 Environment: Cloudera CDH 5.13.1 Reporter: Johannes Mayer
I have two AVRO tables in Hive called FAct and DIm. Both are partitioned by a common column called part_col. Now I want to join both tables on their id but only for some of partitions. If I use an inner join, everything works well: {code:java} select * from FA f join DI d on(f.id = d.id and f.part_col = d.part_col) where f.part_col = 'xyz' {code} In the sql explain plan i can see, that the predicate part_col = 'xyz' is also used in the DIm HiveTableScan. When I execute the same query using a left join the full dim table is scanned. There are some workarounds for this issue, but i wanted to report this as a bug, since it works on an inner join, and i think the behaviour should be the same for an outer join -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org