Xiao Li created SPARK-29092: ------------------------------- Summary: EXPLAIN FORMATTED does not work well with DPP Key: SPARK-29092 URL: https://issues.apache.org/jira/browse/SPARK-29092 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Xiao Li
{code:java} withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true", SQLConf.DYNAMIC_PARTITION_PRUNING_REUSE_BROADCAST.key -> "false") { withTable("df1", "df2") { spark.range(1000) .select(col("id"), col("id").as("k")) .write .partitionBy("k") .format(tableFormat) .mode("overwrite") .saveAsTable("df1") spark.range(100) .select(col("id"), col("id").as("k")) .write .partitionBy("k") .format(tableFormat) .mode("overwrite") .saveAsTable("df2") sql("EXPLAIN FORMATTED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2") .show(false) sql("EXPLAIN EXTENDED SELECT df1.id, df2.k FROM df1 JOIN df2 ON df1.k = df2.k AND df2.id < 2") .show(false) } } {code} The output of EXPLAIN EXTENDED is expected. {code:java} == Physical Plan == *(2) Project [id#2721L, k#2724L] +- *(2) BroadcastHashJoin [k#2722L], [k#2724L], Inner, BuildRight :- *(2) ColumnarToRow : +- FileScan parquet default.df1[id#2721L,k#2722L] Batched: true, DataFilters: [], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2722L), dynamicpruningexpression(k#2722L IN subquery2741)], PushedFilters: [], ReadSchema: struct<id:bigint> : +- Subquery subquery2741, [id=#358] : +- *(2) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L#2740L]) : +- Exchange hashpartitioning(k#2724L, 5), true, [id=#354] : +- *(1) HashAggregate(keys=[k#2724L], functions=[], output=[k#2724L]) : +- *(1) Project [k#2724L] : +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2)) : +- *(1) ColumnarToRow : +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint> +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true])), [id=#379] +- *(1) Project [k#2724L] +- *(1) Filter (isnotnull(id#2723L) AND (id#2723L < 2)) +- *(1) ColumnarToRow +- FileScan parquet default.df2[id#2723L,k#2724L] Batched: true, DataFilters: [isnotnull(id#2723L), (id#2723L < 2)], Format: Parquet, Location: PrunedInMemoryFileIndex[file:/Users/lixiao/IdeaProjects/spark/sql/core/spark-warehouse/org.apache..., PartitionFilters: [isnotnull(k#2724L)], PushedFilters: [IsNotNull(id), LessThan(id,2)], ReadSchema: struct<id:bigint> {code} However, the output of FileScan node of EXPLAIN FORMATTED does not show the effect of DPP {code:java} |== Physical Plan == * Project (9) +- * BroadcastHashJoin Inner BuildRight (8) :- * ColumnarToRow (2) : +- Scan parquet default.df1 (1) +- BroadcastExchange (7) +- * Project (6) +- * Filter (5) +- * ColumnarToRow (4) +- Scan parquet default.df2 (3) (1) Scan parquet default.df1 Output: [id#2716L, k#2717L] {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org