michaelli created HIVE-26273: -------------------------------- Summary: “file does not exist” exception occured when using spark dynamic partition pruning and small table is empty Key: HIVE-26273 URL: https://issues.apache.org/jira/browse/HIVE-26273 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 2.1.1 Reporter: michaelli Attachments: execution plan for good run.txt, execution plan for issue run.txt, issue log.txt
*Issue summary:* When inner join tableA to tableB on partition key of tableB, if dynamic partition pruning is enabled and tableA is emplty, the query will fail with below exception: Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed due to: File hdfs://nameservice1/tmp/hive/hive/fddbc5ac-3596-428d-8b42-cbc61952d182/hive_2022-05-30_14-03-17_139_1843975612196554546-15339/-mr-10003/2/1 does not exist. (state=42000,code=3). I encountered this when using hive-2.1.1-cdh6.3.2, and i think this occurs to other versions too. *Steps to reproduce the issue:* 1. prepare tables: CREATE TABLE tableA ( businsys_no decimal(10,0), acct_id string, prod_code string) PARTITIONED BY (init_date int) stored as orc; CREATE TABLE tableB ( client_id string, open_date decimal(10,0), client_status string, organ_flag string) PARTITIONED BY (businsys_no decimal(10,0)) stored as orc; 2. prepare data for tables: – tableA should be emplty; -- prepare some data for tableB 3. run below steps to reproduce the issue: set hive.execution.engine=spark; set hive.auto.convert.join=true; set hive.spark.dynamic.partition.pruning=true; set hive.spark.dynamic.partition.pruning.map.join.only=true; select * from (select * from tableA fp where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no from tableB ic) ici on cfp.businsys_no = ici.businsys_no and cfp.acct_id = ici.client_id; 4. currently we turned off spark dynamic partition pruning to workaround this: set hive.execution.engine=spark; set hive.auto.convert.join=true; set hive.spark.dynamic.partition.pruning=false; set hive.spark.dynamic.partition.pruning.map.join.only=false; select * from (select * from tableA fp where fp.init_date = 20220525) cfp inner join (select ic.client_id, ic.businsys_no from tableB ic) ici on cfp.businsys_no = ici.businsys_no and cfp.acct_id = ici.client_id; *execution logs and execution plan:* the execution logs and execution plans are attached: -- This message was sent by Atlassian Jira (v8.20.7#820007)