Hi, I have a Hive 0.12 Job running on Hadoop 2.2 which is running as a local map join. The problem is that it skips one of the steps by dumping the table as a hashtable (according to the logs) but then in another step when it is trying to retrieve the hashtable to do a join hive searches for another filename which is not present and throws a FileNotFoundException and the query fails.
I retrieved from the log file in /tmp/ec2-user/.log the following entries: ... 2013-12-04 01:26:50 Starting to launch local task to process map join; maximum memory = 1065484288 2013-12-04 01:26:52 Dump the hashtable into file: file:/tmp/ec2-user/hive_2013-12-04_13-26-26_721_6769305093209091981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile22--.hashtable 2013-12-04 01:26:52 Upload 1 File to: file:/tmp/ec2-user/hive_2013-12-04_13-26-26_721_6769305093209091981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile22--.hashtable File size: 261 2013-12-04 01:26:52 End of local task; Time Taken: 2.142 sec. Execution completed successfully Mapred Local Task Succeeded . Convert the Join into MapJoin Mapred Local Task Succeeded . Convert the Join into MapJoin ... 2013-12-04 13:27:03,848 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(397)) - Map task executor complete. 2013-12-04 13:27:03,850 WARN mapred.LocalJobRunner (LocalJobRunner.java:run(482)) - job_local785823418_0001 java.lang.Exception: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: /tmp/ec2-user/hive_2013-12-04_13-26-26_721_676930 5093209091981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile20--.hashtable (No such file or directory) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: /tmp/ec2-user/hive_2013-12-04_13-26-26_721_6769305093209091 981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile20--.hashtable (No such file or directory) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ... There is no mention before the exception of the file MapJoin-mapfile20--.hashtable being dumped, just a very similar in the same path with the name MapJoin-mapfile22--.hashtable In another query where I had a similar problem I was joining a table partitioned by one field. I tried the query with the corresponding table unpartitioned and it worked fine. Is this possibly a bug when joining tables which have partitions locally? Any ideas on how to fix it other than leaving the table unpartitioned or disabling local jobs? Thanks, Juan.