Hi,

I have a Hive 0.12 Job running on Hadoop 2.2 which is running as a local
map join. The problem is that it skips one of the steps by dumping the
table as a hashtable (according to the logs) but then in another step when
it is trying to retrieve the hashtable to do a join hive searches for
another filename which is not present and throws a FileNotFoundException
and the query fails.

I retrieved from the log file in /tmp/ec2-user/.log the following entries:
...
2013-12-04 01:26:50  Starting to launch local task to process map join;
maximum memory = 1065484288
2013-12-04 01:26:52  Dump the hashtable into file:
file:/tmp/ec2-user/hive_2013-12-04_13-26-26_721_6769305093209091981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile22--.hashtable
2013-12-04 01:26:52  Upload 1 File to:
file:/tmp/ec2-user/hive_2013-12-04_13-26-26_721_6769305093209091981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile22--.hashtable
File size: 261
2013-12-04 01:26:52  End of local task; Time Taken: 2.142 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
...
2013-12-04 13:27:03,848 INFO  mapred.LocalJobRunner
(LocalJobRunner.java:run(397)) - Map task executor complete.
2013-12-04 13:27:03,850 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(482)) - job_local785823418_0001
java.lang.Exception: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.FileNotFoundException:
/tmp/ec2-user/hive_2013-12-04_13-26-26_721_676930
5093209091981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile20--.hashtable
(No such file or directory)
      at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.io.FileNotFoundException:
/tmp/ec2-user/hive_2013-12-04_13-26-26_721_6769305093209091
981-1/-local-10007/HashTable-Stage-14/MapJoin-mapfile20--.hashtable (No
such file or directory)
      at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:175)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
      at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)
...

There is no mention before the exception of the file
MapJoin-mapfile20--.hashtable being dumped, just a very similar in the same
path with the name MapJoin-mapfile22--.hashtable

In another query where I had a similar problem I was joining a table
partitioned by one field. I tried the query with the corresponding table
unpartitioned and it worked fine.

Is this possibly a bug when joining tables which have partitions locally?
Any ideas on how to fix it other than leaving the table unpartitioned or
disabling local jobs?

Thanks,
Juan.

Reply via email to