Disclaimer: Sending this to tez mailing list even though a Hive query
fails, because this query passes on MapReduce but fails on Tez.
I have a hive table with two partitions p=1 and p=2. p=1 contains Sequence
Files with key as BytesWritable, p=2 contains Sequence Files with key as
Text. A simple count(*) on the table fails with:
2015-07-28 21:36:55,094 ERROR [TezChild] io.HiveContextAwareRecordReader:
Caught and rethrowing java.io.IOException: java.io.IOException: While
processing file s3n://<location_hidden>/foo_bar_p/p=2/foo1. wrong key
class: org.apache.hadoop.io.Text is not class
org.apache.hadoop.io.BytesWritable
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:372)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:118)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:137)
at
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:290)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1635)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: While processing file
s3n://qubole-karma/quboleregression/customer_workload_data/pinterest/foo_bar_p/p=2/foo1.
wrong key class: org.apache.hadoop.io.Text is not class
org.apache.hadoop.io.BytesWritable
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.handleExceptionWhenReadNext(HiveContextAwareRecordReader.java:386)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:368)
... 22 more
Caused by: java.io.IOException: wrong key class: org.apache.hadoop.io.Text
is not class org.apache.hadoop.io.BytesWritable
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2484)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
at
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:359)
... 22 more
It passes with MapReduce.
Any ideas why this happens?
Thanks,
Rajat
--
Sent from mobile device. Excuse brevity and tyops.