> Query is a simple group-by on top of sequence table.
...
> java.io.IOException: java.io.IOException: wrong key class:
>org.apache.hadoop.io.BytesWritable is not class
>org.apache.hadoop.io.NullWritable

I have seen this issue when mixing Sequence files written by PIG with
Sequence files written by Hive - primarily because the data ingestion
wasn¹t done properly via HCatalog writers.

Last report, the first sequence file had as its header

M?.io.LongWritable"org.apache.hadoop.io.BytesWritable)org.apache.hadoop.io.
compress.SnappyCodec??


and the second one had

SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text)org.apache.h
adoop.io.compress.SnappyCodec?


You can cross-check the exception trace and make sure that the exception
is coming from the RecordReader as the k-v pairs change types between
files.

Primarily this doesn¹t happen in Hive-mr at the small scale, but it
happens for both MR and Tez.

To hit this via CombineInputFormat, you need a file which has been split
up between machines and two such files to generate a combined split of
mismatched schema.

Tez is more aggressive at splitting, since it relies on the file format
splits, not HDFS locations.

If you confirm that this is indeed the cause of the issue, I might have an
idea how to fix it.

Cheers,
Gopal 


Reply via email to