Hadoop 0.22.0-RC0 I have the following reducer: public static class MergeRecords extends Reducer<Text,MapWritable,Text,MapWritable>
The MapWritables that are handled by the reducer all have Text 'keys' and contain different 'value' classes including Text, DoubleWritable, and a custom Writable MapArrayWritable. The reduce works as expected if each MapWritable contains both a DoubleWritable and MapArrayWritable. The reduce fails with the following exception if some of the MapWritables contains only a DoubleWritable value: ----------- java.lang.IllegalArgumentException: Id 1 exists but maps to com.realcomp.data.hadoop.record.MapArrayWritable and not org.apache.hadoop.io.DoubleWritable at org.apache.hadoop.io.AbstractMapWritable.addToMap(AbstractMapWritable.java:75) at org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:203) at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:148) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:145) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121) at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:292) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) at org.apache.hadoop.mapred.ReduceTask. ------------ Digging into the source a little I stumbled upon the fact that the default constructor for AbstractMapWritable does not configure itself to handle DoubleWritable as it does for all the other base Writables. This looks like an omission to me, and If the DoubleWritable was configured, I would probably never have noticed this problem, as there would be only one custom class in the MapWritable. Question 1: Should I be able to reduce on MapWritables that contain different (custom) value classes? Question 2: It appears the org.apache.hadoop.io.serialize.WritableSerialization class reuses the first MapWritable instance for each deserialization. This is probably a performance optimization, and explains why I am getting the exception. Is it possible for me to register my own serialization class that would allow me to deserialize MapWritables with different value classes? Are there examples of this available? Note: I realize I am running off of a release candidate, but I thought I would ask here first before I go through the trouble of upgrading the cluster. thanks, Kyle