Hi, Since I am not on the ORC mailing list… and since the ORC java code is in the hive APIs… this seems like a good place to start. ;-)
So… Ran in to a little problem… One of my developers was writing a map/reduce job to read records from a source and after some filter, write the result set to an ORC file. There’s an example of how to do this at: http://hadoopcraft.blogspot.com/2014/07/generating-orc-files-using-mapreduce.html So far, so good. But now here’s the problem…. Large source data, means many mappers and with the filter, the number of output rows is a fraction in terms of size. So we want to write to a single reducer. (An identity reducer) so that we get only a single file. Here’s the snag. We were using the OrcSerde class to serialize the data and generate an Orc row which we then wrote to the file. Looking at the source code for OrcSerde, OrcSerde.serialize() returns a OrcSerdeRow. see: http://grepcode.com/file/repo1.maven.org/maven2/co.cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java OrcSerdeRow implements Writable and as we can see in the example code… for a map only example… context.write(Text, Writable) works. However… if we attempt to make this in to a Map/Reduce job, we run in to a problem during run time. the context.write() throws the following exception: "Error: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Writable, received org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow” The goal was to reduce the orc rows and then write out in the reducer. I’m curious as to why the context.write() fails? The error is a bit cryptic since the OrcSerdeRow implements Writable… so the error message doesn’t make sense. Now the quick fix is to borrow the ArrayListWritable from giraph and create the list of fields in to an ArrayListWritable and pass that to the reducer which will then use that to generate the ORC file. Trying to figure out why the context.write() fails… when sending to reducer while it works if its a mapside write. The documentation on the ORC site is … well… to be polite… lacking. ;-) I have some ideas why it doesn’t work, however I would like to confirm my suspicions. Thx -Mike