MIchael! Although there is a little overlap in the communities, I strongly suggest you email u...@orc.apache.org ( https://orc.apache.org/help/ ) I don't know if you have to be subscribed to a mailing list to get replies to your email address.
Ravi On Wed, Oct 19, 2016 at 11:29 AM, Michael Segel <msegel_had...@hotmail.com> wrote: > Just to follow up… > > This appears to be a bug in the hive version of the code… fixed in the orc > library… NOTE: There are two different libraries. > > Documentation is a bit lax… but in terms of design… > > Its better to do the build completely in the reducer making the mapper > code cleaner. > > > > On Oct 19, 2016, at 11:00 AM, Michael Segel <msegel_had...@hotmail.com> > wrote: > > > > Hi, > > Since I am not on the ORC mailing list… and since the ORC java code is > in the hive APIs… this seems like a good place to start. ;-) > > > > > > So… > > > > Ran in to a little problem… > > > > One of my developers was writing a map/reduce job to read records from a > source and after some filter, write the result set to an ORC file. > > There’s an example of how to do this at: > > http://hadoopcraft.blogspot.com/2014/07/generating-orc- > files-using-mapreduce.html > > > > So far, so good. > > But now here’s the problem…. Large source data, means many mappers and > with the filter, the number of output rows is a fraction in terms of size. > > So we want to write to a single reducer. (An identity reducer) so that > we get only a single file. > > > > Here’s the snag. > > > > We were using the OrcSerde class to serialize the data and generate an > Orc row which we then wrote to the file. > > > > Looking at the source code for OrcSerde, OrcSerde.serialize() returns a > OrcSerdeRow. > > see: http://grepcode.com/file/repo1.maven.org/maven2/co. > cask.cdap/hive-exec/0.13.0/org/apache/hadoop/hive/ql/io/orc/OrcSerde.java > > > > OrcSerdeRow implements Writable and as we can see in the example code… > for a map only example… context.write(Text, Writable) works. > > > > However… if we attempt to make this in to a Map/Reduce job, we run in to > a problem during run time. the context.write() throws the following > exception: > > "Error: java.io.IOException: Type mismatch in value from map: expected > org.apache.hadoop.io.Writable, received org.apache.hadoop.hive.ql.io. > orc.OrcSerde$OrcSerdeRow” > > > > > > The goal was to reduce the orc rows and then write out in the reducer. > > > > I’m curious as to why the context.write() fails? > > The error is a bit cryptic since the OrcSerdeRow implements Writable… so > the error message doesn’t make sense. > > > > > > Now the quick fix is to borrow the ArrayListWritable from giraph and > create the list of fields in to an ArrayListWritable and pass that to the > reducer which will then use that to generate the ORC file. > > > > Trying to figure out why the context.write() fails… when sending to > reducer while it works if its a mapside write. > > > > The documentation on the ORC site is … well… to be polite… lacking. ;-) > > > > I have some ideas why it doesn’t work, however I would like to confirm > my suspicions. > > > > Thx > > > > -Mike > > > > > > B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB� > � [��X��ܚX�K K[XZ[ � \�\�][��X��ܚX�P Y �� �\ X� K�ܙ�B��܈ Y ] [ۘ[ ��[X[� > � K[XZ[ � \�\�Z [ Y �� �\ X� K�ܙ�B > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org >