Hadoop streaming inserts tabs into mapper output

2012-10-18 Thread Jason Wang
With hadoop streaming and no reducer, I would expect the output written to HDFS to be the exact STDOUT from the mapper. I noticed that tab characters (0x9) are getting inserted before every new line character (0xa). This is problematic for me because the output of my mapper is binary data which I

Re: hadoop streaming with custom RecordReader class

2012-10-18 Thread Jason Wang
> > mkdir mypackage > > mv mypackage/ > > jar cvf NLineRecordReader.jar mypackage > > [Use this jar] > > > > On Thu, Oct 18, 2012 at 10:54 AM, Jason Wang > wrote: > >> 1. I did try using NLineInputFormat, but this causes the > >> "stream.map.input.

Re: hadoop streaming with custom RecordReader class

2012-10-17 Thread Jason Wang
nto the front-end too? > > $ export HADOOP_CLASSPATH=/path/to/your/jar > $ command… > > 3. Does jar -tf carry a proper mypackage.NLineRecordReader? > > 4. Is your class marked public? > > On Thu, Oct 18, 2012 at 9:32 AM, Jason Wang > wrote: > > Hi all, > > I&

hadoop streaming with custom RecordReader class

2012-10-17 Thread Jason Wang
Hi all, I'm experimenting with hadoop streaming on build 1.0.3. To give background info, i'm streaming a text file into mapper written in C. Using the default settings, streaming uses TextInputFormat which creates one record from each line. The problem I am having is that I need record boundarie