Hello, I wrote a very simple InputFormat and RecordReader to send binary data to mappers. Binary data can contain anything (including \n, \t, \r), here is what next() may actually send:
public class MyRecordReader implements RecordReader<BytesWritable, BytesWritable> { ... public boolean next(BytesWritable key, BytesWritable ignore) throws IOException { ... byte[] result = new byte[8]; for (int i = 0; i < result.length; ++i) result[i] = (byte)(i+1); result[3] = (byte)'\n'; result[4] = (byte)'\n'; key.set(result, 0, result.length); return true; } } As you can see I am using BytesWritable to send eight bytes: 01 02 03 0a 0a 06 07 08, I also use Hadoop-1722 typed bytes (by setting -D stream.map.input=typedbytes). According to the documentation of typed bytes the mapper should receive the following byte sequence: 00 00 00 08 01 02 03 0a 0a 06 07 08 However bytes are somehow modified and I get the following sequence instead: 00 00 00 08 01 02 03 09 0a 09 0a 06 07 08 0a = '\n' 09 = '\t' It seems that Hadoop (streaming?) parsed the new line character as a separator and put '\t' which is the key/value separator for streaming I assume. Is there any work around to send *exactly* the same bytes sequence no matter what characters are in the sequence? Thanks in advance. Best regards, Youssef Hatem