Not sure if anything else exists, but you can easily implement your own
RecordReader that gets a FSDataInputStream from the FileSystem for the
FileSplit, and then read records from that like you would any other
InputStream (with offset, length, byte[], etc).
On Thu, Apr 29, 2010 at 5:36 AM, Pete
Hello all -
I am currently trying to integrate the numpy Python fast-arrays package with
Hadoop. I am basically looking for a way to read binary data similar to a
SequenceFile, except without keys. That is, similar to how the
TextInputFormat emits the position in the file as the key, I would like