Re: Custom file formats

2010-05-03 Thread William Kinney
Not sure if anything else exists, but you can easily implement your own RecordReader that gets a FSDataInputStream from the FileSystem for the FileSplit, and then read records from that like you would any other InputStream (with offset, length, byte[], etc). On Thu, Apr 29, 2010 at 5:36 AM, Pete

Custom file formats

2010-04-29 Thread Pete Hunt
Hello all - I am currently trying to integrate the numpy Python fast-arrays package with Hadoop. I am basically looking for a way to read binary data similar to a SequenceFile, except without keys. That is, similar to how the TextInputFormat emits the position in the file as the key, I would like