Hadoop Custom InputFormat (SequenceFileInputFormat vs FileInputFormat)

2016-07-15 Thread Travis Chung
I'm working with a single image file that consists of headers and a multitude of different of data segment types (each data segment having its own sub-header that contains meta data). Example file layout: | Header | Seg A-1 Sub-Header | Seg A-1 Data | Seg A-2 SubHdr | Seg A-2 Data | Seg B-1 Subhd

FileSplit clarification

2016-08-02 Thread Travis Chung
I wanted to get clarification on the start parameter. If I understand correctly, it's the byte offset from the beginning of the file. /** Constructs a split with host information * * @param file the file name * @param start the position of the first byte in the file to process * @param