Hi Francesco TextInputFormat reads line by line based on '\n' by default, there the key values is the position offset and the line contents respectively. But in your case it is just a sequence of integers and also it is Binary. Also you require the offset for each integer value and not offset by line. I believe you may have to write your own custom Record Reader to get this done.
On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <yuri....@gmail.com>wrote: > Hi Mohammad, > > SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html> > requires > the file to be a sequence of key/value stored in binary (i.e., the key is > stored in the file). In my case, the key is implicitly given by the > position of the value within the file. > > Thank you, > Francesco > > > > On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <donta...@gmail.com> wrote: > >> Hello Francesco, >> >> Have a look at SequenceFileInputFormat : >> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html >> >> Regards, >> Mohammad Tariq >> >> >> >> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri >> <yuri....@gmail.com>wrote: >> >>> Hello, >>> >>> I have a binary file of integers and I would like an input format that >>> generates pairs <key,value>, where value is an integer in the file and key >>> the position of the integer in the file. Which class should I use? (i.e. >>> I'm looking for a kind of TextinputFormat for binary files) >>> >>> Thank you for your consideration, >>> >>> Francesco >>> >> >> >