I am actually trying to reduce the number of mappers because my application takes up a lot of memory (in the order of 1-2 GB ram per mapper). I want to be able to use a few mappers but still maintain good CPU utilization through multithreading within a single mapper. Multithreaded Mapper does't work because it duplicates in memory data structures.
Thanks Yunming On Sun, Sep 29, 2013 at 6:59 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Wouldn't you rather just change your split size so that you can have more > mappers work on your input? What else are you doing in the mappers? > Sent from my iPad > > On Sep 30, 2013, at 2:22 AM, yunming zhang <zhangyunming1...@gmail.com> > wrote: > > Hi, > > I was playing with Hadoop code trying to have a single Mapper support > reading a input split using multiple threads. I am getting All datanodes > are bad IOException, and I am not sure what is the issue. > > The reason for this work is that I suspect my computation was slow because > it takes too long to create the Text() objects from inputsplit using a > single thread. I tried to modify the LineRecordReader (since I am mostly > using TextInputFormat) to provide additional methods to retrieve lines from > the input split getCurrentKey2(), getCurrentValue2(), nextKeyValue2(). I > created a second FSDataInputStream, and second LineReader object for > getCurrentKey2(), getCurrentValue2() to read from. Essentially I am trying > to open the input split twice with different start points (one in the very > beginning, the other in the middle of the split) to read from input split > in parallel using two threads. > > In the org.apache.hadoop.mapreduce.mapper.run() method, I modified it to > read simultaneously using getCurrentKey() and getCurrentKey2() using Thread > 1 and Thread 2 (both threads running at the same tim > Thread 1: > while(context.nextKeyValue()){ > map(context.getCurrentKey(), context.getCurrentValue(), > context); > } > > Thread 2: > while(context.nextKeyValue2()){ > map(context.getCurrentKey2(), context.getCurrentValue2(), > context); > //System.out.println("two iter"); > } > > However, this causes me to see the All Datanodes are bad exception. I > think I made sure that I closed the second file. I have attached a copy of > my LineRecordReader file to show what I changed trying to enable two > simultaneous read to the input split. > > I have modified other files(org.apache.hadoop.mapreduce.RecordReader.java, > mapred.MapTask.java ....) just to enable Mapper.run to call > LinRecordReader.getCurrentKey2() and other access methods for the second > file. > > > I would really appreciate it if anyone could give me a bit advice or just > point me to a direction as to where the problem might be, > > Thanks > > Yunming > > <LineRecordReader.java> > >