Re: All datanodes are bad IOException when trying to implement multithreading serialization

Sonal Goyal Sun, 29 Sep 2013 17:07:10 -0700

Wouldn't you rather just change your split size so that you can have more 
mappers work on your input? What else are you doing in the mappers?
Sent from my iPad


On Sep 30, 2013, at 2:22 AM, yunming zhang <zhangyunming1...@gmail.com> wrote:

> Hi, 
> 
> I was playing with Hadoop code trying to have a single Mapper support reading 
> a input split using multiple threads. I am getting All datanodes are bad 
> IOException, and I am not sure what is the issue. 
> 
> The reason for this work is that I suspect my computation was slow because it 
> takes too long to create the Text() objects from inputsplit using a single 
> thread. I tried to modify the LineRecordReader (since I am mostly using 
> TextInputFormat) to provide additional methods to retrieve lines from the 
> input split  getCurrentKey2(), getCurrentValue2(), nextKeyValue2(). I created 
> a second FSDataInputStream, and second LineReader object for 
> getCurrentKey2(), getCurrentValue2() to read from. Essentially I am trying to 
> open the input split twice with different start points (one in the very 
> beginning, the other in the middle of the split) to read from input split in 
> parallel using two threads.  
> 
> In the org.apache.hadoop.mapreduce.mapper.run() method, I modified it to read 
> simultaneously using getCurrentKey() and getCurrentKey2() using Thread 1 and 
> Thread 2 (both threads running at the same tim
>       Thread 1:
>        while(context.nextKeyValue()){
>                   map(context.getCurrentKey(), context.getCurrentValue(), 
> context);
>         }
> 
>       Thread 2:
>         while(context.nextKeyValue2()){
>                 map(context.getCurrentKey2(), context.getCurrentValue2(), 
> context);
>                 //System.out.println("two iter");
>         }
> 
> However, this causes me to see the All Datanodes are bad exception. I think I 
> made sure that I closed the second file. I have attached a copy of my 
> LineRecordReader file to show what I changed trying to enable two 
> simultaneous read to the input split. 
> 
> I have modified other files(org.apache.hadoop.mapreduce.RecordReader.java, 
> mapred.MapTask.java ....)  just to enable Mapper.run to call 
> LinRecordReader.getCurrentKey2() and other access methods for the second 
> file. 
> 
> 
> I would really appreciate it if anyone could give me a bit advice or just 
> point me to a direction as to where the problem might be, 
> 
> Thanks
> 
> Yunming 
> 
> <LineRecordReader.java>

Re: All datanodes are bad IOException when trying to implement multithreading serialization

Reply via email to