Re: Mapping one key per Map Task

2011-05-23 Thread Moustafa Gaber
I think you don't need to split your input file so that each map is assigned one key. Your goal is to make load balance. For each map task of yours, it will initiate a new MR sub-job. This sub-job will be assigned a new master/workers, which means the map task of the sub-job may be scheduled to wor

Re: Mapping one key per Map Task

2011-05-23 Thread Vincent Xue
Thanks for the suggestions! On Mon, May 23, 2011 at 5:50 PM, Harsh J wrote: > Vincent, > > You _might_ lose locality by splitting beyond the block splits, and > the tasks although better 'parallelized', may only end up performing > worse. A good way to instead increase task #s is to go the block

Re: Mapping one key per Map Task

2011-05-23 Thread Harsh J
Vincent, You _might_ lose locality by splitting beyond the block splits, and the tasks although better 'parallelized', may only end up performing worse. A good way to instead increase task #s is to go the block size way (lower block size, getting more splits at the cost of little extra NN space).

Re: Mapping one key per Map Task

2011-05-23 Thread Jason
Look at NLineInputFormat Sent from my iPhone On May 23, 2011, at 2:09 AM, Vincent Xue wrote: > Hello Hadoop Users, > > I would like to know if anyone has ever tried splitting an input > sequence file by key instead of by size. I know that this is unusual > for the map reduce paradigm but I am

Re: Mapping one key per Map Task

2011-05-23 Thread Joey Echeverria
Look at getInputSplits() of SequenceFileInputFormat. -Joey On May 23, 2011 5:09 AM, "Vincent Xue" wrote: > Hello Hadoop Users, > > I would like to know if anyone has ever tried splitting an input > sequence file by key instead of by size. I know that this is unusual > for the map reduce paradigm

Mapping one key per Map Task

2011-05-23 Thread Vincent Xue
Hello Hadoop Users, I would like to know if anyone has ever tried splitting an input sequence file by key instead of by size. I know that this is unusual for the map reduce paradigm but I am in a situation where I need to perform some large tasks on each key pair in a load balancing like fashion.