Hmm. That feels like a join. Can't you read the input file on the map side
and output those keys along with the original map output keys.. That way
the reducer would automatically get both together ?


On Thu, Mar 28, 2013 at 5:20 PM, Alberto Cordioli <
cordioli.albe...@gmail.com> wrote:

> Hi Hemanth,
>
> thanks for your reply.
> Yes, this partially answered to my question. I know how hash
> partitioner works and I guessed something similar.
> The piece that I missed was that mapred.task.partition returns the
> partition number of the reducer.
> So, putting al the pieces together I undersand that: for each key in
> the file I have to call the HashPartitioner.
> Then I have to compare the returned index with the one retrieved by
> Configuration.getInt("mapred.task.partition").
> If it is equal then such a key will be served by that reducer. Is this
> correct?
>
>
> To answer to your question:
> In a reduce side of a MR job, I want to load from file some data in a
> in-memory structure. Actually, I don't need to store the whole file
> for each reducer, but only the lines that are related to such keys a
> particular reducers will receive.
> So, my intention is to know the keys in the setup method to store only
> the needed lines.
>
> Thanks,
> Alberto
>
>
> On 28 March 2013 11:01, Hemanth Yamijala <yhema...@thoughtworks.com>
> wrote:
> > Hi,
> >
> > Not sure if I am answering your question, but this is the background.
> Every
> > MapReduce job has a partitioner associated to it. The default
> partitioner is
> > a HashPartitioner. You can as a user write your own partitioner as well
> and
> > plug it into the job. The partitioner is responsible for splitting the
> map
> > outputs key space among the reducers.
> >
> > So, to know which reducer a key will go to, it is basically the value
> > returned by the partitioner's getPartition method. For e.g this is the
> code
> > in the HashPartitioner:
> >
> >   public int getPartition(K2 key, V2 value,
> >                           int numReduceTasks) {
> >     return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
> >   }
> >
> > mapred.task.partition is the key that defines the partition number of
> this
> > reducer.
> >
> > I guess you can piece together these bits into what you'd want..
> However, I
> > am interested in understanding why you want to know this ? Can you share
> > some info ?
> >
> > Thanks
> > Hemanth
> >
> >
> > On Thu, Mar 28, 2013 at 2:17 PM, Alberto Cordioli
> > <cordioli.albe...@gmail.com> wrote:
> >>
> >> Hi everyone,
> >>
> >> how can i know the keys that are associated to a particular reducer in
> >> the setup method?
> >> Let's assume in the setup method to read from a file where each line
> >> is a string that will become a key emitted from mappers.
> >> For each of these lines I would like to know if the string will be a
> >> key associated with the current reducer or not.
> >>
> >> I read something about mapred.task.partition and mapred.task.id, but I
> >> didn't understand the usage.
> >>
> >>
> >> Thanks,
> >> Alberto
> >>
> >>
> >> --
> >> Alberto Cordioli
> >
> >
>
>
>
> --
> Alberto Cordioli
>

Reply via email to