Hmm. That feels like a join. Can't you read the input file on the map side and output those keys along with the original map output keys.. That way the reducer would automatically get both together ?
On Thu, Mar 28, 2013 at 5:20 PM, Alberto Cordioli < cordioli.albe...@gmail.com> wrote: > Hi Hemanth, > > thanks for your reply. > Yes, this partially answered to my question. I know how hash > partitioner works and I guessed something similar. > The piece that I missed was that mapred.task.partition returns the > partition number of the reducer. > So, putting al the pieces together I undersand that: for each key in > the file I have to call the HashPartitioner. > Then I have to compare the returned index with the one retrieved by > Configuration.getInt("mapred.task.partition"). > If it is equal then such a key will be served by that reducer. Is this > correct? > > > To answer to your question: > In a reduce side of a MR job, I want to load from file some data in a > in-memory structure. Actually, I don't need to store the whole file > for each reducer, but only the lines that are related to such keys a > particular reducers will receive. > So, my intention is to know the keys in the setup method to store only > the needed lines. > > Thanks, > Alberto > > > On 28 March 2013 11:01, Hemanth Yamijala <yhema...@thoughtworks.com> > wrote: > > Hi, > > > > Not sure if I am answering your question, but this is the background. > Every > > MapReduce job has a partitioner associated to it. The default > partitioner is > > a HashPartitioner. You can as a user write your own partitioner as well > and > > plug it into the job. The partitioner is responsible for splitting the > map > > outputs key space among the reducers. > > > > So, to know which reducer a key will go to, it is basically the value > > returned by the partitioner's getPartition method. For e.g this is the > code > > in the HashPartitioner: > > > > public int getPartition(K2 key, V2 value, > > int numReduceTasks) { > > return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks; > > } > > > > mapred.task.partition is the key that defines the partition number of > this > > reducer. > > > > I guess you can piece together these bits into what you'd want.. > However, I > > am interested in understanding why you want to know this ? Can you share > > some info ? > > > > Thanks > > Hemanth > > > > > > On Thu, Mar 28, 2013 at 2:17 PM, Alberto Cordioli > > <cordioli.albe...@gmail.com> wrote: > >> > >> Hi everyone, > >> > >> how can i know the keys that are associated to a particular reducer in > >> the setup method? > >> Let's assume in the setup method to read from a file where each line > >> is a string that will become a key emitted from mappers. > >> For each of these lines I would like to know if the string will be a > >> key associated with the current reducer or not. > >> > >> I read something about mapred.task.partition and mapred.task.id, but I > >> didn't understand the usage. > >> > >> > >> Thanks, > >> Alberto > >> > >> > >> -- > >> Alberto Cordioli > > > > > > > > -- > Alberto Cordioli >