Hi, Consider it to be a single iteration Kmeans clustering job such that I only wish to schedule reduce jobs for the clusterId(the key for a Kmeans) of the cluster corresponding to the 1st point in the dataset. I wish to check the clusterId of the first point in the input file and get reduce jobs only for that specific clusterId.
I think we shall have to wait for all mappers to end. Thanks, Aseem On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <yhema...@thoughtworks.com > wrote: > Hi, > > When do you know the keys to ignore ? You mentioned "after the map stage" > .. is this at the end of each map task, or at the end of all map tasks ? > > Thanks > hemanth > > > On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <aseem.ii...@gmail.com>wrote: > >> Hi, >> Is there anyway I can ignore all keys except a certain key ( determined >> after the map stage) to start only 1 reduce job using a partitioner? If so >> could someone suggest such a method. >> >> Regards, >> Aseem >> >> >