Mighty users@hadoop anyone on this.
On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee < rahul.rec....@gmail.com> wrote: > Hi, > > I have a question related to Hadoop's input sampler ,which is used for > investigating the data set before hand using random selection , sampling > etc .Mainly used for total sort , used in pig's skewed join implementation > as well. > > The question here is - > > Mapper<K,V,OK,OV> > > K and V are input key and value of the mapper .Essentially coming in from > the input format. OK and OV are output key and value emitted from the > mapper. > > Looking at the input sample's code ,it looks like it is creating the > partition based on the input key of the mapper. > > I think the partitions should be created considering the output key (OK) > and the output key sort comparator should be used for sorting the samples. > > If partitioning is done based on input key and the mapper emits a > different key then the total sort wouldn't hold any good. > > Is there is any condition that input sample is to be only used for > mapper<K,V,K,V1>? > > > Thanks, > Rahul > >