Re: Hadoop sampler related query!

Rahul Bhattacharjee Tue, 23 Apr 2013 03:43:36 -0700

+ mapred dev


On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
rahul.rec....@gmail.com> wrote:

> Hi,
>
> I have a question related to Hadoop's input sampler ,which is used for
> investigating the data set before hand using random selection , sampling
> etc .Mainly used for total sort , used in pig's skewed join implementation
> as well.
>
> The question here is -
>
> Mapper<K,V,OK,OV>
>
> K and V are input key and value of the mapper .Essentially coming in from
> the input format. OK and OV are output key and value emitted from the
> mapper.
>
> Looking at the input sample's code ,it looks like it is creating the
> partition based on the input key of the mapper.
>
> I think the partitions should be created considering the output key (OK)
> and the output key sort comparator should be used for sorting the samples.
>
> If partitioning is done based on input key and the mapper emits a
> different key then the total sort wouldn't hold any good.
>
> Is there is any condition that input sample is to be only used for
> mapper<K,V,K,V1>?
>
>
> Thanks,
> Rahul
>
>

Re: Hadoop sampler related query!

Reply via email to