Sorry for the very slow response, but here it is, hopefully better late than
never.
On Jul 25, 2012, at 4:28 PM, Prasanth J wrote:
Thanks Alan.
The requirement for me is that I want to load N number of samples based on
the input file size and perform naive cube computation to determine the
I see. Thanks Alan for your reply.
Also one more question that I posted earlier was
I used RandomSampleLoader and specified a sample size of 100. The number of map
tasks that are executed is 110. So I am expecting total samples that are
received on the reducer to be 110*100 = 11000 but its
I think we decided to instead stub in a special loader that reads a
few records from each underlying split, in a single mapper (by using a
single wrapping split), right?
On Thu, Aug 23, 2012 at 7:55 PM, Prasanth J buckeye.prasa...@gmail.com wrote:
I see. Thanks Alan for your reply.
Also one
Oh yeah.. This question is not related to our cube sampling stuff that we
discussed.. wanted to know the reason behind that just out of curiosity :)
Thanks
-- Prasanth
On Aug 23, 2012, at 11:20 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
I think we decided to instead stub in a special
Hello everyone
I would like know if there is a way to know the number of mappers while
compiling physical plan to MR-plan.
Thanks
-- Prasanth
No. The number of mappers is determined by the InputFormat used by your load
function (TextInputFormat if you're using the default PigStorage loader) when
the Hadoop job is submitted. Pig doesn't have access to that info until it's
handed the jobs off to MapReduce.
Alan.
On Jul 25, 2012, at
Thanks Alan.
The requirement for me is that I want to load N number of samples based on the
input file size and perform naive cube computation to determine the large
groups that will not fit in reducer's memory. I need to know the exact number
of samples for calculating the partition factor for