Re: Number of mappers in MRCompiler

2012-08-23 Thread Alan Gates
Sorry for the very slow response, but here it is, hopefully better late than never. On Jul 25, 2012, at 4:28 PM, Prasanth J wrote: Thanks Alan. The requirement for me is that I want to load N number of samples based on the input file size and perform naive cube computation to determine the

Re: Number of mappers in MRCompiler

2012-08-23 Thread Prasanth J
I see. Thanks Alan for your reply. Also one more question that I posted earlier was I used RandomSampleLoader and specified a sample size of 100. The number of map tasks that are executed is 110. So I am expecting total samples that are received on the reducer to be 110*100 = 11000 but its

Re: Number of mappers in MRCompiler

2012-08-23 Thread Dmitriy Ryaboy
I think we decided to instead stub in a special loader that reads a few records from each underlying split, in a single mapper (by using a single wrapping split), right? On Thu, Aug 23, 2012 at 7:55 PM, Prasanth J buckeye.prasa...@gmail.com wrote: I see. Thanks Alan for your reply. Also one

Re: Number of mappers in MRCompiler

2012-08-23 Thread Prasanth J
Oh yeah.. This question is not related to our cube sampling stuff that we discussed.. wanted to know the reason behind that just out of curiosity :) Thanks -- Prasanth On Aug 23, 2012, at 11:20 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: I think we decided to instead stub in a special

Number of mappers in MRCompiler

2012-07-25 Thread Prasanth J
Hello everyone I would like know if there is a way to know the number of mappers while compiling physical plan to MR-plan. Thanks -- Prasanth

Re: Number of mappers in MRCompiler

2012-07-25 Thread Alan Gates
No. The number of mappers is determined by the InputFormat used by your load function (TextInputFormat if you're using the default PigStorage loader) when the Hadoop job is submitted. Pig doesn't have access to that info until it's handed the jobs off to MapReduce. Alan. On Jul 25, 2012, at

Re: Number of mappers in MRCompiler

2012-07-25 Thread Prasanth J
Thanks Alan. The requirement for me is that I want to load N number of samples based on the input file size and perform naive cube computation to determine the large groups that will not fit in reducer's memory. I need to know the exact number of samples for calculating the partition factor for