Re: Benchmark Haddop and Pig UDFs

Mridul Muralidharan Wed, 20 Apr 2011 14:13:24 -0700

Not sure what the scope of the experiment is, but some usefulcomparisons could be against :

a) job using only mapred api.
b) hadoop streaming.
c) pig streaming.

It also depends on the actual script/job being run - if it is usingcombiners, multiple outputs, 'depth of pipeline', how many jobs you endup running for it, etc.

If you are interested in only testing how pig scales, then interestingmetrics could be :

a) size of input.
b) with/without compression.
c) number of mappers.
d) number of reducers.
e) output size (depending on what you are running I guess).


Regards,
Mridul


On Thursday 21 April 2011 01:27 AM, Lai Will wrote:

Hi there,

I'm planning to do some performance measurements of my hadoop pig code in order 
to see how it scales.
Does anyone have some suggestions on how to do that?

I thought of measuring the time needed for completion on a fixed cluster size 
by increasing the input data.
Then by fixing the input data and by adding cluster nodes. Does anyone have 
experience in doing that? I thought of writing a script that does start/stop 
the time and execute the pig command. Maybe there's a better way?

Best,
Will

Re: Benchmark Haddop and Pig UDFs

Reply via email to