Hi there,
I'm planning to do some performance measurements of my hadoop pig code in order
to see how it scales.
Does anyone have some suggestions on how to do that?
I thought of measuring the time needed for completion on a fixed cluster size
by increasing the input data.
Then by fixing the
Not sure what the scope of the experiment is, but some useful
comparisons could be against :
a) job using only mapred api.
b) hadoop streaming.
c) pig streaming.
It also depends on the actual script/job being run - if it is using
combiners, multiple outputs, 'depth of pipeline', how many
that? bash script?
python?
How would you set up the benchmark?
Best,
Will
-Original Message-
From: Mridul Muralidharan [mailto:mrid...@yahoo-inc.com]
Sent: Mittwoch, 20. April 2011 23:13
To: user@pig.apache.org
Cc: Lai Will
Subject: Re: Benchmark Haddop and Pig UDFs
Not sure