Benchmark Haddop and Pig UDFs

2011-04-20 Thread Lai Will
Hi there, I'm planning to do some performance measurements of my hadoop pig code in order to see how it scales. Does anyone have some suggestions on how to do that? I thought of measuring the time needed for completion on a fixed cluster size by increasing the input data. Then by fixing the

Re: Benchmark Haddop and Pig UDFs

2011-04-20 Thread Mridul Muralidharan
Not sure what the scope of the experiment is, but some useful comparisons could be against : a) job using only mapred api. b) hadoop streaming. c) pig streaming. It also depends on the actual script/job being run - if it is using combiners, multiple outputs, 'depth of pipeline', how many

Re: Benchmark Haddop and Pig UDFs

2011-04-20 Thread Guy Bayes
that? bash script? python? How would you set up the benchmark? Best, Will -Original Message- From: Mridul Muralidharan [mailto:mrid...@yahoo-inc.com] Sent: Mittwoch, 20. April 2011 23:13 To: user@pig.apache.org Cc: Lai Will Subject: Re: Benchmark Haddop and Pig UDFs Not sure