Hi Users, I have a 12 node CDH3 cluster where I am planning to run some benchmark tests. My main intension is to run the benchmarks first with the default Hadoop configuration and then analyze the outcomes and tune the Hadoop metrics accordingly to increase the performance of my cluster.
Can some one provide me some suggestions that which are the important Hadoop metrics that I should observe during benchmarking? Also, I have seen somewhere that the ratio of "Avg Map Tasks" and "Avg Reduce Tasks" Execution Time is recorded for various benchmarks. How significant is that information for me to judge the cluster performance? How the ratios will help me to analyze and tune the Hadoop cluster accordingly for increase in performance. Till now I have run the following benchmarks without tuning the cluster (with default Hadoop configuration): - Sort - WordCount - TeraSort - TestDFSIO Please provide suggestion that which are the other benchmarks that I should run, especially from "hadoop-test.jar" in $HADOOP_HOME directory and what are the usage of those jobs. Thanks, Gaurav Dasgupta