While trying to run Mapreduce Jobs on a HA Configured cluster, I saw major performance degradation when the machine with active RM went down and the cluster was operating with only one machine. The following job
[ hadoop-mapreduce-examples-2.6.0.jar pi 2 4 ] which normally takes 20-30 seconds to succeed ran for 220 seconds. I believed that this is probably caused by the value of [ ipc.client.connect.timeout ] which is 20 seconds by default. When I changed this value to 5sec, the run time of the job was reduced to 70-80 seconds but I saw it reaching high values intermittently. I also observed that when trying to connect to the active NN or RM, the recent state of the machine is not taken into consideration even when trying to connect the second time in the same job. Thanks, Kunal.