Hi, I changed my cluster config so a failed nodemanager can be detected in
about 30 seconds. When I'm running a wordcount the reduce gets stuck in 25%
for a quite while and logs show nodes trying to connect to the failed node:
org.apache.hadoop.ipc.Client: Retrying connect to server:
hadoop-telles
oop.apache.org>"
mailto:user@hadoop.apache.org>>
Date: Saturday, February 7, 2015 at 8:37 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>"
mailto:user@hadoop.apache.org>>
Subject: Max Connect retries
Hi, I changed my cluster config so a failed node
t;
> From: Telles Nobrega
> Reply-To: "user@hadoop.apache.org"
> Date: Saturday, February 7, 2015 at 8:37 PM
> To: "user@hadoop.apache.org"
> Subject: Max Connect retries
>
> Hi, I changed my cluster config so a failed nodemanager can be detected
> in
>> ipc.client.connect.max.retries.on.timeouts
>>
>> in core-site.xml
>>
>>
>> Thanks
>>
>> Xuan Gong
>>
>> From: Telles Nobrega
>> Reply-To: "user@hadoop.apache.org"
>> Date: Saturday, February 7, 2015 at 8:37 PM
etries by configuring
>>>
>>> ipc.client.connect.max.retries.on.timeouts
>>>
>>> in core-site.xml
>>>
>>>
>>> Thanks
>>>
>>> Xuan Gong
>>>
>>> From: Telles Nobrega
>>> Reply-To: "user@h