Re: Data node not able to contact the resource manager

Daniel Santos Mon, 05 Aug 2019 16:00:41 -0700

Hello

I found out the cause of the error. When I submit a job to the cluster, I 
supply a xml configuration file with properties of the cluster I am connecting 
to.
I had to replicate some properties related to addresses of yarn on that 
configuration file.


I though that the cluster configuration would be sufficient, but no.

Thanks for your interest
Regards


> On 5 Aug 2019, at 19:21, Jon Mack <[email protected]> wrote:
> 
> Doesn't look the client is resolving the IP Address correctly (IE 
> 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030> ), try a nslookup on one 
> of the clients (IE nslookup  hadoopresourcemanager ) to see what the client 
> is resolving it to. Change the configuration to use the IP Address instead of 
> the hostname if possible.
> 
> Also do a netstat -an | grep 8030 on hadoopresourcemanager to verify the 
> resource manager service is running.
> 
> 
> On Mon, Aug 5, 2019 at 12:38 PM Daniel Santos <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello,
> I am using hosts files on all machines that are centrally managed through 
> puppet. When I run the yarn startup script on the hadoopresourcemanager 
> machine it creates the node managers one each slave. 
> 
> Regards
> 
> Sent from my iPhone
> 
> On 5 Aug 2019, at 16:01, Jeff Hubbs <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Does "hadoopresourcemanager" resolve to a machine that's a Hadoop resource 
>> manager? In Hadoop, it's absolutely vital that all names resolve correctly 
>> in both directions.
>> 
>> On 8/5/19 10:55 AM, Daniel Santos wrote:
>>> Hello Jon,
>>> 
>>> I have the following yarn-site.xml :
>>> 
>>> <configuration>
>>> ? ? ? ? <!-- Site specific YARN configuration properties -->
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.acl.enable</name>
>>> ? ? ? ? ? ? ? ? <value>0</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.hostname</name>
>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager,aux-services</name>
>>> ? ? ? ? ? ? ? ? <value>mapreduce_shuffle</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.resource.memory-mb</name>
>>> ? ? ? ? ? ? ? ? <value>1536</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.maximum-allocation-mb</name>
>>> ? ? ? ? ? ? ? ? <value>1536</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.scheduler.minimum-allocation-mb</name>
>>> ? ? ? ? ? ? ? ? <value>128</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.nodemanager.vmem-check-enabled</name>
>>> ? ? ? ? ? ? ? ? <value>false</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.address</name>
>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8032</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.scheduler.address</name>
>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8030</value>
>>> ? ? ? ? </property>
>>> ? ? ? ? <property>
>>> ? ? ? ? ? ? ? ? <name>yarn.resourcemanager.resource-tracker.address</name>
>>> ? ? ? ? ? ? ? ? <value>hadoopresourcemanager:8031</value>
>>> ? ? ? ? </property>
>>> </configuration>
>>> 
>>> So I can say, I already tried your suggestion
>>> 
>>> Cheers
>>> 
>>>> On 5 Aug 2019, at 15:22, Jon Mack <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Looks to me it's missing the resource manager configuration based on the 
>>>> port it's trying to connect to.. 
>>>> 
>>>> On Mon, Aug 5, 2019 at 9:15 AM Daniel Santos <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Hello,
>>>> 
>>>> I have a cluster with one machine holding the name nodes (primary and 
>>>> secondary) a yarn node (resource manager) and four data nodes.
>>>> I am running hadoop 2.7.0.
>>>> 
>>>> When I submit a job to the cluster I can see it in the scheduler webpage. 
>>>> If I go to the container page and check the logs, in the syslog file i 
>>>> have in the end the following :
>>>> 
>>>> 2019-08-05 14:58:05,962 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 2 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:06,962 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 3 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:07,963 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 4 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:08,965 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 5 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:09,966 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 6 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:10,967 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 7 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:11,968 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 8 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 2019-08-05 14:58:12,969 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
>>>> connect to server: 0.0.0.0/0.0.0.0:8030 <http://0.0.0.0/0.0.0.0:8030>. 
>>>> Already tried 9 time(s); retry policy is 
>>>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
>>>> MILLISECONDS)
>>>> 
>>>> I have checked the configuration of the resource manager and the data node 
>>>> where the application is running on and the property : 
>>>> ?yarn.resourcemanager.hostname that I have set in yarn-site.xml is shown.
>>>> I have disabled ipv6 on the yarn machine, as some posts on the internet 
>>>> suggested. All the configuration files are the same in every node of the 
>>>> cluster.
>>>> 
>>>> still I am getting these errors, and the application ends with a timeout.
>>>> 
>>>> What am I doing wrong ?
>>>> 
>>>> Thanks
>>>> Regards
>>> 
>>

Re: Data node not able to contact the resource manager

Reply via email to