Hi,

I've set up a Hadoop 2.4 cluster with three nodes. Namenode and Resourcemanager 
are running on one node, Datanodes and Nodemanagers on the other two. All 
services are starting up without problems (as far as I can see), web apps show 
all nodes as running.

However, I am not able to run MapReduce jobs:
yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000
submits the job, it appears in the web app, but state is stuck in ACCEPTED. 
Instead I'm receiving messages:

14/05/13 12:15:48 INFO mapreduce.Job: Task Id : 
attempt_1399971492349_0004_m_000000_0, Status : FAILED
14/05/13 12:15:48 INFO mapreduce.Job: Task Id : 
attempt_1399971492349_0004_m_000001_0, Status : FAILED


the log shows:

2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration: 
job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir;  
Ignoring.
2014-05-13 12:15:27,896 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2014-05-13 12:15:28,146 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 
10 second(s).
2014-05-13 12:15:28,146 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system 
started
2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Executing with tokens:
2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: 
mapreduce.job, Service: job_1399971492349_0004, Ident: 
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879)
2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild: 
Sleeping for 0ms before retrying again. Got null now.
2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address 
change detected. Old: localhost/127.0.1.1:41395 New: localhost/127.0.0.1:41395
2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
MILLISECONDS)
2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.net.ConnectException: Call From 
hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on 
connection exception: java.net.ConnectException: Verbindungsaufbau abgelehnt; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
        at org.apache.hadoop.ipc.Client.call(Client.java:1414)
        at org.apache.hadoop.ipc.Client.call(Client.java:1363)
        at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
        at com.sun.proxy.$Proxy9.getTask(Unknown Source)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136)
Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
        at org.apache.hadoop.ipc.Client.call(Client.java:1381)
        ... 4 more

Not sure about
a) the 90 seconds break between 12:13 - 12:15. I think I'm running into some 
kind of timeout, but I don't know how to find out what the system is doing 
during that time.
b) the localhost:41395. I cannot find a deamon listening using netstat. I 
suppose this is some kind of local IPC deamon which is also affected by a 
timeout?

Any ideas?

Cheers
Seb.

Reply via email to