Hi, I've set up a Hadoop 2.4 cluster with three nodes. Namenode and Resourcemanager are running on one node, Datanodes and Nodemanagers on the other two. All services are starting up without problems (as far as I can see), web apps show all nodes as running.
However, I am not able to run MapReduce jobs: yarn jar hadoop-mapreduce-examples-2.4.0.jar pi 5 1000000 submits the job, it appears in the web app, but state is stuck in ACCEPTED. Instead I'm receiving messages: 14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_000000_0, Status : FAILED 14/05/13 12:15:48 INFO mapreduce.Job: Task Id : attempt_1399971492349_0004_m_000001_0, Status : FAILED the log shows: 2014-05-13 12:13:56,702 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.cluster.temp.dir; Ignoring. 2014-05-13 12:15:27,896 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-05-13 12:15:28,146 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2014-05-13 12:15:28,185 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2014-05-13 12:15:28,192 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1399971492349_0004, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@364879) 2014-05-13 12:15:28,453 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2014-05-13 12:15:28,662 WARN [main] org.apache.hadoop.ipc.Client: Address change detected. Old: localhost/127.0.1.1:41395 New: localhost/127.0.0.1:41395 2014-05-13 12:15:29,664 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:30,665 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:31,666 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:32,667 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:33,668 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:34,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:35,669 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:36,670 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:37,671 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:38,672 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/127.0.0.1:41395. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2014-05-13 12:15:38,675 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From hd-slave-172.ffm.telekom.de/164.26.155.172 to localhost:41395 failed on connection exception: java.net.ConnectException: Verbindungsaufbau abgelehnt; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231) at com.sun.proxy.$Proxy9.getTask(Unknown Source) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:136) Caused by: java.net.ConnectException: Verbindungsaufbau abgelehnt at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) at org.apache.hadoop.ipc.Client.call(Client.java:1381) ... 4 more Not sure about a) the 90 seconds break between 12:13 - 12:15. I think I'm running into some kind of timeout, but I don't know how to find out what the system is doing during that time. b) the localhost:41395. I cannot find a deamon listening using netstat. I suppose this is some kind of local IPC deamon which is also affected by a timeout? Any ideas? Cheers Seb.